Advanced Engineering Statistics - Section 5 - Jay Liu Dept. Chemical Engineering PKNU
|
|
- Mitchell Noah Roberts
- 5 years ago
- Views:
Transcription
1 Advanced Engineering Statistics - Section 5 - Jay Liu Dept. Chemical Engineering PKNU
2 Least squares regression What we will cover Box, G.E.P., Use and abuse of regression, Technometrics, 8 (4), , Adv. Eng. Stat., Jay Liu 1
3 [FYI]Least squares vs. interpolation Given the data, there are two choices when we want to know the value of y at x = (x 1 + x 2 )/2 x y y x 1 y1 x 1 x 2 x x 2 y 2 least squares? or interpolation? Interpolation is recommended when data are subject to negligible experimental error (or noise) Ex. In using steam tables Otherwise, least squares is recommended Adv. Eng. Stat., Jay Liu 2
4 Least squares - usage examples Quantify relationship between 2 variables (or 2 sets of variables): Manager: How does yield from the lactic acid batch fermentation relate to the purity of sucrose? Engineer: The yield can be predicted from sucrose purity with an error of plus/minus 8% Manager: And how about the relationship between yield and glucose purity? Engineer: Over the range of our historical data, there is no discernible relationship Adv. Eng. Stat., Jay Liu 3
5 Least squares - usage examples Two general applications Predictive modeling usually when an exact model form is unknown. Modeling data trends in order to predict future y values Simulation usually when parameters in the model are unknown. Getting parameter values in the known model form (e.g., calculate activation energy from reaction data) Terminology y : response variables, output variables, dependent variables, x : input variables, regressor variables, independent variables, Adv. Eng. Stat., Jay Liu 4
6 Review: covariance Consider measurements from a gas cylinder: temperature (K) and pressure (kpa). Ideal gas law applies under moderate condition: pv = nrt Fixed volume, V = m3 = 20 L Moles of gas, n = 14.1 mols of chlorine gas, (1 kg gas) Gas constant, R = J/(mol.K) Simplify the ideal gas law to: p = b 1 T, where b 1 nr V Adv. Eng. Stat., Jay Liu 5
7 Review: covariance (Cont.) Adv. Eng. Stat., Jay Liu 6
8 Review: covariance (Cont.) Formal definition: 2. Multiply the centered values: cov( x, y) E x x y y where E( z) z 1. Calculate deviation variables: T T and p p Subtracting off mean centers the vector at zero Calculate the expected value (mean): Covariance has units: [K kpa] c.f) Covariance between temperature and humidity is 202 [K %] Covariance with itself is the variance: T T p p cov( x, x) V ( x) E x x x x or T T centered p centered Adv. Eng. Stat., Jay Liu 7
9 Review: correlation Q: Which one (pressure or temperature) has stronger relationship with temperature? Covariance depends on units: e.g. different covariance for grams vs kilograms Correlation removes the scaling effect: corr( x, y) Divides by the units of x and y: dimensionless result Gas cylinder example: cov( xy, ) corr(temperature, pressure) = corr(temperature, humidity) = E x x y y x y x y 1 corr( x, y) xy Adv. Eng. Stat., Jay Liu 8
10 Review: correlation (cont.) Want to find a relationship y = f(x) other than the above? Adv. Eng. Stat., Jay Liu 9
11 Review: correlation (cont.) Remember!? Adv. Eng. Stat., Jay Liu 10
12 Least squares? Least squares regression? Regression is the act of choosing the best values for the unknown parameters in a model on the basis of a set of measured data. Linear regression is the special case where the model is linear in the parameters. A straight line has the form: There are many possible ways to define the best fit. However, the most commonly used measure for bestness is the sum of squared residuals. y a a x( e) 0 1 Least sum of squares of errors least squares in short. Important: error is from y, not from x Adv. Eng. Stat., Jay Liu 11
13 [FYI] why minimize the sum of squares? The least squares model: has the lowest possible variance for a 0 and a 1 when certain assumptions are met (more later) computationally tractable by hand easy to prove various mathematical properties intuitive: penalize deviations quadratically Other forms: multiple solutions, unstable, high variance solutions, mathematical proofs are difficult Adv. Eng. Stat., Jay Liu 12
14 Least squares (regression) It is the basis for : DOE (Design of Experiments) Latent variable methods We consider only 2 (sets of) variables : x and y (or x s and y) Simple least squares Multiple least squares Generalized least squares Adv. Eng. Stat., Jay Liu 13
15 Simple least squares Wind tunnel example How can we find the best line that describe the following data? Data from wind tunnel experiments: Drag force (F) at various wind velocities Adv. Eng. Stat., Jay Liu 14
16 Wind tunnel example (cont.) From the plot, a linear line seems adequate. y = a 0 + a 1 x At a data point (x i, y i ), error between the line and the point is: (see the figure on the right) e i = y i yˆi = y i a 0 a 1 x i Earlier, least squares means least sum of squares of errors. For all data points, sum of squares of errors is: We need to find model parameters a 0 and a 1 that minimize S r. Least squares Adv. Eng. Stat., Jay Liu 15
17 Wind tunnel example (cont.) How to find model parameters? Take a look at Sr. S r is a parabolic function w.r.t a o and a 1 and sign of a and 2 2 o 1 are plus. S r becomes minimum where S a r a Sr 0 & 0. a 0 1 a 1 a 0 Rearranging and solving for a 0 and a Adv. Eng. Stat., Jay Liu 16
18 Wind tunnel example (cont.) Calculations Adv. Eng. Stat., Jay Liu 17
19 Wind tunnel example (cont.) Calculations This is called simple least squares Adv. Eng. Stat., Jay Liu 18
20 Wind tunnel example (cont.) Results Is this OK with you? Adv. Eng. Stat., Jay Liu 19
21 General modeling procedure Define modeling objective Variable selection Identify the response variables (i.e., y variables), and the regressor variables (i.e., x variables) that are to be considered Design of experiment Design an experiment and use it to generate the data that will be used to fit the model Define the model Choose an appropriate form for the model Fit the model Estimate values for the parameters in the model N Does the model fit? Use the model Adv. Eng. Stat., Jay Liu 20 Y Statistical tools + prior knowledge
22 Simple least squares Summary Model form: y = a 0 + a 1 x + e Rearranging and solving for a 0 and a 1 becomes minimizes where S a r Sr 0 & 0. a Adv. Eng. Stat., Jay Liu 21
23 Simple least squares (cont.) Properties i. e., y yˆ 0 i i a 0 a Adv. Eng. Stat., Jay Liu 22
24 Simple least squares (cont.) Questions a 1 a 1 a 0 a 1 a 0 a 1 what if our model we want to find is non-linear? Ex. Activation energy in rate constant k k0e E RT Linearize! Adv. Eng. Stat., Jay Liu 23
25 Linearization Want to model non-linear relationships between independent (x) and dependent (y) variables. 1. Make a simple linear model through a suitable transformation. y = f(x) + e y = a 0 + a 1 x + e 2. Use previous results (simple least squares) Caution: nonlinear transformation also changes P.D.F of variables (and errors) We will discuss about this in model assessment Adv. Eng. Stat., Jay Liu 24
26 Linearization (Cont.) Adv. Eng. Stat., Jay Liu 25
27 Polynomial regression For quadratic form Sum of squares Again, S r has a parabolic shape w.r.t a 0, a 1, and a 2. with plus signs of a, a, and a S a r 0 S a S a r 1 r ( yi a0 a1xi a2xi ) xi ( yi a0 a1xi a2xi ) 0 2 x ( y a a x a x ) i i 0 1 i 2 i Adv. Eng. Stat., Jay Liu 26
28 Polynomial regression (Cont.) Rearranging the previous equations gives the above equations can be solved easily. (three unknowns and three equations.) For general polynomials 2 n xi x i a0 y i 2 3 xi xi xi a1 xi yi xi xi x i a 2 xi y i From the results of two cases (y = a 0 + a 1 x & y = a 0 + a 1 x + a 2 x 2 ) Sr Sr Sr 0 a a a 0 1 m we need to solve (m+1) linear algebraic equations for (m+1) parameters Adv. Eng. Stat., Jay Liu 27
29 Multiple least squares Consider when there are more than two independent variables, x 1, x 2,, x m. regression plane. y a 0 a 1 x 1 a 2 x 2 a m x m e For 2-D case, y = a 0 + a 1 x 1 + a 2 x 2. Again, S r has a parabolic shape w.r.t a 0, a 1. S r ( yi i a0 a1x1, i a2x2, ) 2 S a r 0 S a S a r 1 r 2 2 ( y a a x a x ) 0 i 0 1 1, i 2 2, i 2 x ( y a a x a x ) 0 1, i i 0 1 1, i 2 2, i 2 x ( y a a x a x ) 0 2, i i 0 1 1, i 2 2, i a 2 a Adv. Eng. Stat., Jay Liu 28
30 Multiple least squares (Cont.) Rearranging and solve for a 0, a 1 and a 2 gives For an m-dimensional plane, y a 0 a a a Same as in general polynomials, 1 x 1 2 x 2 m x m e Sr Sr Sr 0 a a a 0 1 m we need to solve (m+1) linear algebraic equations for (m+1) parameters Adv. Eng. Stat., Jay Liu 29
31 General least squares The following form includes all cases (simple least squares, polynomial regression, multiple regression) Ex. Simple and multiple least squares polynomial regression Same as before, Sr Sr Sr 0 a a a 0 1 we need to solve (m+1) linear algebraic equations for (m+1) parameters. m Adv. Eng. Stat., Jay Liu 30
32 Quantification of errors S t y i y 2 S r 2 ei yi a0z0, i a1z1, i amzm, i 2 Total sum of squares around the mean for the response variable, y Sum of squares of residuals around the regression line Adv. Eng. Stat., Jay Liu 31
33 Quantification of errors (Cont.) S y 1 2 St yi y n 1 n 1 S y x Sr n ( m 1) Standard deviation of y Standard error of predicted y (S E ) quantify appropriateness of regression Adv. Eng. Stat., Jay Liu 32
34 Quantification of errors (Cont.) Coefficients of determination, R 2 2 St Sr The amount of variability in the data explained R S by the regression model. t R 2 = 1 when S r = 0 : perfect fit (a regression curve passes through data points) R 2 = 0 when S r = S t : as bad as doing nothing It is evident from the figures that a parabola is adequate. R 2 of (b) is higher than that of (a) Adv. Eng. Stat., Jay Liu 33
35 Quantification of errors (Cont.) Warning! : R 2 1 does not guarantee that the model is adequate, nor the model will predict new data well. It is possible to force R 2 to be one by adding as many terms as there are observations. S r can be big when variance of random error is large. (Usual assumption on error is that error is random is unpredictable) Practice using Excel (1) Wind tunnel example with higher polynomials (2) Simple regression with increasing random noise Adv. Eng. Stat., Jay Liu 34
36 Confidence intervals - coefficients Coefficients in the regression model have confidence interval. Why? They are also statistics like & s. That is, they are numerical quantities calculated in a sample (not entire population). They are estimated values of parameters. x Statistic that we want to find its confidence interval statistic A statistic Value that depends on P.D.F of the statistic & confidence level a Standard error of the statistic statistic A statistic x x n x z a/2 t n,a/2 sx n The standard error of a statistic is the standard deviation of the sampling distribution of that statistic Adv. Eng. Stat., Jay Liu 35
37 Confidence intervals coefficients (cont.) Matrix representation of GLS y Za e Z y T T a T e m+1: number of coefficients n: number of data points Adv. Eng. Stat., Jay Liu 36
38 Confidence intervals coefficients (Cont.) Example Fitting quadratic polynomials to five data points Adv. Eng. Stat., Jay Liu 37 x y e x a x a a y e Za y e e e e e a a a Can you solve this problem? Three unknowns Five equations
39 Confidence intervals coefficients (Cont.) Solutions y Za e Sum of squares of errors S r e i r a 0 2 e T e T y Za y Za S T T Z Z a Z y Called normal equations 1. LU decomposition or other methods to solve L.A.E 2. Matrix inversion T T Z Za Z y " Ax b" T T T 1 T Z Za Z y a Z Z Z y computationally not efficient, but statistically useful Adv. Eng. Stat., Jay Liu 38
40 Confidence intervals coefficients (Cont.) Matrix inversion approach a T 1 T Z Z Z y Denote 1 Z ii as the diagonal element of Confidence interval of estimated coefficients a t S Z 2 1 i1 n( m1), a /2 y x ii Z T Z 1 t n( m1), a /2 Student t statistics S y x Sr n ( m 1) Standard error of estimate Adv. Eng. Stat., Jay Liu 39
41 Confidence intervals coefficients (Cont.) For a linear model, C.I. for a 1 (slope) C.I. for a 0 (intercept) a t S a t S 1 n ( m 1), a /2 y / x 2 0 n( m1), a /2 y/ x 2 What if confidence intervals contain zero? x 1 i 2 1 x n x x x i Adv. Eng. Stat., Jay Liu 40
42 Confidence intervals prediction C.I for predicted y, yˆi 2 0 yˆ t x n( m1), a /2 Sy/ x 2 0 n xi x 1 x x Adv. Eng. Stat., Jay Liu 41
43 Model assessment When we do not know the model form, we have to assess the model before use it after we fit a regression model. However, in order to assess the model and make inferences about the parameters and predictions from the model, we will have to employ statistics and make some assumptions about the nature of the disturbance. Tools for model assessment S y/x, R 2 (quantitative) ( Do not use) Residual Plots (qualitative) Normal probability chart (qualitative or quantitative) Test for lack of fit (quantitative) This is used when the dataset includes replicates. It is based on analysis of variance (ANOVA) Adv. Eng. Stat., Jay Liu 42
44 Model assessment - assumptions What is the most desirable errors in regression? Assumptions on error unpredictable Error is additive y 1 a0 a1x e y ( a0 a1x1 ) e The variance of the error is constant and is not related to values of the response or values of the regressor variables. There is no error associated with the values of the regressor variables. Error is a random variable with Gaussian distribution N(0, 2 ) ( 2 usually unknown) Adv. Eng. Stat., Jay Liu 43
45 Model assessment residual plots Recall the assumptions on error Error is not related to the values of response or regressor variables. Then, assumptions will not be valid if the model is wrong. Following residual plots will reveal this. Residuals vs. regressor variables Residuals vs. fitted y values ( yˆi ) Residuals vs. lurking variables (i.e. time or order) These plots will show some patterns when a model is inadequate Adv. Eng. Stat., Jay Liu 44
46 Model assessment residual plots (con t) Examples e e x x Adv. Eng. Stat., Jay Liu 45
47 Model assessment residual plots (con t) Examples of residual plots Adv. Eng. Stat., Jay Liu 46
48 Model assessment normal probability plot Recall the assumptions on error Error is a random variable with Gaussian distribution N(0, 2 ) ( 2 usually unknown) Then, errors will fall onto a straight line (y = x) in a normal probability plot. (especially useful when the number of data points is large) Normal probability plot Alternatively, normality test can be used Adv. Eng. Stat., Jay Liu 47
49 Model assessment ANOVA (Test for lack of fit) The variance breakdown t 2 S y y i S t S r S e ˆ 2 S y y r i i Re g ˆ 2 S y y i S S S t Re g r Really? Prove to yourself Adv. Eng. Stat., Jay Liu 48
50 Model assessment ANOVA (Test for lack of fit) The variance breakdown S S S t Re g r Ratio of S Reg /S r follows F distribution when corrected with degree of freedom. If regression is not meaningful, the ratio (S e /S r ) is small and S t S r. t 2 S y y i S t S r ˆ 2 S y y r i i S Reg Re g ˆ 2 S y y i Adv. Eng. Stat., Jay Liu 49
51 Model assessment ANOVA (Test for lack of fit) a 0 a 1 S t SRe g Sr Adv. Eng. Stat., Jay Liu 50
52 Model assessment ANOVA (Test for lack of fit) ANOVA Table Source of Var. Sum of Degrees of Mean Square F 0 Squares Freedom Regression S Reg p MS Reg =S Reg /p MS Reg / MS E Residual error S r n-p MS E =S r /(n-p) Total S t n-1 Compare F 0 to the critical value F p,n-p;a What we are doing is a test of hypothesis. We are testing the hypothesis: H 0 : b0 b p 0 H 1 : at least one parameter is not equal to zero Adv. Eng. Stat., Jay Liu 51
53 [FYI]Meaning of a p-value in hypothesis test A measure of how much evidence we have against the null hypothesis. Null hypothesis (H 0 ) represents the hypothesis of no change or no effect. Much research involves making a hypothesis and then collecting data to test that hypothesis. Then researchers will collect data and measure the consistency of this data with the null hypothesis. A small p-value is evidence against the null hypothesis while a large p-value means little or no evidence against the null hypothesis. Traditionally, researchers will reject a null hypothesis if the p-value is less than 0.05 (a = 0.05). p-value can mean that the possibility that you can be wrong when rejecting the null hypothesis Adv. Eng. Stat., Jay Liu 52
54 Integer variables in the model Integer variables 0 and 1 can represent qualitative variables. Example: raw material from Spain, India, or Vietnam y = a 0 + a 1 x a k x k + r 1 d 1 + r 2 d 2 + r 3 d 3 d 1 = 1 and d 2 = 0 and d 3 = 0 for Spain d 1 = 0 and d 2 = 1 and d 3 = 0 for India d 1 = 0 and d 2 = 0 and d 3 = 1 for Vietnam Often called indicator variables for this reason Adv. Eng. Stat., Jay Liu 53
55 Integer variables in the model Example Want to predict yield when two different impeller used. Yield = f(temperature, impeller type) Build two different models (one for axial, one for radial) axial Build one model using indicator variable. y = a 0 + a 1 T + rd radial y = a 0 + a 1 T + rd i d i = 0 for axial, d i = 1 for radial Adv. Eng. Stat., Jay Liu 54
56 Leverage effect Unusual observations influence the model parameters and our interpretation y y x x Outliers have an over-proportional effect on resulting regression curves. To avoid the leverage effect, Remove outliers before regression (but do not delete without investigation) Use different S r (no longer least squares) Adv. Eng. Stat., Jay Liu 55
57 Causal relation and correlation Causal relation Cause and effect relation Has physical/chemical/engineering meanings x and y are not interchangeable Direction exists. Correlation (Linear) relationship between two variables No physical/chemical/engineering meanings. Average height of 20 s men vs. year x and y are interchangeable Adv. Eng. Stat., Jay Liu 56
58 Advanced topics Testing of least-squares models Adv. Eng. Stat., Jay Liu 57
59 Advanced topics - Testing of least-squares models Adv. Eng. Stat., Jay Liu 58
60 Advanced topics Correlated x s MLR solution 2 T T y Za e S e r S r 0 a i e e y Za y Za T 1 T a Z Z Z y T Z Z a T Z y When two or more x s are correlated, ill-conditioned. T ZZ 1 becomes nearly singular, i.e., Adv. Eng. Stat., Jay Liu 59
61 Advanced topics correlated x s High/no correlation between x 1 and x 2 T ZZ T ZZ T ZZ 1 T ZZ What if very small (measurement ) noises added to x s T ZZ T ZZ T ZZ 1 T ZZ What will happen to your MLR model? Adv. Eng. Stat., Jay Liu 60
62 Advanced topics correlated x s If high correlation among x s: unstable solutions for a predictions uncertain also Geometrically speaking High correlation between x 1 and x Adv. Eng. Stat., Jay Liu 61
63 Advanced topics correlated x s Remedies? Use selected x variables stepwise regression Use ridge regression Use multivariate methods (will not be covered in this lecture) Adv. Eng. Stat., Jay Liu 62
64 Advanced topics What we want to know: How do we select the form of the model? Which variables should be included? Should we include transformations of the regressor variables?. What we want: We would like to build the best regression model We would like to include as many regressor variables as is necessary to adequately describe the behaviour of y. At the same time, we want to keep the model as simple as possible. Stepwise regression start off by choosing an equation having the single best x variables and the attempts to build up with subsequent additions of x s one at a time as long as these additions are worthwhile Adv. Eng. Stat., Jay Liu 63
65 Advanced topics stepwise regression Procedure 1. Add a x variable to the model (the variable that is most highly correlated with y). 2. Check to see whether or not this has significantly improved the model. One way is to see whether or not the confidence interval for the parameter includes zero. (of course you can use hypothesis test) If the new term or terms are not significant, remove them from the model. 3. Find one of the remaining x variables that is highly correlated with the residuals and repeat the procedure Adv. Eng. Stat., Jay Liu 64
66 Advanced topics Ridge regression (Hoerl, 1962; Hoerl & Kennard, 1970a,b) A modified regression method specifically for ill-conditioned datasets that allows all variables to be kept in the model. This is possible by adding additional information to the problem to remove the ill-conditioning. The objective function to minimize: T T y Xb y Xb kb b Least squares estimates have no bias but large variance, while ridge regression estimates have small bias and small variance Adv. Eng. Stat., Jay Liu 65
67 Advanced topics ridge regression Procedure 1. Mean center and scale all x s to unit variance f i x x i s x i 2. Rewrite the model as: x1 x xp x y y a1s x apsx s s 1 1 p x1 xp y bf b f 1 Y Fb ε p p Adv. Eng. Stat., Jay Liu 66
68 Advanced topics ridge regression 3. The objective function to use for the optimization is to minimize J T Y Fb Y Fb kbb Therefore, T 1 T b* F FZ ki F Y 4. Solve the optimization problem in Step 3 for several values of k between 0 and 1 and choose that value of k at which the estimates of b see to stabilize. Otherwise, choose k by validation on new data Adv. Eng. Stat., Jay Liu 67
69 Advanced topics Non-linear regression General form of a non-linear regression model y f( xa, ) T In a linear model, f ( x, a) x a. In a non-linear model, f() woulds have any form. E.g., b1 e x y 1 b x 2 Remember that nonlinear transformation also changes P.D.F of variables (and errors)? What does this mean? Adv. Eng. Stat., Jay Liu 68
70 Advanced topics non-linear regression The approach is exactly the same as for linear models We use the same objective function: S r εε n i1 n i1 ( y yˆ ) i i 2 ( y f x, a ) i i 2 All we need is to minimize S over a. but how? Adv. Eng. Stat., Jay Liu 69
71 Advanced topics non-linear regression The big difference between linear and nonlinear regression is that in general, the optimization problem for a nonlinear model does not have an exact analytical solution. Therefore, we have to use a numerical optimization algorithm such as: Gauss-Newton Steepest Descent Conjugated Gradients Any other optimization algorithm Adv. Eng. Stat., Jay Liu 70
72 Advanced topics non-linear regression When using an optimization algorithm to solve nonlinear regression problems, one needs to be able to specify: 1. an expectation function (i.e. the form of the model) 2. Data 3. starting guesses for a 4. stopping criteria 5. possibly other tuning parameters associated with the optimization algorithm Adv. Eng. Stat., Jay Liu 71
73 Advanced topics non-linear regression Problems with Numerical Optimization Failure to converge Finding only a local minimum and not the global minimum Requires good starting guesses for the parameters Can be sensitive to the choice of convergence criteria and other tuning parameters of the algorithm Sometimes requires specification of the derivatives of the model with respect the the parameters Adv. Eng. Stat., Jay Liu 72
Simple Linear Regression for the Climate Data
Prediction Prediction Interval Temperature 0.2 0.0 0.2 0.4 0.6 0.8 320 340 360 380 CO 2 Simple Linear Regression for the Climate Data What do we do with the data? y i = Temperature of i th Year x i =CO
More informationInferences for Regression
Inferences for Regression An Example: Body Fat and Waist Size Looking at the relationship between % body fat and waist size (in inches). Here is a scatterplot of our data set: Remembering Regression In
More informationStatistics for Engineers
Statistics for Engineers ChE 4C3 and 6C3 Kevin Dunn, 2014 kevin.dunn@mcmaster.ca http://learnche.mcmaster.ca/4c3 Overall revision number: 119 (January 2014) 1 Copyright, sharing, and attribution notice
More informationStatistics for Engineers. Kevin Dunn, Least Squares Modelling
Statistics for Engineers Kevin Dunn, 2016 kevin.dunn@mcmaster.ca http://learnche.mcmaster.ca/ Least Squares Modelling Copyright, sharing, and attribution notice This work is licensed under the Creative
More informationLecture 10 Multiple Linear Regression
Lecture 10 Multiple Linear Regression STAT 512 Spring 2011 Background Reading KNNL: 6.1-6.5 10-1 Topic Overview Multiple Linear Regression Model 10-2 Data for Multiple Regression Y i is the response variable
More informationBusiness Statistics. Lecture 9: Simple Regression
Business Statistics Lecture 9: Simple Regression 1 On to Model Building! Up to now, class was about descriptive and inferential statistics Numerical and graphical summaries of data Confidence intervals
More informationLecture 3: Inference in SLR
Lecture 3: Inference in SLR STAT 51 Spring 011 Background Reading KNNL:.1.6 3-1 Topic Overview This topic will cover: Review of hypothesis testing Inference about 1 Inference about 0 Confidence Intervals
More informationy ˆ i = ˆ " T u i ( i th fitted value or i th fit)
1 2 INFERENCE FOR MULTIPLE LINEAR REGRESSION Recall Terminology: p predictors x 1, x 2,, x p Some might be indicator variables for categorical variables) k-1 non-constant terms u 1, u 2,, u k-1 Each u
More informationChapter 4: Regression Models
Sales volume of company 1 Textbook: pp. 129-164 Chapter 4: Regression Models Money spent on advertising 2 Learning Objectives After completing this chapter, students will be able to: Identify variables,
More informationLecture 9 SLR in Matrix Form
Lecture 9 SLR in Matrix Form STAT 51 Spring 011 Background Reading KNNL: Chapter 5 9-1 Topic Overview Matrix Equations for SLR Don t focus so much on the matrix arithmetic as on the form of the equations.
More informationBasic Probability Reference Sheet
February 27, 2001 Basic Probability Reference Sheet 17.846, 2001 This is intended to be used in addition to, not as a substitute for, a textbook. X is a random variable. This means that X is a variable
More informationStatistical Modelling in Stata 5: Linear Models
Statistical Modelling in Stata 5: Linear Models Mark Lunt Arthritis Research UK Epidemiology Unit University of Manchester 07/11/2017 Structure This Week What is a linear model? How good is my model? Does
More informationVariance. Standard deviation VAR = = value. Unbiased SD = SD = 10/23/2011. Functional Connectivity Correlation and Regression.
10/3/011 Functional Connectivity Correlation and Regression Variance VAR = Standard deviation Standard deviation SD = Unbiased SD = 1 10/3/011 Standard error Confidence interval SE = CI = = t value for
More informationReview of Statistics
Review of Statistics Topics Descriptive Statistics Mean, Variance Probability Union event, joint event Random Variables Discrete and Continuous Distributions, Moments Two Random Variables Covariance and
More informationFinal Review. Yang Feng. Yang Feng (Columbia University) Final Review 1 / 58
Final Review Yang Feng http://www.stat.columbia.edu/~yangfeng Yang Feng (Columbia University) Final Review 1 / 58 Outline 1 Multiple Linear Regression (Estimation, Inference) 2 Special Topics for Multiple
More information* Tuesday 17 January :30-16:30 (2 hours) Recored on ESSE3 General introduction to the course.
Name of the course Statistical methods and data analysis Audience The course is intended for students of the first or second year of the Graduate School in Materials Engineering. The aim of the course
More informationRegression Models - Introduction
Regression Models - Introduction In regression models there are two types of variables that are studied: A dependent variable, Y, also called response variable. It is modeled as random. An independent
More informationChapter 27 Summary Inferences for Regression
Chapter 7 Summary Inferences for Regression What have we learned? We have now applied inference to regression models. Like in all inference situations, there are conditions that we must check. We can test
More informationCorrelation Analysis
Simple Regression Correlation Analysis Correlation analysis is used to measure strength of the association (linear relationship) between two variables Correlation is only concerned with strength of the
More informationECE521 lecture 4: 19 January Optimization, MLE, regularization
ECE521 lecture 4: 19 January 2017 Optimization, MLE, regularization First four lectures Lectures 1 and 2: Intro to ML Probability review Types of loss functions and algorithms Lecture 3: KNN Convexity
More informationLinear Models 1. Isfahan University of Technology Fall Semester, 2014
Linear Models 1 Isfahan University of Technology Fall Semester, 2014 References: [1] G. A. F., Seber and A. J. Lee (2003). Linear Regression Analysis (2nd ed.). Hoboken, NJ: Wiley. [2] A. C. Rencher and
More informationMath 423/533: The Main Theoretical Topics
Math 423/533: The Main Theoretical Topics Notation sample size n, data index i number of predictors, p (p = 2 for simple linear regression) y i : response for individual i x i = (x i1,..., x ip ) (1 p)
More informationappstats27.notebook April 06, 2017
Chapter 27 Objective Students will conduct inference on regression and analyze data to write a conclusion. Inferences for Regression An Example: Body Fat and Waist Size pg 634 Our chapter example revolves
More informationOPTIMIZATION OF FIRST ORDER MODELS
Chapter 2 OPTIMIZATION OF FIRST ORDER MODELS One should not multiply explanations and causes unless it is strictly necessary William of Bakersville in Umberto Eco s In the Name of the Rose 1 In Response
More informationBasic Business Statistics 6 th Edition
Basic Business Statistics 6 th Edition Chapter 12 Simple Linear Regression Learning Objectives In this chapter, you learn: How to use regression analysis to predict the value of a dependent variable based
More informationChapter 10. Regression. Understandable Statistics Ninth Edition By Brase and Brase Prepared by Yixun Shi Bloomsburg University of Pennsylvania
Chapter 10 Regression Understandable Statistics Ninth Edition By Brase and Brase Prepared by Yixun Shi Bloomsburg University of Pennsylvania Scatter Diagrams A graph in which pairs of points, (x, y), are
More informationFormal Statement of Simple Linear Regression Model
Formal Statement of Simple Linear Regression Model Y i = β 0 + β 1 X i + ɛ i Y i value of the response variable in the i th trial β 0 and β 1 are parameters X i is a known constant, the value of the predictor
More informationMachine Learning Linear Classification. Prof. Matteo Matteucci
Machine Learning Linear Classification Prof. Matteo Matteucci Recall from the first lecture 2 X R p Regression Y R Continuous Output X R p Y {Ω 0, Ω 1,, Ω K } Classification Discrete Output X R p Y (X)
More informationLecture 2 Simple Linear Regression STAT 512 Spring 2011 Background Reading KNNL: Chapter 1
Lecture Simple Linear Regression STAT 51 Spring 011 Background Reading KNNL: Chapter 1-1 Topic Overview This topic we will cover: Regression Terminology Simple Linear Regression with a single predictor
More informationStatistics for Managers using Microsoft Excel 6 th Edition
Statistics for Managers using Microsoft Excel 6 th Edition Chapter 13 Simple Linear Regression 13-1 Learning Objectives In this chapter, you learn: How to use regression analysis to predict the value of
More informationEstimating σ 2. We can do simple prediction of Y and estimation of the mean of Y at any value of X.
Estimating σ 2 We can do simple prediction of Y and estimation of the mean of Y at any value of X. To perform inferences about our regression line, we must estimate σ 2, the variance of the error term.
More informationMachine Learning Linear Regression. Prof. Matteo Matteucci
Machine Learning Linear Regression Prof. Matteo Matteucci Outline 2 o Simple Linear Regression Model Least Squares Fit Measures of Fit Inference in Regression o Multi Variate Regession Model Least Squares
More informationMultiple Linear Regression
Multiple Linear Regression Asymptotics Asymptotics Multiple Linear Regression: Assumptions Assumption MLR. (Linearity in parameters) Assumption MLR. (Random Sampling from the population) We have a random
More informationInference in Regression Analysis
Inference in Regression Analysis Dr. Frank Wood Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 4, Slide 1 Today: Normal Error Regression Model Y i = β 0 + β 1 X i + ǫ i Y i value
More informationReview of Econometrics
Review of Econometrics Zheng Tian June 5th, 2017 1 The Essence of the OLS Estimation Multiple regression model involves the models as follows Y i = β 0 + β 1 X 1i + β 2 X 2i + + β k X ki + u i, i = 1,...,
More informationstatistical sense, from the distributions of the xs. The model may now be generalized to the case of k regressors:
Wooldridge, Introductory Econometrics, d ed. Chapter 3: Multiple regression analysis: Estimation In multiple regression analysis, we extend the simple (two-variable) regression model to consider the possibility
More informationEstadística II Chapter 4: Simple linear regression
Estadística II Chapter 4: Simple linear regression Chapter 4. Simple linear regression Contents Objectives of the analysis. Model specification. Least Square Estimators (LSE): construction and properties
More informationCh 2: Simple Linear Regression
Ch 2: Simple Linear Regression 1. Simple Linear Regression Model A simple regression model with a single regressor x is y = β 0 + β 1 x + ɛ, where we assume that the error ɛ is independent random component
More informationBusiness Statistics. Lecture 10: Correlation and Linear Regression
Business Statistics Lecture 10: Correlation and Linear Regression Scatterplot A scatterplot shows the relationship between two quantitative variables measured on the same individuals. It displays the Form
More informationApplied Multivariate Statistical Modeling Prof. J. Maiti Department of Industrial Engineering and Management Indian Institute of Technology, Kharagpur
Applied Multivariate Statistical Modeling Prof. J. Maiti Department of Industrial Engineering and Management Indian Institute of Technology, Kharagpur Lecture - 29 Multivariate Linear Regression- Model
More informationThe regression model with one fixed regressor cont d
The regression model with one fixed regressor cont d 3150/4150 Lecture 4 Ragnar Nymoen 27 January 2012 The model with transformed variables Regression with transformed variables I References HGL Ch 2.8
More informationECE521 week 3: 23/26 January 2017
ECE521 week 3: 23/26 January 2017 Outline Probabilistic interpretation of linear regression - Maximum likelihood estimation (MLE) - Maximum a posteriori (MAP) estimation Bias-variance trade-off Linear
More informationSection 3: Simple Linear Regression
Section 3: Simple Linear Regression Carlos M. Carvalho The University of Texas at Austin McCombs School of Business http://faculty.mccombs.utexas.edu/carlos.carvalho/teaching/ 1 Regression: General Introduction
More informationChapter 1 Statistical Inference
Chapter 1 Statistical Inference causal inference To infer causality, you need a randomized experiment (or a huge observational study and lots of outside information). inference to populations Generalizations
More informationUnit 10: Simple Linear Regression and Correlation
Unit 10: Simple Linear Regression and Correlation Statistics 571: Statistical Methods Ramón V. León 6/28/2004 Unit 10 - Stat 571 - Ramón V. León 1 Introductory Remarks Regression analysis is a method for
More informationLinear Regression In God we trust, all others bring data. William Edwards Deming
Linear Regression ddebarr@uw.edu 2017-01-19 In God we trust, all others bring data. William Edwards Deming Course Outline 1. Introduction to Statistical Learning 2. Linear Regression 3. Classification
More informationSimple Linear Regression Estimation and Properties
Simple Linear Regression Estimation and Properties Outline Review of the Reading Estimate parameters using OLS Other features of OLS Numerical Properties of OLS Assumptions of OLS Goodness of Fit Checking
More informationLecture 10: F -Tests, ANOVA and R 2
Lecture 10: F -Tests, ANOVA and R 2 1 ANOVA We saw that we could test the null hypothesis that β 1 0 using the statistic ( β 1 0)/ŝe. (Although I also mentioned that confidence intervals are generally
More informationCurve Fitting. Objectives
Curve Fitting Objectives Understanding the difference between regression and interpolation. Knowing how to fit curve of discrete with least-squares regression. Knowing how to compute and understand the
More informationA6523 Modeling, Inference, and Mining Jim Cordes, Cornell University
A6523 Modeling, Inference, and Mining Jim Cordes, Cornell University Lecture 19 Modeling Topics plan: Modeling (linear/non- linear least squares) Bayesian inference Bayesian approaches to spectral esbmabon;
More informationProbability and Statistics Notes
Probability and Statistics Notes Chapter Seven Jesse Crawford Department of Mathematics Tarleton State University Spring 2011 (Tarleton State University) Chapter Seven Notes Spring 2011 1 / 42 Outline
More informationImportant note: Transcripts are not substitutes for textbook assignments. 1
In this lesson we will cover correlation and regression, two really common statistical analyses for quantitative (or continuous) data. Specially we will review how to organize the data, the importance
More informationNonlinear Regression Functions
Nonlinear Regression Functions (SW Chapter 8) Outline 1. Nonlinear regression functions general comments 2. Nonlinear functions of one variable 3. Nonlinear functions of two variables: interactions 4.
More informationp(z)
Chapter Statistics. Introduction This lecture is a quick review of basic statistical concepts; probabilities, mean, variance, covariance, correlation, linear regression, probability density functions and
More informationDr. Maddah ENMG 617 EM Statistics 11/28/12. Multiple Regression (3) (Chapter 15, Hines)
Dr. Maddah ENMG 617 EM Statistics 11/28/12 Multiple Regression (3) (Chapter 15, Hines) Problems in multiple regression: Multicollinearity This arises when the independent variables x 1, x 2,, x k, are
More information9. Linear Regression and Correlation
9. Linear Regression and Correlation Data: y a quantitative response variable x a quantitative explanatory variable (Chap. 8: Recall that both variables were categorical) For example, y = annual income,
More informationChapter 13. Multiple Regression and Model Building
Chapter 13 Multiple Regression and Model Building Multiple Regression Models The General Multiple Regression Model y x x x 0 1 1 2 2... k k y is the dependent variable x, x,..., x 1 2 k the model are the
More informationREVIEW 8/2/2017 陈芳华东师大英语系
REVIEW Hypothesis testing starts with a null hypothesis and a null distribution. We compare what we have to the null distribution, if the result is too extreme to belong to the null distribution (p
More informationSTAT 100C: Linear models
STAT 100C: Linear models Arash A. Amini June 9, 2018 1 / 56 Table of Contents Multiple linear regression Linear model setup Estimation of β Geometric interpretation Estimation of σ 2 Hat matrix Gram matrix
More informationStatistics for Engineers Lecture 9 Linear Regression
Statistics for Engineers Lecture 9 Linear Regression Chong Ma Department of Statistics University of South Carolina chongm@email.sc.edu April 17, 2017 Chong Ma (Statistics, USC) STAT 509 Spring 2017 April
More information[y i α βx i ] 2 (2) Q = i=1
Least squares fits This section has no probability in it. There are no random variables. We are given n points (x i, y i ) and want to find the equation of the line that best fits them. We take the equation
More informationLecture 3: Multiple Regression
Lecture 3: Multiple Regression R.G. Pierse 1 The General Linear Model Suppose that we have k explanatory variables Y i = β 1 + β X i + β 3 X 3i + + β k X ki + u i, i = 1,, n (1.1) or Y i = β j X ji + u
More informationThis model of the conditional expectation is linear in the parameters. A more practical and relaxed attitude towards linear regression is to say that
Linear Regression For (X, Y ) a pair of random variables with values in R p R we assume that E(Y X) = β 0 + with β R p+1. p X j β j = (1, X T )β j=1 This model of the conditional expectation is linear
More informationRegression with a Single Regressor: Hypothesis Tests and Confidence Intervals
Regression with a Single Regressor: Hypothesis Tests and Confidence Intervals (SW Chapter 5) Outline. The standard error of ˆ. Hypothesis tests concerning β 3. Confidence intervals for β 4. Regression
More informationSimple Linear Regression for the MPG Data
Simple Linear Regression for the MPG Data 2000 2500 3000 3500 15 20 25 30 35 40 45 Wgt MPG What do we do with the data? y i = MPG of i th car x i = Weight of i th car i =1,...,n n = Sample Size Exploratory
More informationMathematics for Economics MA course
Mathematics for Economics MA course Simple Linear Regression Dr. Seetha Bandara Simple Regression Simple linear regression is a statistical method that allows us to summarize and study relationships between
More informationRegression, Ridge Regression, Lasso
Regression, Ridge Regression, Lasso Fabio G. Cozman - fgcozman@usp.br October 2, 2018 A general definition Regression studies the relationship between a response variable Y and covariates X 1,..., X n.
More informationThe Simple Linear Regression Model
The Simple Linear Regression Model Lesson 3 Ryan Safner 1 1 Department of Economics Hood College ECON 480 - Econometrics Fall 2017 Ryan Safner (Hood College) ECON 480 - Lesson 3 Fall 2017 1 / 77 Bivariate
More informationCorrelation and Regression Theory 1) Multivariate Statistics
Correlation and Regression Theory 1) Multivariate Statistics What is a multivariate data set? How to statistically analyze this data set? Is there any kind of relationship between different variables in
More informationChapter 7 Linear Regression
Chapter 7 Linear Regression 1 7.1 Least Squares: The Line of Best Fit 2 The Linear Model Fat and Protein at Burger King The correlation is 0.76. This indicates a strong linear fit, but what line? The line
More informationLectures 5 & 6: Hypothesis Testing
Lectures 5 & 6: Hypothesis Testing in which you learn to apply the concept of statistical significance to OLS estimates, learn the concept of t values, how to use them in regression work and come across
More information405 ECONOMETRICS Chapter # 11: MULTICOLLINEARITY: WHAT HAPPENS IF THE REGRESSORS ARE CORRELATED? Domodar N. Gujarati
405 ECONOMETRICS Chapter # 11: MULTICOLLINEARITY: WHAT HAPPENS IF THE REGRESSORS ARE CORRELATED? Domodar N. Gujarati Prof. M. El-Sakka Dept of Economics Kuwait University In this chapter we take a critical
More informationMultiple Linear Regression
Multiple Linear Regression Simple linear regression tries to fit a simple line between two variables Y and X. If X is linearly related to Y this explains some of the variability in Y. In most cases, there
More informationMultivariate Regression
Multivariate Regression The so-called supervised learning problem is the following: we want to approximate the random variable Y with an appropriate function of the random variables X 1,..., X p with the
More informationMultiple Linear Regression
Andrew Lonardelli December 20, 2013 Multiple Linear Regression 1 Table Of Contents Introduction: p.3 Multiple Linear Regression Model: p.3 Least Squares Estimation of the Parameters: p.4-5 The matrix approach
More informationRegression analysis is a tool for building mathematical and statistical models that characterize relationships between variables Finds a linear
Regression analysis is a tool for building mathematical and statistical models that characterize relationships between variables Finds a linear relationship between: - one independent variable X and -
More informationNature vs. nurture? Lecture 18 - Regression: Inference, Outliers, and Intervals. Regression Output. Conditions for inference.
Understanding regression output from software Nature vs. nurture? Lecture 18 - Regression: Inference, Outliers, and Intervals In 1966 Cyril Burt published a paper called The genetic determination of differences
More informationGov 2000: 9. Regression with Two Independent Variables
Gov 2000: 9. Regression with Two Independent Variables Matthew Blackwell Fall 2016 1 / 62 1. Why Add Variables to a Regression? 2. Adding a Binary Covariate 3. Adding a Continuous Covariate 4. OLS Mechanics
More informationApplied Statistics and Econometrics
Applied Statistics and Econometrics Lecture 6 Saul Lach September 2017 Saul Lach () Applied Statistics and Econometrics September 2017 1 / 53 Outline of Lecture 6 1 Omitted variable bias (SW 6.1) 2 Multiple
More informationInference with Simple Regression
1 Introduction Inference with Simple Regression Alan B. Gelder 06E:071, The University of Iowa 1 Moving to infinite means: In this course we have seen one-mean problems, twomean problems, and problems
More informationBiostatistics and Design of Experiments Prof. Mukesh Doble Department of Biotechnology Indian Institute of Technology, Madras
Biostatistics and Design of Experiments Prof. Mukesh Doble Department of Biotechnology Indian Institute of Technology, Madras Lecture - 39 Regression Analysis Hello and welcome to the course on Biostatistics
More informationLecture 4: Heteroskedasticity
Lecture 4: Heteroskedasticity Econometric Methods Warsaw School of Economics (4) Heteroskedasticity 1 / 24 Outline 1 What is heteroskedasticity? 2 Testing for heteroskedasticity White Goldfeld-Quandt Breusch-Pagan
More informationSimple Linear Regression
Simple Linear Regression ST 430/514 Recall: A regression model describes how a dependent variable (or response) Y is affected, on average, by one or more independent variables (or factors, or covariates)
More informationEconometrics I KS. Module 2: Multivariate Linear Regression. Alexander Ahammer. This version: April 16, 2018
Econometrics I KS Module 2: Multivariate Linear Regression Alexander Ahammer Department of Economics Johannes Kepler University of Linz This version: April 16, 2018 Alexander Ahammer (JKU) Module 2: Multivariate
More informationCorrelation and the Analysis of Variance Approach to Simple Linear Regression
Correlation and the Analysis of Variance Approach to Simple Linear Regression Biometry 755 Spring 2009 Correlation and the Analysis of Variance Approach to Simple Linear Regression p. 1/35 Correlation
More informationreview session gov 2000 gov 2000 () review session 1 / 38
review session gov 2000 gov 2000 () review session 1 / 38 Overview Random Variables and Probability Univariate Statistics Bivariate Statistics Multivariate Statistics Causal Inference gov 2000 () review
More informationCorrelation and Linear Regression
Correlation and Linear Regression Correlation: Relationships between Variables So far, nearly all of our discussion of inferential statistics has focused on testing for differences between group means
More informationChapter 14. Linear least squares
Serik Sagitov, Chalmers and GU, March 5, 2018 Chapter 14 Linear least squares 1 Simple linear regression model A linear model for the random response Y = Y (x) to an independent variable X = x For a given
More informationEstadística II Chapter 5. Regression analysis (second part)
Estadística II Chapter 5. Regression analysis (second part) Chapter 5. Regression analysis (second part) Contents Diagnostic: Residual analysis The ANOVA (ANalysis Of VAriance) decomposition Nonlinear
More informationBiostatistics. Correlation and linear regression. Burkhardt Seifert & Alois Tschopp. Biostatistics Unit University of Zurich
Biostatistics Correlation and linear regression Burkhardt Seifert & Alois Tschopp Biostatistics Unit University of Zurich Master of Science in Medical Biology 1 Correlation and linear regression Analysis
More informationCorrelation and Regression
Correlation and Regression October 25, 2017 STAT 151 Class 9 Slide 1 Outline of Topics 1 Associations 2 Scatter plot 3 Correlation 4 Regression 5 Testing and estimation 6 Goodness-of-fit STAT 151 Class
More informationMa 3/103: Lecture 24 Linear Regression I: Estimation
Ma 3/103: Lecture 24 Linear Regression I: Estimation March 3, 2017 KC Border Linear Regression I March 3, 2017 1 / 32 Regression analysis Regression analysis Estimate and test E(Y X) = f (X). f is the
More informationREGRESSION ANALYSIS AND INDICATOR VARIABLES
REGRESSION ANALYSIS AND INDICATOR VARIABLES Thesis Submitted in partial fulfillment of the requirements for the award of degree of Masters of Science in Mathematics and Computing Submitted by Sweety Arora
More informationCSC2515 Winter 2015 Introduction to Machine Learning. Lecture 2: Linear regression
CSC2515 Winter 2015 Introduction to Machine Learning Lecture 2: Linear regression All lecture slides will be available as.pdf on the course website: http://www.cs.toronto.edu/~urtasun/courses/csc2515/csc2515_winter15.html
More informationEconometrics. 4) Statistical inference
30C00200 Econometrics 4) Statistical inference Timo Kuosmanen Professor, Ph.D. http://nomepre.net/index.php/timokuosmanen Today s topics Confidence intervals of parameter estimates Student s t-distribution
More informationdf=degrees of freedom = n - 1
One sample t-test test of the mean Assumptions: Independent, random samples Approximately normal distribution (from intro class: σ is unknown, need to calculate and use s (sample standard deviation)) Hypotheses:
More informationMultiplication of Polynomials
Summary 391 Chapter 5 SUMMARY Section 5.1 A polynomial in x is defined by a finite sum of terms of the form ax n, where a is a real number and n is a whole number. a is the coefficient of the term. n is
More informationChapter 12 - Lecture 2 Inferences about regression coefficient
Chapter 12 - Lecture 2 Inferences about regression coefficient April 19th, 2010 Facts about slope Test Statistic Confidence interval Hypothesis testing Test using ANOVA Table Facts about slope In previous
More informationSMA 6304 / MIT / MIT Manufacturing Systems. Lecture 10: Data and Regression Analysis. Lecturer: Prof. Duane S. Boning
SMA 6304 / MIT 2.853 / MIT 2.854 Manufacturing Systems Lecture 10: Data and Regression Analysis Lecturer: Prof. Duane S. Boning 1 Agenda 1. Comparison of Treatments (One Variable) Analysis of Variance
More informationINTRODUCTION TO BASIC LINEAR REGRESSION MODEL
INTRODUCTION TO BASIC LINEAR REGRESSION MODEL 13 September 2011 Yogyakarta, Indonesia Cosimo Beverelli (World Trade Organization) 1 LINEAR REGRESSION MODEL In general, regression models estimate the effect
More information