Unit 10: Simple Linear Regression and Correlation
|
|
- Alexis Hines
- 5 years ago
- Views:
Transcription
1 Unit 10: Simple Linear Regression and Correlation Statistics 571: Statistical Methods Ramón V. León 6/28/2004 Unit 10 - Stat Ramón V. León 1
2 Introductory Remarks Regression analysis is a method for studying the relationship between two or more numerical variables In regression analysis one of the variables is regarded as a response, outcome or dependent variable The other variables are regarded as predictor, explanatory or independent variables An empirical model approximately captures the main features of the relationship between the response variable and the key predictor variables Sometimes it is not clear which of two variable should be the response. In this case correlation analysis is used. In this unit we consider only two variables 6/28/2004 Unit 10 - Stat Ramón V. León 2
3 A Probabilistic Model for Simple Linear Regression 2 Note that the Yi are independent N( µ i = β0 + β1xi, σ ) r.v.'s Constant variability about the regression line Notice that the predictor variable is regarded as nonrandom because it is assumed to be set by the investigator 6/28/2004 Unit 10 - Stat Ramón V. León 3
4 A Probabilistic Model for Simple Linear Regression Let x1, x2,..., xn be specific settings of the predictor variable. Let y1, y2,..., yn be the corresponding values of the response variable. Assume that yi is the observed value of a random variable (r.v.) Yi, which depends on xi according to the following model: Y = β + β x + ε ( i = 1, 2,..., n). i 0 1 i i 2 Here εi is the random error with E( εi) = 0 and Var( εi) = σ Thus EY ( i) = µ i = β0 + β1xi (true regression line). We assume that εi are independent, identically distributed (i.i.d.) r.v.'s We also usually assume that the are normally distributed. ε i 6/28/2004 Unit 10 - Stat Ramón V. León 4
5 Reflections on the Method of Tire Data Collection: Experimental Design Mileage Depth Mileage in 1000 miles Depth in mils Was one tire or nine used in the experiment? (Read the book!) What effect would this answer have on the experimental error? on our ability to generalize to all tires of this brand? Was the data collected on one or several cars? Does it matter? Was the data collected in random order or in the order given? Does it matter? Is there a possible confounding problem? 6/28/2004 Unit 10 - Stat Ramón V. León 5
6 Example 10.1: Tread Wear vs. Mileage Using the Fit Y by X platform in the Analyze menu 6/28/2004 Unit 10 - Stat Ramón V. León 6
7 Scatter Plot: Tread Wear vs. Mileage 6/28/2004 Unit 10 - Stat Ramón V. León 7
8 Least Squares Line Fit y = ˆ β + ˆ 0 1 β x 6/28/2004 Unit 10 - Stat Ramón V. León 8
9 Illustration: Least Squares Line Fit The least squares line fit is the line that minimizes the sum of the squares of the lengths of the vertical segments 6/28/2004 Unit 10 - Stat Ramón V. León 9
10 i= 1 Mathematics of Least Squares Find the line, i.e., values of β and β that minimizes the sum of the squared deviations: How? [ ( β β )] Solve for values of β and β for which Q β n Q= y + x i Q = 0 and = 0 β i 6/28/2004 Unit 10 - Stat Ramón V. León 10
11 Predict Mean Grove Depth for a Given Mileage For x = 25 (25,000 miles) the predicted y: yˆ = ˆ β + ˆ β (25) = = mils. 0 1 Add a row to the table and leave the y value blank. 6/28/2004 Unit 10 - Stat Ramón V. León 11
12 Goodness of Fit of the LS Line Fitted values of y : yˆ = ˆ β + ˆ β x, i = 1,2,..., n. i i 0 1 i ( ˆ β ˆ ) 0 β1 Residuals: e = y yˆ = y + x = segment length i i i i i 6/28/2004 Unit 10 - Stat Ramón V. León 12
13 Sums of Squares Error Sum of Squares (SSE) = Total Sum of Squares (SST) = n n 2 ei = yi yi i= 1 i= 1 i= 1 Regression Sum of Squares (SSR) = n ( y y) i n i= 1 ( ˆ ) 2 ( yˆ y) n 2 n 2 n 2 ( y y) = ( yˆ y) + ( y yˆ ) i i i i i= 1 i= 1 i= 1 SST = SSR + SSE i 2 2 These definitions apply even if the model is more complex than linear in x, for example, if it is quadratic in x SST is interpreted as the total variation in y JMP calls SSR the Model Sums of Square 6/28/2004 Unit 10 - Stat Ramón V. León 13
14 y i yi y y i yˆi y ˆ i y y Illustration of Sums of Squares 6/28/2004 Unit 10 - Stat Ramón V. León 14
15 Coefficient of Determination SST = SSR + SSE 2 SSR SSE r = = 1 = proportion of the variation in SST SST that is accounted for by regression on x y SSR SST = 50, , /28/2004 Unit 10 - Stat Ramón V. León 15
16 s Estimation of σ 2 n n n 2 e i yi y i yi 0 1xi 2 i= 1 i= 1 i= 1 ( ) 2 ˆ ( ˆ β + ˆ β ) 2 = = = n 2 n 2 n 2 s 2 s This estimate has n-2 degrees of freedom because two unknown parameters are estimated. 6/28/2004 Unit 10 - Stat Ramón V. León 16
17 Statistical Inference for β 0 and β 1 2. Select ˆ β0 β ˆ 0 β1 β1 N(0,1) and N(0,1) SD( ˆ β ) SD( ˆ β ) 0 1 ˆ β β ˆ β β t and SE( ˆ β ) SE( ˆ β ) n 2 n (1- α)% CI : ˆ β ± t SE( ˆ β ) = 7.28 ± t = 1 n 2, α 2 1 7,0.025 [ ] 7.28 ± = 8.733, ˆ β ˆ 0 ± tn 2, α 2SE( β0) = ± t7, t 1. Right click on the Parameter Estimate area 6/28/2004 Unit 10 - Stat Ramón V. León 17
18 Test of Hypothesis for β 0 and β 1 H : β = 0 vs. H : β Reject H at α level if t = t ˆ β 0 ˆ β = > t SE( ˆ β ) SE( ˆ β ) n 2, α 2 Tire example: ˆ β 7.28 SE( ˆ β ) = = = t7,.025 = , Strong evidence that mileage affects tread depth. (Two-sided p-value) 6/28/2004 Unit 10 - Stat Ramón V. León 18
19 The Mean Square (MS) is the Sums of Square divided by its degrees of freedom, e.g. MSE = SSE/df = /7 = H Analysis of Variance : β = 0 vs. H : β MSR F = MSE = F = = ( ) = t 2 2 F and t relationship holds only when the F numerator has one degree of freedom 6/28/2004 Unit 10 - Stat Ramón V. León 19
20 Example: Prediction of a Future y or Its Mean For a fixed value of x*, are we trying to predict -the average value of y? -the value of a future observation of y? Do I want to predict the average selling price of all 4,000 square feet houses in my neighborhood. Or do I want to predict the particular future selling price of my 4,000 square feet house? Which prediction is subject to the most error? 6/28/2004 Unit 10 - Stat Ramón V. León 20
21 Prediction of a Future y or Its Mean: Prediction Interval or Confidence Interval For a fixed value of x*, are we trying to predict -the average value of y? -the value of a future observation of y? 6/28/2004 Unit 10 - Stat Ramón V. León 21
22 JMP: Prediction of the Mean of y Grove Depth (in mils) Mileage (in 1000 Miles 6/28/2004 Unit 10 - Stat Ramón V. León 22
23 JMP: Prediction of a Future Value of y Grove Depth (in mils) Mileage (in 1000 Miles 6/28/2004 Unit 10 - Stat Ramón V. León 23
24 Formulas for Confidence and Prediction Intervals xx n i= 1 ( ) 2 S = x x i Prediction Interval of Chapter 7: 6/28/2004 Unit 10 - Stat Ramón V. León 24
25 Confidence and Prediction Intervals with JMP Using the Fit Model Platform not the Fit Y by X platform 6/28/2004 Unit 10 - Stat Ramón V. León 25
26 CI for Mean for µ* 6/28/2004 Unit 10 - Stat Ramón V. León 26
27 Prediction Interval for Y* 6/28/2004 Unit 10 - Stat Ramón V. León 27
28 Prediction for the Mean of Y or a Future Observation of Y Point estimate prediction is the same in both cases: But the error bands are different Narrower for the mean of Y: [158.73, ] Wider for a future value of Y: [129.44, ] 6/28/2004 Unit 10 - Stat Ramón V. León 28
29 Calibration (Inverse Regression) 6/28/2004 Unit 10 - Stat Ramón V. León 29
30 Inverse Regression in JMP 5: Confidence Interval In Fit Model Platform unchecked 6/28/2004 Unit 10 - Stat Ramón V. León 30
31 Inverse Regression: Two Types of Confidence Intervals Calibration is usually used in measurement. Given that you got a certain measurement what is the true value of the measured quantity? Which inverse prediction interval makes more sense? 6/28/2004 Unit 10 - Stat Ramón V. León 31
32 Measurement Application Suppose you get a measurement of 6.50 on an object. What is a calibration interval for its true value? Data was supposedly obtained by measuring standards of known true measurement with a calibrated instrument How the data was really generated 6/28/2004 Unit 10 - Stat Ramón V. León 32
33 Residuals The residuals are the differences between the observed value of y and the predicted value of y yˆ = ˆ β + ˆ β (32) = = mils 0 1 6/28/2004 Unit 10 - Stat Ramón V. León 33
34 Regression Diagnostics The residuals e = y yˆ can be viewed as the "left over" i i i after the model is fitted. If the model is correct, then the e i can be viewed as the "estimates" of the random errors ε ' s. As such, the residuals are vital for checking the model assumptions. i Residual plots If the model is correct the residuals should have no structure, that is, they should look random on their plots. 6/28/2004 Unit 10 - Stat Ramón V. León 34
35 Mathematics of Residuals If the assumed regression model is correct, then the normally distributed with Ee ( ) = 0 and ( xi x) 2 Var( ei ) = σ 1 σ if n is large. n Sxx The e ' s are not independent (even though the ε 's are i n i i i i=1 i=1 i n i e ' s are independent) because they satisfy the following contraints: e = 0, xe = 0. However, the dependence introduced by these constraints is neglible in n is modestly large, say greater than 20 or so. i xx n i= 1 ( ) 2 S = x x i 6/28/2004 Unit 10 - Stat Ramón V. León 35
36 Basic Principle Underlying Residual Plots If the assumed model is correct then the residuals should be randomly scattered around 0 and should show no obvious systematic pattern. 6/28/2004 Unit 10 - Stat Ramón V. León 36
37 In Fit Y by X platform: In Fit Model platform: Red triangle menu next to Linear Fit. Residuals in JMP 6/28/2004 Unit 10 - Stat Ramón V. León 37
38 Checking for Linearity Should EY ( ) = β + β x+ β x be fitted rather than EY ( ) = β + β x? Grove Depth (in mils) Residuals Grove Depth (in mils) Mileage (in 1000 Miles Mileage (in 1000 Miles 6/28/2004 Unit 10 - Stat Ramón V. León 38
39 Predicted by Residual Plot Residuals Grove Depth (in mils) Predicted Grove Depth (in mils) Mileage (in 1000 Miles 6/28/2004 Unit 10 - Stat Ramón V. León 39 Residuals Grove Depth (in mils) What are the advantages of this plot?
40 Tread Wear Quadratic Model 6/28/2004 Unit 10 - Stat Ramón V. León 40
41 Residuals for Quadratic Model Residuals have a random pattern The Fit Y by X platform was used to get these plots Left plot can also be obtained using the Plot Residuals option 6/28/2004 Unit 10 - Stat Ramón V. León 41
42 Tread Wear Quadratic Model R 2 was for model linear in x 6/28/2004 Unit 10 - Stat Ramón V. León 42
43 Checking for Constant Variance: Plot of Predicted vs. Residual If assumption is incorrect, often Var( Y ) is some function of EY ( ) = µ 6/28/2004 Unit 10 - Stat Ramón V. León 43
44 Other Model Diagnostics Based on Residuals Checking for normality of errors: Do a normal plot of the residuals Warning: Don t plot the response y on a normal plot This plot has no meaning when one has regressors Don t transform the data on the basis of this plot Many students make this mistake in their project. Don t be one of them. 6/28/2004 Unit 10 - Stat Ramón V. León 44
45 Other Model Diagnostics Based on Residuals Fit Model Platform: Checking for independence of errors: Plot the residual in time order. The order of data collection should be always recorded. Use Durbin-Watson statistic to test for autocorrelation. 6/28/2004 Unit 10 - Stat Ramón V. León 45
46 Other Model Diagnostics Based on Residuals Checking for outliers: See if any standardized (Studentized) residual exceeds 2 (standard deviations) in absolute value. * ei ei ei ei = =, i= 1, 2,..., n. SE( e ) 2 i 1 ( xi x) s s 1 n S xx Also do a box plot of residuals to check for outliers 6/28/2004 Unit 10 - Stat Ramón V. León 46
47 Standardized (Studentized) Residuals in JMP 4 Using the Fit Model JMP platform with linear model Should we omit the first observation and refit the model? Possible outlier The model fitted here is the one linear in x, not the quadratic model 6/28/2004 Unit 10 - Stat Ramón V. León 47
48 Checking for Influential Observations An observation can be influential because it has an extreme x-value, an extreme y-value, or both Note that the influential observation above are not outliers since their residuals are quite small (even zero) 6/28/2004 Unit 10 - Stat Ramón V. León 48
49 Checking for Influential Observations n yˆ = h y where the h are some functions of the x's i ij j ij j= 1 H = h ij is called the hat matrix We can think of the h as the leverage exerted by y on the fitted value yˆ. i ij If yˆ i is largely determined by yi with very small contribution from the other y 's, then we say that the ith observations is influential (also called high leverage). j j 6/28/2004 Unit 10 - Stat Ramón V. León 49
50 e * i Checking for Influential Observations Since h detemines how much ˆ ii yi depends on yi it is used as a measure of the leverage of the i-thobservation. n Now h = k+ 1 or the average of h is ( k+ 1) / n, i= 1 ii where k is the number of predictor variables variables. Rule of thumb: Regard any hii > 2( k+ 1) / n as high leverage. In simple regression k = 1 and so h > 4 / n is regarded as high leverage. ei ei = = SE( e ) s 1 h i ii ii 6/28/2004 Unit 10 - Stat Ramón V. León 50 ii Standardized (Studentized) residuals will be large relative to the residual for high leverage observations.
51 Graph of Anscombe Data Observation in row 8 6/28/2004 Unit 10 - Stat Ramón V. León 51
52 Checking for Influential Observations Fit Model Platform: Anscombe Data Illustration: > 4 / n= 4 /11 = 0.36 Notice: Observation No. 8 is not an outlier since e8 = y ˆ 8 y8 = 0 6/28/2004 Unit 10 - Stat Ramón V. León 52 h ii Thus observation 8 is influential
53 How to Deal with Outliers and Influential Observations They can give a misleading picture of the relationship between y and x Are they erroneous observations? If yes, they should be discarded. If the observation are valid then they should be included in the analysis Least Squares is very sensitive to these observations. Do analysis with and without outlier or influential observation. Or use robust method: Minimize the sum of the absolute deviations n i= 1 y i ( β + β x ) 0 1 These estimates are analogous to the sample median. LS estimates are analogous to the sample mean. 6/28/2004 Unit 10 - Stat Ramón V. León 53 i
54 Data Transformations If the functional relationship between x and y is known it may be possible to find a linearizing transformation analytically: β For y = αx take log on both sides: log y = logα + βlog x β x For y = αe take log on both sides: log y = logα + βx If we are fitting an empirical model, then a linearizing transformation can be found by trial and error using the scatter plot as a guide. 6/28/2004 Unit 10 - Stat Ramón V. León 54
55 Scatter Plots and Linearizing Transformations 6/28/2004 Unit 10 - Stat Ramón V. León 55
56 Tire Tread Wear vs. Mileage: Exponential Model y = αe β x log y = logα β x 6/28/2004 Unit 10 - Stat Ramón V. León 56
57 Tire Tread Wear vs. Mileage: Exponential Model WEAR = e e = e MILES.0298MILES 6/28/2004 Unit 10 - Stat Ramón V. León 57
58 Tire Tread Wear vs. Mileage: Exponential Model For linear model: R 2 =.953 Residual plot is still curved mainly because observation 1 is an outlier 6/28/2004 Unit 10 - Stat Ramón V. León 58
59 Weights and Gas Mileages of Model Year Cars 6/28/2004 Unit 10 - Stat Ramón V. León 59
60 Weights and Gas Mileages of Model Year Cars Using Fit Y by X platform 100 Use transformation y. The new variable has units y of gallons per 100 miles. This transformation has the advantage of being physically interpretable. A spline fits the data locally. Our prior curve fits have been global. 6/28/2004 Unit 10 - Stat Ramón V. León 60
61 Weights and Gas Mileages of Model Year Cars 6/28/2004 Unit 10 - Stat Ramón V. León 61
62 Variance Stabilizing Transformation α Suppose σ µ We want to find a transformation of y that yields a constant variance. Suppose the transformation is a power of the original data, say y* = Y y λ For example: Take the square root of Poisson count data since for data so distributed the mean and the variance are the same 6/28/2004 Unit 10 - Stat Ramón V. León 62
63 Box-Cox Transformation: Empirical Determination of Best Transformation Fit Model Platform: y λ λ 1 ( y 1 ) y 0 λ λ λ = yln y λ = 0 ( ) where y = n n yi = i= 1 geometric mean 6/28/2004 Unit 10 - Stat Ramón V. León 63
64 Transformation in Tire Data A better fit than than the polynomial and with one less parameter (3 versus 4) Transform Mileage (in 1000 Miles 6/28/2004 Unit 10 - Stat Ramón V. León 64
65 Weights and Gas Mileages of Model Year Cars: Box-Cox Transformation Reciprocal transformation among suggested transformations 6/28/2004 Unit 10 - Stat Ramón V. León 65
66 Correlation Analysis When it is not clear which is the predictor variable and which is the response variable When both variables are random Try the bivariate normal distribution as a probability model for the joint distribution of two r.v. s Parameters: µ X, µ Y, σ X, σy and Cov (X, Y) ρ = Corr(X, Y) = Var(X)Var(Y) The correlation ρ is a measure of association between the two random variables X and Y. Zero correlation corresponds to no association. Correlation of 1 or -1 represents perfect association. 6/28/2004 Unit 10 - Stat Ramón V. León 66
67 Bivariate Normal Density Function 6/28/2004 Unit 10 - Stat Ramón V. León 67
68 Statistical Inference on the Correlation Coefficient R H = n ( X X)( Y Y) i i i= 1 n n 2 2 ( X i X) ( Yi Y) i= 1 i= 1 : ρ = 0 vs H : ρ Test statistic Approximate test statistic available to Test nonzero null hypotheses Obtain confidence intervals for ρ T = R n-2 1 R 6/28/2004 Unit 10 - Stat Ramón V. León 68 2 t n 2 Equivalent to testing H : β = 0 vs. H : β See textbook
69 Exercise /28/2004 Unit 10 - Stat Ramón V. León 69
70 Exercise /28/2004 Unit 10 - Stat Ramón V. León 70
71 Exercise By default, a 95% bivariate normal density ellipse is imposed on each scatterplot. If the variables are bivariate normally distributed, this ellipse encloses approximately 95% of the points. The correlation of the variables is seen by the collapsing of the ellipse along the diagonal axis. If the ellipse is fairly round and is not diagonally oriented, the variables are uncorrelated 6/28/2004 Unit 10 - Stat Ramón V. León 71
Unit 11: Multiple Linear Regression
Unit 11: Multiple Linear Regression Statistics 571: Statistical Methods Ramón V. León 7/13/2004 Unit 11 - Stat 571 - Ramón V. León 1 Main Application of Multiple Regression Isolating the effect of a variable
More informationInference for Regression Inference about the Regression Model and Using the Regression Line
Inference for Regression Inference about the Regression Model and Using the Regression Line PBS Chapter 10.1 and 10.2 2009 W.H. Freeman and Company Objectives (PBS Chapter 10.1 and 10.2) Inference about
More informationLecture 10 Multiple Linear Regression
Lecture 10 Multiple Linear Regression STAT 512 Spring 2011 Background Reading KNNL: 6.1-6.5 10-1 Topic Overview Multiple Linear Regression Model 10-2 Data for Multiple Regression Y i is the response variable
More informationEstimating σ 2. We can do simple prediction of Y and estimation of the mean of Y at any value of X.
Estimating σ 2 We can do simple prediction of Y and estimation of the mean of Y at any value of X. To perform inferences about our regression line, we must estimate σ 2, the variance of the error term.
More informationStatistics for Managers using Microsoft Excel 6 th Edition
Statistics for Managers using Microsoft Excel 6 th Edition Chapter 13 Simple Linear Regression 13-1 Learning Objectives In this chapter, you learn: How to use regression analysis to predict the value of
More informationChapter 16. Simple Linear Regression and dcorrelation
Chapter 16 Simple Linear Regression and dcorrelation 16.1 Regression Analysis Our problem objective is to analyze the relationship between interval variables; regression analysis is the first tool we will
More informationInference for Regression Simple Linear Regression
Inference for Regression Simple Linear Regression IPS Chapter 10.1 2009 W.H. Freeman and Company Objectives (IPS Chapter 10.1) Simple linear regression p Statistical model for linear regression p Estimating
More informationInferences for Regression
Inferences for Regression An Example: Body Fat and Waist Size Looking at the relationship between % body fat and waist size (in inches). Here is a scatterplot of our data set: Remembering Regression In
More informationCorrelation Analysis
Simple Regression Correlation Analysis Correlation analysis is used to measure strength of the association (linear relationship) between two variables Correlation is only concerned with strength of the
More informationChapter 16. Simple Linear Regression and Correlation
Chapter 16 Simple Linear Regression and Correlation 16.1 Regression Analysis Our problem objective is to analyze the relationship between interval variables; regression analysis is the first tool we will
More informationLINEAR REGRESSION ANALYSIS. MODULE XVI Lecture Exercises
LINEAR REGRESSION ANALYSIS MODULE XVI Lecture - 44 Exercises Dr. Shalabh Department of Mathematics and Statistics Indian Institute of Technology Kanpur Exercise 1 The following data has been obtained on
More informationSTAT Chapter 11: Regression
STAT 515 -- Chapter 11: Regression Mostly we have studied the behavior of a single random variable. Often, however, we gather data on two random variables. We wish to determine: Is there a relationship
More informationLectures on Simple Linear Regression Stat 431, Summer 2012
Lectures on Simple Linear Regression Stat 43, Summer 0 Hyunseung Kang July 6-8, 0 Last Updated: July 8, 0 :59PM Introduction Previously, we have been investigating various properties of the population
More informationBasic Business Statistics 6 th Edition
Basic Business Statistics 6 th Edition Chapter 12 Simple Linear Regression Learning Objectives In this chapter, you learn: How to use regression analysis to predict the value of a dependent variable based
More informationCHAPTER EIGHT Linear Regression
7 CHAPTER EIGHT Linear Regression 8. Scatter Diagram Example 8. A chemical engineer is investigating the effect of process operating temperature ( x ) on product yield ( y ). The study results in the following
More informationMultiple Linear Regression
Multiple Linear Regression University of California, San Diego Instructor: Ery Arias-Castro http://math.ucsd.edu/~eariasca/teaching.html 1 / 42 Passenger car mileage Consider the carmpg dataset taken from
More informationAny of 27 linear and nonlinear models may be fit. The output parallels that of the Simple Regression procedure.
STATGRAPHICS Rev. 9/13/213 Calibration Models Summary... 1 Data Input... 3 Analysis Summary... 5 Analysis Options... 7 Plot of Fitted Model... 9 Predicted Values... 1 Confidence Intervals... 11 Observed
More informationSTAT420 Midterm Exam. University of Illinois Urbana-Champaign October 19 (Friday), :00 4:15p. SOLUTIONS (Yellow)
STAT40 Midterm Exam University of Illinois Urbana-Champaign October 19 (Friday), 018 3:00 4:15p SOLUTIONS (Yellow) Question 1 (15 points) (10 points) 3 (50 points) extra ( points) Total (77 points) Points
More informationLinear Regression. Simple linear regression model determines the relationship between one dependent variable (y) and one independent variable (x).
Linear Regression Simple linear regression model determines the relationship between one dependent variable (y) and one independent variable (x). A dependent variable is a random variable whose variation
More informationChapter 4: Regression Models
Sales volume of company 1 Textbook: pp. 129-164 Chapter 4: Regression Models Money spent on advertising 2 Learning Objectives After completing this chapter, students will be able to: Identify variables,
More informationFormal Statement of Simple Linear Regression Model
Formal Statement of Simple Linear Regression Model Y i = β 0 + β 1 X i + ɛ i Y i value of the response variable in the i th trial β 0 and β 1 are parameters X i is a known constant, the value of the predictor
More informationUnit 12: Analysis of Single Factor Experiments
Unit 12: Analysis of Single Factor Experiments Statistics 571: Statistical Methods Ramón V. León 7/16/2004 Unit 12 - Stat 571 - Ramón V. León 1 Introduction Chapter 8: How to compare two treatments. Chapter
More informationCh. 1: Data and Distributions
Ch. 1: Data and Distributions Populations vs. Samples How to graphically display data Histograms, dot plots, stem plots, etc Helps to show how samples are distributed Distributions of both continuous and
More informationChapter 4. Regression Models. Learning Objectives
Chapter 4 Regression Models To accompany Quantitative Analysis for Management, Eleventh Edition, by Render, Stair, and Hanna Power Point slides created by Brian Peterson Learning Objectives After completing
More information5.1 Model Specification and Data 5.2 Estimating the Parameters of the Multiple Regression Model 5.3 Sampling Properties of the Least Squares
5.1 Model Specification and Data 5. Estimating the Parameters of the Multiple Regression Model 5.3 Sampling Properties of the Least Squares Estimator 5.4 Interval Estimation 5.5 Hypothesis Testing for
More informationTHE ROYAL STATISTICAL SOCIETY 2008 EXAMINATIONS SOLUTIONS HIGHER CERTIFICATE (MODULAR FORMAT) MODULE 4 LINEAR MODELS
THE ROYAL STATISTICAL SOCIETY 008 EXAMINATIONS SOLUTIONS HIGHER CERTIFICATE (MODULAR FORMAT) MODULE 4 LINEAR MODELS The Society provides these solutions to assist candidates preparing for the examinations
More informationSimple Linear Regression
Simple Linear Regression In simple linear regression we are concerned about the relationship between two variables, X and Y. There are two components to such a relationship. 1. The strength of the relationship.
More informationKeller: Stats for Mgmt & Econ, 7th Ed July 17, 2006
Chapter 17 Simple Linear Regression and Correlation 17.1 Regression Analysis Our problem objective is to analyze the relationship between interval variables; regression analysis is the first tool we will
More informationSampling Distributions in Regression. Mini-Review: Inference for a Mean. For data (x 1, y 1 ),, (x n, y n ) generated with the SRM,
Department of Statistics The Wharton School University of Pennsylvania Statistics 61 Fall 3 Module 3 Inference about the SRM Mini-Review: Inference for a Mean An ideal setup for inference about a mean
More informationCh 2: Simple Linear Regression
Ch 2: Simple Linear Regression 1. Simple Linear Regression Model A simple regression model with a single regressor x is y = β 0 + β 1 x + ɛ, where we assume that the error ɛ is independent random component
More information401 Review. 6. Power analysis for one/two-sample hypothesis tests and for correlation analysis.
401 Review Major topics of the course 1. Univariate analysis 2. Bivariate analysis 3. Simple linear regression 4. Linear algebra 5. Multiple regression analysis Major analysis methods 1. Graphical analysis
More informationLecture 9: Linear Regression
Lecture 9: Linear Regression Goals Develop basic concepts of linear regression from a probabilistic framework Estimating parameters and hypothesis testing with linear models Linear regression in R Regression
More informationMultiple linear regression
Multiple linear regression Course MF 930: Introduction to statistics June 0 Tron Anders Moger Department of biostatistics, IMB University of Oslo Aims for this lecture: Continue where we left off. Repeat
More informationTable of z values and probabilities for the standard normal distribution. z is the first column plus the top row. Each cell shows P(X z).
Table of z values and probabilities for the standard normal distribution. z is the first column plus the top row. Each cell shows P(X z). For example P(X.04) =.8508. For z < 0 subtract the value from,
More informationConfidence Intervals, Testing and ANOVA Summary
Confidence Intervals, Testing and ANOVA Summary 1 One Sample Tests 1.1 One Sample z test: Mean (σ known) Let X 1,, X n a r.s. from N(µ, σ) or n > 30. Let The test statistic is H 0 : µ = µ 0. z = x µ 0
More informationChapter 14 Simple Linear Regression (A)
Chapter 14 Simple Linear Regression (A) 1. Characteristics Managerial decisions often are based on the relationship between two or more variables. can be used to develop an equation showing how the variables
More informationLinear models and their mathematical foundations: Simple linear regression
Linear models and their mathematical foundations: Simple linear regression Steffen Unkel Department of Medical Statistics University Medical Center Göttingen, Germany Winter term 2018/19 1/21 Introduction
More informationRegression Models - Introduction
Regression Models - Introduction In regression models there are two types of variables that are studied: A dependent variable, Y, also called response variable. It is modeled as random. An independent
More informationChapter 8 Heteroskedasticity
Chapter 8 Walter R. Paczkowski Rutgers University Page 1 Chapter Contents 8.1 The Nature of 8. Detecting 8.3 -Consistent Standard Errors 8.4 Generalized Least Squares: Known Form of Variance 8.5 Generalized
More informationRegression used to predict or estimate the value of one variable corresponding to a given value of another variable.
CHAPTER 9 Simple Linear Regression and Correlation Regression used to predict or estimate the value of one variable corresponding to a given value of another variable. X = independent variable. Y = dependent
More informationProject Report for STAT571 Statistical Methods Instructor: Dr. Ramon V. Leon. Wage Data Analysis. Yuanlei Zhang
Project Report for STAT7 Statistical Methods Instructor: Dr. Ramon V. Leon Wage Data Analysis Yuanlei Zhang 77--7 November, Part : Introduction Data Set The data set contains a random sample of observations
More informationPolynomial Regression
Polynomial Regression Summary... 1 Analysis Summary... 3 Plot of Fitted Model... 4 Analysis Options... 6 Conditional Sums of Squares... 7 Lack-of-Fit Test... 7 Observed versus Predicted... 8 Residual Plots...
More informationChapter 13. Multiple Regression and Model Building
Chapter 13 Multiple Regression and Model Building Multiple Regression Models The General Multiple Regression Model y x x x 0 1 1 2 2... k k y is the dependent variable x, x,..., x 1 2 k the model are the
More informationRegression Models. Chapter 4
Chapter 4 Regression Models To accompany Quantitative Analysis for Management, Eleventh Edition, by Render, Stair, and Hanna Power Point slides created by Brian Peterson Introduction Regression analysis
More informationLecture 9 SLR in Matrix Form
Lecture 9 SLR in Matrix Form STAT 51 Spring 011 Background Reading KNNL: Chapter 5 9-1 Topic Overview Matrix Equations for SLR Don t focus so much on the matrix arithmetic as on the form of the equations.
More informationSection 3: Simple Linear Regression
Section 3: Simple Linear Regression Carlos M. Carvalho The University of Texas at Austin McCombs School of Business http://faculty.mccombs.utexas.edu/carlos.carvalho/teaching/ 1 Regression: General Introduction
More informationBusiness Statistics. Lecture 9: Simple Regression
Business Statistics Lecture 9: Simple Regression 1 On to Model Building! Up to now, class was about descriptive and inferential statistics Numerical and graphical summaries of data Confidence intervals
More informationChapte The McGraw-Hill Companies, Inc. All rights reserved.
12er12 Chapte Bivariate i Regression (Part 1) Bivariate Regression Visual Displays Begin the analysis of bivariate data (i.e., two variables) with a scatter plot. A scatter plot - displays each observed
More informationy response variable x 1, x 2,, x k -- a set of explanatory variables
11. Multiple Regression and Correlation y response variable x 1, x 2,, x k -- a set of explanatory variables In this chapter, all variables are assumed to be quantitative. Chapters 12-14 show how to incorporate
More informationLAB 3 INSTRUCTIONS SIMPLE LINEAR REGRESSION
LAB 3 INSTRUCTIONS SIMPLE LINEAR REGRESSION In this lab you will first learn how to display the relationship between two quantitative variables with a scatterplot and also how to measure the strength of
More informationBusiness Statistics. Lecture 10: Course Review
Business Statistics Lecture 10: Course Review 1 Descriptive Statistics for Continuous Data Numerical Summaries Location: mean, median Spread or variability: variance, standard deviation, range, percentiles,
More informationChapter 1 Statistical Inference
Chapter 1 Statistical Inference causal inference To infer causality, you need a randomized experiment (or a huge observational study and lots of outside information). inference to populations Generalizations
More informationMathematics for Economics MA course
Mathematics for Economics MA course Simple Linear Regression Dr. Seetha Bandara Simple Regression Simple linear regression is a statistical method that allows us to summarize and study relationships between
More information9. Linear Regression and Correlation
9. Linear Regression and Correlation Data: y a quantitative response variable x a quantitative explanatory variable (Chap. 8: Recall that both variables were categorical) For example, y = annual income,
More informationStart with review, some new definitions, and pictures on the white board. Assumptions in the Normal Linear Regression Model
Start with review, some new definitions, and pictures on the white board. Assumptions in the Normal Linear Regression Model A1: There is a linear relationship between X and Y. A2: The error terms (and
More informationMultiple Regression. Inference for Multiple Regression and A Case Study. IPS Chapters 11.1 and W.H. Freeman and Company
Multiple Regression Inference for Multiple Regression and A Case Study IPS Chapters 11.1 and 11.2 2009 W.H. Freeman and Company Objectives (IPS Chapters 11.1 and 11.2) Multiple regression Data for multiple
More informationMultiple Linear Regression
Multiple Linear Regression Simple linear regression tries to fit a simple line between two variables Y and X. If X is linearly related to Y this explains some of the variability in Y. In most cases, there
More informationInference for Regression
Inference for Regression Section 9.4 Cathy Poliak, Ph.D. cathy@math.uh.edu Office in Fleming 11c Department of Mathematics University of Houston Lecture 13b - 3339 Cathy Poliak, Ph.D. cathy@math.uh.edu
More informationRegression Models. Chapter 4. Introduction. Introduction. Introduction
Chapter 4 Regression Models Quantitative Analysis for Management, Tenth Edition, by Render, Stair, and Hanna 008 Prentice-Hall, Inc. Introduction Regression analysis is a very valuable tool for a manager
More informationLecture 2 Linear Regression: A Model for the Mean. Sharyn O Halloran
Lecture 2 Linear Regression: A Model for the Mean Sharyn O Halloran Closer Look at: Linear Regression Model Least squares procedure Inferential tools Confidence and Prediction Intervals Assumptions Robustness
More informationLecture 14 Simple Linear Regression
Lecture 4 Simple Linear Regression Ordinary Least Squares (OLS) Consider the following simple linear regression model where, for each unit i, Y i is the dependent variable (response). X i is the independent
More informationMulticollinearity occurs when two or more predictors in the model are correlated and provide redundant information about the response.
Multicollinearity Read Section 7.5 in textbook. Multicollinearity occurs when two or more predictors in the model are correlated and provide redundant information about the response. Example of multicollinear
More informationSTAT5044: Regression and Anova
STAT5044: Regression and Anova Inyoung Kim 1 / 49 Outline 1 How to check assumptions 2 / 49 Assumption Linearity: scatter plot, residual plot Randomness: Run test, Durbin-Watson test when the data can
More informationUnit 14: Nonparametric Statistical Methods
Unit 14: Nonparametric Statistical Methods Statistics 571: Statistical Methods Ramón V. León 8/8/2003 Unit 14 - Stat 571 - Ramón V. León 1 Introductory Remarks Most methods studied so far have been based
More informationCorrelation and regression
1 Correlation and regression Yongjua Laosiritaworn Introductory on Field Epidemiology 6 July 2015, Thailand Data 2 Illustrative data (Doll, 1955) 3 Scatter plot 4 Doll, 1955 5 6 Correlation coefficient,
More informationBusiness Statistics. Lecture 10: Correlation and Linear Regression
Business Statistics Lecture 10: Correlation and Linear Regression Scatterplot A scatterplot shows the relationship between two quantitative variables measured on the same individuals. It displays the Form
More informationLecture 3: Inference in SLR
Lecture 3: Inference in SLR STAT 51 Spring 011 Background Reading KNNL:.1.6 3-1 Topic Overview This topic will cover: Review of hypothesis testing Inference about 1 Inference about 0 Confidence Intervals
More informationChapter 14 Student Lecture Notes Department of Quantitative Methods & Information Systems. Business Statistics. Chapter 14 Multiple Regression
Chapter 14 Student Lecture Notes 14-1 Department of Quantitative Methods & Information Systems Business Statistics Chapter 14 Multiple Regression QMIS 0 Dr. Mohammad Zainal Chapter Goals After completing
More informationChapter 14. Linear least squares
Serik Sagitov, Chalmers and GU, March 5, 2018 Chapter 14 Linear least squares 1 Simple linear regression model A linear model for the random response Y = Y (x) to an independent variable X = x For a given
More informationHomoskedasticity. Var (u X) = σ 2. (23)
Homoskedasticity How big is the difference between the OLS estimator and the true parameter? To answer this question, we make an additional assumption called homoskedasticity: Var (u X) = σ 2. (23) This
More informationMath 3330: Solution to midterm Exam
Math 3330: Solution to midterm Exam Question 1: (14 marks) Suppose the regression model is y i = β 0 + β 1 x i + ε i, i = 1,, n, where ε i are iid Normal distribution N(0, σ 2 ). a. (2 marks) Compute the
More informationMeasuring the fit of the model - SSR
Measuring the fit of the model - SSR Once we ve determined our estimated regression line, we d like to know how well the model fits. How far/close are the observations to the fitted line? One way to do
More informationDiagnostics and Remedial Measures
Diagnostics and Remedial Measures Yang Feng http://www.stat.columbia.edu/~yangfeng Yang Feng (Columbia University) Diagnostics and Remedial Measures 1 / 72 Remedial Measures How do we know that the regression
More informationChapter 14 Student Lecture Notes 14-1
Chapter 14 Student Lecture Notes 14-1 Business Statistics: A Decision-Making Approach 6 th Edition Chapter 14 Multiple Regression Analysis and Model Building Chap 14-1 Chapter Goals After completing this
More informationConfidence Interval for the mean response
Week 3: Prediction and Confidence Intervals at specified x. Testing lack of fit with replicates at some x's. Inference for the correlation. Introduction to regression with several explanatory variables.
More informationThe Model Building Process Part I: Checking Model Assumptions Best Practice
The Model Building Process Part I: Checking Model Assumptions Best Practice Authored by: Sarah Burke, PhD 31 July 2017 The goal of the STAT T&E COE is to assist in developing rigorous, defensible test
More informationLecture notes on Regression & SAS example demonstration
Regression & Correlation (p. 215) When two variables are measured on a single experimental unit, the resulting data are called bivariate data. You can describe each variable individually, and you can also
More informationAssumptions in Regression Modeling
Fall Semester, 2001 Statistics 621 Lecture 2 Robert Stine 1 Assumptions in Regression Modeling Preliminaries Preparing for class Read the casebook prior to class Pace in class is too fast to absorb without
More informationUnit 10: Simple Linear Regression and Correlation
Unt 10: Smple Lnear Regresson and Correlaton Statstcs 571: Statstcal Methods Ramón V. León 6/28/2004 Unt 10 - Stat 571 - Ramón V. León 1 Introductory Remarks Regresson analyss s a method for studyng the
More informationChapter 7 Student Lecture Notes 7-1
Chapter 7 Student Lecture Notes 7- Chapter Goals QM353: Business Statistics Chapter 7 Multiple Regression Analysis and Model Building After completing this chapter, you should be able to: Explain model
More informationLecture 2 Simple Linear Regression STAT 512 Spring 2011 Background Reading KNNL: Chapter 1
Lecture Simple Linear Regression STAT 51 Spring 011 Background Reading KNNL: Chapter 1-1 Topic Overview This topic we will cover: Regression Terminology Simple Linear Regression with a single predictor
More informationQuantitative Methods I: Regression diagnostics
Quantitative Methods I: Regression University College Dublin 10 December 2014 1 Assumptions and errors 2 3 4 Outline Assumptions and errors 1 Assumptions and errors 2 3 4 Assumptions: specification Linear
More informationHarvard University. Rigorous Research in Engineering Education
Statistical Inference Kari Lock Harvard University Department of Statistics Rigorous Research in Engineering Education 12/3/09 Statistical Inference You have a sample and want to use the data collected
More informationThe Model Building Process Part I: Checking Model Assumptions Best Practice (Version 1.1)
The Model Building Process Part I: Checking Model Assumptions Best Practice (Version 1.1) Authored by: Sarah Burke, PhD Version 1: 31 July 2017 Version 1.1: 24 October 2017 The goal of the STAT T&E COE
More informationTentative solutions TMA4255 Applied Statistics 16 May, 2015
Norwegian University of Science and Technology Department of Mathematical Sciences Page of 9 Tentative solutions TMA455 Applied Statistics 6 May, 05 Problem Manufacturer of fertilizers a) Are these independent
More informationy ˆ i = ˆ " T u i ( i th fitted value or i th fit)
1 2 INFERENCE FOR MULTIPLE LINEAR REGRESSION Recall Terminology: p predictors x 1, x 2,, x p Some might be indicator variables for categorical variables) k-1 non-constant terms u 1, u 2,, u k-1 Each u
More informationThe Simple Linear Regression Model
The Simple Linear Regression Model Lesson 3 Ryan Safner 1 1 Department of Economics Hood College ECON 480 - Econometrics Fall 2017 Ryan Safner (Hood College) ECON 480 - Lesson 3 Fall 2017 1 / 77 Bivariate
More informationMultiple Regression Analysis. Part III. Multiple Regression Analysis
Part III Multiple Regression Analysis As of Sep 26, 2017 1 Multiple Regression Analysis Estimation Matrix form Goodness-of-Fit R-square Adjusted R-square Expected values of the OLS estimators Irrelevant
More informationWe like to capture and represent the relationship between a set of possible causes and their response, by using a statistical predictive model.
Statistical Methods in Business Lecture 5. Linear Regression We like to capture and represent the relationship between a set of possible causes and their response, by using a statistical predictive model.
More informationContents. 1 Review of Residuals. 2 Detecting Outliers. 3 Influential Observations. 4 Multicollinearity and its Effects
Contents 1 Review of Residuals 2 Detecting Outliers 3 Influential Observations 4 Multicollinearity and its Effects W. Zhou (Colorado State University) STAT 540 July 6th, 2015 1 / 32 Model Diagnostics:
More informationPsychology 282 Lecture #4 Outline Inferences in SLR
Psychology 282 Lecture #4 Outline Inferences in SLR Assumptions To this point we have not had to make any distributional assumptions. Principle of least squares requires no assumptions. Can use correlations
More informationReview of Statistics 101
Review of Statistics 101 We review some important themes from the course 1. Introduction Statistics- Set of methods for collecting/analyzing data (the art and science of learning from data). Provides methods
More informationCh14. Multiple Regression Analysis
Ch14. Multiple Regression Analysis 1 Goals : multiple regression analysis Model Building and Estimating More than 1 independent variables Quantitative( 量 ) independent variables Qualitative( ) independent
More informationChapter 10. Regression. Understandable Statistics Ninth Edition By Brase and Brase Prepared by Yixun Shi Bloomsburg University of Pennsylvania
Chapter 10 Regression Understandable Statistics Ninth Edition By Brase and Brase Prepared by Yixun Shi Bloomsburg University of Pennsylvania Scatter Diagrams A graph in which pairs of points, (x, y), are
More informationInteractions. Interactions. Lectures 1 & 2. Linear Relationships. y = a + bx. Slope. Intercept
Interactions Lectures 1 & Regression Sometimes two variables appear related: > smoking and lung cancers > height and weight > years of education and income > engine size and gas mileage > GMAT scores and
More informationTopic 10 - Linear Regression
Topic 10 - Linear Regression Least squares principle Hypothesis tests/confidence intervals/prediction intervals for regression 1 Linear Regression How much should you pay for a house? Would you consider
More informationInference for the Regression Coefficient
Inference for the Regression Coefficient Recall, b 0 and b 1 are the estimates of the slope β 1 and intercept β 0 of population regression line. We can shows that b 0 and b 1 are the unbiased estimates
More informationK. Model Diagnostics. residuals ˆɛ ij = Y ij ˆµ i N = Y ij Ȳ i semi-studentized residuals ω ij = ˆɛ ij. studentized deleted residuals ɛ ij =
K. Model Diagnostics We ve already seen how to check model assumptions prior to fitting a one-way ANOVA. Diagnostics carried out after model fitting by using residuals are more informative for assessing
More informationAnalysis of Variance. ภาว น ศ ร ประภาน ก ล คณะเศรษฐศาสตร มหาว ทยาล ยธรรมศาสตร
Analysis of Variance ภาว น ศ ร ประภาน ก ล คณะเศรษฐศาสตร มหาว ทยาล ยธรรมศาสตร pawin@econ.tu.ac.th Outline Introduction One Factor Analysis of Variance Two Factor Analysis of Variance ANCOVA MANOVA Introduction
More informationReview of Statistics
Review of Statistics Topics Descriptive Statistics Mean, Variance Probability Union event, joint event Random Variables Discrete and Continuous Distributions, Moments Two Random Variables Covariance and
More information