Formal Statement of Simple Linear Regression Model
|
|
- Mabel Webster
- 5 years ago
- Views:
Transcription
1 Formal Statement of Simple Linear Regression Model Y i = β 0 + β 1 X i + ɛ i Y i value of the response variable in the i th trial β 0 and β 1 are parameters X i is a known constant, the value of the predictor variable in the i th trial ɛ i is a random error term with mean E(ɛ i ) = 0 and variance Var(ɛ i ) = σ 2 i = 1,..., n
2 Least Squares Linear Regression Seek to minimize Q = n (Y i (β 0 + β 1 X i )) 2 i=1 Choose b 0 and b 1 as estimators for β 0 and β 1. b 0 and b 1 will minimize the criterion Q for the given sample observations (X 1, Y 1 ), (X 2, Y 2 ),, (X n, Y n ).
3 Normal Equations The result of this maximization step are called the normal equations. b 0 and b 1 are called point estimators of β 0 and β 1 respectively. Yi = nb 0 + b 1 Xi Xi Y i = b 0 Xi + b 1 X 2 i This is a system of two equations and two unknowns. The solution is given by...
4 Solution to Normal Equations After a lot of algebra one arrives at b 1 = (Xi X )(Y i Ȳ ) (Xi X ) 2 b 0 = Ȳ b 1 X X = Ȳ = Xi n Yi n
5 Properties of Solution The i th residual is defined to be e i = Y i Ŷ i i e i = 0 i Ŷ i = i Y i i X ie i = 0 i Ŷ i e i = 0 The regression line always goes through the point X, Ȳ
6 Alternative format of linear regression model: Y i = β 0 + β 1 (X i X ) + ɛ i The least squares estimator b 1 for β 1 remains the same as before. The least squares estimator for β 0 = β 0 + β 1 X becomes b 0 = b 0 + b 1 X = ( Ȳ b 1 X ) + b1 X = Ȳ Hence the estimated regression function is Ŷ = Ȳ + b 1 (X X )
7 s 2 estimator for σ 2 s 2 = MSE = SSE n 2 = (Yi Ŷi) 2 n 2 MSE is an unbiased estimator of σ 2 E(MSE) = σ 2 = e 2 i n 2 The sum of squares SSE has n 2 degrees of freedom associated with it. Cochran s theorem (later in the course) tells us where degree s of freedom come from and how to calculate them.
8 Normal Error Regression Model Y i = β 0 + β 1 X i + ɛ i Y i value of the response variable in the i th trial β 0 and β 1 are parameters X i is a known constant, the value of the predictor variable in the i th trial ɛ i iid N(0, σ 2 ) note this is different, now we know the distribution i = 1,..., n
9 Inference concerning β 1 Tests concerning β 1 (the slope) are often of interest, particularly H 0 : β 1 = 0 H a : β 1 0 the null hypothesis model Y i = β 0 + (0)X i + ɛ i implies that there is no relationship between Y and X. Note the means of all the Y i s are equal at all levels of X i.
10 Sampling Dist. Of b 1 The point estimator for b 1 is b 1 = (Xi X )(Y i Ȳ ) (Xi X ) 2 For a normal error regression model the sampling distribution of b 1 is normal, with mean and variance given by E(b 1 ) = β 1 Var(b 1 ) = σ 2 (Xi X ) 2
11 Estimated variance of b 1 When we don t know σ 2 then we have to replace it with the MSE estimate Let where s 2 = MSE = SSE n 2 SSE = e 2 i and e i = Y i Ŷi plugging in we get Var(b 1 ) = ˆ Var(b 1 ) = σ 2 (Xi X ) 2 s 2 (Xi X ) 2
12 Recap We now have an expression for the sampling distribution of b 1 when σ 2 is known b 1 N (β 1, σ 2 (Xi X ) 2 ) (1) When σ 2 is unknown we have an unbiased point estimator of σ 2 ˆ Var(b 1 ) = s 2 (Xi X ) 2
13 Sampling Distribution of (b 1 β 1 )/S(b 1 ) b 1 is normally distributed so (b 1 β 1 )/( Var(b 1 )) is a standard normal variable We don t know Var(b 1 ) so it must be estimated from data. We have already denoted it s estimate If using the estimate ˆV (b 1 ) it can be shown that b 1 β 1 Ŝ(b 1 ) Ŝ(b 1 ) = t(n 2) ˆV (b 1 )
14 Confidence Intervals and Hypothesis Tests Now that we know the sampling distribution of b 1 (t with n-2 degrees of freedom) we can construct confidence intervals and hypothesis tests easily.
15 1 α confidence limits for β 1 The 1 α confidence limits for β 1 are b 1 ± t(1 α/2; n 2)s{b 1 } Note that this quantity can be used to calculate confidence intervals given n and α. Fixing α can guide the choice of sample size if a particular confidence interval is desired Given a sample size, vice versa. Also useful for hypothesis testing
16 Tests Concerning β 1 Example 1 Two-sided test H 0 : β 1 = 0 H a : β 1 0 Test statistic t = b1 0 s{b 1}
17 Tests Concerning β 1 We have an estimate of the sampling distribution of b 1 from the data. If the null hypothesis holds then the b 1 estimate coming from the data should be within the 95% confidence interval of the sampling distribution centered at 0 (in this case) t = b 1 0 s{b 1 }
18 Decision rules if t t(1 α/2; n 2), accept H 0 if t > t(1 α/2; n 2), reject H 0 Absolute values make the test two-sided
19 Inferences Concerning β 0 Largely, inference procedures regarding β 0 can be performed in the same way as those for β 1 Remember the point estimator b 0 for β 0 b 0 = Ȳ b 1 X
20 Sampling distribution of b 0 When error variance is known E(b 0 ) = β 0 σ 2 {b 0 } = σ 2 ( 1 n + X 2 (Xi X ) 2 ) When error variance is unknown s 2 {b 0 } = MSE( 1 n + X 2 (Xi X ) 2 )
21 Confidence interval for β 0 The 1 α confidence limits for β 0 are obtained in the same manner as those for β 1 b 0 ± t(1 α/2; n 2)s{b 0 }
22 Sampling Distribution of Ŷh We have Ŷ h = b 0 + b 1 X h Since this quantity is itself a linear combination of the Y i s it s sampling distribution is itself normal. The mean of the sampling distribution is Biased or unbiased? E{Ŷh} = E{b 0 } + E{b 1 }X h = β 0 + β 1 X h
23 Sampling Distribution of Ŷh So, plugging in, we get ( 1 σ 2 {Ŷ h } = σ 2 n + (X h X ) 2 ) (Xi X ) 2 Since we often won t know σ 2 we can, as usual, plug in S 2 = SSE/(n 2), our estimate for it to get our estimate of this sampling distribution variance ( 1 s 2 {Ŷ h } = S 2 n + (X h X ) 2 ) (Xi X ) 2
24 No surprise... The sampling distribution of our point estimator for the output is distributed as a t-distribution with two degrees of freedom Ŷ h E{Y h } t(n 2) s{ŷ h } This means that we can construct confidence intervals in the same manner as before.
25 Confidence Intervals for E(Y h ) The 1 α confidence intervals for E(Y h ) are Ŷ h ± t(1 α/2; n 2)s{Ŷ h } From this hypothesis tests can be constructed as usual.
26 Prediction interval for single new observation If the regression parameters are unknown the 1 α prediction interval for a new observation Y h is given by the following theorem Ŷ h ± t(1 α/2; n 2)s{pred} We have σ 2 {pred} = σ 2 {Y h Ŷ h } = σ 2 {Y h } + σ 2 {Ŷ h } = σ 2 + σ 2 {Ŷ h } An unbiased estimator of σ 2 {pred} is s 2 {pred} = MSE + s 2 {Ŷ h }, which is given by s 2 {pred} = MSE [1 + 1n + (X h X ) 2 ] (Xi X ) 2
27 ANOVA table for simple lin. regression Source of Variation SS df MS E(MS) Regression SSR = (Ŷi Ȳ )2 1 MSR = SSR/1 σ 2 + β 2 1 (Xi X ) 2 Error SSE = (Y i Ŷ i ) 2 n 2 MSE = SSE/(n 2) σ 2 Total SSTO = (Y i Ȳ )2 n 1
28 F Test of β 1 = 0 vs. β 1 0 ANOVA provides a battery of useful tests. For example, ANOVA provides an easy test for Two-sided test H 0 : β 1 = 0 H a : β 1 0 Test statistic Test statistic from before t = b 1 0 s{b 1 } ANOVA test statistic F = MSR MSE
29 F Distribution The F distribution is the ratio of two independent χ 2 random variables normalized by their corresponding degrees of freedom. The test statistic F follows the distribution F F (1, n 2)
30 Hypothesis Test Decision Rule Since F is distributed as F (1, n 2) when H 0 holds, the decision rule to follow when the risk of a Type I error is to be controlled at α is: If F F (1 α; 1, n 2), conclude H 0 If F > F (1 α; 1, n 2), conclude H a
31 General Linear Test The test of β 1 = 0 versus β 1 0 is but a single example of a general test for a linear statistical models. The general linear test has three parts Full Model Reduced Model Test Statistic
32 Full Model Fit A full linear model is first fit to the data Y i = β 0 + β 1 X i + ɛ i Using this model the error sum of squares is obtained, here for example the simple linear model with non-zero slope is the full model SSE(F ) = [Y i (b 0 + b 1 X i )] 2 = (Y i Ŷ i ) 2 = SSE
33 Fit Reduced Model One can test the hypothesis that a simpler model is a better model via a general linear test (which is really a likelihood ratio test in disguise). For instance, consider a reduced model in which the slope is zero (i.e. no relationship between input and output). H 0 : β 1 = 0 H a : β 1 0 The model when H 0 holds is called the reduced or restricted model. Y i = β 0 + ɛ i The SSE for the reduced model is obtained SSE(R) = (Y i b 0 ) 2 = (Y i Ȳ ) 2 = SSTO
34 Test Statistic The idea is to compare the two error sums of squares SSE(F) and SSE(R). Because the full model F has more parameters than the reduced model R SSE(F ) SSE(R) always In the general linear test, the test statistic is F = SSE(R) SSE(F ) df R df F SSE(F ) df F which follows the F distribution when H 0 holds. df R and df F are those associated with the reduced and full model error sums of square respectively
35 R 2 (Coefficient of determination) SSTO measures the variation in the observations Y i when X is not considered SSE measures the variation in the Y i after a predictor variable X is employed A natural measure of the effect of X in reducing variation in Y is to express the reduction in variation (SSTO SSE = SSR) as a proportion of the total variation R 2 = SSR SSTO = 1 SSE SSTO Note that since 0 SSE SSTO then 0 R 2 1
36 Coefficient of Correlation r = ± R 2 Range: 1 r 1
37 Remedial Measures How do we know that the regression function is a good explainer of the observed data? - Plotting - Tests What if it is not? What can we do about it? - Transformation of variables
38 Residuals Remember, the definition of residuals: e i = Y i Ŷ i And the difference between that and the unknown true error ɛ = Y i E(Y i ) In a normal regression model the ɛ i s are assumed to be iid N(0, σ 2 ) random variables. The observed residuals e i should reflect these properties.
39 Departures from Model... To be studied by residuals Regression function not linear Error terms do not have constant variance Error terms are not independent Model fits all but one or a few outlier observations Error terms are not normally distributed One or more predictor variables have been omitted from the model
40 Diagnostics for Residuals Plot of residuals against predictor variable Plot of absolute or squared residuals against predictor variable Plot of residuals against fitted values Plot of residuals against time or other sequence Plot of residuals against omitted predictor variables Box plot of residuals Normal probability plot of residuals
41 Tests Involving Residuals Tests for constancy of variance (Brown-Forsythe test, Breusch-Pagan test, Section 3.6) Tests for normality of error distribution
42 Brown-Forsythe Test The test statistic for comparing the means of the absolute deviations of the residuals around the group medians is where the pooled variance s 2 = t BF = d 1 d 2 s 1 n n 2 (di1 d 1 ) 2 + (d i2 d 2 ) 2 n 2
43 Brown-Forsythe Test If n 1 and n 2 are not extremely small t BF t(n 2) approximately From this confidence intervals and tests can be constructed.
44 F test for lack of fit Formal test for determining whether a specific type of regression function adequately fits the data. Assumptions (usual): - observations Y X are 1. i.i.d. 2. normally distributed 3. same variance σ 2 Requires: repeat observations at one or more X levels (called replicates)
45 Full Model vs. Regression Model The full model is Y ij = µ j + ɛ ij where - µ j are parameters j = 1,..., c - ɛ ij are iid N(0, σ 2 ) Full model Since the error terms have expectation zero E(Y ij ) = µ j
46 Full Model In the full model there is a different mean (a free parameter) for each X i In the regression model the mean responses are constrained to lie on a line E(Y ) = β 0 + β 1 X
47 Fitting the Full Model The estimators of µ j are simply ˆµ j = Ȳ j The error sum of squares of the full model therefore is SSE(F ) = (Y ij Ȳ j ) 2 = SSPE SSPE: Pure Error Sum of Squares
48 Degrees of Freedom Ordinary total sum of squares had n-1 degrees of freedom. Each of the j terms is a ordinary total sum of squares - Each then has n j 1 degrees of freedom The number of degrees of freedom of SSPE is the sum of the component degrees of freedom df F = j (n j 1) = j n j c = n c
49 General Linear Test Remember: the general linear test proposes a reduced model null hypotheses - this will be our normal regression model The full model will be as described (one independent mean for each level of X) H 0 : E(Y ) = β 0 + β 1 X H a : E(Y ) β 0 + β 1 X
50 SSE For Reduced Model The SSE for the reduced model is as before - remember SSE(R) = i = i [Y ij (b 0 + b 1 X j )] 2 j (Y ij Yˆ ij ) 2 j - and has n-2 degrees of freedom df R = n 2
51 F Test Statistic From the general linear test approach Lack of fit sum of squares: F = SSE(R) SSE(F ) df R df F SSE(F ) df F F = SSE SSPE (n 2) (n c) SSPE n c SSLF = SSE SSPE Then F = SSLF (n 2) (n c) SSPE n c = MSLF MSPE
52 F Test Rule From the F test we know that large values of F lead us to reject the null hypothesis: If F F (1 α; c 2, n c), conclude H 0 If F > F (1 α; c 2, n c), conclude H a
53 Variance decomposition SSE = SSPE + SSLF. (Yij Ŷ ij ) 2 = (Y ij Ȳ j ) 2 + (Ȳ j Ŷ ij ) 2
54 Example decomposition
55 Box Cox Transforms It can be difficult to graphically determine which transformation of Y is most appropriate for correcting - skewness of the distributions of error terms - unequal variances - nonlinearity of the regression function The Box-Cox procedure automatically identifies a transformation from the family of power transformations on Y
56 Box Cox Transforms This family is of the form Examples include Y = Y λ λ = 2 Y = Y 2 λ =.5 Y = Y λ = 0 Y = lny (by definition) λ =.5 Y = 1 Y λ = 1 Y = 1 Y
57 Box Cox Cont. The normal error regression model with the response variable a member of the family of power transformations becomes Y λ i = β 0 + β 1 X i + ɛ i This model has an additional parameter that needs to be estimated Maximum likelihood is a way to estimate this parameter
58 Using the Bonferroni inequality cont. To achieve a 1 α family confidence interval for β 0 and β 1 (for example) using the Bonferroni procedure we know that both individual intervals must shrink. Returning to our confidence intervals for β 0 and β 1 from before b 0 ± t(1 α/2; n 2)s{b 0 } b 1 ± t(1 α/2; n 2)s{b 1 } To achieve a 1 α family confidence interval these intervals must widen to Then b 0 ± t(1 α/4; n 2)s{b 0 } b 1 ± t(1 α/4; n 2)s{b 1 } P(Ā1 Ā2) 1 P(A 2 ) P(A 1 ) = 1 α/4 α/4 = 1 α/2
Diagnostics and Remedial Measures
Diagnostics and Remedial Measures Yang Feng http://www.stat.columbia.edu/~yangfeng Yang Feng (Columbia University) Diagnostics and Remedial Measures 1 / 72 Remedial Measures How do we know that the regression
More informationBias Variance Trade-off
Bias Variance Trade-off The mean squared error of an estimator MSE(ˆθ) = E([ˆθ θ] 2 ) Can be re-expressed MSE(ˆθ) = Var(ˆθ) + (B(ˆθ) 2 ) MSE = VAR + BIAS 2 Proof MSE(ˆθ) = E((ˆθ θ) 2 ) = E(([ˆθ E(ˆθ)]
More informationRemedial Measures, Brown-Forsythe test, F test
Remedial Measures, Brown-Forsythe test, F test Dr. Frank Wood Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 7, Slide 1 Remedial Measures How do we know that the regression function
More informationInference in Normal Regression Model. Dr. Frank Wood
Inference in Normal Regression Model Dr. Frank Wood Remember We know that the point estimator of b 1 is b 1 = (Xi X )(Y i Ȳ ) (Xi X ) 2 Last class we derived the sampling distribution of b 1, it being
More informationOutline. Remedial Measures) Extra Sums of Squares Standardized Version of the Multiple Regression Model
Outline 1 Multiple Linear Regression (Estimation, Inference, Diagnostics and Remedial Measures) 2 Special Topics for Multiple Regression Extra Sums of Squares Standardized Version of the Multiple Regression
More informationNonparametric Regression and Bonferroni joint confidence intervals. Yang Feng
Nonparametric Regression and Bonferroni joint confidence intervals Yang Feng Simultaneous Inferences In chapter 2, we know how to construct confidence interval for β 0 and β 1. If we want a confidence
More information3. Diagnostics and Remedial Measures
3. Diagnostics and Remedial Measures So far, we took data (X i, Y i ) and we assumed where ɛ i iid N(0, σ 2 ), Y i = β 0 + β 1 X i + ɛ i i = 1, 2,..., n, β 0, β 1 and σ 2 are unknown parameters, X i s
More informationConcordia University (5+5)Q 1.
(5+5)Q 1. Concordia University Department of Mathematics and Statistics Course Number Section Statistics 360/1 40 Examination Date Time Pages Mid Term Test May 26, 2004 Two Hours 3 Instructor Course Examiner
More informationChapter 3. Diagnostics and Remedial Measures
Chapter 3. Diagnostics and Remedial Measures So far, we took data (X i, Y i ) and we assumed Y i = β 0 + β 1 X i + ǫ i i = 1, 2,..., n, where ǫ i iid N(0, σ 2 ), β 0, β 1 and σ 2 are unknown parameters,
More informationLecture 15 Multiple regression I Chapter 6 Set 2 Least Square Estimation The quadratic form to be minimized is
Lecture 15 Multiple regression I Chapter 6 Set 2 Least Square Estimation The quadratic form to be minimized is Q = (Y i β 0 β 1 X i1 β 2 X i2 β p 1 X i.p 1 ) 2, which in matrix notation is Q = (Y Xβ) (Y
More informationInference in Regression Analysis
Inference in Regression Analysis Dr. Frank Wood Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 4, Slide 1 Today: Normal Error Regression Model Y i = β 0 + β 1 X i + ǫ i Y i value
More informationSSR = The sum of squared errors measures how much Y varies around the regression line n. It happily turns out that SSR + SSE = SSTO.
Analysis of variance approach to regression If x is useless, i.e. β 1 = 0, then E(Y i ) = β 0. In this case β 0 is estimated by Ȳ. The ith deviation about this grand mean can be written: deviation about
More informationDiagnostics and Remedial Measures: An Overview
Diagnostics and Remedial Measures: An Overview Residuals Model diagnostics Graphical techniques Hypothesis testing Remedial measures Transformation Later: more about all this for multiple regression W.
More informationChapter 2. Continued. Proofs For ANOVA Proof of ANOVA Identity. the product term in the above equation can be simplified as n
Chapter 2. Continued Proofs For ANOVA Proof of ANOVA Identity We are going to prove that Writing SST SSR + SSE. Y i Ȳ (Y i Ŷ i ) + (Ŷ i Ȳ ) Squaring both sides summing over all i 1,...n, we get (Y i Ȳ
More informationInferences for Regression
Inferences for Regression An Example: Body Fat and Waist Size Looking at the relationship between % body fat and waist size (in inches). Here is a scatterplot of our data set: Remembering Regression In
More informationCorrelation Analysis
Simple Regression Correlation Analysis Correlation analysis is used to measure strength of the association (linear relationship) between two variables Correlation is only concerned with strength of the
More informationCh 2: Simple Linear Regression
Ch 2: Simple Linear Regression 1. Simple Linear Regression Model A simple regression model with a single regressor x is y = β 0 + β 1 x + ɛ, where we assume that the error ɛ is independent random component
More informationLinear models and their mathematical foundations: Simple linear regression
Linear models and their mathematical foundations: Simple linear regression Steffen Unkel Department of Medical Statistics University Medical Center Göttingen, Germany Winter term 2018/19 1/21 Introduction
More information6. Multiple Linear Regression
6. Multiple Linear Regression SLR: 1 predictor X, MLR: more than 1 predictor Example data set: Y i = #points scored by UF football team in game i X i1 = #games won by opponent in their last 10 games X
More informationRegression Estimation - Least Squares and Maximum Likelihood. Dr. Frank Wood
Regression Estimation - Least Squares and Maximum Likelihood Dr. Frank Wood Least Squares Max(min)imization Function to minimize w.r.t. β 0, β 1 Q = n (Y i (β 0 + β 1 X i )) 2 i=1 Minimize this by maximizing
More informationEstimating σ 2. We can do simple prediction of Y and estimation of the mean of Y at any value of X.
Estimating σ 2 We can do simple prediction of Y and estimation of the mean of Y at any value of X. To perform inferences about our regression line, we must estimate σ 2, the variance of the error term.
More informationSTA 302 H1F / 1001 HF Fall 2007 Test 1 October 24, 2007
STA 302 H1F / 1001 HF Fall 2007 Test 1 October 24, 2007 LAST NAME: SOLUTIONS FIRST NAME: STUDENT NUMBER: ENROLLED IN: (circle one) STA 302 STA 1001 INSTRUCTIONS: Time: 90 minutes Aids allowed: calculator.
More informationSTAT 540: Data Analysis and Regression
STAT 540: Data Analysis and Regression Wen Zhou http://www.stat.colostate.edu/~riczw/ Email: riczw@stat.colostate.edu Department of Statistics Colorado State University Fall 205 W. Zhou (Colorado State
More informationInference for Regression
Inference for Regression Section 9.4 Cathy Poliak, Ph.D. cathy@math.uh.edu Office in Fleming 11c Department of Mathematics University of Houston Lecture 13b - 3339 Cathy Poliak, Ph.D. cathy@math.uh.edu
More informationy ˆ i = ˆ " T u i ( i th fitted value or i th fit)
1 2 INFERENCE FOR MULTIPLE LINEAR REGRESSION Recall Terminology: p predictors x 1, x 2,, x p Some might be indicator variables for categorical variables) k-1 non-constant terms u 1, u 2,, u k-1 Each u
More informationChapter 2 Inferences in Simple Linear Regression
STAT 525 SPRING 2018 Chapter 2 Inferences in Simple Linear Regression Professor Min Zhang Testing for Linear Relationship Term β 1 X i defines linear relationship Will then test H 0 : β 1 = 0 Test requires
More informationInference for the Regression Coefficient
Inference for the Regression Coefficient Recall, b 0 and b 1 are the estimates of the slope β 1 and intercept β 0 of population regression line. We can shows that b 0 and b 1 are the unbiased estimates
More informationSimple and Multiple Linear Regression
Sta. 113 Chapter 12 and 13 of Devore March 12, 2010 Table of contents 1 Simple Linear Regression 2 Model Simple Linear Regression A simple linear regression model is given by Y = β 0 + β 1 x + ɛ where
More informationF-tests and Nested Models
F-tests and Nested Models Nested Models: A core concept in statistics is comparing nested s. Consider the Y = β 0 + β 1 x 1 + β 2 x 2 + ǫ. (1) The following reduced s are special cases (nested within)
More informationMath 423/533: The Main Theoretical Topics
Math 423/533: The Main Theoretical Topics Notation sample size n, data index i number of predictors, p (p = 2 for simple linear regression) y i : response for individual i x i = (x i1,..., x ip ) (1 p)
More informationFinal Review. Yang Feng. Yang Feng (Columbia University) Final Review 1 / 58
Final Review Yang Feng http://www.stat.columbia.edu/~yangfeng Yang Feng (Columbia University) Final Review 1 / 58 Outline 1 Multiple Linear Regression (Estimation, Inference) 2 Special Topics for Multiple
More informationLecture 3: Inference in SLR
Lecture 3: Inference in SLR STAT 51 Spring 011 Background Reading KNNL:.1.6 3-1 Topic Overview This topic will cover: Review of hypothesis testing Inference about 1 Inference about 0 Confidence Intervals
More informationSTAT 571A Advanced Statistical Regression Analysis. Chapter 3 NOTES Diagnostics and Remedial Measures
STAT 571A Advanced Statistical Regression Analysis Chapter 3 NOTES Diagnostics and Remedial Measures 2015 University of Arizona Statistics GIDP. All rights reserved, except where previous rights exist.
More informationLecture 1 Linear Regression with One Predictor Variable.p2
Lecture Linear Regression with One Predictor Variablep - Basics - Meaning of regression parameters p - β - the slope of the regression line -it indicates the change in mean of the probability distn of
More informationDESAIN EKSPERIMEN Analysis of Variances (ANOVA) Semester Genap 2017/2018 Jurusan Teknik Industri Universitas Brawijaya
DESAIN EKSPERIMEN Analysis of Variances (ANOVA) Semester Jurusan Teknik Industri Universitas Brawijaya Outline Introduction The Analysis of Variance Models for the Data Post-ANOVA Comparison of Means Sample
More informationMultiple Regression. Dr. Frank Wood. Frank Wood, Linear Regression Models Lecture 12, Slide 1
Multiple Regression Dr. Frank Wood Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 12, Slide 1 Review: Matrix Regression Estimation We can solve this equation (if the inverse of X
More informationConfidence Intervals, Testing and ANOVA Summary
Confidence Intervals, Testing and ANOVA Summary 1 One Sample Tests 1.1 One Sample z test: Mean (σ known) Let X 1,, X n a r.s. from N(µ, σ) or n > 30. Let The test statistic is H 0 : µ = µ 0. z = x µ 0
More informationSTA121: Applied Regression Analysis
STA121: Applied Regression Analysis Linear Regression Analysis - Chapters 3 and 4 in Dielman Artin Department of Statistical Science September 15, 2009 Outline 1 Simple Linear Regression Analysis 2 Using
More informationSTA 108 Applied Linear Models: Regression Analysis Spring Solution for Homework #6
STA 8 Applied Linear Models: Regression Analysis Spring 011 Solution for Homework #6 6. a) = 11 1 31 41 51 1 3 4 5 11 1 31 41 51 β = β1 β β 3 b) = 1 1 1 1 1 11 1 31 41 51 1 3 4 5 β = β 0 β1 β 6.15 a) Stem-and-leaf
More informationInference for Regression Simple Linear Regression
Inference for Regression Simple Linear Regression IPS Chapter 10.1 2009 W.H. Freeman and Company Objectives (IPS Chapter 10.1) Simple linear regression p Statistical model for linear regression p Estimating
More informationSimple Linear Regression
Simple Linear Regression ST 430/514 Recall: A regression model describes how a dependent variable (or response) Y is affected, on average, by one or more independent variables (or factors, or covariates)
More informationSTAT 705 Chapter 19: Two-way ANOVA
STAT 705 Chapter 19: Two-way ANOVA Adapted from Timothy Hanson Department of Statistics, University of South Carolina Stat 705: Data Analysis II 1 / 41 Two-way ANOVA This material is covered in Sections
More informationLecture 10 Multiple Linear Regression
Lecture 10 Multiple Linear Regression STAT 512 Spring 2011 Background Reading KNNL: 6.1-6.5 10-1 Topic Overview Multiple Linear Regression Model 10-2 Data for Multiple Regression Y i is the response variable
More informationSTAT 705 Chapter 19: Two-way ANOVA
STAT 705 Chapter 19: Two-way ANOVA Timothy Hanson Department of Statistics, University of South Carolina Stat 705: Data Analysis II 1 / 38 Two-way ANOVA Material covered in Sections 19.2 19.4, but a bit
More informationSTAT 705 Chapter 16: One-way ANOVA
STAT 705 Chapter 16: One-way ANOVA Timothy Hanson Department of Statistics, University of South Carolina Stat 705: Data Analysis II 1 / 21 What is ANOVA? Analysis of variance (ANOVA) models are regression
More informationSTAT5044: Regression and Anova
STAT5044: Regression and Anova Inyoung Kim 1 / 49 Outline 1 How to check assumptions 2 / 49 Assumption Linearity: scatter plot, residual plot Randomness: Run test, Durbin-Watson test when the data can
More informationMathematics for Economics MA course
Mathematics for Economics MA course Simple Linear Regression Dr. Seetha Bandara Simple Regression Simple linear regression is a statistical method that allows us to summarize and study relationships between
More informationMultiple Linear Regression
Multiple Linear Regression Simple linear regression tries to fit a simple line between two variables Y and X. If X is linearly related to Y this explains some of the variability in Y. In most cases, there
More informationSimple Linear Regression
Simple Linear Regression In simple linear regression we are concerned about the relationship between two variables, X and Y. There are two components to such a relationship. 1. The strength of the relationship.
More informationEcon 3790: Business and Economics Statistics. Instructor: Yogesh Uppal
Econ 3790: Business and Economics Statistics Instructor: Yogesh Uppal yuppal@ysu.edu Sampling Distribution of b 1 Expected value of b 1 : Variance of b 1 : E(b 1 ) = 1 Var(b 1 ) = σ 2 /SS x Estimate of
More informationIntroduction to Simple Linear Regression
Introduction to Simple Linear Regression Yang Feng http://www.stat.columbia.edu/~yangfeng Yang Feng (Columbia University) Introduction to Simple Linear Regression 1 / 68 About me Faculty in the Department
More informationApplied Regression Analysis
Applied Regression Analysis Chapter 3 Multiple Linear Regression Hongcheng Li April, 6, 2013 Recall simple linear regression 1 Recall simple linear regression 2 Parameter Estimation 3 Interpretations of
More informationDesign & Analysis of Experiments 7E 2009 Montgomery
1 What If There Are More Than Two Factor Levels? The t-test does not directly apply ppy There are lots of practical situations where there are either more than two levels of interest, or there are several
More informationNeed for Several Predictor Variables
Multiple regression One of the most widely used tools in statistical analysis Matrix expressions for multiple regression are the same as for simple linear regression Need for Several Predictor Variables
More informationNotes for Week 13 Analysis of Variance (ANOVA) continued WEEK 13 page 1
Notes for Wee 13 Analysis of Variance (ANOVA) continued WEEK 13 page 1 Exam 3 is on Friday May 1. A part of one of the exam problems is on Predictiontervals : When randomly sampling from a normal population
More informationChapter 6 Multiple Regression
STAT 525 FALL 2018 Chapter 6 Multiple Regression Professor Min Zhang The Data and Model Still have single response variable Y Now have multiple explanatory variables Examples: Blood Pressure vs Age, Weight,
More informationSTA 4210 Practise set 2a
STA 410 Practise set a For all significance tests, use = 0.05 significance level. S.1. A multiple linear regression model is fit, relating household weekly food expenditures (Y, in $100s) to weekly income
More informationSummary of Chapter 7 (Sections ) and Chapter 8 (Section 8.1)
Summary of Chapter 7 (Sections 7.2-7.5) and Chapter 8 (Section 8.1) Chapter 7. Tests of Statistical Hypotheses 7.2. Tests about One Mean (1) Test about One Mean Case 1: σ is known. Assume that X N(µ, σ
More informationLinear regression. We have that the estimated mean in linear regression is. ˆµ Y X=x = ˆβ 0 + ˆβ 1 x. The standard error of ˆµ Y X=x is.
Linear regression We have that the estimated mean in linear regression is The standard error of ˆµ Y X=x is where x = 1 n s.e.(ˆµ Y X=x ) = σ ˆµ Y X=x = ˆβ 0 + ˆβ 1 x. 1 n + (x x)2 i (x i x) 2 i x i. The
More informationLecture 6 Multiple Linear Regression, cont.
Lecture 6 Multiple Linear Regression, cont. BIOST 515 January 22, 2004 BIOST 515, Lecture 6 Testing general linear hypotheses Suppose we are interested in testing linear combinations of the regression
More informationK. Model Diagnostics. residuals ˆɛ ij = Y ij ˆµ i N = Y ij Ȳ i semi-studentized residuals ω ij = ˆɛ ij. studentized deleted residuals ɛ ij =
K. Model Diagnostics We ve already seen how to check model assumptions prior to fitting a one-way ANOVA. Diagnostics carried out after model fitting by using residuals are more informative for assessing
More information2.1: Inferences about β 1
Chapter 2 1 2.1: Inferences about β 1 Test of interest throughout regression: Need sampling distribution of the estimator b 1. Idea: If b 1 can be written as a linear combination of the responses (which
More informationST505/S697R: Fall Homework 2 Solution.
ST505/S69R: Fall 2012. Homework 2 Solution. 1. 1a; problem 1.22 Below is the summary information (edited) from the regression (using R output); code at end of solution as is code and output for SAS. a)
More informationStatistics for Managers using Microsoft Excel 6 th Edition
Statistics for Managers using Microsoft Excel 6 th Edition Chapter 13 Simple Linear Regression 13-1 Learning Objectives In this chapter, you learn: How to use regression analysis to predict the value of
More informationSection 3: Simple Linear Regression
Section 3: Simple Linear Regression Carlos M. Carvalho The University of Texas at Austin McCombs School of Business http://faculty.mccombs.utexas.edu/carlos.carvalho/teaching/ 1 Regression: General Introduction
More informationLectures on Simple Linear Regression Stat 431, Summer 2012
Lectures on Simple Linear Regression Stat 43, Summer 0 Hyunseung Kang July 6-8, 0 Last Updated: July 8, 0 :59PM Introduction Previously, we have been investigating various properties of the population
More informationFigure 1: The fitted line using the shipment route-number of ampules data. STAT5044: Regression and ANOVA The Solution of Homework #2 Inyoung Kim
0.0 1.0 1.5 2.0 2.5 3.0 8 10 12 14 16 18 20 22 y x Figure 1: The fitted line using the shipment route-number of ampules data STAT5044: Regression and ANOVA The Solution of Homework #2 Inyoung Kim Problem#
More informationThe Multiple Regression Model
Multiple Regression The Multiple Regression Model Idea: Examine the linear relationship between 1 dependent (Y) & or more independent variables (X i ) Multiple Regression Model with k Independent Variables:
More informationChapter 2 Inferences in Regression and Correlation Analysis
Chapter 2 Inferences in Regression and Correlation Analysis 許湘伶 Applied Linear Regression Models (Kutner, Nachtsheim, Neter, Li) hsuhl (NUK) LR Chap 2 1 / 102 Inferences concerning the regression parameters
More information2.2 Classical Regression in the Time Series Context
48 2 Time Series Regression and Exploratory Data Analysis context, and therefore we include some material on transformations and other techniques useful in exploratory data analysis. 2.2 Classical Regression
More informationReview of Statistics 101
Review of Statistics 101 We review some important themes from the course 1. Introduction Statistics- Set of methods for collecting/analyzing data (the art and science of learning from data). Provides methods
More informationUnit 10: Simple Linear Regression and Correlation
Unit 10: Simple Linear Regression and Correlation Statistics 571: Statistical Methods Ramón V. León 6/28/2004 Unit 10 - Stat 571 - Ramón V. León 1 Introductory Remarks Regression analysis is a method for
More informationSTAT5044: Regression and Anova. Inyoung Kim
STAT5044: Regression and Anova Inyoung Kim 2 / 47 Outline 1 Regression 2 Simple Linear regression 3 Basic concepts in regression 4 How to estimate unknown parameters 5 Properties of Least Squares Estimators:
More informationBasic Business Statistics 6 th Edition
Basic Business Statistics 6 th Edition Chapter 12 Simple Linear Regression Learning Objectives In this chapter, you learn: How to use regression analysis to predict the value of a dependent variable based
More informationRegression Models - Introduction
Regression Models - Introduction In regression models there are two types of variables that are studied: A dependent variable, Y, also called response variable. It is modeled as random. An independent
More informationSociology 6Z03 Review II
Sociology 6Z03 Review II John Fox McMaster University Fall 2016 John Fox (McMaster University) Sociology 6Z03 Review II Fall 2016 1 / 35 Outline: Review II Probability Part I Sampling Distributions Probability
More informationApplied Regression. Applied Regression. Chapter 2 Simple Linear Regression. Hongcheng Li. April, 6, 2013
Applied Regression Chapter 2 Simple Linear Regression Hongcheng Li April, 6, 2013 Outline 1 Introduction of simple linear regression 2 Scatter plot 3 Simple linear regression model 4 Test of Hypothesis
More informationTentative solutions TMA4255 Applied Statistics 16 May, 2015
Norwegian University of Science and Technology Department of Mathematical Sciences Page of 9 Tentative solutions TMA455 Applied Statistics 6 May, 05 Problem Manufacturer of fertilizers a) Are these independent
More informationReview: General Approach to Hypothesis Testing. 1. Define the research question and formulate the appropriate null and alternative hypotheses.
1 Review: Let X 1, X,..., X n denote n independent random variables sampled from some distribution might not be normal!) with mean µ) and standard deviation σ). Then X µ σ n In other words, X is approximately
More informationTopic 17 - Single Factor Analysis of Variance. Outline. One-way ANOVA. The Data / Notation. One way ANOVA Cell means model Factor effects model
Topic 17 - Single Factor Analysis of Variance - Fall 2013 One way ANOVA Cell means model Factor effects model Outline Topic 17 2 One-way ANOVA Response variable Y is continuous Explanatory variable is
More informationBusiness Statistics. Chapter 14 Introduction to Linear Regression and Correlation Analysis QMIS 220. Dr. Mohammad Zainal
Department of Quantitative Methods & Information Systems Business Statistics Chapter 14 Introduction to Linear Regression and Correlation Analysis QMIS 220 Dr. Mohammad Zainal Chapter Goals After completing
More informationBiostatistics 380 Multiple Regression 1. Multiple Regression
Biostatistics 0 Multiple Regression ORIGIN 0 Multiple Regression Multiple Regression is an extension of the technique of linear regression to describe the relationship between a single dependent (response)
More informationy response variable x 1, x 2,, x k -- a set of explanatory variables
11. Multiple Regression and Correlation y response variable x 1, x 2,, x k -- a set of explanatory variables In this chapter, all variables are assumed to be quantitative. Chapters 12-14 show how to incorporate
More information9. Linear Regression and Correlation
9. Linear Regression and Correlation Data: y a quantitative response variable x a quantitative explanatory variable (Chap. 8: Recall that both variables were categorical) For example, y = annual income,
More informationSTA 6167 Exam 1 Spring 2016 PRINT Name
STA 6167 Exam 1 Spring 2016 PRINT Name Unless stated otherwise, for all significance tests, use = 0.05 significance level. Q.1. A regression model was fit, relating estimated cost of de-commissioning oil
More informationRegression Estimation Least Squares and Maximum Likelihood
Regression Estimation Least Squares and Maximum Likelihood Dr. Frank Wood Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 3, Slide 1 Least Squares Max(min)imization Function to minimize
More informationBNAD 276 Lecture 10 Simple Linear Regression Model
1 / 27 BNAD 276 Lecture 10 Simple Linear Regression Model Phuong Ho May 30, 2017 2 / 27 Outline 1 Introduction 2 3 / 27 Outline 1 Introduction 2 4 / 27 Simple Linear Regression Model Managerial decisions
More informationMAT2377. Rafa l Kulik. Version 2015/November/26. Rafa l Kulik
MAT2377 Rafa l Kulik Version 2015/November/26 Rafa l Kulik Bivariate data and scatterplot Data: Hydrocarbon level (x) and Oxygen level (y): x: 0.99, 1.02, 1.15, 1.29, 1.46, 1.36, 0.87, 1.23, 1.55, 1.40,
More informationCh 3: Multiple Linear Regression
Ch 3: Multiple Linear Regression 1. Multiple Linear Regression Model Multiple regression model has more than one regressor. For example, we have one response variable and two regressor variables: 1. delivery
More informationChapter 14 Student Lecture Notes Department of Quantitative Methods & Information Systems. Business Statistics. Chapter 14 Multiple Regression
Chapter 14 Student Lecture Notes 14-1 Department of Quantitative Methods & Information Systems Business Statistics Chapter 14 Multiple Regression QMIS 0 Dr. Mohammad Zainal Chapter Goals After completing
More informationMultiple Regression. Inference for Multiple Regression and A Case Study. IPS Chapters 11.1 and W.H. Freeman and Company
Multiple Regression Inference for Multiple Regression and A Case Study IPS Chapters 11.1 and 11.2 2009 W.H. Freeman and Company Objectives (IPS Chapters 11.1 and 11.2) Multiple regression Data for multiple
More informationWe like to capture and represent the relationship between a set of possible causes and their response, by using a statistical predictive model.
Statistical Methods in Business Lecture 5. Linear Regression We like to capture and represent the relationship between a set of possible causes and their response, by using a statistical predictive model.
More informationConfidence Interval for the mean response
Week 3: Prediction and Confidence Intervals at specified x. Testing lack of fit with replicates at some x's. Inference for the correlation. Introduction to regression with several explanatory variables.
More informationChapter 4. Regression Models. Learning Objectives
Chapter 4 Regression Models To accompany Quantitative Analysis for Management, Eleventh Edition, by Render, Stair, and Hanna Power Point slides created by Brian Peterson Learning Objectives After completing
More informationRegression Analysis. Regression: Methodology for studying the relationship among two or more variables
Regression Analysis Regression: Methodology for studying the relationship among two or more variables Two major aims: Determine an appropriate model for the relationship between the variables Predict the
More informationInference for Regression Inference about the Regression Model and Using the Regression Line
Inference for Regression Inference about the Regression Model and Using the Regression Line PBS Chapter 10.1 and 10.2 2009 W.H. Freeman and Company Objectives (PBS Chapter 10.1 and 10.2) Inference about
More informationLecture 18: Simple Linear Regression
Lecture 18: Simple Linear Regression BIOS 553 Department of Biostatistics University of Michigan Fall 2004 The Correlation Coefficient: r The correlation coefficient (r) is a number that measures the strength
More informationOrdinary Least Squares Regression
Ordinary Least Squares Regression Goals for this unit More on notation and terminology OLS scalar versus matrix derivation Some Preliminaries In this class we will be learning to analyze Cross Section
More informationTopic 22 Analysis of Variance
Topic 22 Analysis of Variance Comparing Multiple Populations 1 / 14 Outline Overview One Way Analysis of Variance Sample Means Sums of Squares The F Statistic Confidence Intervals 2 / 14 Overview Two-sample
More informationMath 5305 Notes. Diagnostics and Remedial Measures. Jesse Crawford. Department of Mathematics Tarleton State University
Math 5305 Notes Diagnostics and Remedial Measures Jesse Crawford Department of Mathematics Tarleton State University (Tarleton State University) Diagnostics and Remedial Measures 1 / 44 Model Assumptions
More information