Chapter 14. Linear least squares

Size: px
Start display at page:

Download "Chapter 14. Linear least squares"

Transcription

1 Serik Sagitov, Chalmers and GU, March 5, 2018 Chapter 14 Linear least squares 1 Simple linear regression model A linear model for the random response Y = Y (x) to an independent variable X = x For a given set of values (x 1,, x n ) of the independent variable put Y i = β 0 + β 1 x i + ɛ i, i = 1,, n, assuming that the noise vector (ɛ 1,, ɛ n ) has independent N(0,σ 2 ) random components Given the data (y 1,, y n ), the model is characterised by the likelihood function of three parameters β 0, β 1, σ 2 L(β 0, β 1, σ 2 ) = n i=1 1 { exp (y i β 0 β 1 x i ) 2 } = (2π) n/2 σ n e S(β 0,β 1 ) 2πσ 2σ 2 2σ 2, where S(β 0, β 1 ) = n i=1 (y i β 0 β 1 x i ) 2 Observe that n 1 S(β 0, β 1 ) = n 1 n (y i β 0 β 1 x i ) 2 = β β 0 β 1 x 2β 0 ȳ 2β 1 xy + β1x y 2 i=1 Least squares estimates Regression lines: true y = β 0 + β 1 x and fitted y = b 0 + b 1 x We want to find (b 0, b 1 ) such that the observed responses y i are approximated by the predicted responses ŷ i = b 0 + b 1 x i in an optimal way Least squares method: find (b 0, b 1 ) minimising the sum of squares S(b 0, b 1 ) = (y i ŷ i ) 2 From S/ b 0 = 0 and S/ b 1 = 0 we get the so-called Normal Equations: xy xȳ { b0 + b 1 x = ȳ b 0 x + b 1 x 2 = xy implying b 1 = x 2 x 2 b 0 = ȳ b 1 x = rs y The least square regression line y = b 0 + b 1 x takes the form y = ȳ + r sy (x x) sample variances s 2 x = 1 (xi x) 2, s 2 n 1 y = 1 (yi ȳ) 2, n 1 sample covariance y = 1 (xi x)(y n 1 i ȳ), sample correlation coefficient r = sxy s y The least square estimates (b 0, b 1 ) are the maximum likelihood estimates of (β 0, β 1 ) The least square estimates (b 0, b 1 ) are not robust: outliers exert leverage on the fitted line 2 Residuals The estimated regression line predicts the responses to the values of the explanatory variable by ŷ i = ȳ + r s y (x i x) 1

2 The noise components of the observed responses y i is represented by the residuals which are dependent via e i = y i ŷ i = y i ȳ r s y (x i x), e e n = 0, x 1 e x n e n = 0, ŷ 1 e ŷ n e n = 0 Residuals e i have normal distributions with zero mean and k Var(e i ) = σ 2( 1 (x k x i ) 2 ), Cov(ei, e n(n 1)s 2 j ) = σ 2 k (x k x i )(x k x j ) x n(n 1)s 2 x The error sum of squares SSE = (y i ŷ i ) 2 = i e 2 i can be expressed as SSE = i (y i ȳ) 2 2r s y n(xy ȳ x) + r 2 s2 y s 2 x (x i x) 2 = (n 1)s 2 y(1 r 2 ) i This leads to the corrected maximum likelihood estimate of σ 2 : s 2 = SSE n 2 = n 1 n 2 s2 y(1 r 2 ) Using y i ȳ = ŷ i ȳ + e i we obtain a decomposition SST = SSR + SSE, where SST = i (y i ȳ) 2 = (n 1)s 2 y is the total sum of squares, SSR = i (ŷ i ȳ) 2 = (n 1)b 2 1s 2 x is the regression sum of squares Coefficient of determination r 2 = SSR SST = 1 SSE SST Coefficient of determination is the proportion of variation in Y explained by main factor X Thus r 2 has a more transparent meaning than the correlation coefficient r To test the normality assumption use the normal distribution plot for the standardized residuals e i where s i = s 1 are the estimated standard deviations of e i k (x k x i ) 2 n(n 1)s 2 x The expected plot of the standardised residuals versu i is a horizontal blur (linearity), variance does not depend on x (homoscedasticity) Example (flow rate vs stream depth) For this example with n = 10, the scatter plot looks slightly non-linear The residual plot gives a clearer picture having the U-shape After the log-log transformation, the scatter plot is closer to linear and the residual plot has a horizontal profile s i, 2

3 3 Confidence intervals and hypothesis testing The list square estimators (b 0, b 1 ) are unbiased and consistent Due to the normality assumption we have the following exact distributions b 0 N(β 0, σ 2 0), σ0 2 σ2 x = 2 i, n(n 1)s 2 x b 1 N(β 1, σ1), 2 σ1 2 = σ2, (n 1)s 2 x b 0 β 0 s b0 t n 2, s b0 = s x 2 i, n(n 1) b 1 β 1 s s b1 t n 2, s b1 = n 1 Weak dependence between the two estimators: Cov(b 0, b 1 ) = σ2 x (n 1)s 2 x Exact 100(1 α)% CI for β i : b i ± t n 2 ( α 2 ) s b i Hypothesis testing H 0 : β i = β i0 : test statistic T = b i β i0 s bi, exact null distribution T t n 2 Model utility test and zero-intercept test H 0 : β 1 = 0 (no relationship between X and Y ), test statistic T = b 1 /s b1, null distribution T t n 2 H 0 : β 0 = 0, test statistic T = b 0 /s b0, null distribution T t n 2 Intervals for individual observations Given x predict the value y for the random variable Y = β 0 +β 1 x+ɛ Its expected value µ = β 0 +β 1 x has the least square estimate ˆµ = b 0 + b 1 x The standard error of ˆµ is computed as the square root of Var(ˆµ) = σ2 + σ2 ( x x n n 1 ) 2 Exact 100(1 α)% confidence interval for the mean µ: b 0 + b 1 x ± t n 2 ( α 2 ) s 1 n + 1 n 1 ( x x ) 2 Exact 100(1 α)% prediction interval for y: b 0 + b 1 x ± t n 2 ( α 2 ) s n + 1 n 1 ( x x ) 2 Prediction interval has wider limits since it contains the uncertainty due the noise factors: Var(Y ˆµ) = σ 2 + Var(ˆµ) = σ 2 (1 + 1 n + 1 n 1 ( x x ) 2 ) Compare these two formulas by drawing the confidence bands around the regression line both for the individual observation y and the mean µ 4 Linear regression and ANOVA Recall the two independent samples case from Chapter 11: first sample µ 1 + ɛ 1,, µ 1 + ɛ n, second sample µ 2 + ɛ n+1,, µ 2 + ɛ n+m, where the noise variables are independent and identically distributed ɛ i N(0, σ 2 ) This setting is equivalent to the simple linear regression model with Y i = β 0 + β 1 x i + ɛ i, x 1 = = x n = 0, x n+1 = = x n+m = 1, µ 1 = β 0, µ 2 = β 0 + β 1 The model utility test H 0 : β 1 = 0 is equivalent to the equality test H 0 : µ 1 = µ 2 3

4 More generally, for the one-way ANOVA setting with I = p levels for the main factor and n = pj observations β 0 + ɛ i, i = 1,, J, β 0 + β 1 + ɛ i, i = J + 1,, 2J, we need a multiple linear regression model β 0 + β p 1 + ɛ i, i = (p 1)J + 1,, n, Y i = β 0 + β 1 x i,1 + + β p 1 x i,p 1 + ɛ i, i = 1,, n with dummy variable i,j taking values 0 and 1 so that x i,1 = 1 only for i = J + 1, 2J, x i,2 = 1 only for i = 2J + 1, 3J, x i,p 1 = 1 only for i = (p 1)J + 1,, n The corresponding design matrix has the following block form X = , where each block is a column vector of length J whose components are either or ones or zeros 5 Multiple linear regression Consider a linear regression model Y = β 0 + β 1 x β p 1 x p 1 + ɛ, ɛ N(0, σ 2 ) with p 1 explanatory variables and a homoscedastic noise This is an extension of the simple linear regression model with p = 2 The corresponding data set consists of observations (y 1,, y n ) with n > p, which are realisations of n independent random variables Y 1 = β 0 + β 1 x 1,1 + + β p 1 x 1,p 1 + ɛ 1, Y n = β 0 + β 1 x n,1 + + β p 1 x n,p 1 + ɛ n In the matrix notation the column vector y = (y 1,, y n ) T is a realisation of Y = Xβ + ɛ, where Y = (Y 1,, Y n ) T, β = (β 0,, β p 1 ) T, ɛ = (ɛ 1,, ɛ n ) T, 4

5 are column vectors, and X is the so called design matrix 1 x 1,1 x 1,p 1 X = 1 x n,1 x n,p 1 assumed to have rank p Least square estimates b = (b 0,, b p 1 ) T minimise S(b) = y Xb 2, where a is the length of a vector a Solving the normal equations X T Xb = X T y we find the least squares estimates being b = MX T y, M = (X T X) 1 Least squares multiple regression: predicted responses ŷ = Xb = Py, where P = XMX T Covariance matrix for the least square estimates Σ bb = σ 2 M is a p p matrix with elements Cov(b i, b j ) The vector of residuals e = y ŷ = (I P)y have a covariance matrix Σ ee = σ 2 (I P) An unbiased estimate of σ 2 is given by s 2 = SSE n p, where SSE = e 2 The standard error of b i is computed as s bj = s m jj, where m jj is a diagonal element of M Exact sampling distributions b j β j s bj t n p, j = 1,, p 1 Inspect the normal probability plot for the standardised residuals elements of P y i ŷ i s 1 p ii, where p ii are the diagonal Coefficient of multiple determination can be computed similarly to the simple linear regression model as R 2 = 1 SSE, where SST = (n SST 1)s2 y The problem with R 2 is that it increases even if irrelevant variables are added to the model To punish for irrelevant variables it is better to use the adjusted coefficient of multiple determination The adjustment factor n 1 n p Ra 2 = 1 n 1 SSE = 1 s2 n p SST gets larger for the larger values of p Example (flow rate vs stream depth) The multiple linear regression framework works for the quadratic model y = β 0 + β 1 x + β 2 x 2 The residuals show no sign of systematic misfit Linear and quadratic terms are statistically significant Coefficient Estimate Standard Error t Value β β β Emperical relationship developed in a region might break down, if extrapolated to a wider region in which no data been observed s 2 y 5

6 Example (catheter length) Doctors want predictions on heart catheter length depending on child s height and weight The pairwise scatterplots for the data of size n = 12 suggests two simple linear regressions Estimate Height t Value Weight t Value b 0 (s b0 ) 121(43) (20) 128 b 1 (s b1 ) 060(010) (004) 70 s r 2 (Ra) (076) 080 (078) The plots of standardised residuals do not contradict the normality assumptions The simple regression models should be compared to the multiple regression model L = β 0 + β 1 H + β 2 W, which gives b 0 = 21, s b0 = 88, b 0 /s b0 = 239, b 1 = 020, s b1 = 036, b 1 /s b1 = 056, b 2 = 019, s b2 = 017, b 2 /s b2 = 112, s = 39, R 2 = 081, Ra 2 = 077 In contrast to the simple models, we can not reject neither H 1 : β 1 = 0 nor H 2 : β 2 = 0 This paradox is explained by different meaning of the slope parameters in the simple and multiple regression models In the multiple model β 1 is the expected change in L when H increased by one unit and W held constant Collinearity problem: height and weight have a strong linear relationship The fitted plane has a well resolved slope along the line about which the (H, W ) points fall and poorly resolved slopes along the H and W axes Conclusion: since the simple model L = β 0 + β 1 W gives the highest adjusted coefficient of determination, there is little or no gain from adding H to the regression model model with a single explanatory variable W 6

Simple Linear Regression

Simple Linear Regression Simple Linear Regression In simple linear regression we are concerned about the relationship between two variables, X and Y. There are two components to such a relationship. 1. The strength of the relationship.

More information

Ch 2: Simple Linear Regression

Ch 2: Simple Linear Regression Ch 2: Simple Linear Regression 1. Simple Linear Regression Model A simple regression model with a single regressor x is y = β 0 + β 1 x + ɛ, where we assume that the error ɛ is independent random component

More information

Simple and Multiple Linear Regression

Simple and Multiple Linear Regression Sta. 113 Chapter 12 and 13 of Devore March 12, 2010 Table of contents 1 Simple Linear Regression 2 Model Simple Linear Regression A simple linear regression model is given by Y = β 0 + β 1 x + ɛ where

More information

Lecture 15 Multiple regression I Chapter 6 Set 2 Least Square Estimation The quadratic form to be minimized is

Lecture 15 Multiple regression I Chapter 6 Set 2 Least Square Estimation The quadratic form to be minimized is Lecture 15 Multiple regression I Chapter 6 Set 2 Least Square Estimation The quadratic form to be minimized is Q = (Y i β 0 β 1 X i1 β 2 X i2 β p 1 X i.p 1 ) 2, which in matrix notation is Q = (Y Xβ) (Y

More information

Ch 3: Multiple Linear Regression

Ch 3: Multiple Linear Regression Ch 3: Multiple Linear Regression 1. Multiple Linear Regression Model Multiple regression model has more than one regressor. For example, we have one response variable and two regressor variables: 1. delivery

More information

I i=1 1 I(J 1) j=1 (Y ij Ȳi ) 2. j=1 (Y j Ȳ )2 ] = 2n( is the two-sample t-test statistic.

I i=1 1 I(J 1) j=1 (Y ij Ȳi ) 2. j=1 (Y j Ȳ )2 ] = 2n( is the two-sample t-test statistic. Serik Sagitov, Chalmers and GU, February, 08 Solutions chapter Matlab commands: x = data matrix boxplot(x) anova(x) anova(x) Problem.3 Consider one-way ANOVA test statistic For I = and = n, put F = MS

More information

Applied Regression. Applied Regression. Chapter 2 Simple Linear Regression. Hongcheng Li. April, 6, 2013

Applied Regression. Applied Regression. Chapter 2 Simple Linear Regression. Hongcheng Li. April, 6, 2013 Applied Regression Chapter 2 Simple Linear Regression Hongcheng Li April, 6, 2013 Outline 1 Introduction of simple linear regression 2 Scatter plot 3 Simple linear regression model 4 Test of Hypothesis

More information

Inference for Regression

Inference for Regression Inference for Regression Section 9.4 Cathy Poliak, Ph.D. cathy@math.uh.edu Office in Fleming 11c Department of Mathematics University of Houston Lecture 13b - 3339 Cathy Poliak, Ph.D. cathy@math.uh.edu

More information

Math 423/533: The Main Theoretical Topics

Math 423/533: The Main Theoretical Topics Math 423/533: The Main Theoretical Topics Notation sample size n, data index i number of predictors, p (p = 2 for simple linear regression) y i : response for individual i x i = (x i1,..., x ip ) (1 p)

More information

LECTURE 6. Introduction to Econometrics. Hypothesis testing & Goodness of fit

LECTURE 6. Introduction to Econometrics. Hypothesis testing & Goodness of fit LECTURE 6 Introduction to Econometrics Hypothesis testing & Goodness of fit October 25, 2016 1 / 23 ON TODAY S LECTURE We will explain how multiple hypotheses are tested in a regression model We will define

More information

Multiple Linear Regression

Multiple Linear Regression Multiple Linear Regression Simple linear regression tries to fit a simple line between two variables Y and X. If X is linearly related to Y this explains some of the variability in Y. In most cases, there

More information

Linear Regression. In this problem sheet, we consider the problem of linear regression with p predictors and one intercept,

Linear Regression. In this problem sheet, we consider the problem of linear regression with p predictors and one intercept, Linear Regression In this problem sheet, we consider the problem of linear regression with p predictors and one intercept, y = Xβ + ɛ, where y t = (y 1,..., y n ) is the column vector of target values,

More information

Applied Regression Analysis

Applied Regression Analysis Applied Regression Analysis Chapter 3 Multiple Linear Regression Hongcheng Li April, 6, 2013 Recall simple linear regression 1 Recall simple linear regression 2 Parameter Estimation 3 Interpretations of

More information

Lectures on Simple Linear Regression Stat 431, Summer 2012

Lectures on Simple Linear Regression Stat 431, Summer 2012 Lectures on Simple Linear Regression Stat 43, Summer 0 Hyunseung Kang July 6-8, 0 Last Updated: July 8, 0 :59PM Introduction Previously, we have been investigating various properties of the population

More information

Chapter 4: Regression Models

Chapter 4: Regression Models Sales volume of company 1 Textbook: pp. 129-164 Chapter 4: Regression Models Money spent on advertising 2 Learning Objectives After completing this chapter, students will be able to: Identify variables,

More information

Section 3: Simple Linear Regression

Section 3: Simple Linear Regression Section 3: Simple Linear Regression Carlos M. Carvalho The University of Texas at Austin McCombs School of Business http://faculty.mccombs.utexas.edu/carlos.carvalho/teaching/ 1 Regression: General Introduction

More information

Confidence Intervals, Testing and ANOVA Summary

Confidence Intervals, Testing and ANOVA Summary Confidence Intervals, Testing and ANOVA Summary 1 One Sample Tests 1.1 One Sample z test: Mean (σ known) Let X 1,, X n a r.s. from N(µ, σ) or n > 30. Let The test statistic is H 0 : µ = µ 0. z = x µ 0

More information

Inference for Regression Simple Linear Regression

Inference for Regression Simple Linear Regression Inference for Regression Simple Linear Regression IPS Chapter 10.1 2009 W.H. Freeman and Company Objectives (IPS Chapter 10.1) Simple linear regression p Statistical model for linear regression p Estimating

More information

Linear Regression. Simple linear regression model determines the relationship between one dependent variable (y) and one independent variable (x).

Linear Regression. Simple linear regression model determines the relationship between one dependent variable (y) and one independent variable (x). Linear Regression Simple linear regression model determines the relationship between one dependent variable (y) and one independent variable (x). A dependent variable is a random variable whose variation

More information

Inference for the Regression Coefficient

Inference for the Regression Coefficient Inference for the Regression Coefficient Recall, b 0 and b 1 are the estimates of the slope β 1 and intercept β 0 of population regression line. We can shows that b 0 and b 1 are the unbiased estimates

More information

K. Model Diagnostics. residuals ˆɛ ij = Y ij ˆµ i N = Y ij Ȳ i semi-studentized residuals ω ij = ˆɛ ij. studentized deleted residuals ɛ ij =

K. Model Diagnostics. residuals ˆɛ ij = Y ij ˆµ i N = Y ij Ȳ i semi-studentized residuals ω ij = ˆɛ ij. studentized deleted residuals ɛ ij = K. Model Diagnostics We ve already seen how to check model assumptions prior to fitting a one-way ANOVA. Diagnostics carried out after model fitting by using residuals are more informative for assessing

More information

Simple Linear Regression

Simple Linear Regression Simple Linear Regression ST 430/514 Recall: A regression model describes how a dependent variable (or response) Y is affected, on average, by one or more independent variables (or factors, or covariates)

More information

Lecture 10 Multiple Linear Regression

Lecture 10 Multiple Linear Regression Lecture 10 Multiple Linear Regression STAT 512 Spring 2011 Background Reading KNNL: 6.1-6.5 10-1 Topic Overview Multiple Linear Regression Model 10-2 Data for Multiple Regression Y i is the response variable

More information

Regression Analysis II

Regression Analysis II Regression Analysis II Measures of Goodness of fit Two measures of Goodness of fit Measure of the absolute fit of the sample points to the sample regression line Standard error of the estimate An index

More information

Linear regression. We have that the estimated mean in linear regression is. ˆµ Y X=x = ˆβ 0 + ˆβ 1 x. The standard error of ˆµ Y X=x is.

Linear regression. We have that the estimated mean in linear regression is. ˆµ Y X=x = ˆβ 0 + ˆβ 1 x. The standard error of ˆµ Y X=x is. Linear regression We have that the estimated mean in linear regression is The standard error of ˆµ Y X=x is where x = 1 n s.e.(ˆµ Y X=x ) = σ ˆµ Y X=x = ˆβ 0 + ˆβ 1 x. 1 n + (x x)2 i (x i x) 2 i x i. The

More information

Correlation Analysis

Correlation Analysis Simple Regression Correlation Analysis Correlation analysis is used to measure strength of the association (linear relationship) between two variables Correlation is only concerned with strength of the

More information

STAT5044: Regression and Anova. Inyoung Kim

STAT5044: Regression and Anova. Inyoung Kim STAT5044: Regression and Anova Inyoung Kim 2 / 51 Outline 1 Matrix Expression 2 Linear and quadratic forms 3 Properties of quadratic form 4 Properties of estimates 5 Distributional properties 3 / 51 Matrix

More information

Mathematics for Economics MA course

Mathematics for Economics MA course Mathematics for Economics MA course Simple Linear Regression Dr. Seetha Bandara Simple Regression Simple linear regression is a statistical method that allows us to summarize and study relationships between

More information

5.1 Model Specification and Data 5.2 Estimating the Parameters of the Multiple Regression Model 5.3 Sampling Properties of the Least Squares

5.1 Model Specification and Data 5.2 Estimating the Parameters of the Multiple Regression Model 5.3 Sampling Properties of the Least Squares 5.1 Model Specification and Data 5. Estimating the Parameters of the Multiple Regression Model 5.3 Sampling Properties of the Least Squares Estimator 5.4 Interval Estimation 5.5 Hypothesis Testing for

More information

Table of z values and probabilities for the standard normal distribution. z is the first column plus the top row. Each cell shows P(X z).

Table of z values and probabilities for the standard normal distribution. z is the first column plus the top row. Each cell shows P(X z). Table of z values and probabilities for the standard normal distribution. z is the first column plus the top row. Each cell shows P(X z). For example P(X.04) =.8508. For z < 0 subtract the value from,

More information

MAT2377. Rafa l Kulik. Version 2015/November/26. Rafa l Kulik

MAT2377. Rafa l Kulik. Version 2015/November/26. Rafa l Kulik MAT2377 Rafa l Kulik Version 2015/November/26 Rafa l Kulik Bivariate data and scatterplot Data: Hydrocarbon level (x) and Oxygen level (y): x: 0.99, 1.02, 1.15, 1.29, 1.46, 1.36, 0.87, 1.23, 1.55, 1.40,

More information

STAT 511. Lecture : Simple linear regression Devore: Section Prof. Michael Levine. December 3, Levine STAT 511

STAT 511. Lecture : Simple linear regression Devore: Section Prof. Michael Levine. December 3, Levine STAT 511 STAT 511 Lecture : Simple linear regression Devore: Section 12.1-12.4 Prof. Michael Levine December 3, 2018 A simple linear regression investigates the relationship between the two variables that is not

More information

Basic Business Statistics 6 th Edition

Basic Business Statistics 6 th Edition Basic Business Statistics 6 th Edition Chapter 12 Simple Linear Regression Learning Objectives In this chapter, you learn: How to use regression analysis to predict the value of a dependent variable based

More information

Scatter plot of data from the study. Linear Regression

Scatter plot of data from the study. Linear Regression 1 2 Linear Regression Scatter plot of data from the study. Consider a study to relate birthweight to the estriol level of pregnant women. The data is below. i Weight (g / 100) i Weight (g / 100) 1 7 25

More information

Final Review. Yang Feng. Yang Feng (Columbia University) Final Review 1 / 58

Final Review. Yang Feng.   Yang Feng (Columbia University) Final Review 1 / 58 Final Review Yang Feng http://www.stat.columbia.edu/~yangfeng Yang Feng (Columbia University) Final Review 1 / 58 Outline 1 Multiple Linear Regression (Estimation, Inference) 2 Special Topics for Multiple

More information

Quantitative Analysis of Financial Markets. Summary of Part II. Key Concepts & Formulas. Christopher Ting. November 11, 2017

Quantitative Analysis of Financial Markets. Summary of Part II. Key Concepts & Formulas. Christopher Ting. November 11, 2017 Summary of Part II Key Concepts & Formulas Christopher Ting November 11, 2017 christopherting@smu.edu.sg http://www.mysmu.edu/faculty/christophert/ Christopher Ting 1 of 16 Why Regression Analysis? Understand

More information

MA 575 Linear Models: Cedric E. Ginestet, Boston University Mixed Effects Estimation, Residuals Diagnostics Week 11, Lecture 1

MA 575 Linear Models: Cedric E. Ginestet, Boston University Mixed Effects Estimation, Residuals Diagnostics Week 11, Lecture 1 MA 575 Linear Models: Cedric E Ginestet, Boston University Mixed Effects Estimation, Residuals Diagnostics Week 11, Lecture 1 1 Within-group Correlation Let us recall the simple two-level hierarchical

More information

Probability and Statistics Notes

Probability and Statistics Notes Probability and Statistics Notes Chapter Seven Jesse Crawford Department of Mathematics Tarleton State University Spring 2011 (Tarleton State University) Chapter Seven Notes Spring 2011 1 / 42 Outline

More information

Concordia University (5+5)Q 1.

Concordia University (5+5)Q 1. (5+5)Q 1. Concordia University Department of Mathematics and Statistics Course Number Section Statistics 360/1 40 Examination Date Time Pages Mid Term Test May 26, 2004 Two Hours 3 Instructor Course Examiner

More information

STA 302 H1F / 1001 HF Fall 2007 Test 1 October 24, 2007

STA 302 H1F / 1001 HF Fall 2007 Test 1 October 24, 2007 STA 302 H1F / 1001 HF Fall 2007 Test 1 October 24, 2007 LAST NAME: SOLUTIONS FIRST NAME: STUDENT NUMBER: ENROLLED IN: (circle one) STA 302 STA 1001 INSTRUCTIONS: Time: 90 minutes Aids allowed: calculator.

More information

AMS 315/576 Lecture Notes. Chapter 11. Simple Linear Regression

AMS 315/576 Lecture Notes. Chapter 11. Simple Linear Regression AMS 315/576 Lecture Notes Chapter 11. Simple Linear Regression 11.1 Motivation A restaurant opening on a reservations-only basis would like to use the number of advance reservations x to predict the number

More information

Chapter 2 Multiple Regression I (Part 1)

Chapter 2 Multiple Regression I (Part 1) Chapter 2 Multiple Regression I (Part 1) 1 Regression several predictor variables The response Y depends on several predictor variables X 1,, X p response {}}{ Y predictor variables {}}{ X 1, X 2,, X p

More information

Lecture 14 Simple Linear Regression

Lecture 14 Simple Linear Regression Lecture 4 Simple Linear Regression Ordinary Least Squares (OLS) Consider the following simple linear regression model where, for each unit i, Y i is the dependent variable (response). X i is the independent

More information

where x and ȳ are the sample means of x 1,, x n

where x and ȳ are the sample means of x 1,, x n y y Animal Studies of Side Effects Simple Linear Regression Basic Ideas In simple linear regression there is an approximately linear relation between two variables say y = pressure in the pancreas x =

More information

Correlation and the Analysis of Variance Approach to Simple Linear Regression

Correlation and the Analysis of Variance Approach to Simple Linear Regression Correlation and the Analysis of Variance Approach to Simple Linear Regression Biometry 755 Spring 2009 Correlation and the Analysis of Variance Approach to Simple Linear Regression p. 1/35 Correlation

More information

Summer School in Statistics for Astronomers V June 1 - June 6, Regression. Mosuk Chow Statistics Department Penn State University.

Summer School in Statistics for Astronomers V June 1 - June 6, Regression. Mosuk Chow Statistics Department Penn State University. Summer School in Statistics for Astronomers V June 1 - June 6, 2009 Regression Mosuk Chow Statistics Department Penn State University. Adapted from notes prepared by RL Karandikar Mean and variance Recall

More information

Scatter plot of data from the study. Linear Regression

Scatter plot of data from the study. Linear Regression 1 2 Linear Regression Scatter plot of data from the study. Consider a study to relate birthweight to the estriol level of pregnant women. The data is below. i Weight (g / 100) i Weight (g / 100) 1 7 25

More information

Need for Several Predictor Variables

Need for Several Predictor Variables Multiple regression One of the most widely used tools in statistical analysis Matrix expressions for multiple regression are the same as for simple linear regression Need for Several Predictor Variables

More information

Formal Statement of Simple Linear Regression Model

Formal Statement of Simple Linear Regression Model Formal Statement of Simple Linear Regression Model Y i = β 0 + β 1 X i + ɛ i Y i value of the response variable in the i th trial β 0 and β 1 are parameters X i is a known constant, the value of the predictor

More information

Inference for Regression Inference about the Regression Model and Using the Regression Line, with Details. Section 10.1, 2, 3

Inference for Regression Inference about the Regression Model and Using the Regression Line, with Details. Section 10.1, 2, 3 Inference for Regression Inference about the Regression Model and Using the Regression Line, with Details Section 10.1, 2, 3 Basic components of regression setup Target of inference: linear dependency

More information

STAT5044: Regression and Anova. Inyoung Kim

STAT5044: Regression and Anova. Inyoung Kim STAT5044: Regression and Anova Inyoung Kim 2 / 47 Outline 1 Regression 2 Simple Linear regression 3 Basic concepts in regression 4 How to estimate unknown parameters 5 Properties of Least Squares Estimators:

More information

STA121: Applied Regression Analysis

STA121: Applied Regression Analysis STA121: Applied Regression Analysis Linear Regression Analysis - Chapters 3 and 4 in Dielman Artin Department of Statistical Science September 15, 2009 Outline 1 Simple Linear Regression Analysis 2 Using

More information

Lecture 18: Simple Linear Regression

Lecture 18: Simple Linear Regression Lecture 18: Simple Linear Regression BIOS 553 Department of Biostatistics University of Michigan Fall 2004 The Correlation Coefficient: r The correlation coefficient (r) is a number that measures the strength

More information

Simple Regression Model Setup Estimation Inference Prediction. Model Diagnostic. Multiple Regression. Model Setup and Estimation.

Simple Regression Model Setup Estimation Inference Prediction. Model Diagnostic. Multiple Regression. Model Setup and Estimation. Statistical Computation Math 475 Jimin Ding Department of Mathematics Washington University in St. Louis www.math.wustl.edu/ jmding/math475/index.html October 10, 2013 Ridge Part IV October 10, 2013 1

More information

Lecture 3: Linear Models. Bruce Walsh lecture notes Uppsala EQG course version 28 Jan 2012

Lecture 3: Linear Models. Bruce Walsh lecture notes Uppsala EQG course version 28 Jan 2012 Lecture 3: Linear Models Bruce Walsh lecture notes Uppsala EQG course version 28 Jan 2012 1 Quick Review of the Major Points The general linear model can be written as y = X! + e y = vector of observed

More information

Chapter 1: Linear Regression with One Predictor Variable also known as: Simple Linear Regression Bivariate Linear Regression

Chapter 1: Linear Regression with One Predictor Variable also known as: Simple Linear Regression Bivariate Linear Regression BSTT523: Kutner et al., Chapter 1 1 Chapter 1: Linear Regression with One Predictor Variable also known as: Simple Linear Regression Bivariate Linear Regression Introduction: Functional relation between

More information

Unit 10: Simple Linear Regression and Correlation

Unit 10: Simple Linear Regression and Correlation Unit 10: Simple Linear Regression and Correlation Statistics 571: Statistical Methods Ramón V. León 6/28/2004 Unit 10 - Stat 571 - Ramón V. León 1 Introductory Remarks Regression analysis is a method for

More information

MA 575 Linear Models: Cedric E. Ginestet, Boston University Midterm Review Week 7

MA 575 Linear Models: Cedric E. Ginestet, Boston University Midterm Review Week 7 MA 575 Linear Models: Cedric E. Ginestet, Boston University Midterm Review Week 7 1 Random Vectors Let a 0 and y be n 1 vectors, and let A be an n n matrix. Here, a 0 and A are non-random, whereas y is

More information

Regression used to predict or estimate the value of one variable corresponding to a given value of another variable.

Regression used to predict or estimate the value of one variable corresponding to a given value of another variable. CHAPTER 9 Simple Linear Regression and Correlation Regression used to predict or estimate the value of one variable corresponding to a given value of another variable. X = independent variable. Y = dependent

More information

Simple Linear Regression. (Chs 12.1, 12.2, 12.4, 12.5)

Simple Linear Regression. (Chs 12.1, 12.2, 12.4, 12.5) 10 Simple Linear Regression (Chs 12.1, 12.2, 12.4, 12.5) Simple Linear Regression Rating 20 40 60 80 0 5 10 15 Sugar 2 Simple Linear Regression Rating 20 40 60 80 0 5 10 15 Sugar 3 Simple Linear Regression

More information

Single and multiple linear regression analysis

Single and multiple linear regression analysis Single and multiple linear regression analysis Marike Cockeran 2017 Introduction Outline of the session Simple linear regression analysis SPSS example of simple linear regression analysis Additional topics

More information

ECON 450 Development Economics

ECON 450 Development Economics ECON 450 Development Economics Statistics Background University of Illinois at Urbana-Champaign Summer 2017 Outline 1 Introduction 2 3 4 5 Introduction Regression analysis is one of the most important

More information

Analysing data: regression and correlation S6 and S7

Analysing data: regression and correlation S6 and S7 Basic medical statistics for clinical and experimental research Analysing data: regression and correlation S6 and S7 K. Jozwiak k.jozwiak@nki.nl 2 / 49 Correlation So far we have looked at the association

More information

CS 5014: Research Methods in Computer Science

CS 5014: Research Methods in Computer Science Computer Science Clifford A. Shaffer Department of Computer Science Virginia Tech Blacksburg, Virginia Fall 2010 Copyright c 2010 by Clifford A. Shaffer Computer Science Fall 2010 1 / 207 Correlation and

More information

SIMPLE REGRESSION ANALYSIS. Business Statistics

SIMPLE REGRESSION ANALYSIS. Business Statistics SIMPLE REGRESSION ANALYSIS Business Statistics CONTENTS Ordinary least squares (recap for some) Statistical formulation of the regression model Assessing the regression model Testing the regression coefficients

More information

Unit 6 - Simple linear regression

Unit 6 - Simple linear regression Sta 101: Data Analysis and Statistical Inference Dr. Çetinkaya-Rundel Unit 6 - Simple linear regression LO 1. Define the explanatory variable as the independent variable (predictor), and the response variable

More information

CHAPTER EIGHT Linear Regression

CHAPTER EIGHT Linear Regression 7 CHAPTER EIGHT Linear Regression 8. Scatter Diagram Example 8. A chemical engineer is investigating the effect of process operating temperature ( x ) on product yield ( y ). The study results in the following

More information

Inferences for Regression

Inferences for Regression Inferences for Regression An Example: Body Fat and Waist Size Looking at the relationship between % body fat and waist size (in inches). Here is a scatterplot of our data set: Remembering Regression In

More information

Simple Linear Regression. Material from Devore s book (Ed 8), and Cengagebrain.com

Simple Linear Regression. Material from Devore s book (Ed 8), and Cengagebrain.com 12 Simple Linear Regression Material from Devore s book (Ed 8), and Cengagebrain.com The Simple Linear Regression Model The simplest deterministic mathematical relationship between two variables x and

More information

Lecture 6 Multiple Linear Regression, cont.

Lecture 6 Multiple Linear Regression, cont. Lecture 6 Multiple Linear Regression, cont. BIOST 515 January 22, 2004 BIOST 515, Lecture 6 Testing general linear hypotheses Suppose we are interested in testing linear combinations of the regression

More information

Chapter 12. Analysis of variance

Chapter 12. Analysis of variance Serik Sagitov, Chalmers and GU, January 9, 016 Chapter 1. Analysis of variance Chapter 11: I = samples independent samples paired samples Chapter 1: I 3 samples of equal size J one-way layout two-way layout

More information

Regression. Marc H. Mehlman University of New Haven

Regression. Marc H. Mehlman University of New Haven Regression Marc H. Mehlman marcmehlman@yahoo.com University of New Haven the statistician knows that in nature there never was a normal distribution, there never was a straight line, yet with normal and

More information

Simple Linear Regression

Simple Linear Regression Simple Linear Regression OI CHAPTER 7 Important Concepts Correlation (r or R) and Coefficient of determination (R 2 ) Interpreting y-intercept and slope coefficients Inference (hypothesis testing and confidence

More information

Y i = η + ɛ i, i = 1,...,n.

Y i = η + ɛ i, i = 1,...,n. Nonparametric tests If data do not come from a normal population (and if the sample is not large), we cannot use a t-test. One useful approach to creating test statistics is through the use of rank statistics.

More information

STAT 540: Data Analysis and Regression

STAT 540: Data Analysis and Regression STAT 540: Data Analysis and Regression Wen Zhou http://www.stat.colostate.edu/~riczw/ Email: riczw@stat.colostate.edu Department of Statistics Colorado State University Fall 205 W. Zhou (Colorado State

More information

Inference for Regression Inference about the Regression Model and Using the Regression Line

Inference for Regression Inference about the Regression Model and Using the Regression Line Inference for Regression Inference about the Regression Model and Using the Regression Line PBS Chapter 10.1 and 10.2 2009 W.H. Freeman and Company Objectives (PBS Chapter 10.1 and 10.2) Inference about

More information

Ma 3/103: Lecture 25 Linear Regression II: Hypothesis Testing and ANOVA

Ma 3/103: Lecture 25 Linear Regression II: Hypothesis Testing and ANOVA Ma 3/103: Lecture 25 Linear Regression II: Hypothesis Testing and ANOVA March 6, 2017 KC Border Linear Regression II March 6, 2017 1 / 44 1 OLS estimator 2 Restricted regression 3 Errors in variables 4

More information

BNAD 276 Lecture 10 Simple Linear Regression Model

BNAD 276 Lecture 10 Simple Linear Regression Model 1 / 27 BNAD 276 Lecture 10 Simple Linear Regression Model Phuong Ho May 30, 2017 2 / 27 Outline 1 Introduction 2 3 / 27 Outline 1 Introduction 2 4 / 27 Simple Linear Regression Model Managerial decisions

More information

Finding Relationships Among Variables

Finding Relationships Among Variables Finding Relationships Among Variables BUS 230: Business and Economic Research and Communication 1 Goals Specific goals: Re-familiarize ourselves with basic statistics ideas: sampling distributions, hypothesis

More information

6. Multiple Linear Regression

6. Multiple Linear Regression 6. Multiple Linear Regression SLR: 1 predictor X, MLR: more than 1 predictor Example data set: Y i = #points scored by UF football team in game i X i1 = #games won by opponent in their last 10 games X

More information

Outline. Remedial Measures) Extra Sums of Squares Standardized Version of the Multiple Regression Model

Outline. Remedial Measures) Extra Sums of Squares Standardized Version of the Multiple Regression Model Outline 1 Multiple Linear Regression (Estimation, Inference, Diagnostics and Remedial Measures) 2 Special Topics for Multiple Regression Extra Sums of Squares Standardized Version of the Multiple Regression

More information

Simple Linear Regression

Simple Linear Regression Simple Linear Regression EdPsych 580 C.J. Anderson Fall 2005 Simple Linear Regression p. 1/80 Outline 1. What it is and why it s useful 2. How 3. Statistical Inference 4. Examining assumptions (diagnostics)

More information

CAS MA575 Linear Models

CAS MA575 Linear Models CAS MA575 Linear Models Boston University, Fall 2013 Midterm Exam (Correction) Instructor: Cedric Ginestet Date: 22 Oct 2013. Maximal Score: 200pts. Please Note: You will only be graded on work and answers

More information

SSR = The sum of squared errors measures how much Y varies around the regression line n. It happily turns out that SSR + SSE = SSTO.

SSR = The sum of squared errors measures how much Y varies around the regression line n. It happily turns out that SSR + SSE = SSTO. Analysis of variance approach to regression If x is useless, i.e. β 1 = 0, then E(Y i ) = β 0. In this case β 0 is estimated by Ȳ. The ith deviation about this grand mean can be written: deviation about

More information

Regression Analysis. Regression: Methodology for studying the relationship among two or more variables

Regression Analysis. Regression: Methodology for studying the relationship among two or more variables Regression Analysis Regression: Methodology for studying the relationship among two or more variables Two major aims: Determine an appropriate model for the relationship between the variables Predict the

More information

Business Statistics. Tommaso Proietti. Linear Regression. DEF - Università di Roma 'Tor Vergata'

Business Statistics. Tommaso Proietti. Linear Regression. DEF - Università di Roma 'Tor Vergata' Business Statistics Tommaso Proietti DEF - Università di Roma 'Tor Vergata' Linear Regression Specication Let Y be a univariate quantitative response variable. We model Y as follows: Y = f(x) + ε where

More information

Lecture 9: Linear Regression

Lecture 9: Linear Regression Lecture 9: Linear Regression Goals Develop basic concepts of linear regression from a probabilistic framework Estimating parameters and hypothesis testing with linear models Linear regression in R Regression

More information

1. Least squares with more than one predictor

1. Least squares with more than one predictor Statistics 1 Lecture ( November ) c David Pollard Page 1 Read M&M Chapter (skip part on logistic regression, pages 730 731). Read M&M pages 1, for ANOVA tables. Multiple regression. 1. Least squares with

More information

STAT763: Applied Regression Analysis. Multiple linear regression. 4.4 Hypothesis testing

STAT763: Applied Regression Analysis. Multiple linear regression. 4.4 Hypothesis testing STAT763: Applied Regression Analysis Multiple linear regression 4.4 Hypothesis testing Chunsheng Ma E-mail: cma@math.wichita.edu 4.4.1 Significance of regression Null hypothesis (Test whether all β j =

More information

9. Linear Regression and Correlation

9. Linear Regression and Correlation 9. Linear Regression and Correlation Data: y a quantitative response variable x a quantitative explanatory variable (Chap. 8: Recall that both variables were categorical) For example, y = annual income,

More information

Coefficient of Determination

Coefficient of Determination Coefficient of Determination ST 430/514 The coefficient of determination, R 2, is defined as before: R 2 = 1 SS E (yi ŷ i ) = 1 2 SS yy (yi ȳ) 2 The interpretation of R 2 is still the fraction of variance

More information

PART I. (a) Describe all the assumptions for a normal error regression model with one predictor variable,

PART I. (a) Describe all the assumptions for a normal error regression model with one predictor variable, Concordia University Department of Mathematics and Statistics Course Number Section Statistics 360/2 01 Examination Date Time Pages Final December 2002 3 hours 6 Instructors Course Examiner Marks Y.P.

More information

Hypothesis Testing hypothesis testing approach

Hypothesis Testing hypothesis testing approach Hypothesis Testing In this case, we d be trying to form an inference about that neighborhood: Do people there shop more often those people who are members of the larger population To ascertain this, we

More information

Chapter 10. Regression. Understandable Statistics Ninth Edition By Brase and Brase Prepared by Yixun Shi Bloomsburg University of Pennsylvania

Chapter 10. Regression. Understandable Statistics Ninth Edition By Brase and Brase Prepared by Yixun Shi Bloomsburg University of Pennsylvania Chapter 10 Regression Understandable Statistics Ninth Edition By Brase and Brase Prepared by Yixun Shi Bloomsburg University of Pennsylvania Scatter Diagrams A graph in which pairs of points, (x, y), are

More information

STAT Chapter 11: Regression

STAT Chapter 11: Regression STAT 515 -- Chapter 11: Regression Mostly we have studied the behavior of a single random variable. Often, however, we gather data on two random variables. We wish to determine: Is there a relationship

More information

[y i α βx i ] 2 (2) Q = i=1

[y i α βx i ] 2 (2) Q = i=1 Least squares fits This section has no probability in it. There are no random variables. We are given n points (x i, y i ) and want to find the equation of the line that best fits them. We take the equation

More information

School of Mathematical Sciences. Question 1

School of Mathematical Sciences. Question 1 School of Mathematical Sciences MTH5120 Statistical Modelling I Practical 8 and Assignment 7 Solutions Question 1 Figure 1: The residual plots do not contradict the model assumptions of normality, constant

More information

Inference in Normal Regression Model. Dr. Frank Wood

Inference in Normal Regression Model. Dr. Frank Wood Inference in Normal Regression Model Dr. Frank Wood Remember We know that the point estimator of b 1 is b 1 = (Xi X )(Y i Ȳ ) (Xi X ) 2 Last class we derived the sampling distribution of b 1, it being

More information

Chapter 6 Multiple Regression

Chapter 6 Multiple Regression STAT 525 FALL 2018 Chapter 6 Multiple Regression Professor Min Zhang The Data and Model Still have single response variable Y Now have multiple explanatory variables Examples: Blood Pressure vs Age, Weight,

More information

Variance. Standard deviation VAR = = value. Unbiased SD = SD = 10/23/2011. Functional Connectivity Correlation and Regression.

Variance. Standard deviation VAR = = value. Unbiased SD = SD = 10/23/2011. Functional Connectivity Correlation and Regression. 10/3/011 Functional Connectivity Correlation and Regression Variance VAR = Standard deviation Standard deviation SD = Unbiased SD = 1 10/3/011 Standard error Confidence interval SE = CI = = t value for

More information