Inference. ME104: Linear Regression Analysis Kenneth Benoit. August 15, August 15, 2012 Lecture 3 Multiple linear regression 1 1 / 58

Size: px

Start display at page:

Download "Inference. ME104: Linear Regression Analysis Kenneth Benoit. August 15, August 15, 2012 Lecture 3 Multiple linear regression 1 1 / 58"

Jared Maxwell
5 years ago
Views:

1 Inference ME104: Linear Regression Analysis Kenneth Benoit August 15, 2012 August 15, 2012 Lecture 3 Multiple linear regression 1 1 / 58

2 Stata output resvisited. reg votes1st spend_total incumb minister spendxinc Source SS df MS Number of obs = F( 4, 457) = Model e Prob > F = Residual e R-squared = Adj R-squared = Total e Root MSE = votes1st Coef. Std. Err. t P> t [95% Conf. Interval] spend_total incumb minister spendxinc _cons August 15, 2012 Lecture 3 Multiple linear regression 1 2 / 58

3 R 2 R 2 R 2 = SSM (1) TSS (ŷi ŷ) 2 = (yi ȳ) 2 (2) e + 09 = e + 09 (3) = (4) Adjusted R 2 Radj 2 = 1 (1 n 1 R2 ) n k 1 (5) = 1 ( ) (6) = (7) August 15, 2012 Lecture 3 Multiple linear regression 1 3 / 58

4 Root MSE (and F ) Root MSE = estimate of σ ˆσ = SSE/df resid (8) = (1.4739e + 09)/457 (9) = (10) = (11) F -test This is the test of the null hypothesis that the joint effect of all independent variables is zero more on this shortly August 15, 2012 Lecture 3 Multiple linear regression 1 4 / 58

5 How to compute R 2 and ˆσ from this output? Valid cases: 4274 Dependent variable: disprls Missing cases: 0 Deletion method: None Total SS: Degrees of freedom: 4251 R-squared: Rbar-squared: Residual SS: Std error of est: F(22,4251): Probability of F: Standard Prob Standardized Cor with Variable Estimate Error t-value > t Estimate Dep Var CONSTANT HSL*m HSL SL*m SL MSL*m MSL dh*m dh LRH*m LRH LRDr*m LRDr LRI*m LRI ImpHA*m ImpHA EqP*m EqP Dan*m Dan Ad*m Ad August 15, 2012 Lecture 3 Multiple linear regression 1 5 / 58

6 OLS computation of best fitting line This is incredibly powerful: Y = X β + ɛ X Y = X X β + X ɛ X Y = X X β + 0 (X X ) 1 X Y = β + 0 β = (X X ) 1 X Y But it does not tell us how uncertain is our estimate of β in any probabalistic, comparative sense August 15, 2012 Lecture 3 Multiple linear regression 1 6 / 58

7 Distributional assumptions Distributions are used to assess uncertainty Many standard parametric tests are associated with interpreting uncertainty of regression results, including those from the CLRM/OLS t distributions z distributions F distributions χ 2 distributions In small samples, the applicability of these distributions depends on the errors being distributed normally In larger samples, the asymptotic properties of these distributions means the results hold even when errors are not distributed normally August 15, 2012 Lecture 3 Multiple linear regression 1 7 / 58

8 Distribution of ˆβ ˆβ N(β, (X X ) 1 σ 2 ) in repeated samples The variances of ˆβ will be the diagonal elements from the variance-covariance matrix of ˆβ Problem: variance covariance matrix is not usually known, because σ 2 is not usually known (and Var(β) = σ 2 / (x i x) 2 ) But by using s 2 as an estimate of σ 2, we can use the square root of the kth diagonal element of the variance-covariance matrix to estimate the standard error of ˆβ k, which will be t-distributed August 15, 2012 Lecture 3 Multiple linear regression 1 8 / 58

9 t-tests for individual ˆβ As just stated, the sampling distribution of ˆβ will be t-distributed, with n k 1 degrees of freedom The empirical t-value will be the coefficient estimate divided by its standard error This yields a t-value that is compared with the critical value for t with n k 1 degrees of freedom Example in Stata: August 15, 2012 Lecture 3 Multiple linear regression 1 9 / 58

10 OLS Example in Stata. reg votes1st spend_total incumb minister spendxinc Source SS df MS Number of obs = F( 4, 457) = Model e Prob > F = Residual e R-squared = Adj R-squared = Total e Root MSE = votes1st Coef. Std. Err. t P> t [95% Conf. Interval] spend_total incumb minister spendxinc _cons August 15, 2012 Lecture 3 Multiple linear regression 1 10 / 58

11 Interpretation of regression coefficients Each coefficient in a multiple regression model describes the association between an explanatory variable and the response, controlling for the other explanatory variables These are known as partial associations For example, consider the coefficient β k of X k : µ = (α + β 1 X β k 1 X k 1 ) + β k X k = (Others) + β k X k Here the (Others) bit depends on the other explanatory variables X 1,..., X k 1 but not on X k If now X k increases by 1 unit and the other explanatory variables remain unchanged, µ changes by β k units August 15, 2012 Lecture 3 Multiple linear regression 1 11 / 58

12 Interpretation of regression coefficients In general, The coefficient βj of an explanatory variable X j shows the change in the expected value of Y when X j increases by 1 unit while all the other explanatory variables are held constant...or, in other words, the expected change in Y when Xj increases by 1 unit, controlling for the other explanatory variables For example, in the model shown below, the (estimated) coefficient of education is ˆβ education = Thus every one-year increase in education completed increases the expected General Health Index by 0.99 points, controlling for age and family income August 15, 2012 Lecture 3 Multiple linear regression 1 12 / 58

13 A fitted model for GHI Response variable: General Health Index Explanatory variable ˆβ s.e. t P-value 95% CI Constant Age < (-0.190; ) Education < (0.709; 1.272) Fam. Income < (0.151; 0.398) ˆσ = 14.6; R 2 = 0.061; n = 1699; df = 1695 August 15, 2012 Lecture 3 Multiple linear regression 1 13 / 58

14 The uninteresting parameters The remaining two parameters of the model are necessary but uninteresting for substantive interpretation: The constant term α is the expected value of Y when all explanatory variables are 0 The residual standard deviation σ is the standard deviation of Y given (any) single set of values for X 1,..., X k i.e. the standard deviation around the fitted regression surface at any given point of it August 15, 2012 Lecture 3 Multiple linear regression 1 14 / 58

15 Estimation of the parameters The fitted values of Y are Ŷ i = ˆα + ˆβ 1 X 1i + ˆβ 2 X 2i + + ˆβ k X ki for all observations i = 1,..., n in the sample We would like to select the values of the estimated coefficients ˆα, ˆβ 1,..., ˆβ k so that the fitted values are a good match to observed values Y 1,..., Y n This is done by finding estimates which minimize the error sum of squares SSE = (Y i Ŷ i ) 2 These are the (ordinary) least squares (OLS) estimates of the coefficients August 15, 2012 Lecture 3 Multiple linear regression 1 15 / 58

16 Estimation of the parameters Y i Ŷ i Y Y i Ȳ Ŷ i Ȳ Ŷ = ˆα + ˆβX Y i Ŷi Ȳ X August 15, 2012 Lecture 3 Multiple linear regression 1 16 / 58

17 Estimation of the parameters Reminder: For the simple (one-x ) linear model, the fitted values define a straight line: Infant Mortality Rate School enrolment August 15, 2012 Lecture 3 Multiple linear regression 1 17 / 58

18 Estimation of the parameters Least squares estimates ˆβ j of the coefficients are easily calculated by a computer Also produced are estimated standard errors ŝe( ˆβ j ) of the estimated coefficients Also produced is an estimate ˆσ of the residual standard deviation August 15, 2012 Lecture 3 Multiple linear regression 1 18 / 58

19 Fitted values for interpretation A fitted model can be interpreted using the regression coefficients ˆβ j as well as fitted (predicted) values Ŷ = ˆα + ˆβ 1 X 1 + ˆβ 2 X ˆβ k X k for Y, calculated at some representative values of the explanatory variables Methods of presentation: Plots of fitted values given a continuous explanatory variable, fixing others at some values Tables of fitted values, given an array of values for the explanatory variables See examples for GHI below, given age, education and income August 15, 2012 Lecture 3 Multiple linear regression 1 19 / 58

20 Fitted values of GHI given income General Health Index Age 30 Age Family income ($1000) (Education fixed at 12 years) August 15, 2012 Lecture 3 Multiple linear regression 1 20 / 58

21 Fitted values of GHI Education Income Age August 15, 2012 Lecture 3 Multiple linear regression 1 21 / 58

22 Inference for a single regression coefficient Consider the following null hypothesis for the coefficient of an explanatory variable X j : H 0 : β j = 0 against the alternative hypothesis H a : β j 0, both with no claims about the coefficients of the other explanatory variables In other words, H 0 : There is no partial association between X j and Y, controlling for the other explanatory variables August 15, 2012 Lecture 3 Multiple linear regression 1 22 / 58

23 Inference for a single regression coefficient This is tested using the t-test statistic t = ˆβ j ŝe( ˆβ j ) When H 0 is true, the sampling distribution of t is a t distribution with n (k + 1) degrees of freedom (k is the number of explanatory variables) The P-value is calculated just as before If the null hypothesis is not rejected, the implication is that X j may be dropped from the model (while keeping the other explanatory variables in the model) August 15, 2012 Lecture 3 Multiple linear regression 1 23 / 58

24 Inference for a single regression coefficient For example, in the model for GHI given age, education and income, the coefficient of education is ˆβ education = and ŝe( ˆβ education ) = 0.143, so for which P < t = = 6.91 Thus there is strong evidence of a partial association between education and GHI, even controlling for age and income August 15, 2012 Lecture 3 Multiple linear regression 1 24 / 58

25 Inference for a single regression coefficient Similarly, P < for tests of the effects of both age and income, so both of these have a partial effect as well If, however, we add work experience to the model, the test of its coefficient has P = Length of work experience has no partial effect on GHI, once we control for age, education and income, so it does not need to be included in the model August 15, 2012 Lecture 3 Multiple linear regression 1 25 / 58

26 Inference for a single regression coefficient Model Variable (1) (2) (3) (4) (5) Age (< 0.001) (0.004) (< 0.001) (< 0.001) (< 0.001) Education (< 0.001) (< 0.001) (< 0.001) Income (< 0.001) (< 0.001) (< 0.001) Experience (0.563) (Constant) R (P-values in parentheses) August 15, 2012 Lecture 3 Multiple linear regression 1 26 / 58

27 Confidence intervals A confidence interval for a single regression coefficient β j is ˆβ j ± t (n (k+1)) α/2 ŝe( ˆβ j ) where t (n (k+1)) α/2 is the multiplier from the t n (k+1) distribution for the required confidence level or approximately from the standard normal distribution, e.g for 95% intervals For example, 95% confidence interval for the coefficient of education in the model discussed above is ± = (0.709; 1.272) August 15, 2012 Lecture 3 Multiple linear regression 1 27 / 58

28 An example Data from the Rand Health Insurance Experiment (HIE): see S. 4.1 of the coursepack n = 1699 respondents to a survey at the start of the study Response variable Y : Respondent s diastolic blood pressure at the end of the study Various explanatory variables considered today for illustration August 15, 2012 Lecture 3 Multiple linear regression 1 28 / 58

29 Dummy variables Categorical explanatory variables are included in regression models as dummy variables (indicator variables) Variables with only two values, 0 and 1 1 if a subject s value of a categorical variable is in a particular category, 0 if not For example, a person s sex may be entered as the dummy variable for men: { 1 if the person is male X = 0 otherwise or as the dummy for women (but not both) August 15, 2012 Lecture 3 Multiple linear regression 1 29 / 58

30 Coefficients of dummy variables Consider a model with only dummy for men as explanatory variable: For men: E(Y ) = α + βx = α + β 1 = α + β For women: E(Y ) = α + βx = α + β 0 = α Difference: β In short, the coefficient of the dummy variable for men is the expected difference in Y between men and women August 15, 2012 Lecture 3 Multiple linear regression 1 30 / 58

31 Coefficients of dummy variables This is the two-group model considered in lecture 2, and β is the group (sex) difference in expected Y Least squares estimates are here ˆα = Ȳwomen ˆβ = Ȳmen Ȳwomen The t-test for the null hypothesis that β = 0 and confidence interval for β are the same as the inference for group difference of means in lecture 2 This is the simplest example of an Analysis of Variance (ANOVA) model: linear regression models with only categorical explanatory variables August 15, 2012 Lecture 3 Multiple linear regression 1 31 / 58

32 Coefficients of dummy variables More generally, dummy variables may be included in multiple linear models together with other (continuous or dummy) explanatory variables In general, the coefficients of dummy variables are interpreted as expected differences in Y between units at different levels of categorical variables, controlling for other variables in the model A t-test for the hypothesis that such a coefficient is 0 is a test of no such difference August 15, 2012 Lecture 3 Multiple linear regression 1 32 / 58

33 Example from HIE data Models for diastolic blood pressure at exit (Y ), given Control variables: initial blood pressure, age and sex (as dummy for men) Dummy variable for free health care (0 for all other insurance plans) The coefficient of the free-care dummy is , with P = Thus the expected blood pressure at exit is points lower for participants on free care than for those on some other plan, controlling for initial blood pressure, age and sex This difference is statistically significant, at the 5% level The 95% confidence interval for this difference is (-2.76; -0.33) August 15, 2012 Lecture 3 Multiple linear regression 1 33 / 58

34 Example from HIE data Response variable: Diastolic blood pressure at exit Explanatory variable ˆβ s.e. t P-value Constant Initial BB < Age < Sex: male < Free health care August 15, 2012 Lecture 3 Multiple linear regression 1 34 / 58

35 Variables with more categories Dummy variables are also used for explanatory variables with more than two categories e.g. smoking status: never smoked/ex-smoker/current smoker Dummy variables for all but one of the categories are included in the model The category without a dummy is the reference (baseline) category The coefficient of the dummy of a category is the expected difference in Y between that category and the baseline Differences between non-baseline categories are given by differences of their coefficients The choice of the baseline is arbitrary: the model is the same, whatever the choice August 15, 2012 Lecture 3 Multiple linear regression 1 35 / 58

36 Variables with more categories Response variable: blood pressure at exit Model Variable (1) (2) (3) Past blood pressure Sex Female Male Smoking status Never smoked Ex-smoker Current smoker (Constant) August 15, 2012 Lecture 3 Multiple linear regression 1 36 / 58

37 Example from HIE data Again, models for diastolic blood pressure at exit (Y ), with initial blood pressure, age and sex as control variables Insurance plan now entered as a five-category variable, with 95% coinsurance plan as the reference level Each t-test of the coefficient of the dummy for a particular insurance plan tests the hypothesis that there is no difference in expected blood pressure between that plan and the reference plan, controlling for the other variables The only significant difference is for the free care plan, with coefficient : This is the difference in expected blood pressure between free care and 95% plans, controlling for initial BB, age and sex August 15, 2012 Lecture 3 Multiple linear regression 1 37 / 58

38 Example from HIE data Response variable: Diastolic blood pressure at exit Explanatory variable ˆβ s.e. t P-value (Other coefficients not shown) Insurance plan: 95% coinsurance 0 50% coinsurance % coinsurance Individual deductible Free care August 15, 2012 Lecture 3 Multiple linear regression 1 38 / 58

39 The General F -test This is used to test multiple-coefficient hypotheses of the form against the alternative H 0 : β g+1 = β g+2 = = β k = 0, H a : at least one of β g+1, β g+2,..., β k is not 0 The most common application of this is testing the coefficients of dummy variables for different categories of a categorical explanatory variable simultaneously e.g. in the example above, the coefficients of the dummies for four insurance plans If this is not rejected, none of the plans differ from the reference plan (and thus they also do not differ from each other), i.e. insurance plan has no effect on blood pressure, given the control variables August 15, 2012 Lecture 3 Multiple linear regression 1 39 / 58

40 The General F -test To carry out the F -test, first fit two models: The restricted model (M 0 ), where the variables of the null hypothesis are omitted The full model (Ma ), where the variables of the null hypothesis are included In this example, M0 includes initial BB, age and sex Ma includes initial BB, age and sex, and the four insurance plan dummies Then compare the R 2 values (or error sums of squares SSE) between the two models August 15, 2012 Lecture 3 Multiple linear regression 1 40 / 58

41 The General F -test The F -test statistic is F = (SSE 0 SSE a )/(k a k 0 ) SSE a /[n (k a + 1)] = (R 2 a R 2 0 )/(k a k 0 ) (1 R 2 a )/[n (k a + 1)] Large values of this are evidence against the null hypothesis that the restricted model is correct In that case R 2 a R0 2 is large, i.e. the full model has a much higher R 2 The sampling distribution of F is an F distribution with k a k 0 and n (k a + 1) degrees of freedom In practice, P-values obtained with a computer August 15, 2012 Lecture 3 Multiple linear regression 1 41 / 58

42 The General F -test In the example, n = 1045 and Ra 2 = and k a = 7 for the full model R 2 0 = and k a = 3 for the restricted model, so ( )/4 F = ( )/1037 = 1.80 for which P = Thus the null hypothesis is not rejected: no evidence of differences between insurance plans in their effect on blood pressure, controlling for the other three variables August 15, 2012 Lecture 3 Multiple linear regression 1 42 / 58

43 Interactions There is an interaction between two explanatory variables, if the effect of (either) one of them on the response variable depends on at which value the other one is controlled Included in the model by using products of the two explanatory variables as additional explanatory variables in the model Example: data for the 50 United States, average SAT score of students (Y ) given school expenditure per student (X ) and % of students taking the SAT in three groups (low, middle and high) The %-variable included as two dummy variables, say D M for middle and D L for low August 15, 2012 Lecture 3 Multiple linear regression 1 43 / 58

44 Interactions A model without interactions: E(Y ) = α + β 1 D L + β 2 D M + β 3 X Here the partial effect of expenditure is β 3, same for all values of the %-variable Add now the products (D L X ) and (D M X ), to get the model E(Y ) = α + β 1 D L + β 2 D M + β 3 X + β 4 (D L X ) + β 5 (D M X ) This model states that there is an interaction between school expenditure and the %-variable Why? August 15, 2012 Lecture 3 Multiple linear regression 1 44 / 58

45 Interactions Consider the effect of X at different values of the dummy variables: E(Y ) = α + β 1 D L + β 2 D M + β 3 X + β 4 (D L X ) + β 5 (D M X ) = α + β 3 X For high-% states = (α + β 2 ) + (β 3 + β 5 )X For mid-% states = (α + β 1 ) + (β 3 + β 4 )X For low-% states In other words, the coefficient of X depends on the value at which D L and D M are fixed August 15, 2012 Lecture 3 Multiple linear regression 1 45 / 58

46 Interactions The estimated coefficients in this example are E(Y ) = D L D M + 6.3X 3.2(D L X ) 11.7(D M X ) = X for high-% states = X for low-% states = X for mid-% states August 15, 2012 Lecture 3 Multiple linear regression 1 46 / 58

47 Model with interaction Average SAT score School expenditure August 15, 2012 Lecture 3 Multiple linear regression 1 47 / 58

48 ...and without Average SAT score School expenditure August 15, 2012 Lecture 3 Multiple linear regression 1 48 / 58

49 Testing for interactions A standard test of whether the coefficient of the product variable (or variables) is zero is a test of whether the interaction is needed in the model t-test or (if more than one product variable) F -test In the example, we use an F -test, comparing Full model vs. Restricted m. E(Y ) = α + β 1 D L + β 2 D M + β 3 X +β 4 (D L X ) + β 5 (D M X ) E(Y ) = α + β 1 D L + β 2 D M + β 3 X i.e. a test of H 0 : β 4 = β 5 = 0 Here F = 0.61 and P = 0.55, so the interaction is not in fact significant August 15, 2012 Lecture 3 Multiple linear regression 1 49 / 58

50 Interactions between categorical variables In the previous example, the interaction was between a continuous variable and a categorical variable In other cases too, interactions are included as products of variables An example of interaction between two categorical (here binary) explanatory variables, from HIE data: Response variable: blood pressure at exit Two binary explanatory variables: Being on free health care vs. some other plan Income in the lowest 20% in the data vs. not Other control variables: initial blood pressure, age and sex August 15, 2012 Lecture 3 Multiple linear regression 1 50 / 58

51 Interactions between categorical variables Variable Coefficient Initial blood pressure Age Sex: Male Low income (lowest 20%) Free health care Income Insurance plan (Constant) August 15, 2012 Lecture 3 Multiple linear regression 1 51 / 58

52 Interactions between categorical variables Which coefficients involving income and insurance plan apply to different combinations of these variables: Low income Free care No Yes No Yes (not showing the other coefficients) where 0.101= In other words, effect of low income on blood pressure is smaller for respondents on free care than on other plans effect of free care on blood pressure is bigger for low-income respondents than for high-income ones (Again, the interaction is not actually significant (P = 0.42) here, so this just illustrates the general idea) August 15, 2012 Lecture 3 Multiple linear regression 1 52 / 58

53 F -tests for all predictors The default test with most regressions is the test that all β = 0 in other words, nothing going on here Null hypothesis: H 0 : β 0 =... = β k = 0 This is equivalent to testing a model with a set of linear constraints where all β are set to zero Formula: F = (SST SSE)/k SSE/(n k 1) where: SST and SSE are the total and error sums of squares n is the number of observations k is the number of variables (excluding constant!) (n k 1) is also the residual degrees of freedom August 15, 2012 Lecture 3 Multiple linear regression 1 53 / 58

54 Example of F -test for all predictors > m1 <- lm(votes1st ~ spend_total*incumb, data=dail) > summary(m1) Call: lm(formula = votes1st ~ spend_total * incumb, data = dail) Residuals: Min 1Q Median 3Q Max Coefficients: Estimate Std. Error t value Pr(> t ) (Intercept) ** spend_total < 2e-16 *** incumb < 2e-16 *** spend_total:incumb e-06 *** --- Signif. codes: 0 *** ** 0.01 * Residual standard error: 1808 on 458 degrees of freedom (2 observations deleted due to missingness) Multiple R-squared: , Adjusted R-squared: F-statistic: on 3 and 458 DF, p-value: < 2.2e-16 > SST <- sum((m1$model[,1] - mean(m1$model[,1]))^2) > SSE <- sum(m1$residuals^2) > k <- 3 > (df <- m1$df.residual) [1] 458 > (F <- ((SST-SSE)/k) / (SSE/df)) [1] > 1 - pf(f, k, df) [1] 0 August 15, 2012 Lecture 3 Multiple linear regression 1 54 / 58

55 Testing just one predictor Null hypothesis: H 0 : β j = 0 Error statistic will be t j = ˆβ j se( ˆβ j ) where t j is t-distributed with n k 1 degrees of freedom (same as df.residual) The F statistic will be t 2 j August 15, 2012 Lecture 3 Multiple linear regression 1 55 / 58

56 Example of testing just one predictor > m1c <- lm(votes1st ~ spend_total + incumb, data=dail) > SSEc <- deviance(m1c) # the SSE > (F2 <- (SSEc - SSE) / (SSE / df)) [1] > 1 - pf(f2, 1, df) [1] e-06 > sqrt(f2) # will be the same as the t-test for this coefficient [1] > summary(m1)$coeff Estimate Std. Error t value Pr(> t ) (Intercept) e-03 spend_total e-53 incumb e-19 spend_total:incumb e-06 > anova(m1c,m1) # a much easier way to compare 2 models Analysis of Variance Table Model 1: votes1st ~ spend_total + incumb Model 2: votes1st ~ spend_total * incumb Res.Df RSS Df Sum of Sq F Pr(>F) e-06 *** --- Signif. codes: 0 *** ** 0.01 * August 15, 2012 Lecture 3 Multiple linear regression 1 56 / 58

57 Testing multiple predictors and generalized linear constraints Just because two variables are individually not significant, does not mean that jointly the variables are not significant The F -test can be generalized to any set of J linear constraints, as follows: F = [SSE constrained SSE unconstrained ]/(df const df unconst ) SSE unconstrained /df unconstr Steps: 1. Run the unconstrained regression, save SSE 2. Run the constrained regression, save SSE 3. Compute F and reject if F > F dfconst,df unconst For a single β k, this is equivalent to the t-test August 15, 2012 Lecture 3 Multiple linear regression 1 57 / 58

58 Parametric confidence intervals for β CIs or confidence intervals provide an alternative way to express uncertainty for our estimates For a 100(1 α)% confidence region, any point that lies within the region represents a null hypothesis that would not be rejected at the 100α% level, while every point outside it represents a null hypothesis that would have been rejected More valuable than simple hypothesis tests because it tells us about a parameter s plausible values Formula: Estimate ± Critical Value S.E. For β specifically: ˆβ i ± t n k 1ˆσ α/2 (X X ) 1 ii In practice we should consider joint confidence regions, especially when ˆβ are (highly) correlated August 15, 2012 Lecture 3 Multiple linear regression 1 58 / 58

The Classical Linear Regression Model

The Classical Linear Regression Model ME104: Linear Regression Analysis Kenneth Benoit August 14, 2012 CLRM: Basic Assumptions 1. Specification: Relationship between X and Y in the population is linear: