Inference. ME104: Linear Regression Analysis Kenneth Benoit. August 15, August 15, 2012 Lecture 3 Multiple linear regression 1 1 / 58

Size: px
Start display at page:

Download "Inference. ME104: Linear Regression Analysis Kenneth Benoit. August 15, August 15, 2012 Lecture 3 Multiple linear regression 1 1 / 58"

Transcription

1 Inference ME104: Linear Regression Analysis Kenneth Benoit August 15, 2012 August 15, 2012 Lecture 3 Multiple linear regression 1 1 / 58

2 Stata output resvisited. reg votes1st spend_total incumb minister spendxinc Source SS df MS Number of obs = F( 4, 457) = Model e Prob > F = Residual e R-squared = Adj R-squared = Total e Root MSE = votes1st Coef. Std. Err. t P> t [95% Conf. Interval] spend_total incumb minister spendxinc _cons August 15, 2012 Lecture 3 Multiple linear regression 1 2 / 58

3 R 2 R 2 R 2 = SSM (1) TSS (ŷi ŷ) 2 = (yi ȳ) 2 (2) e + 09 = e + 09 (3) = (4) Adjusted R 2 Radj 2 = 1 (1 n 1 R2 ) n k 1 (5) = 1 ( ) (6) = (7) August 15, 2012 Lecture 3 Multiple linear regression 1 3 / 58

4 Root MSE (and F ) Root MSE = estimate of σ ˆσ = SSE/df resid (8) = (1.4739e + 09)/457 (9) = (10) = (11) F -test This is the test of the null hypothesis that the joint effect of all independent variables is zero more on this shortly August 15, 2012 Lecture 3 Multiple linear regression 1 4 / 58

5 How to compute R 2 and ˆσ from this output? Valid cases: 4274 Dependent variable: disprls Missing cases: 0 Deletion method: None Total SS: Degrees of freedom: 4251 R-squared: Rbar-squared: Residual SS: Std error of est: F(22,4251): Probability of F: Standard Prob Standardized Cor with Variable Estimate Error t-value > t Estimate Dep Var CONSTANT HSL*m HSL SL*m SL MSL*m MSL dh*m dh LRH*m LRH LRDr*m LRDr LRI*m LRI ImpHA*m ImpHA EqP*m EqP Dan*m Dan Ad*m Ad August 15, 2012 Lecture 3 Multiple linear regression 1 5 / 58

6 OLS computation of best fitting line This is incredibly powerful: Y = X β + ɛ X Y = X X β + X ɛ X Y = X X β + 0 (X X ) 1 X Y = β + 0 β = (X X ) 1 X Y But it does not tell us how uncertain is our estimate of β in any probabalistic, comparative sense August 15, 2012 Lecture 3 Multiple linear regression 1 6 / 58

7 Distributional assumptions Distributions are used to assess uncertainty Many standard parametric tests are associated with interpreting uncertainty of regression results, including those from the CLRM/OLS t distributions z distributions F distributions χ 2 distributions In small samples, the applicability of these distributions depends on the errors being distributed normally In larger samples, the asymptotic properties of these distributions means the results hold even when errors are not distributed normally August 15, 2012 Lecture 3 Multiple linear regression 1 7 / 58

8 Distribution of ˆβ ˆβ N(β, (X X ) 1 σ 2 ) in repeated samples The variances of ˆβ will be the diagonal elements from the variance-covariance matrix of ˆβ Problem: variance covariance matrix is not usually known, because σ 2 is not usually known (and Var(β) = σ 2 / (x i x) 2 ) But by using s 2 as an estimate of σ 2, we can use the square root of the kth diagonal element of the variance-covariance matrix to estimate the standard error of ˆβ k, which will be t-distributed August 15, 2012 Lecture 3 Multiple linear regression 1 8 / 58

9 t-tests for individual ˆβ As just stated, the sampling distribution of ˆβ will be t-distributed, with n k 1 degrees of freedom The empirical t-value will be the coefficient estimate divided by its standard error This yields a t-value that is compared with the critical value for t with n k 1 degrees of freedom Example in Stata: August 15, 2012 Lecture 3 Multiple linear regression 1 9 / 58

10 OLS Example in Stata. reg votes1st spend_total incumb minister spendxinc Source SS df MS Number of obs = F( 4, 457) = Model e Prob > F = Residual e R-squared = Adj R-squared = Total e Root MSE = votes1st Coef. Std. Err. t P> t [95% Conf. Interval] spend_total incumb minister spendxinc _cons August 15, 2012 Lecture 3 Multiple linear regression 1 10 / 58

11 Interpretation of regression coefficients Each coefficient in a multiple regression model describes the association between an explanatory variable and the response, controlling for the other explanatory variables These are known as partial associations For example, consider the coefficient β k of X k : µ = (α + β 1 X β k 1 X k 1 ) + β k X k = (Others) + β k X k Here the (Others) bit depends on the other explanatory variables X 1,..., X k 1 but not on X k If now X k increases by 1 unit and the other explanatory variables remain unchanged, µ changes by β k units August 15, 2012 Lecture 3 Multiple linear regression 1 11 / 58

12 Interpretation of regression coefficients In general, The coefficient βj of an explanatory variable X j shows the change in the expected value of Y when X j increases by 1 unit while all the other explanatory variables are held constant...or, in other words, the expected change in Y when Xj increases by 1 unit, controlling for the other explanatory variables For example, in the model shown below, the (estimated) coefficient of education is ˆβ education = Thus every one-year increase in education completed increases the expected General Health Index by 0.99 points, controlling for age and family income August 15, 2012 Lecture 3 Multiple linear regression 1 12 / 58

13 A fitted model for GHI Response variable: General Health Index Explanatory variable ˆβ s.e. t P-value 95% CI Constant Age < (-0.190; ) Education < (0.709; 1.272) Fam. Income < (0.151; 0.398) ˆσ = 14.6; R 2 = 0.061; n = 1699; df = 1695 August 15, 2012 Lecture 3 Multiple linear regression 1 13 / 58

14 The uninteresting parameters The remaining two parameters of the model are necessary but uninteresting for substantive interpretation: The constant term α is the expected value of Y when all explanatory variables are 0 The residual standard deviation σ is the standard deviation of Y given (any) single set of values for X 1,..., X k i.e. the standard deviation around the fitted regression surface at any given point of it August 15, 2012 Lecture 3 Multiple linear regression 1 14 / 58

15 Estimation of the parameters The fitted values of Y are Ŷ i = ˆα + ˆβ 1 X 1i + ˆβ 2 X 2i + + ˆβ k X ki for all observations i = 1,..., n in the sample We would like to select the values of the estimated coefficients ˆα, ˆβ 1,..., ˆβ k so that the fitted values are a good match to observed values Y 1,..., Y n This is done by finding estimates which minimize the error sum of squares SSE = (Y i Ŷ i ) 2 These are the (ordinary) least squares (OLS) estimates of the coefficients August 15, 2012 Lecture 3 Multiple linear regression 1 15 / 58

16 Estimation of the parameters Y i Ŷ i Y Y i Ȳ Ŷ i Ȳ Ŷ = ˆα + ˆβX Y i Ŷi Ȳ X August 15, 2012 Lecture 3 Multiple linear regression 1 16 / 58

17 Estimation of the parameters Reminder: For the simple (one-x ) linear model, the fitted values define a straight line: Infant Mortality Rate School enrolment August 15, 2012 Lecture 3 Multiple linear regression 1 17 / 58

18 Estimation of the parameters Least squares estimates ˆβ j of the coefficients are easily calculated by a computer Also produced are estimated standard errors ŝe( ˆβ j ) of the estimated coefficients Also produced is an estimate ˆσ of the residual standard deviation August 15, 2012 Lecture 3 Multiple linear regression 1 18 / 58

19 Fitted values for interpretation A fitted model can be interpreted using the regression coefficients ˆβ j as well as fitted (predicted) values Ŷ = ˆα + ˆβ 1 X 1 + ˆβ 2 X ˆβ k X k for Y, calculated at some representative values of the explanatory variables Methods of presentation: Plots of fitted values given a continuous explanatory variable, fixing others at some values Tables of fitted values, given an array of values for the explanatory variables See examples for GHI below, given age, education and income August 15, 2012 Lecture 3 Multiple linear regression 1 19 / 58

20 Fitted values of GHI given income General Health Index Age 30 Age Family income ($1000) (Education fixed at 12 years) August 15, 2012 Lecture 3 Multiple linear regression 1 20 / 58

21 Fitted values of GHI Education Income Age August 15, 2012 Lecture 3 Multiple linear regression 1 21 / 58

22 Inference for a single regression coefficient Consider the following null hypothesis for the coefficient of an explanatory variable X j : H 0 : β j = 0 against the alternative hypothesis H a : β j 0, both with no claims about the coefficients of the other explanatory variables In other words, H 0 : There is no partial association between X j and Y, controlling for the other explanatory variables August 15, 2012 Lecture 3 Multiple linear regression 1 22 / 58

23 Inference for a single regression coefficient This is tested using the t-test statistic t = ˆβ j ŝe( ˆβ j ) When H 0 is true, the sampling distribution of t is a t distribution with n (k + 1) degrees of freedom (k is the number of explanatory variables) The P-value is calculated just as before If the null hypothesis is not rejected, the implication is that X j may be dropped from the model (while keeping the other explanatory variables in the model) August 15, 2012 Lecture 3 Multiple linear regression 1 23 / 58

24 Inference for a single regression coefficient For example, in the model for GHI given age, education and income, the coefficient of education is ˆβ education = and ŝe( ˆβ education ) = 0.143, so for which P < t = = 6.91 Thus there is strong evidence of a partial association between education and GHI, even controlling for age and income August 15, 2012 Lecture 3 Multiple linear regression 1 24 / 58

25 Inference for a single regression coefficient Similarly, P < for tests of the effects of both age and income, so both of these have a partial effect as well If, however, we add work experience to the model, the test of its coefficient has P = Length of work experience has no partial effect on GHI, once we control for age, education and income, so it does not need to be included in the model August 15, 2012 Lecture 3 Multiple linear regression 1 25 / 58

26 Inference for a single regression coefficient Model Variable (1) (2) (3) (4) (5) Age (< 0.001) (0.004) (< 0.001) (< 0.001) (< 0.001) Education (< 0.001) (< 0.001) (< 0.001) Income (< 0.001) (< 0.001) (< 0.001) Experience (0.563) (Constant) R (P-values in parentheses) August 15, 2012 Lecture 3 Multiple linear regression 1 26 / 58

27 Confidence intervals A confidence interval for a single regression coefficient β j is ˆβ j ± t (n (k+1)) α/2 ŝe( ˆβ j ) where t (n (k+1)) α/2 is the multiplier from the t n (k+1) distribution for the required confidence level or approximately from the standard normal distribution, e.g for 95% intervals For example, 95% confidence interval for the coefficient of education in the model discussed above is ± = (0.709; 1.272) August 15, 2012 Lecture 3 Multiple linear regression 1 27 / 58

28 An example Data from the Rand Health Insurance Experiment (HIE): see S. 4.1 of the coursepack n = 1699 respondents to a survey at the start of the study Response variable Y : Respondent s diastolic blood pressure at the end of the study Various explanatory variables considered today for illustration August 15, 2012 Lecture 3 Multiple linear regression 1 28 / 58

29 Dummy variables Categorical explanatory variables are included in regression models as dummy variables (indicator variables) Variables with only two values, 0 and 1 1 if a subject s value of a categorical variable is in a particular category, 0 if not For example, a person s sex may be entered as the dummy variable for men: { 1 if the person is male X = 0 otherwise or as the dummy for women (but not both) August 15, 2012 Lecture 3 Multiple linear regression 1 29 / 58

30 Coefficients of dummy variables Consider a model with only dummy for men as explanatory variable: For men: E(Y ) = α + βx = α + β 1 = α + β For women: E(Y ) = α + βx = α + β 0 = α Difference: β In short, the coefficient of the dummy variable for men is the expected difference in Y between men and women August 15, 2012 Lecture 3 Multiple linear regression 1 30 / 58

31 Coefficients of dummy variables This is the two-group model considered in lecture 2, and β is the group (sex) difference in expected Y Least squares estimates are here ˆα = Ȳwomen ˆβ = Ȳmen Ȳwomen The t-test for the null hypothesis that β = 0 and confidence interval for β are the same as the inference for group difference of means in lecture 2 This is the simplest example of an Analysis of Variance (ANOVA) model: linear regression models with only categorical explanatory variables August 15, 2012 Lecture 3 Multiple linear regression 1 31 / 58

32 Coefficients of dummy variables More generally, dummy variables may be included in multiple linear models together with other (continuous or dummy) explanatory variables In general, the coefficients of dummy variables are interpreted as expected differences in Y between units at different levels of categorical variables, controlling for other variables in the model A t-test for the hypothesis that such a coefficient is 0 is a test of no such difference August 15, 2012 Lecture 3 Multiple linear regression 1 32 / 58

33 Example from HIE data Models for diastolic blood pressure at exit (Y ), given Control variables: initial blood pressure, age and sex (as dummy for men) Dummy variable for free health care (0 for all other insurance plans) The coefficient of the free-care dummy is , with P = Thus the expected blood pressure at exit is points lower for participants on free care than for those on some other plan, controlling for initial blood pressure, age and sex This difference is statistically significant, at the 5% level The 95% confidence interval for this difference is (-2.76; -0.33) August 15, 2012 Lecture 3 Multiple linear regression 1 33 / 58

34 Example from HIE data Response variable: Diastolic blood pressure at exit Explanatory variable ˆβ s.e. t P-value Constant Initial BB < Age < Sex: male < Free health care August 15, 2012 Lecture 3 Multiple linear regression 1 34 / 58

35 Variables with more categories Dummy variables are also used for explanatory variables with more than two categories e.g. smoking status: never smoked/ex-smoker/current smoker Dummy variables for all but one of the categories are included in the model The category without a dummy is the reference (baseline) category The coefficient of the dummy of a category is the expected difference in Y between that category and the baseline Differences between non-baseline categories are given by differences of their coefficients The choice of the baseline is arbitrary: the model is the same, whatever the choice August 15, 2012 Lecture 3 Multiple linear regression 1 35 / 58

36 Variables with more categories Response variable: blood pressure at exit Model Variable (1) (2) (3) Past blood pressure Sex Female Male Smoking status Never smoked Ex-smoker Current smoker (Constant) August 15, 2012 Lecture 3 Multiple linear regression 1 36 / 58

37 Example from HIE data Again, models for diastolic blood pressure at exit (Y ), with initial blood pressure, age and sex as control variables Insurance plan now entered as a five-category variable, with 95% coinsurance plan as the reference level Each t-test of the coefficient of the dummy for a particular insurance plan tests the hypothesis that there is no difference in expected blood pressure between that plan and the reference plan, controlling for the other variables The only significant difference is for the free care plan, with coefficient : This is the difference in expected blood pressure between free care and 95% plans, controlling for initial BB, age and sex August 15, 2012 Lecture 3 Multiple linear regression 1 37 / 58

38 Example from HIE data Response variable: Diastolic blood pressure at exit Explanatory variable ˆβ s.e. t P-value (Other coefficients not shown) Insurance plan: 95% coinsurance 0 50% coinsurance % coinsurance Individual deductible Free care August 15, 2012 Lecture 3 Multiple linear regression 1 38 / 58

39 The General F -test This is used to test multiple-coefficient hypotheses of the form against the alternative H 0 : β g+1 = β g+2 = = β k = 0, H a : at least one of β g+1, β g+2,..., β k is not 0 The most common application of this is testing the coefficients of dummy variables for different categories of a categorical explanatory variable simultaneously e.g. in the example above, the coefficients of the dummies for four insurance plans If this is not rejected, none of the plans differ from the reference plan (and thus they also do not differ from each other), i.e. insurance plan has no effect on blood pressure, given the control variables August 15, 2012 Lecture 3 Multiple linear regression 1 39 / 58

40 The General F -test To carry out the F -test, first fit two models: The restricted model (M 0 ), where the variables of the null hypothesis are omitted The full model (Ma ), where the variables of the null hypothesis are included In this example, M0 includes initial BB, age and sex Ma includes initial BB, age and sex, and the four insurance plan dummies Then compare the R 2 values (or error sums of squares SSE) between the two models August 15, 2012 Lecture 3 Multiple linear regression 1 40 / 58

41 The General F -test The F -test statistic is F = (SSE 0 SSE a )/(k a k 0 ) SSE a /[n (k a + 1)] = (R 2 a R 2 0 )/(k a k 0 ) (1 R 2 a )/[n (k a + 1)] Large values of this are evidence against the null hypothesis that the restricted model is correct In that case R 2 a R0 2 is large, i.e. the full model has a much higher R 2 The sampling distribution of F is an F distribution with k a k 0 and n (k a + 1) degrees of freedom In practice, P-values obtained with a computer August 15, 2012 Lecture 3 Multiple linear regression 1 41 / 58

42 The General F -test In the example, n = 1045 and Ra 2 = and k a = 7 for the full model R 2 0 = and k a = 3 for the restricted model, so ( )/4 F = ( )/1037 = 1.80 for which P = Thus the null hypothesis is not rejected: no evidence of differences between insurance plans in their effect on blood pressure, controlling for the other three variables August 15, 2012 Lecture 3 Multiple linear regression 1 42 / 58

43 Interactions There is an interaction between two explanatory variables, if the effect of (either) one of them on the response variable depends on at which value the other one is controlled Included in the model by using products of the two explanatory variables as additional explanatory variables in the model Example: data for the 50 United States, average SAT score of students (Y ) given school expenditure per student (X ) and % of students taking the SAT in three groups (low, middle and high) The %-variable included as two dummy variables, say D M for middle and D L for low August 15, 2012 Lecture 3 Multiple linear regression 1 43 / 58

44 Interactions A model without interactions: E(Y ) = α + β 1 D L + β 2 D M + β 3 X Here the partial effect of expenditure is β 3, same for all values of the %-variable Add now the products (D L X ) and (D M X ), to get the model E(Y ) = α + β 1 D L + β 2 D M + β 3 X + β 4 (D L X ) + β 5 (D M X ) This model states that there is an interaction between school expenditure and the %-variable Why? August 15, 2012 Lecture 3 Multiple linear regression 1 44 / 58

45 Interactions Consider the effect of X at different values of the dummy variables: E(Y ) = α + β 1 D L + β 2 D M + β 3 X + β 4 (D L X ) + β 5 (D M X ) = α + β 3 X For high-% states = (α + β 2 ) + (β 3 + β 5 )X For mid-% states = (α + β 1 ) + (β 3 + β 4 )X For low-% states In other words, the coefficient of X depends on the value at which D L and D M are fixed August 15, 2012 Lecture 3 Multiple linear regression 1 45 / 58

46 Interactions The estimated coefficients in this example are E(Y ) = D L D M + 6.3X 3.2(D L X ) 11.7(D M X ) = X for high-% states = X for low-% states = X for mid-% states August 15, 2012 Lecture 3 Multiple linear regression 1 46 / 58

47 Model with interaction Average SAT score School expenditure August 15, 2012 Lecture 3 Multiple linear regression 1 47 / 58

48 ...and without Average SAT score School expenditure August 15, 2012 Lecture 3 Multiple linear regression 1 48 / 58

49 Testing for interactions A standard test of whether the coefficient of the product variable (or variables) is zero is a test of whether the interaction is needed in the model t-test or (if more than one product variable) F -test In the example, we use an F -test, comparing Full model vs. Restricted m. E(Y ) = α + β 1 D L + β 2 D M + β 3 X +β 4 (D L X ) + β 5 (D M X ) E(Y ) = α + β 1 D L + β 2 D M + β 3 X i.e. a test of H 0 : β 4 = β 5 = 0 Here F = 0.61 and P = 0.55, so the interaction is not in fact significant August 15, 2012 Lecture 3 Multiple linear regression 1 49 / 58

50 Interactions between categorical variables In the previous example, the interaction was between a continuous variable and a categorical variable In other cases too, interactions are included as products of variables An example of interaction between two categorical (here binary) explanatory variables, from HIE data: Response variable: blood pressure at exit Two binary explanatory variables: Being on free health care vs. some other plan Income in the lowest 20% in the data vs. not Other control variables: initial blood pressure, age and sex August 15, 2012 Lecture 3 Multiple linear regression 1 50 / 58

51 Interactions between categorical variables Variable Coefficient Initial blood pressure Age Sex: Male Low income (lowest 20%) Free health care Income Insurance plan (Constant) August 15, 2012 Lecture 3 Multiple linear regression 1 51 / 58

52 Interactions between categorical variables Which coefficients involving income and insurance plan apply to different combinations of these variables: Low income Free care No Yes No Yes (not showing the other coefficients) where 0.101= In other words, effect of low income on blood pressure is smaller for respondents on free care than on other plans effect of free care on blood pressure is bigger for low-income respondents than for high-income ones (Again, the interaction is not actually significant (P = 0.42) here, so this just illustrates the general idea) August 15, 2012 Lecture 3 Multiple linear regression 1 52 / 58

53 F -tests for all predictors The default test with most regressions is the test that all β = 0 in other words, nothing going on here Null hypothesis: H 0 : β 0 =... = β k = 0 This is equivalent to testing a model with a set of linear constraints where all β are set to zero Formula: F = (SST SSE)/k SSE/(n k 1) where: SST and SSE are the total and error sums of squares n is the number of observations k is the number of variables (excluding constant!) (n k 1) is also the residual degrees of freedom August 15, 2012 Lecture 3 Multiple linear regression 1 53 / 58

54 Example of F -test for all predictors > m1 <- lm(votes1st ~ spend_total*incumb, data=dail) > summary(m1) Call: lm(formula = votes1st ~ spend_total * incumb, data = dail) Residuals: Min 1Q Median 3Q Max Coefficients: Estimate Std. Error t value Pr(> t ) (Intercept) ** spend_total < 2e-16 *** incumb < 2e-16 *** spend_total:incumb e-06 *** --- Signif. codes: 0 *** ** 0.01 * Residual standard error: 1808 on 458 degrees of freedom (2 observations deleted due to missingness) Multiple R-squared: , Adjusted R-squared: F-statistic: on 3 and 458 DF, p-value: < 2.2e-16 > SST <- sum((m1$model[,1] - mean(m1$model[,1]))^2) > SSE <- sum(m1$residuals^2) > k <- 3 > (df <- m1$df.residual) [1] 458 > (F <- ((SST-SSE)/k) / (SSE/df)) [1] > 1 - pf(f, k, df) [1] 0 August 15, 2012 Lecture 3 Multiple linear regression 1 54 / 58

55 Testing just one predictor Null hypothesis: H 0 : β j = 0 Error statistic will be t j = ˆβ j se( ˆβ j ) where t j is t-distributed with n k 1 degrees of freedom (same as df.residual) The F statistic will be t 2 j August 15, 2012 Lecture 3 Multiple linear regression 1 55 / 58

56 Example of testing just one predictor > m1c <- lm(votes1st ~ spend_total + incumb, data=dail) > SSEc <- deviance(m1c) # the SSE > (F2 <- (SSEc - SSE) / (SSE / df)) [1] > 1 - pf(f2, 1, df) [1] e-06 > sqrt(f2) # will be the same as the t-test for this coefficient [1] > summary(m1)$coeff Estimate Std. Error t value Pr(> t ) (Intercept) e-03 spend_total e-53 incumb e-19 spend_total:incumb e-06 > anova(m1c,m1) # a much easier way to compare 2 models Analysis of Variance Table Model 1: votes1st ~ spend_total + incumb Model 2: votes1st ~ spend_total * incumb Res.Df RSS Df Sum of Sq F Pr(>F) e-06 *** --- Signif. codes: 0 *** ** 0.01 * August 15, 2012 Lecture 3 Multiple linear regression 1 56 / 58

57 Testing multiple predictors and generalized linear constraints Just because two variables are individually not significant, does not mean that jointly the variables are not significant The F -test can be generalized to any set of J linear constraints, as follows: F = [SSE constrained SSE unconstrained ]/(df const df unconst ) SSE unconstrained /df unconstr Steps: 1. Run the unconstrained regression, save SSE 2. Run the constrained regression, save SSE 3. Compute F and reject if F > F dfconst,df unconst For a single β k, this is equivalent to the t-test August 15, 2012 Lecture 3 Multiple linear regression 1 57 / 58

58 Parametric confidence intervals for β CIs or confidence intervals provide an alternative way to express uncertainty for our estimates For a 100(1 α)% confidence region, any point that lies within the region represents a null hypothesis that would not be rejected at the 100α% level, while every point outside it represents a null hypothesis that would have been rejected More valuable than simple hypothesis tests because it tells us about a parameter s plausible values Formula: Estimate ± Critical Value S.E. For β specifically: ˆβ i ± t n k 1ˆσ α/2 (X X ) 1 ii In practice we should consider joint confidence regions, especially when ˆβ are (highly) correlated August 15, 2012 Lecture 3 Multiple linear regression 1 58 / 58

The Classical Linear Regression Model

The Classical Linear Regression Model The Classical Linear Regression Model ME104: Linear Regression Analysis Kenneth Benoit August 14, 2012 CLRM: Basic Assumptions 1. Specification: Relationship between X and Y in the population is linear:

More information

Lecture 6 Multiple Linear Regression, cont.

Lecture 6 Multiple Linear Regression, cont. Lecture 6 Multiple Linear Regression, cont. BIOST 515 January 22, 2004 BIOST 515, Lecture 6 Testing general linear hypotheses Suppose we are interested in testing linear combinations of the regression

More information

Day 4: Shrinkage Estimators

Day 4: Shrinkage Estimators Day 4: Shrinkage Estimators Kenneth Benoit Data Mining and Statistical Learning March 9, 2015 n versus p (aka k) Classical regression framework: n > p. Without this inequality, the OLS coefficients have

More information

Inference for Regression

Inference for Regression Inference for Regression Section 9.4 Cathy Poliak, Ph.D. cathy@math.uh.edu Office in Fleming 11c Department of Mathematics University of Houston Lecture 13b - 3339 Cathy Poliak, Ph.D. cathy@math.uh.edu

More information

ECO220Y Simple Regression: Testing the Slope

ECO220Y Simple Regression: Testing the Slope ECO220Y Simple Regression: Testing the Slope Readings: Chapter 18 (Sections 18.3-18.5) Winter 2012 Lecture 19 (Winter 2012) Simple Regression Lecture 19 1 / 32 Simple Regression Model y i = β 0 + β 1 x

More information

General Linear Model (Chapter 4)

General Linear Model (Chapter 4) General Linear Model (Chapter 4) Outcome variable is considered continuous Simple linear regression Scatterplots OLS is BLUE under basic assumptions MSE estimates residual variance testing regression coefficients

More information

ECON3150/4150 Spring 2015

ECON3150/4150 Spring 2015 ECON3150/4150 Spring 2015 Lecture 3&4 - The linear regression model Siv-Elisabeth Skjelbred University of Oslo January 29, 2015 1 / 67 Chapter 4 in S&W Section 17.1 in S&W (extended OLS assumptions) 2

More information

22s:152 Applied Linear Regression. Take random samples from each of m populations.

22s:152 Applied Linear Regression. Take random samples from each of m populations. 22s:152 Applied Linear Regression Chapter 8: ANOVA NOTE: We will meet in the lab on Monday October 10. One-way ANOVA Focuses on testing for differences among group means. Take random samples from each

More information

22s:152 Applied Linear Regression. There are a couple commonly used models for a one-way ANOVA with m groups. Chapter 8: ANOVA

22s:152 Applied Linear Regression. There are a couple commonly used models for a one-way ANOVA with m groups. Chapter 8: ANOVA 22s:152 Applied Linear Regression Chapter 8: ANOVA NOTE: We will meet in the lab on Monday October 10. One-way ANOVA Focuses on testing for differences among group means. Take random samples from each

More information

ECON3150/4150 Spring 2016

ECON3150/4150 Spring 2016 ECON3150/4150 Spring 2016 Lecture 4 - The linear regression model Siv-Elisabeth Skjelbred University of Oslo Last updated: January 26, 2016 1 / 49 Overview These lecture slides covers: The linear regression

More information

Statistical Inference with Regression Analysis

Statistical Inference with Regression Analysis Introductory Applied Econometrics EEP/IAS 118 Spring 2015 Steven Buck Lecture #13 Statistical Inference with Regression Analysis Next we turn to calculating confidence intervals and hypothesis testing

More information

Lecture 4 Multiple linear regression

Lecture 4 Multiple linear regression Lecture 4 Multiple linear regression BIOST 515 January 15, 2004 Outline 1 Motivation for the multiple regression model Multiple regression in matrix notation Least squares estimation of model parameters

More information

Lecture 12: Effect modification, and confounding in logistic regression

Lecture 12: Effect modification, and confounding in logistic regression Lecture 12: Effect modification, and confounding in logistic regression Ani Manichaikul amanicha@jhsph.edu 4 May 2007 Today Categorical predictor create dummy variables just like for linear regression

More information

ECON Introductory Econometrics. Lecture 5: OLS with One Regressor: Hypothesis Tests

ECON Introductory Econometrics. Lecture 5: OLS with One Regressor: Hypothesis Tests ECON4150 - Introductory Econometrics Lecture 5: OLS with One Regressor: Hypothesis Tests Monique de Haan (moniqued@econ.uio.no) Stock and Watson Chapter 5 Lecture outline 2 Testing Hypotheses about one

More information

Ch 2: Simple Linear Regression

Ch 2: Simple Linear Regression Ch 2: Simple Linear Regression 1. Simple Linear Regression Model A simple regression model with a single regressor x is y = β 0 + β 1 x + ɛ, where we assume that the error ɛ is independent random component

More information

University of California at Berkeley Fall Introductory Applied Econometrics Final examination. Scores add up to 125 points

University of California at Berkeley Fall Introductory Applied Econometrics Final examination. Scores add up to 125 points EEP 118 / IAS 118 Elisabeth Sadoulet and Kelly Jones University of California at Berkeley Fall 2008 Introductory Applied Econometrics Final examination Scores add up to 125 points Your name: SID: 1 1.

More information

Applied Regression Analysis

Applied Regression Analysis Applied Regression Analysis Chapter 3 Multiple Linear Regression Hongcheng Li April, 6, 2013 Recall simple linear regression 1 Recall simple linear regression 2 Parameter Estimation 3 Interpretations of

More information

5. Let W follow a normal distribution with mean of μ and the variance of 1. Then, the pdf of W is

5. Let W follow a normal distribution with mean of μ and the variance of 1. Then, the pdf of W is Practice Final Exam Last Name:, First Name:. Please write LEGIBLY. Answer all questions on this exam in the space provided (you may use the back of any page if you need more space). Show all work but do

More information

Computer Exercise 3 Answers Hypothesis Testing

Computer Exercise 3 Answers Hypothesis Testing Computer Exercise 3 Answers Hypothesis Testing. reg lnhpay xper yearsed tenure ---------+------------------------------ F( 3, 6221) = 512.58 Model 457.732594 3 152.577531 Residual 1851.79026 6221.297667619

More information

Answers to Problem Set #4

Answers to Problem Set #4 Answers to Problem Set #4 Problems. Suppose that, from a sample of 63 observations, the least squares estimates and the corresponding estimated variance covariance matrix are given by: bβ bβ 2 bβ 3 = 2

More information

ECON3150/4150 Spring 2016

ECON3150/4150 Spring 2016 ECON3150/4150 Spring 2016 Lecture 6 Multiple regression model Siv-Elisabeth Skjelbred University of Oslo February 5th Last updated: February 3, 2016 1 / 49 Outline Multiple linear regression model and

More information

Econometrics I KS. Module 2: Multivariate Linear Regression. Alexander Ahammer. This version: April 16, 2018

Econometrics I KS. Module 2: Multivariate Linear Regression. Alexander Ahammer. This version: April 16, 2018 Econometrics I KS Module 2: Multivariate Linear Regression Alexander Ahammer Department of Economics Johannes Kepler University of Linz This version: April 16, 2018 Alexander Ahammer (JKU) Module 2: Multivariate

More information

Applied Statistics and Econometrics

Applied Statistics and Econometrics Applied Statistics and Econometrics Lecture 6 Saul Lach September 2017 Saul Lach () Applied Statistics and Econometrics September 2017 1 / 53 Outline of Lecture 6 1 Omitted variable bias (SW 6.1) 2 Multiple

More information

Statistical Modelling in Stata 5: Linear Models

Statistical Modelling in Stata 5: Linear Models Statistical Modelling in Stata 5: Linear Models Mark Lunt Arthritis Research UK Epidemiology Unit University of Manchester 07/11/2017 Structure This Week What is a linear model? How good is my model? Does

More information

MS&E 226: Small Data

MS&E 226: Small Data MS&E 226: Small Data Lecture 15: Examples of hypothesis tests (v5) Ramesh Johari ramesh.johari@stanford.edu 1 / 32 The recipe 2 / 32 The hypothesis testing recipe In this lecture we repeatedly apply the

More information

Tests of Linear Restrictions

Tests of Linear Restrictions Tests of Linear Restrictions 1. Linear Restricted in Regression Models In this tutorial, we consider tests on general linear restrictions on regression coefficients. In other tutorials, we examine some

More information

ST430 Exam 2 Solutions

ST430 Exam 2 Solutions ST430 Exam 2 Solutions Date: November 9, 2015 Name: Guideline: You may use one-page (front and back of a standard A4 paper) of notes. No laptop or textbook are permitted but you may use a calculator. Giving

More information

Linear Regression. In this lecture we will study a particular type of regression model: the linear regression model

Linear Regression. In this lecture we will study a particular type of regression model: the linear regression model 1 Linear Regression 2 Linear Regression In this lecture we will study a particular type of regression model: the linear regression model We will first consider the case of the model with one predictor

More information

Variance Decomposition and Goodness of Fit

Variance Decomposition and Goodness of Fit Variance Decomposition and Goodness of Fit 1. Example: Monthly Earnings and Years of Education In this tutorial, we will focus on an example that explores the relationship between total monthly earnings

More information

Basic Business Statistics, 10/e

Basic Business Statistics, 10/e Chapter 4 4- Basic Business Statistics th Edition Chapter 4 Introduction to Multiple Regression Basic Business Statistics, e 9 Prentice-Hall, Inc. Chap 4- Learning Objectives In this chapter, you learn:

More information

Coefficient of Determination

Coefficient of Determination Coefficient of Determination ST 430/514 The coefficient of determination, R 2, is defined as before: R 2 = 1 SS E (yi ŷ i ) = 1 2 SS yy (yi ȳ) 2 The interpretation of R 2 is still the fraction of variance

More information

Simple Linear Regression

Simple Linear Regression Simple Linear Regression In simple linear regression we are concerned about the relationship between two variables, X and Y. There are two components to such a relationship. 1. The strength of the relationship.

More information

Lab 10 - Binary Variables

Lab 10 - Binary Variables Lab 10 - Binary Variables Spring 2017 Contents 1 Introduction 1 2 SLR on a Dummy 2 3 MLR with binary independent variables 3 3.1 MLR with a Dummy: different intercepts, same slope................. 4 3.2

More information

Categorical Predictor Variables

Categorical Predictor Variables Categorical Predictor Variables We often wish to use categorical (or qualitative) variables as covariates in a regression model. For binary variables (taking on only 2 values, e.g. sex), it is relatively

More information

Tentative solutions TMA4255 Applied Statistics 16 May, 2015

Tentative solutions TMA4255 Applied Statistics 16 May, 2015 Norwegian University of Science and Technology Department of Mathematical Sciences Page of 9 Tentative solutions TMA455 Applied Statistics 6 May, 05 Problem Manufacturer of fertilizers a) Are these independent

More information

Figure 1: The fitted line using the shipment route-number of ampules data. STAT5044: Regression and ANOVA The Solution of Homework #2 Inyoung Kim

Figure 1: The fitted line using the shipment route-number of ampules data. STAT5044: Regression and ANOVA The Solution of Homework #2 Inyoung Kim 0.0 1.0 1.5 2.0 2.5 3.0 8 10 12 14 16 18 20 22 y x Figure 1: The fitted line using the shipment route-number of ampules data STAT5044: Regression and ANOVA The Solution of Homework #2 Inyoung Kim Problem#

More information

Statistics and Quantitative Analysis U4320

Statistics and Quantitative Analysis U4320 Statistics and Quantitative Analysis U3 Lecture 13: Explaining Variation Prof. Sharyn O Halloran Explaining Variation: Adjusted R (cont) Definition of Adjusted R So we'd like a measure like R, but one

More information

Regression #8: Loose Ends

Regression #8: Loose Ends Regression #8: Loose Ends Econ 671 Purdue University Justin L. Tobias (Purdue) Regression #8 1 / 30 In this lecture we investigate a variety of topics that you are probably familiar with, but need to touch

More information

1 Independent Practice: Hypothesis tests for one parameter:

1 Independent Practice: Hypothesis tests for one parameter: 1 Independent Practice: Hypothesis tests for one parameter: Data from the Indian DHS survey from 2006 includes a measure of autonomy of the women surveyed (a scale from 0-10, 10 being the most autonomous)

More information

Lecture 18: Simple Linear Regression

Lecture 18: Simple Linear Regression Lecture 18: Simple Linear Regression BIOS 553 Department of Biostatistics University of Michigan Fall 2004 The Correlation Coefficient: r The correlation coefficient (r) is a number that measures the strength

More information

R 2 and F -Tests and ANOVA

R 2 and F -Tests and ANOVA R 2 and F -Tests and ANOVA December 6, 2018 1 Partition of Sums of Squares The distance from any point y i in a collection of data, to the mean of the data ȳ, is the deviation, written as y i ȳ. Definition.

More information

Multivariate Regression: Part I

Multivariate Regression: Part I Topic 1 Multivariate Regression: Part I ARE/ECN 240 A Graduate Econometrics Professor: Òscar Jordà Outline of this topic Statement of the objective: we want to explain the behavior of one variable as a

More information

Lecture 2. The Simple Linear Regression Model: Matrix Approach

Lecture 2. The Simple Linear Regression Model: Matrix Approach Lecture 2 The Simple Linear Regression Model: Matrix Approach Matrix algebra Matrix representation of simple linear regression model 1 Vectors and Matrices Where it is necessary to consider a distribution

More information

Economics 326 Methods of Empirical Research in Economics. Lecture 14: Hypothesis testing in the multiple regression model, Part 2

Economics 326 Methods of Empirical Research in Economics. Lecture 14: Hypothesis testing in the multiple regression model, Part 2 Economics 326 Methods of Empirical Research in Economics Lecture 14: Hypothesis testing in the multiple regression model, Part 2 Vadim Marmer University of British Columbia May 5, 2010 Multiple restrictions

More information

Binary Dependent Variables

Binary Dependent Variables Binary Dependent Variables In some cases the outcome of interest rather than one of the right hand side variables - is discrete rather than continuous Binary Dependent Variables In some cases the outcome

More information

Lecture 4: Multivariate Regression, Part 2

Lecture 4: Multivariate Regression, Part 2 Lecture 4: Multivariate Regression, Part 2 Gauss-Markov Assumptions 1) Linear in Parameters: Y X X X i 0 1 1 2 2 k k 2) Random Sampling: we have a random sample from the population that follows the above

More information

9. Linear Regression and Correlation

9. Linear Regression and Correlation 9. Linear Regression and Correlation Data: y a quantitative response variable x a quantitative explanatory variable (Chap. 8: Recall that both variables were categorical) For example, y = annual income,

More information

Example: Poisondata. 22s:152 Applied Linear Regression. Chapter 8: ANOVA

Example: Poisondata. 22s:152 Applied Linear Regression. Chapter 8: ANOVA s:5 Applied Linear Regression Chapter 8: ANOVA Two-way ANOVA Used to compare populations means when the populations are classified by two factors (or categorical variables) For example sex and occupation

More information

LECTURE 6. Introduction to Econometrics. Hypothesis testing & Goodness of fit

LECTURE 6. Introduction to Econometrics. Hypothesis testing & Goodness of fit LECTURE 6 Introduction to Econometrics Hypothesis testing & Goodness of fit October 25, 2016 1 / 23 ON TODAY S LECTURE We will explain how multiple hypotheses are tested in a regression model We will define

More information

Statistiek II. John Nerbonne. March 17, Dept of Information Science incl. important reworkings by Harmut Fitz

Statistiek II. John Nerbonne. March 17, Dept of Information Science incl. important reworkings by Harmut Fitz Dept of Information Science j.nerbonne@rug.nl incl. important reworkings by Harmut Fitz March 17, 2015 Review: regression compares result on two distinct tests, e.g., geographic and phonetic distance of

More information

Recent Advances in the Field of Trade Theory and Policy Analysis Using Micro-Level Data

Recent Advances in the Field of Trade Theory and Policy Analysis Using Micro-Level Data Recent Advances in the Field of Trade Theory and Policy Analysis Using Micro-Level Data July 2012 Bangkok, Thailand Cosimo Beverelli (World Trade Organization) 1 Content a) Classical regression model b)

More information

Regression Analysis. BUS 735: Business Decision Making and Research. Learn how to detect relationships between ordinal and categorical variables.

Regression Analysis. BUS 735: Business Decision Making and Research. Learn how to detect relationships between ordinal and categorical variables. Regression Analysis BUS 735: Business Decision Making and Research 1 Goals of this section Specific goals Learn how to detect relationships between ordinal and categorical variables. Learn how to estimate

More information

y response variable x 1, x 2,, x k -- a set of explanatory variables

y response variable x 1, x 2,, x k -- a set of explanatory variables 11. Multiple Regression and Correlation y response variable x 1, x 2,, x k -- a set of explanatory variables In this chapter, all variables are assumed to be quantitative. Chapters 12-14 show how to incorporate

More information

Please discuss each of the 3 problems on a separate sheet of paper, not just on a separate page!

Please discuss each of the 3 problems on a separate sheet of paper, not just on a separate page! Econometrics - Exam May 11, 2011 1 Exam Please discuss each of the 3 problems on a separate sheet of paper, not just on a separate page! Problem 1: (15 points) A researcher has data for the year 2000 from

More information

Handout 12. Endogeneity & Simultaneous Equation Models

Handout 12. Endogeneity & Simultaneous Equation Models Handout 12. Endogeneity & Simultaneous Equation Models In which you learn about another potential source of endogeneity caused by the simultaneous determination of economic variables, and learn how to

More information

Lab 07 Introduction to Econometrics

Lab 07 Introduction to Econometrics Lab 07 Introduction to Econometrics Learning outcomes for this lab: Introduce the different typologies of data and the econometric models that can be used Understand the rationale behind econometrics Understand

More information

ST430 Exam 1 with Answers

ST430 Exam 1 with Answers ST430 Exam 1 with Answers Date: October 5, 2015 Name: Guideline: You may use one-page (front and back of a standard A4 paper) of notes. No laptop or textook are permitted but you may use a calculator.

More information

22s:152 Applied Linear Regression. Chapter 8: 1-Way Analysis of Variance (ANOVA) 2-Way Analysis of Variance (ANOVA)

22s:152 Applied Linear Regression. Chapter 8: 1-Way Analysis of Variance (ANOVA) 2-Way Analysis of Variance (ANOVA) 22s:152 Applied Linear Regression Chapter 8: 1-Way Analysis of Variance (ANOVA) 2-Way Analysis of Variance (ANOVA) We now consider an analysis with only categorical predictors (i.e. all predictors are

More information

Essential of Simple regression

Essential of Simple regression Essential of Simple regression We use simple regression when we are interested in the relationship between two variables (e.g., x is class size, and y is student s GPA). For simplicity we assume the relationship

More information

EXST Regression Techniques Page 1. We can also test the hypothesis H :" œ 0 versus H :"

EXST Regression Techniques Page 1. We can also test the hypothesis H : œ 0 versus H : EXST704 - Regression Techniques Page 1 Using F tests instead of t-tests We can also test the hypothesis H :" œ 0 versus H :" Á 0 with an F test.! " " " F œ MSRegression MSError This test is mathematically

More information

Regression and the 2-Sample t

Regression and the 2-Sample t Regression and the 2-Sample t James H. Steiger Department of Psychology and Human Development Vanderbilt University James H. Steiger (Vanderbilt University) Regression and the 2-Sample t 1 / 44 Regression

More information

1 Multiple Regression

1 Multiple Regression 1 Multiple Regression In this section, we extend the linear model to the case of several quantitative explanatory variables. There are many issues involved in this problem and this section serves only

More information

Problem Set 10: Panel Data

Problem Set 10: Panel Data Problem Set 10: Panel Data 1. Read in the data set, e11panel1.dta from the course website. This contains data on a sample or 1252 men and women who were asked about their hourly wage in two years, 2005

More information

Inferences for Regression

Inferences for Regression Inferences for Regression An Example: Body Fat and Waist Size Looking at the relationship between % body fat and waist size (in inches). Here is a scatterplot of our data set: Remembering Regression In

More information

Data Analysis 1 LINEAR REGRESSION. Chapter 03

Data Analysis 1 LINEAR REGRESSION. Chapter 03 Data Analysis 1 LINEAR REGRESSION Chapter 03 Data Analysis 2 Outline The Linear Regression Model Least Squares Fit Measures of Fit Inference in Regression Other Considerations in Regression Model Qualitative

More information

Section Least Squares Regression

Section Least Squares Regression Section 2.3 - Least Squares Regression Statistics 104 Autumn 2004 Copyright c 2004 by Mark E. Irwin Regression Correlation gives us a strength of a linear relationship is, but it doesn t tell us what it

More information

Graduate Econometrics Lecture 4: Heteroskedasticity

Graduate Econometrics Lecture 4: Heteroskedasticity Graduate Econometrics Lecture 4: Heteroskedasticity Department of Economics University of Gothenburg November 30, 2014 1/43 and Autocorrelation Consequences for OLS Estimator Begin from the linear model

More information

Ch 3: Multiple Linear Regression

Ch 3: Multiple Linear Regression Ch 3: Multiple Linear Regression 1. Multiple Linear Regression Model Multiple regression model has more than one regressor. For example, we have one response variable and two regressor variables: 1. delivery

More information

Review of Statistics 101

Review of Statistics 101 Review of Statistics 101 We review some important themes from the course 1. Introduction Statistics- Set of methods for collecting/analyzing data (the art and science of learning from data). Provides methods

More information

Multiple Regression: Chapter 13. July 24, 2015

Multiple Regression: Chapter 13. July 24, 2015 Multiple Regression: Chapter 13 July 24, 2015 Multiple Regression (MR) Response Variable: Y - only one response variable (quantitative) Several Predictor Variables: X 1, X 2, X 3,..., X p (p = # predictors)

More information

Variance Decomposition in Regression James M. Murray, Ph.D. University of Wisconsin - La Crosse Updated: October 04, 2017

Variance Decomposition in Regression James M. Murray, Ph.D. University of Wisconsin - La Crosse Updated: October 04, 2017 Variance Decomposition in Regression James M. Murray, Ph.D. University of Wisconsin - La Crosse Updated: October 04, 2017 PDF file location: http://www.murraylax.org/rtutorials/regression_anovatable.pdf

More information

Acknowledgements. Outline. Marie Diener-West. ICTR Leadership / Team INTRODUCTION TO CLINICAL RESEARCH. Introduction to Linear Regression

Acknowledgements. Outline. Marie Diener-West. ICTR Leadership / Team INTRODUCTION TO CLINICAL RESEARCH. Introduction to Linear Regression INTRODUCTION TO CLINICAL RESEARCH Introduction to Linear Regression Karen Bandeen-Roche, Ph.D. July 17, 2012 Acknowledgements Marie Diener-West Rick Thompson ICTR Leadership / Team JHU Intro to Clinical

More information

Answer all questions from part I. Answer two question from part II.a, and one question from part II.b.

Answer all questions from part I. Answer two question from part II.a, and one question from part II.b. B203: Quantitative Methods Answer all questions from part I. Answer two question from part II.a, and one question from part II.b. Part I: Compulsory Questions. Answer all questions. Each question carries

More information

STATISTICS 110/201 PRACTICE FINAL EXAM

STATISTICS 110/201 PRACTICE FINAL EXAM STATISTICS 110/201 PRACTICE FINAL EXAM Questions 1 to 5: There is a downloadable Stata package that produces sequential sums of squares for regression. In other words, the SS is built up as each variable

More information

Lecture 4: Multivariate Regression, Part 2

Lecture 4: Multivariate Regression, Part 2 Lecture 4: Multivariate Regression, Part 2 Gauss-Markov Assumptions 1) Linear in Parameters: Y X X X i 0 1 1 2 2 k k 2) Random Sampling: we have a random sample from the population that follows the above

More information

22s:152 Applied Linear Regression

22s:152 Applied Linear Regression 22s:152 Applied Linear Regression Chapter 7: Dummy Variable Regression So far, we ve only considered quantitative variables in our models. We can integrate categorical predictors by constructing artificial

More information

ECON Introductory Econometrics. Lecture 7: OLS with Multiple Regressors Hypotheses tests

ECON Introductory Econometrics. Lecture 7: OLS with Multiple Regressors Hypotheses tests ECON4150 - Introductory Econometrics Lecture 7: OLS with Multiple Regressors Hypotheses tests Monique de Haan (moniqued@econ.uio.no) Stock and Watson Chapter 7 Lecture outline 2 Hypothesis test for single

More information

Linear regression. We have that the estimated mean in linear regression is. ˆµ Y X=x = ˆβ 0 + ˆβ 1 x. The standard error of ˆµ Y X=x is.

Linear regression. We have that the estimated mean in linear regression is. ˆµ Y X=x = ˆβ 0 + ˆβ 1 x. The standard error of ˆµ Y X=x is. Linear regression We have that the estimated mean in linear regression is The standard error of ˆµ Y X=x is where x = 1 n s.e.(ˆµ Y X=x ) = σ ˆµ Y X=x = ˆβ 0 + ˆβ 1 x. 1 n + (x x)2 i (x i x) 2 i x i. The

More information

Linear Regression with Multiple Regressors

Linear Regression with Multiple Regressors Linear Regression with Multiple Regressors (SW Chapter 6) Outline 1. Omitted variable bias 2. Causality and regression analysis 3. Multiple regression and OLS 4. Measures of fit 5. Sampling distribution

More information

Final Exam. Question 1 (20 points) 2 (25 points) 3 (30 points) 4 (25 points) 5 (10 points) 6 (40 points) Total (150 points) Bonus question (10)

Final Exam. Question 1 (20 points) 2 (25 points) 3 (30 points) 4 (25 points) 5 (10 points) 6 (40 points) Total (150 points) Bonus question (10) Name Economics 170 Spring 2004 Honor pledge: I have neither given nor received aid on this exam including the preparation of my one page formula list and the preparation of the Stata assignment for the

More information

Statistical Inference. Part IV. Statistical Inference

Statistical Inference. Part IV. Statistical Inference Part IV Statistical Inference As of Oct 5, 2017 Sampling Distributions of the OLS Estimator 1 Statistical Inference Sampling Distributions of the OLS Estimator Testing Against One-Sided Alternatives Two-Sided

More information

Measurement Error. Often a data set will contain imperfect measures of the data we would ideally like.

Measurement Error. Often a data set will contain imperfect measures of the data we would ideally like. Measurement Error Often a data set will contain imperfect measures of the data we would ideally like. Aggregate Data: (GDP, Consumption, Investment are only best guesses of theoretical counterparts and

More information

Multiple Regression: Inference

Multiple Regression: Inference Multiple Regression: Inference The t-test: is ˆ j big and precise enough? We test the null hypothesis: H 0 : β j =0; i.e. test that x j has no effect on y once the other explanatory variables are controlled

More information

Simple Linear Regression

Simple Linear Regression Simple Linear Regression ST 430/514 Recall: A regression model describes how a dependent variable (or response) Y is affected, on average, by one or more independent variables (or factors, or covariates)

More information

Applied Statistics and Econometrics

Applied Statistics and Econometrics Applied Statistics and Econometrics Lecture 7 Saul Lach September 2017 Saul Lach () Applied Statistics and Econometrics September 2017 1 / 68 Outline of Lecture 7 1 Empirical example: Italian labor force

More information

Econometrics Homework 1

Econometrics Homework 1 Econometrics Homework Due Date: March, 24. by This problem set includes questions for Lecture -4 covered before midterm exam. Question Let z be a random column vector of size 3 : z = @ (a) Write out z

More information

Explanatory Variables Must be Linear Independent...

Explanatory Variables Must be Linear Independent... Explanatory Variables Must be Linear Independent... Recall the multiple linear regression model Y j = β 0 + β 1 X 1j + β 2 X 2j + + β p X pj + ε j, i = 1,, n. is a shorthand for n linear relationships

More information

Lecture 11: Simple Linear Regression

Lecture 11: Simple Linear Regression Lecture 11: Simple Linear Regression Readings: Sections 3.1-3.3, 11.1-11.3 Apr 17, 2009 In linear regression, we examine the association between two quantitative variables. Number of beers that you drink

More information

Simple Linear Regression: One Qualitative IV

Simple Linear Regression: One Qualitative IV Simple Linear Regression: One Qualitative IV 1. Purpose As noted before regression is used both to explain and predict variation in DVs, and adding to the equation categorical variables extends regression

More information

Applied Statistics and Econometrics

Applied Statistics and Econometrics Applied Statistics and Econometrics Lecture 5 Saul Lach September 2017 Saul Lach () Applied Statistics and Econometrics September 2017 1 / 44 Outline of Lecture 5 Now that we know the sampling distribution

More information

Correlation and Simple Linear Regression

Correlation and Simple Linear Regression Correlation and Simple Linear Regression Sasivimol Rattanasiri, Ph.D Section for Clinical Epidemiology and Biostatistics Ramathibodi Hospital, Mahidol University E-mail: sasivimol.rat@mahidol.ac.th 1 Outline

More information

Lecture 3: Inference in SLR

Lecture 3: Inference in SLR Lecture 3: Inference in SLR STAT 51 Spring 011 Background Reading KNNL:.1.6 3-1 Topic Overview This topic will cover: Review of hypothesis testing Inference about 1 Inference about 0 Confidence Intervals

More information

At this point, if you ve done everything correctly, you should have data that looks something like:

At this point, if you ve done everything correctly, you should have data that looks something like: This homework is due on July 19 th. Economics 375: Introduction to Econometrics Homework #4 1. One tool to aid in understanding econometrics is the Monte Carlo experiment. A Monte Carlo experiment allows

More information

Lecture 12: Interactions and Splines

Lecture 12: Interactions and Splines Lecture 12: Interactions and Splines Sandy Eckel seckel@jhsph.edu 12 May 2007 1 Definition Effect Modification The phenomenon in which the relationship between the primary predictor and outcome varies

More information

Analysing data: regression and correlation S6 and S7

Analysing data: regression and correlation S6 and S7 Basic medical statistics for clinical and experimental research Analysing data: regression and correlation S6 and S7 K. Jozwiak k.jozwiak@nki.nl 2 / 49 Correlation So far we have looked at the association

More information

Regression with a Single Regressor: Hypothesis Tests and Confidence Intervals

Regression with a Single Regressor: Hypothesis Tests and Confidence Intervals Regression with a Single Regressor: Hypothesis Tests and Confidence Intervals (SW Chapter 5) Outline. The standard error of ˆ. Hypothesis tests concerning β 3. Confidence intervals for β 4. Regression

More information

Inference for the Regression Coefficient

Inference for the Regression Coefficient Inference for the Regression Coefficient Recall, b 0 and b 1 are the estimates of the slope β 1 and intercept β 0 of population regression line. We can shows that b 0 and b 1 are the unbiased estimates

More information

Lecture 5: ANOVA and Correlation

Lecture 5: ANOVA and Correlation Lecture 5: ANOVA and Correlation Ani Manichaikul amanicha@jhsph.edu 23 April 2007 1 / 62 Comparing Multiple Groups Continous data: comparing means Analysis of variance Binary data: comparing proportions

More information

Econometrics Midterm Examination Answers

Econometrics Midterm Examination Answers Econometrics Midterm Examination Answers March 4, 204. Question (35 points) Answer the following short questions. (i) De ne what is an unbiased estimator. Show that X is an unbiased estimator for E(X i

More information

Linear Regression Model. Badr Missaoui

Linear Regression Model. Badr Missaoui Linear Regression Model Badr Missaoui Introduction What is this course about? It is a course on applied statistics. It comprises 2 hours lectures each week and 1 hour lab sessions/tutorials. We will focus

More information