Advanced Regression Summer Statistics Institute. Day 2: MLR and Dummy Variables

Size: px
Start display at page:

Download "Advanced Regression Summer Statistics Institute. Day 2: MLR and Dummy Variables"

Transcription

1 Advanced Regression Summer Statistics Institute Day 2: MLR and Dummy Variables 1

2 The Multiple Regression Model Many problems involve more than one independent variable or factor which affects the dependent or response variable. More than size to predict house price! Demand for a product given prices of competing brands, advertising,house hold attributes, etc. In SLR, the conditional mean of Y depends on X. The Multiple Linear Regression (MLR) model extends this idea to include more than one independent variable. 2

3 The MLR Model Same as always, but with more covariates. Y = β 0 + β 1 X 1 + β 2 X β p X p + ɛ Recall the key assumptions of our linear regression model: (i) The conditional mean of Y is linear in the X j variables. (ii) The error term (deviations from line) are normally distributed independent from each other identically distributed (i.e., they have constant variance) Y X 1... X p N(β 0 + β 1 X β p X p, σ 2 ) 3

4 The MLR Model Our interpretation of regression coefficients can be extended from the simple single covariate regression case: β j = E[Y X 1,..., X p ] X j Holding all other variables constant, β j is the average change in Y per unit change in X j. 4

5 The MLR Model If p = 2, we can plot the regression surface in 3D. Consider sales of a product as predicted by price of this product (P1) and the price of a competing product (P2). Sales = β 0 + β 1 P1 + β 2 P2 + ɛ 5

6 Least Squares Y = β 0 + β 1 X β p X p + ε, ε N(0, σ 2 ) How do we estimate the MLR model parameters? The principle of Least Squares is exactly the same as before: Define the fitted values Find the best fitting plane by minimizing the sum of squared residuals. 6

7 Least Squares The data... p1 p2 Sales

8 Least Squares Model: SUMMARY OUTPUT Sales i = β 0 + β 1 P1 i + β 2 P2 i + ɛ i, ɛ N(0, σ 2 ) Regression Statistics Multiple R 0.99 R Square 0.99 Adjusted R Square 0.99 Standard Error Observations ANOVA df SS MS F Significance F Regression Residual Total Coefficients Standard Error t Stat P-value Lower 95% Upper 95% Intercept p p b 0 = ˆβ 0 = , b 1 = ˆβ 1 = 97.66, b 2 = ˆβ 2 = , s = ˆσ =

9 Plug-in Prediction in MLR Suppose that by using advanced corporate espionage tactics, I discover that my competitor will charge $10 the next quarter. After some marketing analysis I decided to charge $8. How much will I sell? Our model is Sales = β 0 + β 1 P1 + β 2 P2 + ɛ with ɛ N(0, σ 2 ) Our estimates are b 0 = 115, b 1 = 97, b 2 = 109 and s = 28 which leads to Sales = P P2 + ɛ with ɛ N(0, 28 2 ) 9

10 Plug-in Prediction in MLR By plugging-in the numbers, Sales = ɛ = ɛ Sales P1 = 8, P2 = 10 N(437, 28 2 ) and the 95% Prediction Interval is (437 ± 2 28) 381 < Sales <

11 Least Squares Just as before, each b i is our estimate of β i Fitted Values: Ŷ i = b 0 + b 1 X 1i + b 2 X 2i... + b p X p. Residuals: e i = Y i Ŷi. Least Squares: Find b 0, b 1, b 2,..., b p to minimize n i=1 e2 i. In MLR the formulas for the b i s are too complicated so we won t talk about them... 11

12 Least Squares 12

13 Least Squares p1 p2 Sales yhat residuals b0 b1 b Excel break: fitted values, residuals,... 13

14 Residual Standard Error The calculation for s 2 is exactly the same: s 2 = n i=1 e2 i n p 1 = n i=1 (Y i Ŷi) 2 n p 1 Ŷ i = b 0 + b 1 X 1i + + b p X pi The residual standard error is the estimate for the standard deviation of ɛ,i.e, ˆσ = s = s 2. 14

15 Residuals in MLR As in the SLR model, the residuals in multiple regression are purged of any linear relationship to the independent variables. Once again, they are on average zero. Because the fitted values are an exact linear combination of the X s they are not correlated to the residuals. We decompose Y into the part predicted by X and the part due to idiosyncratic error. Y = Ŷ + e ē = 0; corr(x j, e) = 0; corr(ŷ, e) = 0 15

16 Residuals in MLR Consider the residuals from the Sales data: residuals residuals residuals fitted P P2 16

17 Fitted Values in MLR Another great plot for MLR problems is to look at Y (true values) against Ŷ (fitted values). y=sales y.hat (MLR: p1 and p2) If things are working, these values should form a nice straight line. Can you guess the slope of the blue line? 17

18 Fitted Values in MLR With just P1... y=sales y=sales y=sales p y.hat (SLR: p1) 0 Left plot: Sales vs P1 Right plot: Sales vs. ŷ (only P1 as a regressor) 18

19 Fitted Values in MLR Now, with P1 and P2... y=sales y=sales y=sales y.hat(slr:p1) y.hat(slr:p2) y.hat(mlr:p1 and p2) First plot: Sales regressed on P1 alone... Second plot: Sales regressed on P2 alone... Third plot: Sales regressed on P1 and P2 19

20 R-squared We still have our old variance decomposition identity... SST = SSR + SSE... and R 2 is once again defined as R 2 = SSR SST = 1 SSE SST telling us the percentage of variation in Y explained by the X s. In Excel, R 2 is in the same place and Multiple R refers to the correlation between Ŷ and Y. 20

21 Least Squares Sales SUMMARY i = OUTPUT β 0 + β 1 P1 i + β 2 P2 i + ɛ i Regression Statistics Multiple R 0.99 R Square 0.99 Adjusted R Square 0.99 Standard Error Observations ANOVA df SS MS F Significance F Regression Residual Total Coefficients Standard Error t Stat P-value Lower 95% Upper 95% Intercept p p R 2 = 0.99 Multiple R = r Y,Ŷ = corr(y,ŷ ) = 0.99 Note that R 2 = corr(y, Ŷ ) 2 21

22 Back to Baseball SUMMARY OUTPUT R/G = β 0 + β 1 OBP + β 2 SLG + ɛ Regression Statistics Multiple R R Square Adjusted R Square Standard Error Observations 30 ANOVA df SS MS F Significance F Regression E 15 Residual Total Coefficientst andard Erro t Stat P value Lower 95% Upper 95% Intercept E OBP E SLG R 2 = Multiple R = r Y,Ŷ = corr(y,ŷ ) = Note that R 2 = corr(y, Ŷ ) 2 22

23 Intervals for Individual Coefficients As in SLR, the sampling distribution tells us how close we can expect b j to be from β j The LS estimators are unbiased: E[b j ] = β j for j = 0,..., d. We denote the sampling distribution of each estimator as b j N(β j, s 2 b j ) 23

24 Intervals for Individual Coefficients Intervals and t-statistics are exactly the same as in SLR. A 95% C.I. for β j is approximately b j ± 2s bj The t-stat: t j = (b j β 0 j ) s bj is the number of standard errors between the LS estimate and the null value (β 0 j ) As before, we reject the null when t-stat is greater than 2 in absolute value Also as before, a small p-value leads to a rejection of the null Rejecting when the p-value is less than 0.05 is equivalent to rejecting when the t j > 2 24

25 Intervals for Individual Coefficients IMPORTANT: Intervals and testing via b j & s bj procedures: are one-at-a-time You are evaluating the j th coefficient conditional on the other X s being in the model, but regardless of the values you ve estimated for the other b s. 25

26 In Excel... Do we know all of these numbers? SUMMARY OUTPUT Regression Statistics Multiple R 0.99 R Square 0.99 Adjusted R Square 0.99 Standard Error Observations ANOVA df SS MS F Significance F Regression Residual Total Coefficients Standard Error t Stat P-value Lower 95% Upper 95% Intercept p p % C.I. for β 1 b1 ± 2 s b1 [ ; ] = [ ; 92.36] 26

27 Understanding Multiple Regression There are two, very important things we need to understand about the MLR model: 1. How dependencies between the X s affect our interpretation of a multiple regression; 2. How dependencies between the X s inflate standard errors (aka multicolinearity) We will look at a few examples to illustrate the ideas... 27

28 Understanding Multiple Regression The Sales Data: Sales : units sold in excess of a baseline P1: our price in $ (in excess of a baseline price) P2: competitors price (again, over a baseline) 28

29 Understanding Multiple Regression ress If we regress Sales on our own price, we obtain a somewhat Regression Plot surprising conclusion... thesales higher = the price p1 the more we sell!! S = R-Sq = 19.6 % R-Sq(adj) = 18.8 % e, in the at g on gher ssociated re sales!! Sales It looks like we should just raise our prices, right? NO, not if 4 p1 you have taken this statistics class! 29 The regression line

30 Understanding Multiple Regression The regression equation for Sales on own price (P1) is: Sales = P1 If now we add the competitors price to the regression we get Sales = P P2 Does this look better? How did it happen? Remember: 97.7 is the affect on sales of a change in P1 with P2 held fixed!! 30

31 Understanding Multiple Regression How can we see what is going on? How can we see what is going on? Let s compare Sales in two If we different compares observations: sales in weeks weeks and and , we see that an increase in p1, holding p2 constant We see that an increase in P1, holding P2 constant, (82 to 99) corresponds to a drop is sales. corresponds to a drop in Sales! p p Sales p1 Note the strong relationship between p1 and p2!! Note the strong relationship (dependence) between P1 and P2!!

32 Understanding Multiple Regression Here we select a subset of points where p varies and Let s p2 does look atis ahelp subsetapproximately of points where P1constant. varies and P2 is held approximately constant p1 5 4 Sales p p For a fixed level of p2, variation in p1 is negatively correlated with Sales!! with sale! For a fixed level of P2, variation in P1 is negatively correlated 32

33 Understanding Multiple Regression Different colors indicate different ranges of p2. Below, different colors indicate different ranges for P2... p1 larger p1 are associated with larger p2 Sales for each fixed level of p2 there is a negative relationship between sales and p1 sales$p sales$sales sales$p2 p sales$p1 p1 33

34 Understanding Multiple Regression Summary: 1. A larger P1 is associated with larger P2 and the overall effect leads to bigger sales 2. With P2 held fixed, a larger P1 leads to lower sales 3. MLR does the trick and unveils the correct economic relationship between Sales and prices! 34

35 Understanding Multiple Regression Beer Data (from an MBA class) nbeer number of beers before getting drunk d height and weight 20 nbeer height Is number of beers related to height? 35

36 Understanding Multiple Regression SUMMARY OUTPUT nbeers = β 0 + β 1 height + ɛ Regression Statistics Multiple R 0.58 R Square 0.34 Adjusted R Square 0.33 Standard Error 3.11 Observations ANOVA df SS MS F Significance F Regression Residual Total Coefficients Standard Error t Stat P-value Lower 95% Upper 95% Intercept height Yes! Beers and height are related... 36

37 Understanding Multiple Regression nbeers = β 0 + β 1 weight + β 2 height + ɛ SUMMARY OUTPUT Regression Statistics Multiple R 0.69 R Square 0.48 Adjusted R Square 0.46 Standard Error 2.78 Observations ANOVA df SS MS F Significance F Regression Residual Total Coefficients Standard Error t Stat P-value Lower 95% Upper 95% Intercept weight height What about now?? Height is not necessarily a factor... 37

38 S = R-Sq = 48.1% R-Sq(adj) = 45.9% Understanding Multiple Regression 75 The correlations: height nbeer weight weight height weight 200 The two x s are highly correlated!! If we regress beers only on height we see an effect. Bigger heights go with more beers. However, when height goes up weight tends to go up as well... in the first regression, height was a proxy for the real cause of drinking ability. Bigger people can drink more and weight is a more accurate measure of bigness. 38

39 No, not all. Understanding Multiple Regression weight S = R-Sq = 48.1% R-Sq(adj) = 45.9% 75 The correlations: height nbeer weight weight height weight 200 The two x s are highly correlated!! In the multiple regression, when we consider only the variation in height that is not associated with variation in weight, we see no relationship between height and beers. 39

40 Understanding Multiple Regression SUMMARY OUTPUT nbeers = β 0 + β 1 weight + ɛ Regression Statistics Multiple R 0.69 R Square 0.48 Adjusted R 0.47 Standard E 2.76 Observation 50 ANOVA df SS MS F Significance F Regression E-08 Residual Total Coefficients Standard Error t Stat P-value Lower 95% Upper 95% Intercept weight Why is this a better model than the one with weight and height?? 40

41 Understanding Multiple Regression In general, when we see a relationship between y and x (or x s), that relationship may be driven by variables lurking in the background which are related to your current x s. This makes it hard to reliably find causal relationships. Any correlation (association) you find could be caused by other variables in the background... correlation is NOT causation Any time a report says two variables are related and there s a suggestion of a causal relationship, ask yourself whether or not other variables might be the real reason for the effect. Multiple regression allows us to control for all important variables by including them into the regression. Once we control for weight, height and beers are NOT related!! 41

42 Understanding Multiple Regression With the above examples we saw how the relationship amongst the X s can affect our interpretation of a multiple regression... we will now look at how these dependencies will inflate the standard errors for the regression coefficients, and hence our uncertainty about them. Remember that in simple linear regression our uncertainty about b 1 is measured by s 2 b 1 = s 2 (n 1)s 2 x = s 2 n i=1 (X i X ) 2 The more variation in X (the larger s 2 x ) the more we know about β 1... ie, (b 1 β 1 ) is smaller. 42

43 Understanding Multiple Regression In Multiple Regression we seek to relate the variation in Y to the variation in an X holding the other X s fixed. So, we need to see how much each X varies on its own. in MLR, the standard errors are defined by the following formula: s 2 b j = s 2 variation in X j not associated with other X s How do we measure the bottom part of the equation? We regress X j on all the other X s and compute the residual sum of squares (call it SSE j ) so that s 2 b j = s2 SSE j 43

44 Understanding Multiple Regression In the number of beers example... s = 2.78 and the regression on height on weight gives... SUMMARY OUTPUT Regression Statistics Multiple R R Square Adjusted R Squar Standard Error Observations ANOVA df SS MS F Significance F Regression Residual Total Coefficients Standard Error t Stat P-value Lower 95% Upper 95% Intercept weight SSE 2 = s b2 = = 0.20 Is this right? 44

45 Understanding Multiple Regression What happens if we are regressing Y on X s that are highly correlated. SSE j goes down and the standard error s bj up! goes What is the effect on the confidence intervals (b j ± 2 s bj )? They get wider! This situation is called Multicolinearity If a variable X does nothing on its own we can t estimate its effect on Y. 45

46 SUMMARY OUTPUT Back to Baseball Let s try to add AVG on top of OBP Regression Statistics Multiple R R Square Adjusted R Square Standard Error Observations 30 ANOVA df SS MS F Significance F Regression E 14 Residual Total Coefficientst andard Erro t Stat P value Lower 95% Upper 95% Intercept E AVG OBP E R/G = β 0 + β 1 AVG + β 2 OBP + ɛ Is AVG any good? 46

47 SUMMARY OUTPUT Back to Baseball - Now let s add SLG Regression Statistics Multiple R R Square Adjusted R Square Standard Error Observations 30 ANOVA df SS MS F Significance F Regression E 15 Residual Total Coefficientst andard Erro t Stat P value Lower 95% Upper 95% Intercept E OBP E SLG R/G = β 0 + β 1 OBP + β 2 SLG + ɛ What about now? Is SLG any good 47

48 Back to Baseball Correlations AVG 1 OBP SLG When AVG is added to the model with OBP, no additional information is conveyed. AVG does nothing on its own to help predict Runs per Game... SLG however, measures something that OBP doesn t (power!) and by doing something on its own it is relevant to help predict Runs per Game. (Okay, but not much...) 48

49 F-tests In many situation, we need a testing procedure that can address simultaneous hypotheses about more than one coefficient Why not the t-test? We will look at two important types of simultaneous tests (i) Overall Test of Significance (ii) Partial F-test The first test will help us determine whether or not our regression is worth anything... the second will allow us to compare different models. 49

50 Supervisor Performance Data Suppose you are interested in the relationship between the overall performance of supervisors to specific activities involving interactions between supervisors and employees (from a psychology management study) The Data Y = Overall rating of supervisor X 1 = Handles employee complaints X 2 = Does not allow special privileges X 3 = Opportunity to learn new things X 4 = Raises based on performance X 5 = Too critical of poor performance X 6 = Rate of advancing to better jobs 50

51 Supervisor Performance Data Supervisor Performance Data Applied Regression Analysis Carlos M. Carvalho!""#$%&'$()*+"$,- 51

52 Supervisor Performance Data F-tests %.$/0"1"$234$1")2/*53.0*6$0"1"7$81"$2))$/0"$95"::*9*"3/.$.*;3*:*923/7$!02/$2<5=/$/0"$;15=6$5:$>21*2<)".$953.*+"1"+$/5;"/0"17 Is there any relationship here? Are all the coefficients significant? Applied Regression Analysis Carlos M. Carvalho What about all of them together?!""#$%&'$()*+"$,- 52

53 Why not look at R 2 R 2 in MLR is still a measure of goodness of fit. However it ALWAYS grows as we increase the number of explanatory variables. Even if there is no relationship between the X s and Y, R 2 > 0!! To see this let s look at some Garbage Data 53

54 F-tests Garbage Data./$0""$12*03$)"140$5"6"781"$0/9"$587:85"$+818$1281$280$6/12*65$1/$ I made up 6 garbage variables that have nothing to do with Y... D*701$)"140$98#"$E=$0/9"$F587:85"G$@HI3$H,3$J3$HKB$<87*8:)"0C Applied Regression Analysis Carlos M. Carvalho!""#$%&'$()*+"$,- 54

55 Garbage Data R 2 is 26%!! We need to develop a way to see whether a R 2 of 26% can happen by chance when all the true β s are zero. It turns out that if we transform R 2 we can solve this. Define f = R 2 /p (1 R 2 )/(n p 1) A big f corresponds to a big R 2 but there is a distribution that tells what kind of f we are likely to get when all the coefficients are indeed zero... The f statistic provides a scale that allows us to decide if big is big enough. 55

56 The F -test We are testing: H 0 : β 1 = β 2 =... β p = 0 H 1 : at least one β j 0. This is the F-test of overall significance. Under the null hypothesis f is distributed: f F p,n p 1 Generally, f > 4 is very significant (reject the null). 56

57 The F -test What kind of distribution is this? F dist. with 6 and 23 df density It is a right skewed, positive valued family of distributions indexed by two parameters (the two df values). 57

58 The F-test # 5 ((7 6 # 5 Two equivalent expressions for f 9"3:;$<="<#$3=*;$3";3$4/>$3="$?@A>BA@"C$DA>*AB)";E Let s check this test for the garbage data... F-tests./0$12/34$45"$/6*7*81)$181)9:*:$;:36<"9$<16*12)":=> Applied Regression Analysis Carlos M. Carvalho How about the original analysis (survey variables)...!""#$%&'$()*+"$,- 58

59 F-test The p-value for the F -test is p-value = Pr(F p,n p 1 > f ) We usually reject the null when the p-value is less than 5%. Big f REJECT! Small p-value REJECT! 59

60 The F-test 4, # 5 ((7 6 # 5 Two equivalent expressions for f 9"3:;$<="<#$3=*;$3";3$4/>$3="$?@A>BA@"C$DA>*AB)";E In Excel, the p-value is reported under Significance F F-tests./0$12/34$45"$/6*7*81)$181)9:*:$;:36<"9$<16*12)":=> Applied Regression Analysis Carlos M. Carvalho!""#$%&'$()*+"$,- 60

61 F-tests The F-test Summary continued ***'./0123"$4$, 8 # ((8 # Note that f is also 4 equal to (you can check the math!), # 5 ((7 6 # 5 SSR/p f = SSE/(n p 1) Two equivalent expressions for f In 9"3:;$<="<#$3=*;$3";3$4/>$3="$?@A>BA@"C$DA>*AB)";E Excel, the values under MS are SSR/p and SSE/(n p 1) Applied Regression Analysis Carlos M. Carvalho f = = 1.39!""#$%&'$()*+"$,- 61

62 Partial F-tests What about fitting a reduced model with only a couple of X s? In other words, do we need all of the X s to explain Y? For example, in the Supervisor data we could argue that X 1 and X 3 were the most important variables in predicting Y. The full model (6 covariates) has Rfull 2 = while the model with only X 1 and X 3 has Rrest 2 = (check that!) Can we make a decision based only in the R 2 calculations? NO!! 62

63 Partial F -test With the total F -test, we were asking Is this regression worthwhile? Now, we re asking Is is useful to add these extra covariates to the regression? You always want to use the simplest model possible. Only add covariates if they are truly informative. 63

64 Partial F -test Consider the regression model Y = β 0 + β 1 X β pbase X pbase + β pbase +1X pbase β pfull X pfull + ε Such that d base is the number of covariates in the base (small) model and p full > p base is the number in the full (larger) model. The Partial F -test is concerned with the hypotheses H 0 : β pbase +1 = β pbase +2 =... = β pfull = 0 H 1 : at least one β j 0 for j > p base. 64

65 Partial F -test It turns out that under the null H 0 (i.e. base model is true), f = (R2 full R2 base )/(p full p base ) (1 Rfull 2 )/(n p full 1) F pfull p base,n p full 1 That is, under the null hypothesis, the ratio of normalized R 2 full R2 base (increase in R 2 ) and 1 R 2 full has F -distribution with p full p base and n p full 1 df. Big f means that R 2 full R2 base is statistically significant. Big f means that at least one of the added X s is useful. 65

66 Supervisor Performance: Partial F -test Back to our supervisor data; we want to test H 0 : β 2 = β 4 = β 5 = β 6 = 0 H 1 : at least one β j 0 for j {2, 4, 5, 6}. The F -stat is f = ( )/(6 2) (1.733)/(30 6 1) = = 0.54 This leads to a p-value of What do we conclude? 66

67 Example: Detecting Sex Discrimination Imagine you are a trial lawyer and you want to file a suit against a company for salary discrimination... you gather the following data... Gender Salary 1 Male Female Female Female Male Female

68 Detecting Sex Discrimination You want to relate salary(y ) to gender(x )... how can we do that? Gender is an example of a categorical variable. The variable gender separates our data into 2 groups or categories. The question we want to answer is: how is your salary related to which group you belong to... Could we think about additional examples of categories potentially associated with salary? MBA education vs. not legal vs. illegal immigrant quarterback vs wide receiver 68

69 Detecting Sex Discrimination We can use regression to answer these question but we need to recode the categorical variable into a dummy variable Gender Salary Sex 1 Male Female Female Female Male Female Note: In Excel you can create the dummy variable using the formula: =IF(Gender= Male,1,0) 69

70 Detecting Sex Discrimination Now you can present the following model in court: How do you interpret β 1? Salary i = β 0 + β 1 Sex i + ɛ i E[Salary Sex = 0] = β 0 β 1 is the male/female difference E[Salary Sex = 1] = β 0 + β 1 70

71 Detecting Sex Discrimination SUMMARY OUTPUT Salary i = β 0 + β 1 Sex i + ɛ i Regression Statistics Multiple R R Square Adjusted R Square Standard Error Observations 208 ANOVA df SS MS F Significance F Regression E-07 Residual Total Coefficientstandard Erro t Stat P-value Lower 95% Upper 95% Intercept E Gender E ˆβ 1 = b 1 = on average, a male makes approximately $8,300 more than a female in this firm. How should the plaintiff s lawyer use the confidence interval in his presentation? 71

72 Detecting Sex Discrimination How can the defense attorney try to counteract the plaintiff s argument? Perhaps, the observed difference in salaries is related to other variables in the background and NOT to policy discrimination... Obviously, there are many other factors which we can legitimately use in determining salaries: education job productivity experience How can we use regression to incorporate additional information? 72

73 Detecting Sex Discrimination Let s add a measure of experience... Salary i = β 0 + β 1 Sex i + β 2 Exp i + ɛ i What does that mean? E[Salary Sex = 0, Exp] = β 0 + β 2 Exp E[Salary Sex = 1, Exp] = (β 0 + β 1 ) + β 2 Exp 73

74 Detecting Sex Discrimination The data gives us the year hired as a measure of experience... YrHired Gender Salary Sex 1 92 Male Female Female Female Male Female

75 Detecting Sex Discrimination SUMMARY OUTPUT Regression Statistics Multiple R R Square Adjusted R Standard E Observation 208 Salary i = β 0 + β 1 Sex i + β 2 Exp + ɛ i ANOVA df SS MS F Significance F Regression E-31 Residual Total Coefficients tandard Erro t Stat P-value Lower 95% Upper 95% Intercept E Gender E YrHired E Salary i = Sex i 0.98Exp i + ɛ i Is this good or bad news for the defense? 75

76 Detecting Sex Discrimination Salary i = { Expi + ɛ i females Exp i + ɛ i males Salary Year Hired 76

77 More than Two Categories We can use dummy variables in situations in which there are more than two categories. Dummy variables are needed for each category except one, designated as the base category. Why? Remember that the numerical value of each category has no quantitative meaning! 77

78 Example: House Prices We want to evaluate the difference in house prices in a couple of different neighborhoods. Nbhd SqFt Price

79 Example: House Prices Let s create the dummy variables dn1, dn2 and dn3... Nbhd SqFt Price dn1 dn2 dn

80 Example: House Prices Price i = β 0 + β 1 dn1 i + β 2 dn2 i + β 3 Size i + ɛ i E[Price dn1 = 1, Size] = β 0 + β 1 + β 3 Size (Nbhd 1) E[Price dn2 = 1, Size] = β 0 + β 2 + β 3 Size (Nbhd 2) E[Price dn1 = 0, dn2 = 0, Size] = β 0 + β 3 Size (Nbhd 3) 80

81 Example: House Prices Price i = β 0 + β 1 dn1 + β 2 dn2 + β 3 Size + ɛ i SUMMARY OUTPUT Regression Statistics Multiple R R Square Adjusted R Square Standard Error Observations 128 ANOVA df SS MS F gnificance F Regression E-31 Residual Total Coefficients Standard Error t Stat P-value Lower 95%Upper 95% Intercept dn dn size Price i = dn dn Size + ɛ i 81

82 Example: House Prices Price Nbhd = 1 Nbhd = 2 Nbhd = Size 82

83 Price i = Size + ɛ i 83 Example: House Prices SUMMARY OUTPUT Price i = β 0 + β 1 Size + ɛ i Regression Statistics Multiple R R Square Adjusted R Square Standard Error Observations 128 ANOVA df SS MS F gnificance F Regression E-11 Residual Total Coefficientsandard Err t Stat P-value ower 95%pper 95% Intercept size

84 Example: House Prices Price Nbhd = 1 Nbhd = 2 Nbhd = 3 Just Size Size 84

85 Back to the Sex Discrimination Case Salary Year Hired Does it look like the effect of experience on salary is the same for males and females? 85

86 Back to the Sex Discrimination Case Could we try to expand our analysis by allowing a different slope for each group? Yes... Consider the following model: Salary i = β 0 + β 1 Exp i + β 2 Sex i + β 3 Exp i Sex i + ɛ i For Females: For Males: Salary i = β 0 + β 1 Exp i + ɛ i Salary i = (β 0 + β 2 ) + (β 1 + β 3 )Exp i + ɛ i 86

87 Sex Discrimination Case How does the data look like? YrHired Gender Salary Sex SexExp 1 92 Male Female Female Female Male Female

88 Sex Discrimination Case Salary i = β 0 + β 1 Sex i + β 2 Exp + β 3 Exp Sex + ɛ i SUMMARY OUTPUT Regression Statistics Multiple R R Square Adjusted R S Standard Err Observations 208 ANOVA df SS MS F Significance F Regression E-45 Residual Total Coefficients tandard Erro t Stat P-value Lower 95% Upper 95% Intercept E Gender E YrHired GenderExp E Salary i = Sex i Exp Exp Sex + ɛ i 88

89 Sex Discrimination Case Salary Year Hired Is this good or bad news for the plaintiff? 89

90 Variable Interaction So, the effect of experience on salary is different for males and females... in general, when the effect of the variable X 1 onto Y depends on another variable X 2 we say that X 1 and X 2 interact with each other. We can extend this notion by the inclusion of multiplicative effects through interaction terms. Y i = β 0 + β 1 X 1i + β 2 X 2i + β 3 (X 1i X 2i ) + ε E[Y X 1, X 2 ] X 1 = β 1 + β 3 X 2 We will pick this up in our next section... 90

Applied Regression Analysis. Section 2: Multiple Linear Regression

Applied Regression Analysis. Section 2: Multiple Linear Regression Applied Regression Analysis Section 2: Multiple Linear Regression 1 The Multiple Regression Model Many problems involve more than one independent variable or factor which affects the dependent or response

More information

Section 4: Multiple Linear Regression

Section 4: Multiple Linear Regression Section 4: Multiple Linear Regression Carlos M. Carvalho The University of Texas at Austin McCombs School of Business http://faculty.mccombs.utexas.edu/carlos.carvalho/teaching/ 1 The Multiple Regression

More information

Section 5: Dummy Variables and Interactions

Section 5: Dummy Variables and Interactions Section 5: Dummy Variables and Interactions Carlos M. Carvalho The University of Texas at Austin McCombs School of Business http://faculty.mccombs.utexas.edu/carlos.carvalho/teaching/ 1 Example: Detecting

More information

Regression and Model Selection Book Chapters 3 and 6. Carlos M. Carvalho The University of Texas McCombs School of Business

Regression and Model Selection Book Chapters 3 and 6. Carlos M. Carvalho The University of Texas McCombs School of Business Regression and Model Selection Book Chapters 3 and 6. Carlos M. Carvalho The University of Texas McCombs School of Business 1 1. Simple Linear Regression 2. Multiple Linear Regression 3. Dummy Variables

More information

Section 3: Simple Linear Regression

Section 3: Simple Linear Regression Section 3: Simple Linear Regression Carlos M. Carvalho The University of Texas at Austin McCombs School of Business http://faculty.mccombs.utexas.edu/carlos.carvalho/teaching/ 1 Regression: General Introduction

More information

Homework 1 Solutions

Homework 1 Solutions Homework 1 Solutions January 18, 2012 Contents 1 Normal Probability Calculations 2 2 Stereo System (SLR) 2 3 Match Histograms 3 4 Match Scatter Plots 4 5 Housing (SLR) 4 6 Shock Absorber (SLR) 5 7 Participation

More information

Basic Business Statistics, 10/e

Basic Business Statistics, 10/e Chapter 4 4- Basic Business Statistics th Edition Chapter 4 Introduction to Multiple Regression Basic Business Statistics, e 9 Prentice-Hall, Inc. Chap 4- Learning Objectives In this chapter, you learn:

More information

LECTURE 6. Introduction to Econometrics. Hypothesis testing & Goodness of fit

LECTURE 6. Introduction to Econometrics. Hypothesis testing & Goodness of fit LECTURE 6 Introduction to Econometrics Hypothesis testing & Goodness of fit October 25, 2016 1 / 23 ON TODAY S LECTURE We will explain how multiple hypotheses are tested in a regression model We will define

More information

Ch 2: Simple Linear Regression

Ch 2: Simple Linear Regression Ch 2: Simple Linear Regression 1. Simple Linear Regression Model A simple regression model with a single regressor x is y = β 0 + β 1 x + ɛ, where we assume that the error ɛ is independent random component

More information

Inference in Regression Model

Inference in Regression Model Inference in Regression Model Christopher Taber Department of Economics University of Wisconsin-Madison March 25, 2009 Outline 1 Final Step of Classical Linear Regression Model 2 Confidence Intervals 3

More information

Regression Analysis. BUS 735: Business Decision Making and Research. Learn how to detect relationships between ordinal and categorical variables.

Regression Analysis. BUS 735: Business Decision Making and Research. Learn how to detect relationships between ordinal and categorical variables. Regression Analysis BUS 735: Business Decision Making and Research 1 Goals of this section Specific goals Learn how to detect relationships between ordinal and categorical variables. Learn how to estimate

More information

Correlation Analysis

Correlation Analysis Simple Regression Correlation Analysis Correlation analysis is used to measure strength of the association (linear relationship) between two variables Correlation is only concerned with strength of the

More information

Week 3: Multiple Linear Regression

Week 3: Multiple Linear Regression BUS41100 Applied Regression Analysis Week 3: Multiple Linear Regression Polynomial regression, categorical variables, interactions & main effects, R 2 Max H. Farrell The University of Chicago Booth School

More information

Simple Regression Model. January 24, 2011

Simple Regression Model. January 24, 2011 Simple Regression Model January 24, 2011 Outline Descriptive Analysis Causal Estimation Forecasting Regression Model We are actually going to derive the linear regression model in 3 very different ways

More information

Ch 3: Multiple Linear Regression

Ch 3: Multiple Linear Regression Ch 3: Multiple Linear Regression 1. Multiple Linear Regression Model Multiple regression model has more than one regressor. For example, we have one response variable and two regressor variables: 1. delivery

More information

Estimating σ 2. We can do simple prediction of Y and estimation of the mean of Y at any value of X.

Estimating σ 2. We can do simple prediction of Y and estimation of the mean of Y at any value of X. Estimating σ 2 We can do simple prediction of Y and estimation of the mean of Y at any value of X. To perform inferences about our regression line, we must estimate σ 2, the variance of the error term.

More information

Regression Analysis IV... More MLR and Model Building

Regression Analysis IV... More MLR and Model Building Regression Analysis IV... More MLR and Model Building This session finishes up presenting the formal methods of inference based on the MLR model and then begins discussion of "model building" (use of regression

More information

Basic Business Statistics 6 th Edition

Basic Business Statistics 6 th Edition Basic Business Statistics 6 th Edition Chapter 12 Simple Linear Regression Learning Objectives In this chapter, you learn: How to use regression analysis to predict the value of a dependent variable based

More information

Chapter 3 Multiple Regression Complete Example

Chapter 3 Multiple Regression Complete Example Department of Quantitative Methods & Information Systems ECON 504 Chapter 3 Multiple Regression Complete Example Spring 2013 Dr. Mohammad Zainal Review Goals After completing this lecture, you should be

More information

Chapter 16. Simple Linear Regression and dcorrelation

Chapter 16. Simple Linear Regression and dcorrelation Chapter 16 Simple Linear Regression and dcorrelation 16.1 Regression Analysis Our problem objective is to analyze the relationship between interval variables; regression analysis is the first tool we will

More information

Regression Models. Chapter 4. Introduction. Introduction. Introduction

Regression Models. Chapter 4. Introduction. Introduction. Introduction Chapter 4 Regression Models Quantitative Analysis for Management, Tenth Edition, by Render, Stair, and Hanna 008 Prentice-Hall, Inc. Introduction Regression analysis is a very valuable tool for a manager

More information

Chapter 16. Simple Linear Regression and Correlation

Chapter 16. Simple Linear Regression and Correlation Chapter 16 Simple Linear Regression and Correlation 16.1 Regression Analysis Our problem objective is to analyze the relationship between interval variables; regression analysis is the first tool we will

More information

Chapter 7 Student Lecture Notes 7-1

Chapter 7 Student Lecture Notes 7-1 Chapter 7 Student Lecture Notes 7- Chapter Goals QM353: Business Statistics Chapter 7 Multiple Regression Analysis and Model Building After completing this chapter, you should be able to: Explain model

More information

Multiple Regression: Chapter 13. July 24, 2015

Multiple Regression: Chapter 13. July 24, 2015 Multiple Regression: Chapter 13 July 24, 2015 Multiple Regression (MR) Response Variable: Y - only one response variable (quantitative) Several Predictor Variables: X 1, X 2, X 3,..., X p (p = # predictors)

More information

Chapter 14 Student Lecture Notes 14-1

Chapter 14 Student Lecture Notes 14-1 Chapter 14 Student Lecture Notes 14-1 Business Statistics: A Decision-Making Approach 6 th Edition Chapter 14 Multiple Regression Analysis and Model Building Chap 14-1 Chapter Goals After completing this

More information

The Simple Linear Regression Model

The Simple Linear Regression Model The Simple Linear Regression Model Lesson 3 Ryan Safner 1 1 Department of Economics Hood College ECON 480 - Econometrics Fall 2017 Ryan Safner (Hood College) ECON 480 - Lesson 3 Fall 2017 1 / 77 Bivariate

More information

28. SIMPLE LINEAR REGRESSION III

28. SIMPLE LINEAR REGRESSION III 28. SIMPLE LINEAR REGRESSION III Fitted Values and Residuals To each observed x i, there corresponds a y-value on the fitted line, y = βˆ + βˆ x. The are called fitted values. ŷ i They are the values of

More information

SIMPLE REGRESSION ANALYSIS. Business Statistics

SIMPLE REGRESSION ANALYSIS. Business Statistics SIMPLE REGRESSION ANALYSIS Business Statistics CONTENTS Ordinary least squares (recap for some) Statistical formulation of the regression model Assessing the regression model Testing the regression coefficients

More information

Business Statistics. Chapter 14 Introduction to Linear Regression and Correlation Analysis QMIS 220. Dr. Mohammad Zainal

Business Statistics. Chapter 14 Introduction to Linear Regression and Correlation Analysis QMIS 220. Dr. Mohammad Zainal Department of Quantitative Methods & Information Systems Business Statistics Chapter 14 Introduction to Linear Regression and Correlation Analysis QMIS 220 Dr. Mohammad Zainal Chapter Goals After completing

More information

Chapter 4. Regression Models. Learning Objectives

Chapter 4. Regression Models. Learning Objectives Chapter 4 Regression Models To accompany Quantitative Analysis for Management, Eleventh Edition, by Render, Stair, and Hanna Power Point slides created by Brian Peterson Learning Objectives After completing

More information

Interactions. Interactions. Lectures 1 & 2. Linear Relationships. y = a + bx. Slope. Intercept

Interactions. Interactions. Lectures 1 & 2. Linear Relationships. y = a + bx. Slope. Intercept Interactions Lectures 1 & Regression Sometimes two variables appear related: > smoking and lung cancers > height and weight > years of education and income > engine size and gas mileage > GMAT scores and

More information

Midterm 2 - Solutions

Midterm 2 - Solutions Ecn 102 - Analysis of Economic Data University of California - Davis February 23, 2010 Instructor: John Parman Midterm 2 - Solutions You have until 10:20am to complete this exam. Please remember to put

More information

Trendlines Simple Linear Regression Multiple Linear Regression Systematic Model Building Practical Issues

Trendlines Simple Linear Regression Multiple Linear Regression Systematic Model Building Practical Issues Trendlines Simple Linear Regression Multiple Linear Regression Systematic Model Building Practical Issues Overfitting Categorical Variables Interaction Terms Non-linear Terms Linear Logarithmic y = a +

More information

MBA Statistics COURSE #4

MBA Statistics COURSE #4 MBA Statistics 51-651-00 COURSE #4 Simple and multiple linear regression What should be the sales of ice cream? Example: Before beginning building a movie theater, one must estimate the daily number of

More information

Applied Regression Analysis. Section 4: Diagnostics and Transformations

Applied Regression Analysis. Section 4: Diagnostics and Transformations Applied Regression Analysis Section 4: Diagnostics and Transformations 1 Regression Model Assumptions Y i = β 0 + β 1 X i + ɛ Recall the key assumptions of our linear regression model: (i) The mean of

More information

Chapter 14 Student Lecture Notes Department of Quantitative Methods & Information Systems. Business Statistics. Chapter 14 Multiple Regression

Chapter 14 Student Lecture Notes Department of Quantitative Methods & Information Systems. Business Statistics. Chapter 14 Multiple Regression Chapter 14 Student Lecture Notes 14-1 Department of Quantitative Methods & Information Systems Business Statistics Chapter 14 Multiple Regression QMIS 0 Dr. Mohammad Zainal Chapter Goals After completing

More information

Final Exam - Solutions

Final Exam - Solutions Ecn 102 - Analysis of Economic Data University of California - Davis March 19, 2010 Instructor: John Parman Final Exam - Solutions You have until 5:30pm to complete this exam. Please remember to put your

More information

The Multiple Regression Model

The Multiple Regression Model Multiple Regression The Multiple Regression Model Idea: Examine the linear relationship between 1 dependent (Y) & or more independent variables (X i ) Multiple Regression Model with k Independent Variables:

More information

Chapter 4: Regression Models

Chapter 4: Regression Models Sales volume of company 1 Textbook: pp. 129-164 Chapter 4: Regression Models Money spent on advertising 2 Learning Objectives After completing this chapter, students will be able to: Identify variables,

More information

Inference for Regression Simple Linear Regression

Inference for Regression Simple Linear Regression Inference for Regression Simple Linear Regression IPS Chapter 10.1 2009 W.H. Freeman and Company Objectives (IPS Chapter 10.1) Simple linear regression p Statistical model for linear regression p Estimating

More information

The simple linear regression model discussed in Chapter 13 was written as

The simple linear regression model discussed in Chapter 13 was written as 1519T_c14 03/27/2006 07:28 AM Page 614 Chapter Jose Luis Pelaez Inc/Blend Images/Getty Images, Inc./Getty Images, Inc. 14 Multiple Regression 14.1 Multiple Regression Analysis 14.2 Assumptions of the Multiple

More information

Inference for Regression Inference about the Regression Model and Using the Regression Line

Inference for Regression Inference about the Regression Model and Using the Regression Line Inference for Regression Inference about the Regression Model and Using the Regression Line PBS Chapter 10.1 and 10.2 2009 W.H. Freeman and Company Objectives (PBS Chapter 10.1 and 10.2) Inference about

More information

Statistics for Managers using Microsoft Excel 6 th Edition

Statistics for Managers using Microsoft Excel 6 th Edition Statistics for Managers using Microsoft Excel 6 th Edition Chapter 13 Simple Linear Regression 13-1 Learning Objectives In this chapter, you learn: How to use regression analysis to predict the value of

More information

Finding Relationships Among Variables

Finding Relationships Among Variables Finding Relationships Among Variables BUS 230: Business and Economic Research and Communication 1 Goals Specific goals: Re-familiarize ourselves with basic statistics ideas: sampling distributions, hypothesis

More information

Multiple Linear Regression

Multiple Linear Regression Multiple Linear Regression Simple linear regression tries to fit a simple line between two variables Y and X. If X is linearly related to Y this explains some of the variability in Y. In most cases, there

More information

Midterm 2 - Solutions

Midterm 2 - Solutions Ecn 102 - Analysis of Economic Data University of California - Davis February 24, 2010 Instructor: John Parman Midterm 2 - Solutions You have until 10:20am to complete this exam. Please remember to put

More information

Simple linear regression

Simple linear regression Simple linear regression Business Statistics 41000 Fall 2015 1 Topics 1. conditional distributions, squared error, means and variances 2. linear prediction 3. signal + noise and R 2 goodness of fit 4.

More information

Lecture 6 Multiple Linear Regression, cont.

Lecture 6 Multiple Linear Regression, cont. Lecture 6 Multiple Linear Regression, cont. BIOST 515 January 22, 2004 BIOST 515, Lecture 6 Testing general linear hypotheses Suppose we are interested in testing linear combinations of the regression

More information

Inference for Regression

Inference for Regression Inference for Regression Section 9.4 Cathy Poliak, Ph.D. cathy@math.uh.edu Office in Fleming 11c Department of Mathematics University of Houston Lecture 13b - 3339 Cathy Poliak, Ph.D. cathy@math.uh.edu

More information

STA121: Applied Regression Analysis

STA121: Applied Regression Analysis STA121: Applied Regression Analysis Linear Regression Analysis - Chapters 3 and 4 in Dielman Artin Department of Statistical Science September 15, 2009 Outline 1 Simple Linear Regression Analysis 2 Using

More information

Keller: Stats for Mgmt & Econ, 7th Ed July 17, 2006

Keller: Stats for Mgmt & Econ, 7th Ed July 17, 2006 Chapter 17 Simple Linear Regression and Correlation 17.1 Regression Analysis Our problem objective is to analyze the relationship between interval variables; regression analysis is the first tool we will

More information

Business Statistics 41000: Homework # 5

Business Statistics 41000: Homework # 5 Business Statistics 41000: Homework # 5 Drew Creal Due date: Beginning of class in week # 10 Remarks: These questions cover Lectures #7, 8, and 9. Question # 1. Condence intervals and plug-in predictive

More information

STAT 135 Lab 11 Tests for Categorical Data (Fisher s Exact test, χ 2 tests for Homogeneity and Independence) and Linear Regression

STAT 135 Lab 11 Tests for Categorical Data (Fisher s Exact test, χ 2 tests for Homogeneity and Independence) and Linear Regression STAT 135 Lab 11 Tests for Categorical Data (Fisher s Exact test, χ 2 tests for Homogeneity and Independence) and Linear Regression Rebecca Barter April 20, 2015 Fisher s Exact Test Fisher s Exact Test

More information

Categorical Predictor Variables

Categorical Predictor Variables Categorical Predictor Variables We often wish to use categorical (or qualitative) variables as covariates in a regression model. For binary variables (taking on only 2 values, e.g. sex), it is relatively

More information

ECON 450 Development Economics

ECON 450 Development Economics ECON 450 Development Economics Statistics Background University of Illinois at Urbana-Champaign Summer 2017 Outline 1 Introduction 2 3 4 5 Introduction Regression analysis is one of the most important

More information

Multiple Linear Regression for the Salary Data

Multiple Linear Regression for the Salary Data Multiple Linear Regression for the Salary Data 5 10 15 20 10000 15000 20000 25000 Experience Salary HS BS BS+ 5 10 15 20 10000 15000 20000 25000 Experience Salary No Yes Problem & Data Overview Primary

More information

Regression With a Categorical Independent Variable

Regression With a Categorical Independent Variable Regression With a Independent Variable Lecture 10 November 5, 2008 ERSH 8320 Lecture #10-11/5/2008 Slide 1 of 54 Today s Lecture Today s Lecture Chapter 11: Regression with a single categorical independent

More information

STA441: Spring Multiple Regression. This slide show is a free open source document. See the last slide for copyright information.

STA441: Spring Multiple Regression. This slide show is a free open source document. See the last slide for copyright information. STA441: Spring 2018 Multiple Regression This slide show is a free open source document. See the last slide for copyright information. 1 Least Squares Plane 2 Statistical MODEL There are p-1 explanatory

More information

Chapter Learning Objectives. Regression Analysis. Correlation. Simple Linear Regression. Chapter 12. Simple Linear Regression

Chapter Learning Objectives. Regression Analysis. Correlation. Simple Linear Regression. Chapter 12. Simple Linear Regression Chapter 12 12-1 North Seattle Community College BUS21 Business Statistics Chapter 12 Learning Objectives In this chapter, you learn:! How to use regression analysis to predict the value of a dependent

More information

Regression Analysis. BUS 735: Business Decision Making and Research

Regression Analysis. BUS 735: Business Decision Making and Research Regression Analysis BUS 735: Business Decision Making and Research 1 Goals and Agenda Goals of this section Specific goals Learn how to detect relationships between ordinal and categorical variables. Learn

More information

Correlation & Simple Regression

Correlation & Simple Regression Chapter 11 Correlation & Simple Regression The previous chapter dealt with inference for two categorical variables. In this chapter, we would like to examine the relationship between two quantitative variables.

More information

Data Analysis 1 LINEAR REGRESSION. Chapter 03

Data Analysis 1 LINEAR REGRESSION. Chapter 03 Data Analysis 1 LINEAR REGRESSION Chapter 03 Data Analysis 2 Outline The Linear Regression Model Least Squares Fit Measures of Fit Inference in Regression Other Considerations in Regression Model Qualitative

More information

Chapter 14. Simple Linear Regression Preliminary Remarks The Simple Linear Regression Model

Chapter 14. Simple Linear Regression Preliminary Remarks The Simple Linear Regression Model Chapter 14 Simple Linear Regression 14.1 Preliminary Remarks We have only three lectures to introduce the ideas of regression. Worse, they are the last three lectures when you have many other things academic

More information

6. Multiple Linear Regression

6. Multiple Linear Regression 6. Multiple Linear Regression SLR: 1 predictor X, MLR: more than 1 predictor Example data set: Y i = #points scored by UF football team in game i X i1 = #games won by opponent in their last 10 games X

More information

STAT Chapter 11: Regression

STAT Chapter 11: Regression STAT 515 -- Chapter 11: Regression Mostly we have studied the behavior of a single random variable. Often, however, we gather data on two random variables. We wish to determine: Is there a relationship

More information

Binary Logistic Regression

Binary Logistic Regression The coefficients of the multiple regression model are estimated using sample data with k independent variables Estimated (or predicted) value of Y Estimated intercept Estimated slope coefficients Ŷ = b

More information

MULTIPLE REGRESSION ANALYSIS AND OTHER ISSUES. Business Statistics

MULTIPLE REGRESSION ANALYSIS AND OTHER ISSUES. Business Statistics MULTIPLE REGRESSION ANALYSIS AND OTHER ISSUES Business Statistics CONTENTS Multiple regression Dummy regressors Assumptions of regression analysis Predicting with regression analysis Old exam question

More information

(ii) Scan your answer sheets INTO ONE FILE only, and submit it in the drop-box.

(ii) Scan your answer sheets INTO ONE FILE only, and submit it in the drop-box. FINAL EXAM ** Two different ways to submit your answer sheet (i) Use MS-Word and place it in a drop-box. (ii) Scan your answer sheets INTO ONE FILE only, and submit it in the drop-box. Deadline: December

More information

Concordia University (5+5)Q 1.

Concordia University (5+5)Q 1. (5+5)Q 1. Concordia University Department of Mathematics and Statistics Course Number Section Statistics 360/1 40 Examination Date Time Pages Mid Term Test May 26, 2004 Two Hours 3 Instructor Course Examiner

More information

Lectures on Simple Linear Regression Stat 431, Summer 2012

Lectures on Simple Linear Regression Stat 431, Summer 2012 Lectures on Simple Linear Regression Stat 43, Summer 0 Hyunseung Kang July 6-8, 0 Last Updated: July 8, 0 :59PM Introduction Previously, we have been investigating various properties of the population

More information

Simple and Multiple Linear Regression

Simple and Multiple Linear Regression Sta. 113 Chapter 12 and 13 of Devore March 12, 2010 Table of contents 1 Simple Linear Regression 2 Model Simple Linear Regression A simple linear regression model is given by Y = β 0 + β 1 x + ɛ where

More information

Simple Linear Regression

Simple Linear Regression Simple Linear Regression ST 430/514 Recall: A regression model describes how a dependent variable (or response) Y is affected, on average, by one or more independent variables (or factors, or covariates)

More information

Chapter 12 - Lecture 2 Inferences about regression coefficient

Chapter 12 - Lecture 2 Inferences about regression coefficient Chapter 12 - Lecture 2 Inferences about regression coefficient April 19th, 2010 Facts about slope Test Statistic Confidence interval Hypothesis testing Test using ANOVA Table Facts about slope In previous

More information

Section 4.6 Simple Linear Regression

Section 4.6 Simple Linear Regression Section 4.6 Simple Linear Regression Objectives ˆ Basic philosophy of SLR and the regression assumptions ˆ Point & interval estimation of the model parameters, and how to make predictions ˆ Point and interval

More information

STAT 3A03 Applied Regression With SAS Fall 2017

STAT 3A03 Applied Regression With SAS Fall 2017 STAT 3A03 Applied Regression With SAS Fall 2017 Assignment 2 Solution Set Q. 1 I will add subscripts relating to the question part to the parameters and their estimates as well as the errors and residuals.

More information

Figure 1: The fitted line using the shipment route-number of ampules data. STAT5044: Regression and ANOVA The Solution of Homework #2 Inyoung Kim

Figure 1: The fitted line using the shipment route-number of ampules data. STAT5044: Regression and ANOVA The Solution of Homework #2 Inyoung Kim 0.0 1.0 1.5 2.0 2.5 3.0 8 10 12 14 16 18 20 22 y x Figure 1: The fitted line using the shipment route-number of ampules data STAT5044: Regression and ANOVA The Solution of Homework #2 Inyoung Kim Problem#

More information

Statistics and Quantitative Analysis U4320

Statistics and Quantitative Analysis U4320 Statistics and Quantitative Analysis U3 Lecture 13: Explaining Variation Prof. Sharyn O Halloran Explaining Variation: Adjusted R (cont) Definition of Adjusted R So we'd like a measure like R, but one

More information

Statistical Inference with Regression Analysis

Statistical Inference with Regression Analysis Introductory Applied Econometrics EEP/IAS 118 Spring 2015 Steven Buck Lecture #13 Statistical Inference with Regression Analysis Next we turn to calculating confidence intervals and hypothesis testing

More information

Chapter 1. Linear Regression with One Predictor Variable

Chapter 1. Linear Regression with One Predictor Variable Chapter 1. Linear Regression with One Predictor Variable 1.1 Statistical Relation Between Two Variables To motivate statistical relationships, let us consider a mathematical relation between two mathematical

More information

Applied Regression Analysis

Applied Regression Analysis Applied Regression Analysis Chapter 3 Multiple Linear Regression Hongcheng Li April, 6, 2013 Recall simple linear regression 1 Recall simple linear regression 2 Parameter Estimation 3 Interpretations of

More information

Measuring the fit of the model - SSR

Measuring the fit of the model - SSR Measuring the fit of the model - SSR Once we ve determined our estimated regression line, we d like to know how well the model fits. How far/close are the observations to the fitted line? One way to do

More information

CHAPTER 4 & 5 Linear Regression with One Regressor. Kazu Matsuda IBEC PHBU 430 Econometrics

CHAPTER 4 & 5 Linear Regression with One Regressor. Kazu Matsuda IBEC PHBU 430 Econometrics CHAPTER 4 & 5 Linear Regression with One Regressor Kazu Matsuda IBEC PHBU 430 Econometrics Introduction Simple linear regression model = Linear model with one independent variable. y = dependent variable

More information

Mathematics for Economics MA course

Mathematics for Economics MA course Mathematics for Economics MA course Simple Linear Regression Dr. Seetha Bandara Simple Regression Simple linear regression is a statistical method that allows us to summarize and study relationships between

More information

1 A Non-technical Introduction to Regression

1 A Non-technical Introduction to Regression 1 A Non-technical Introduction to Regression Chapters 1 and Chapter 2 of the textbook are reviews of material you should know from your previous study (e.g. in your second year course). They cover, in

More information

22s:152 Applied Linear Regression. Take random samples from each of m populations.

22s:152 Applied Linear Regression. Take random samples from each of m populations. 22s:152 Applied Linear Regression Chapter 8: ANOVA NOTE: We will meet in the lab on Monday October 10. One-way ANOVA Focuses on testing for differences among group means. Take random samples from each

More information

Linear Regression. Chapter 3

Linear Regression. Chapter 3 Chapter 3 Linear Regression Once we ve acquired data with multiple variables, one very important question is how the variables are related. For example, we could ask for the relationship between people

More information

Regression, part II. I. What does it all mean? A) Notice that so far all we ve done is math.

Regression, part II. I. What does it all mean? A) Notice that so far all we ve done is math. Regression, part II I. What does it all mean? A) Notice that so far all we ve done is math. 1) One can calculate the Least Squares Regression Line for anything, regardless of any assumptions. 2) But, if

More information

1 Correlation and Inference from Regression

1 Correlation and Inference from Regression 1 Correlation and Inference from Regression Reading: Kennedy (1998) A Guide to Econometrics, Chapters 4 and 6 Maddala, G.S. (1992) Introduction to Econometrics p. 170-177 Moore and McCabe, chapter 12 is

More information

ECO220Y Simple Regression: Testing the Slope

ECO220Y Simple Regression: Testing the Slope ECO220Y Simple Regression: Testing the Slope Readings: Chapter 18 (Sections 18.3-18.5) Winter 2012 Lecture 19 (Winter 2012) Simple Regression Lecture 19 1 / 32 Simple Regression Model y i = β 0 + β 1 x

More information

Hypothesis testing Goodness of fit Multicollinearity Prediction. Applied Statistics. Lecturer: Serena Arima

Hypothesis testing Goodness of fit Multicollinearity Prediction. Applied Statistics. Lecturer: Serena Arima Applied Statistics Lecturer: Serena Arima Hypothesis testing for the linear model Under the Gauss-Markov assumptions and the normality of the error terms, we saw that β N(β, σ 2 (X X ) 1 ) and hence s

More information

Econ 3790: Business and Economics Statistics. Instructor: Yogesh Uppal

Econ 3790: Business and Economics Statistics. Instructor: Yogesh Uppal Econ 3790: Business and Economics Statistics Instructor: Yogesh Uppal yuppal@ysu.edu Sampling Distribution of b 1 Expected value of b 1 : Variance of b 1 : E(b 1 ) = 1 Var(b 1 ) = σ 2 /SS x Estimate of

More information

Inferences for Regression

Inferences for Regression Inferences for Regression An Example: Body Fat and Waist Size Looking at the relationship between % body fat and waist size (in inches). Here is a scatterplot of our data set: Remembering Regression In

More information

18.0 Multiple and Nonlinear Regression

18.0 Multiple and Nonlinear Regression 18.0 Multiple and Nonlinear Regression 1 Answer Questions Multiple Regression Nonlinear Regression 18.1 Multiple Regression Recall the regression assumptions: 1. Each point (X i,y i ) in the scatterplot

More information

ST430 Exam 2 Solutions

ST430 Exam 2 Solutions ST430 Exam 2 Solutions Date: November 9, 2015 Name: Guideline: You may use one-page (front and back of a standard A4 paper) of notes. No laptop or textbook are permitted but you may use a calculator. Giving

More information

Topic 10 - Linear Regression

Topic 10 - Linear Regression Topic 10 - Linear Regression Least squares principle Hypothesis tests/confidence intervals/prediction intervals for regression 1 Linear Regression How much should you pay for a house? Would you consider

More information

CS 5014: Research Methods in Computer Science

CS 5014: Research Methods in Computer Science Computer Science Clifford A. Shaffer Department of Computer Science Virginia Tech Blacksburg, Virginia Fall 2010 Copyright c 2010 by Clifford A. Shaffer Computer Science Fall 2010 1 / 207 Correlation and

More information

Formal Statement of Simple Linear Regression Model

Formal Statement of Simple Linear Regression Model Formal Statement of Simple Linear Regression Model Y i = β 0 + β 1 X i + ɛ i Y i value of the response variable in the i th trial β 0 and β 1 are parameters X i is a known constant, the value of the predictor

More information

Marketing Research Session 10 Hypothesis Testing with Simple Random samples (Chapter 12)

Marketing Research Session 10 Hypothesis Testing with Simple Random samples (Chapter 12) Marketing Research Session 10 Hypothesis Testing with Simple Random samples (Chapter 12) Remember: Z.05 = 1.645, Z.01 = 2.33 We will only cover one-sided hypothesis testing (cases 12.3, 12.4.2, 12.5.2,

More information

Econometrics I Lecture 3: The Simple Linear Regression Model

Econometrics I Lecture 3: The Simple Linear Regression Model Econometrics I Lecture 3: The Simple Linear Regression Model Mohammad Vesal Graduate School of Management and Economics Sharif University of Technology 44716 Fall 1397 1 / 32 Outline Introduction Estimating

More information

where Female = 0 for males, = 1 for females Age is measured in years (22, 23, ) GPA is measured in units on a four-point scale (0, 1.22, 3.45, etc.

where Female = 0 for males, = 1 for females Age is measured in years (22, 23, ) GPA is measured in units on a four-point scale (0, 1.22, 3.45, etc. Notes on regression analysis 1. Basics in regression analysis key concepts (actual implementation is more complicated) A. Collect data B. Plot data on graph, draw a line through the middle of the scatter

More information