Lecture Notes for STATISTICAL METHODS FOR BUSINESS II BMGT 1 Chapters 14, 15 & 16 Professor Ahmad, Ph.D. Department of Management Revsed August 005
Chapter 14 Formulas Smple Lnear Regresson Model: y = β0 + β1x + ε Smple Lnear Regresson Equaton: E(y) = β0 + β1x Least Squares Crteron: ( ) Mn y y$ $y = bo + b1x Estmated Smple Lnear Regresson Equaton where $y = the estmated value of the dependent varable b 0 = the y-ntercept and b 1 = the slope of the lne (x x)(y y) b 1 = and b o = y b1x (x x) Sum of Squares Due to Regresson: SSR (x Total Sum of Squares: [ (x x )(y y) ] = x ) SST = (y y) Also: SST = SSR + SSE Sum of Squares Due to Error: y y$ SSE = ( ) Coeffcent of Determnaton: SSR r SSE = Also r = 1 SST SST Sample Correlaton Coeffcent: r = (the sgn of b 1 ) Coeffcent of Determnaton = + where b 1 = the slope of the regresson equaton r Professor Ahmad s Lecture Notes Page
t Test for sgnfcance of ndvdual coeffcents n Lnear Regresson H o : β 1 = 0 H a : β1 0 b t - statstc: t = β 1 1 s b 1 where sb 1 (Estmated Standard Devaton of b1) s s sb 1 = and s = MSE Σ(x x) Reject Ho f t < t α or: t > t α (degrees of freedom = n p 1) F Test for Sgnfcance of the Lnear Regresson Model (ANOVA) : β 0 (The model s not sgnfcant) H 1 o = : β 0 (The model s sgnfcant) Ha 1 Source of Sum of Degrees of Mean Test Statstc Varaton Squares Freedom Square F Regresson SSR p MSR Error (Resdual) SSE n - p - 1 MSE MSR MSE Total SST n - 1 Where: p = Number of ndependent varables n = The sample sze Reject Ho f the Test statstc F > Crtcal Fα Confdence Interval Estmate for the Mean Value of y, that s E(y p ) y$ p t s$ p ± α y where Estmated Standard Devaton of ŷ s p s ŷ p 1 (x p x) = s + Remember: s= MSE n Σ(x x) Professor Ahmad s Lecture Notes Page 3
Chapter 14 Smple (Bvarate) Lnear Regresson and Correlaton Ahmad, Inc. s a mcrocomputer producer. The followng data represent Ahmad's yearly sales volume and ther advertsng expendture over a perod of 8 years. (Y) (X) Sales Advertsng Year (In $1,000,000) (In $10,000) 1996 15 3 1997 16 33 1998 18 35 1999 17 34 000 16 36 001 19 37 00 19 39 003 4 4 a. Develop a scatter dagram of sales versus advertsng. b. Use the method of least squares to compute an estmated regresson lne between sales and advertsng. c. If the company's advertsng expendture s $400,000, what s the predcted sales? Gve the answer n dollars. d. What does the slope of the estmated regresson lne ndcate? e. Compute the coeffcent of determnaton and fully nterpret ts meanng. f. Use the F test to determne whether or not the regresson model s sgnfcant. Let α = 0.05. g. Use the t test to determne whether the slope of the regresson model s sgnfcant. Let α = 0.05 h. Explan the basc assumptons about the error term n regresson.. Develop a 95% confdence nterval for predctng the average sales for the years when $400,000 was spent on advertsng. j. Use Excel and solve the above problems. k. Usng Excel determne the regresson equaton between sales an tme (where 1996 = 1). Professor Ahmad s Lecture Notes Page 4
Multple Regresson Model: y = β 0 + β 1 x 1 + β x +... β p x p + ε Multple Regresson Equaton: E(y) = β0 + β1x1 + βx +... βpxp Estmated Regresson Equaton: y$ = b0 + b1x1 + bx +... + bpxp Multple Coeffcent of Determnaton: Chapters 15 and 16 Formulas SSR R = Also SST R = 1 SSE SST Adjusted Multple Coeffcent of Determnaton: R a = 1 - (1 - R n 1 )( n p 1 ) F Statstc for Determnng When to Add or Delete x : SSE( x1) SSE( x1, x) F = 1 SSE( x1, x) n p 1 General F Test for Addng or Deletng Varables: F = SSE( x, x,..., x ) SSE( x, x +... + x, x +... + x ) 1 q 1 q q+ 1 p q SSE( x, x,..., x, x,..., x ) 1 q q+ 1 p n p 1 p H H t Test for sgnfcance of ndvdual coeffcents n Lnear Regresson o a : β = 0 : β 0 t statstc: Decson Rule: For =1,, 3, p b s t = where s b s the estmated Standard Devaton of b b Reject Ho f t < t α or: t > t α (degrees of freedom = n p 1) Usng the p-value approach: Reject Ho f p-value < α Professor Ahmad s Lecture Notes Page 5
F Test for Sgnfcance of the Lnear Regresson Model (ANOVA) H o : β = β =... β = 1 p 0 (.e., the regresson model s NOT sgnfcant) H a : At least one of the coeffcents s sgnfcantly dfferent from zero (the regresson model IS sgnfcant) ANOVA Source of Sum of Degrees of Mean Test Statstc Varaton Squares Freedom Square F Regresson SSR p MSR Error (Resdual) SSE n - p - 1 MSE MSR MSE Total SST n - 1 Where: p = Number of ndependent varables n = The sample sze Decson Rule: Reject Ho f the Test statstc F > Crtcal Fα Usng the p-value approach: Reject Ho f p-value < α Professor Ahmad s Lecture Notes Page 6
Chapter 15 Problem 1 Introducton to Multple Regresson and Correlaton Ahmad, Inc. s a mcrocomputer producer. The followng data represent Ahmad's yearly sales volume, ther advertsng expendture, and the number of ndvduals n the sales force over a perod of 15 years: (Y) X1 X X3 Sales Advertsng Sales Force Tme Year ($1,000,000) ($10,000) (100) 1989 15 3 10 1 1990 16 33 1 1991 18 35 11 3 199 17 34 14 4 1993 16 36 16 5 1994 19 37 18 6 1995 19 39 17 7 1996 4 4 0 8 1997 5 44 5 9 1998 7 40 10 1999 30 45 7 11 000 33 50 8 1 001 38 49 30 13 00 40 50 30 14 003 45 55 35 15 a. Usng Excel, enter the above data n a fle and save the fle. Prnt the fle as well as the results of all of the followng parts. b. Run the correlaton analyss relatng sales (Y) and all of the ndependent varables. (Do not nclude the column of Year.) Explan the results. Dscuss the concept of multcollnearty. c. Run the Regresson analyses relatng sales (Y) and advertsng (X1). Explan the results. d. Run a regresson analyss relatng sales (Y) and two ndependent varables X1 and X. Explan the results. e. Use an F test (α = 0.05) to determne f varable X contrbutes sgnfcantly to the model. (Topc from Chapter Sxteen secton 16.) f. Run a regresson analyss relatng sales (Y) and two ndependent varables X1 and X3. Explan the results. g. Usng the model developed n part "f", predct sales for 004 assumng we are plannng to advertse $700,000. h. Run a regresson analyss relatng sales (Y) and Tme (X3). Explan the results.. Usng the model developed n part "h" predct sales for 008. j. Run a regresson analyss relatng sales (Y) and three ndependent varables X1, X, and X3. Explan the results. Professor Ahmad s Lecture Notes Page 7
Problem Interpretaton of Coeffcents and Other Issues n Multple Regresson A multple regresson model relatng the prce of Rawlston, Inc. stock (Y), the number of shares of the company's stocks sold (X 1 n 100s), and the volume of exchange on the New York Stock Exchange (X n mllons) was developed and part of the results are shown below. ANOVA df SS MS F Sgnfcance F Regresson 118.8474 59.437 40.916 0.0000 Resdual 9 13.069 1.451 Total 11 131.9167 Coeffcents Standard Error t Stat P-value Intercept 118.5059 33.5753 3.596 0.0064 X 1-0.0163 0.0315-0.5171 0.6176 X -1.576 0.3590-4.3807 0.0018 a. Use the output shown above and wrte an equaton that can be used to predct the prce of the stock. b. Interpret the coeffcents of the estmated regresson equaton. c. At 95% confdence, determne whch varables are sgnfcant and whch are not. d. At 95% confdence, test to determne f the regresson model represents a sgnfcant relatonshp between the ndependent varables and the dependent varable. e. If n a gven day, the number of shares of stock that were sold was 94,500 and the volume of exchange on the New York Stock Exchange was 16 mllon, what would you expect the prce of the stock to be? Professor Ahmad s Lecture Notes Page 8
Problem 3 Multple Regresson and Qualtatve Independent Varables The followng data s part of a sample taken from the mortalty tables of a lfe nsurance company. Data provde nformaton on how lfe expectancy (dependent varable Y) relates to two ndependent varables: weght (X 1 n pounds) and whether or not the ndvdual s a smoker (X ), where: x 0 = 1 f the ndvdual s a nonsmoker f the ndvdual s a smoker Age Weght Smoker (Y) (X 1 ) (X ) 59 53 1 93 180 0 70 01 1 60 68 1 70 15 0... etc. etc. etc. The results of regresson analyss, relatng Y to X 1 and X s shown below. Regresson Statstcs Multple R 0.5983 R Square 0.3580 Adjusted R Square 0.3373 Standard Error 8.5599 Observatons 65 ANOVA df SS MS F Sgnfcance F Regresson 533.19 166.60 17.9 0.0000 Resdual 6 454.87 73.7 Total 64 7076.06 Coeffcents Standard Error t Stat P-value Intercept 9.8770 4.0964.679 0.0000 Weght -0.063 0.047 -.508 0.0143 Smoker -6.675.9096 -.1541 0.0351 Professor Ahmad s Lecture Notes Page 9
a. Use the output shown above and wrte the regresson equaton. b. Interpret the coeffcents of the estmated regresson equaton. c. At 95% confdence, determne whch varables are sgnfcant and whch are not. d. At 95% confdence, test to determne f the regresson model represents a sgnfcant relatonshp between the ndependent varables and the dependent varable. e. Predct the lfe expectancy of a nonsmoker who weghs 150 pounds. f. Predct the lfe expectancy of a person who smokes 1 pack of cgarettes per day and weghs 150 pounds. g. Predct the lfe expectancy of a person who smokes 3 packs of cgarettes per day and weghs 150 pounds. Professor Ahmad s Lecture Notes Page 10
Chapter 16 Problem 1 Curvlnear Regresson Monthly total producton costs and the number of unts produced at a local company over a perod of 10 months are shown below. Producton Costs (Y ) Unts Produced (X ) Month (n $ mllons) (n mllons) Z = X 1 1 4 1 3 9 3 1 4 16 4 5 5 5 6 36 6 4 7 49 7 5 8 64 8 7 9 81 9 9 10 100 10 1 10 100 a. Usng Excel, enter the above data n a fle and save the fle. b. Draw a scatter dagram relatng X & Y. c. Perform a regresson and correlaton analyss relatng X & Y. d. Draw a scatter dagram relatng X & Y). e. If we can assume that a model n the form of: Y = β 0 + β 1 X + ε best descrbes the relatonshp between X and Y, Perform a regresson and correlaton analyss between X & Y. f. Compare the results of parts c and d and explan whch would be a better model and why? Professor Ahmad s Lecture Notes Page 11
Chapter 16 Problem Multple Regresson & Correlaton Wth Dummy Varables Fll n the Blanks Ahmad, Inc. s a mcrocomputer producer. The followng data represent Ahmad's yearly sales volume, ther advertsng expendture, and whether n a gven year they used all Televson advertsng (X = 0) or used Multmeda advertsng (X = 1). (Y) X1 X Sales Advertsng Dummy Varable Year ($1,000,000) ($10,000) (0,1) 1989 15 3 0 1990 16 33 1 1991 18 35 1 199 17 34 1 1993 16 36 0 1994 19 37 1 1995 19 39 0 1996 4 4 0 1997 5 44 1 1998 7 40 0 1999 30 45 1 000 33 50 1 001 38 49 0 00 40 50 0 003 45 55 1 Regresson procedure of Excel was used on the above data and parts of the results are shown on the next page. a. Fll n all the blanks on the next page. b. Wrte the estmated regresson equaton. c. Usng the results shown on the next page, predct sales for the year 004 assumng we are plannng to use $700,000 for televson advertsng only. d. Usng the results shown on the next page, predct sales for the year 004 assumng we are plannng to use $700,000 for multmeda advertsng. Professor Ahmad s Lecture Notes Page 1
SUMMARY OUTPUT Multple R? R Square? Adjusted R Square? Standard Error.715 Observatons? ANOVA df SS MS F Sgnfcance F Regresson? 143.74?? 8.59E-08 Resdual??? Total?? Coeffcents Standard Error t Stat P-value Intercept -8.46401 4.8559715?? Advertsng 1.31337 0.10113336?? Dummy -0.896375 1.40609116?? Professor Ahmad s Lecture Notes Page 13
Your Turn One Fnal Example Sgnfcance of Varables and Other Issues 3. Ahmad, Inc. produces several models of computer prnters. Data on a few varables for one of the company s prnters are presented below. Sales (Y) (In $1,000,000) Compettor's Prce (X3) (In $100) Advertsng (X1) (In $1,000) Prce (X) (In $100) Tme (X4) (In Years) 1578 588 1 0 1 4 1741 600 0 95 600 17 19 3 4 134 780 1 1 4 8 035 750 1 1 5 6 408 80 19 1 6 8 337 810 0 0 7 8 468 840 5 8 6 533 700 5 4 9 8 800 970 16 18 10 8 79 90 15 1 11 6 799 950 4 3 1 6 364 980 17 3 13 6 3367 1167 19 17 14 4 389 800 1 18 15 6 3453 155 17 16 16 6 5031 1706 17 5 17 8 615 1890 1 6 18 8 6519 1996 17 8 19 8 4586 1700 15 18 0 10 4876 1706 1 4 1 4 4675 1888 14 3 6 3473 1300 19 4 3 10 3669 1500 18 1 4 8 4167 1400 4 3 5 4 Ratng (X5) (0 to 10) a. Enter the above data nto an Excel fle and save the fle. Prnt the fle and the results of all of the followng parts. b. Run a correlaton analyss (among all varables) and prnt the results. Fully dscuss the meanng of the correlaton coeffcents. Be sure to dscuss the concept of multcollnearty. c. Run a regresson analyss relatng sales (Y) and ALL the ndependent varables. Fully explan the results. d. Drop the varable(s) that at 95% confdence were not sgnfcant n part c and run a new regresson analyss. Fully explan your results. Professor Ahmad s Lecture Notes Page 14