UNIVERSITY OF TORONTO. Faculty of Arts and Science JUNE EXAMINATIONS STA 302 H1F / STA 1001 H1F Duration - 3 hours Aids Allowed: Calculator

Similar documents
Statistics for Economics & Business

Statistics MINITAB - Lab 2

[The following data appear in Wooldridge Q2.3.] The table below contains the ACT score and college GPA for eight college students.

Basic Business Statistics, 10/e

Department of Quantitative Methods & Information Systems. Time Series and Their Components QMIS 320. Chapter 6

Chapter 11: Simple Linear Regression and Correlation

Learning Objectives for Chapter 11

17 - LINEAR REGRESSION II

Lecture Notes for STATISTICAL METHODS FOR BUSINESS II BMGT 212. Chapters 14, 15 & 16. Professor Ahmadi, Ph.D. Department of Management

Chapter 14 Simple Linear Regression

NANYANG TECHNOLOGICAL UNIVERSITY SEMESTER I EXAMINATION MTH352/MH3510 Regression Analysis

β0 + β1xi. You are interested in estimating the unknown parameters β

x i1 =1 for all i (the constant ).

Comparison of Regression Lines

Statistics for Business and Economics

Statistics for Managers Using Microsoft Excel/SPSS Chapter 13 The Simple Linear Regression Model and Correlation

Y = β 0 + β 1 X 1 + β 2 X β k X k + ε

Chapter 15 - Multiple Regression

Chap 10: Diagnostics, p384

STAT 3340 Assignment 1 solutions. 1. Find the equation of the line which passes through the points (1,1) and (4,5).

Economics 130. Lecture 4 Simple Linear Regression Continued

Introduction to Regression

1. Inference on Regression Parameters a. Finding Mean, s.d and covariance amongst estimates. 2. Confidence Intervals and Working Hotelling Bands

Lecture 6: Introduction to Linear Regression

Chapter 13: Multiple Regression

Negative Binomial Regression

28. SIMPLE LINEAR REGRESSION III

Topic 7: Analysis of Variance

UNIVERSITY OF TORONTO Faculty of Arts and Science. December 2005 Examinations STA437H1F/STA1005HF. Duration - 3 hours

18. SIMPLE LINEAR REGRESSION III

Professor Chris Murray. Midterm Exam

Chapter 9: Statistical Inference and the Relationship between Two Variables

Statistics II Final Exam 26/6/18

β0 + β1xi. You are interested in estimating the unknown parameters β

The Ordinary Least Squares (OLS) Estimator

STATISTICS QUESTIONS. Step by Step Solutions.

Biostatistics. Chapter 11 Simple Linear Correlation and Regression. Jing Li

Lecture 9: Linear regression: centering, hypothesis testing, multiple covariates, and confounding

Lecture 9: Linear regression: centering, hypothesis testing, multiple covariates, and confounding

Resource Allocation and Decision Analysis (ECON 8010) Spring 2014 Foundations of Regression Analysis

e i is a random error

The Multiple Classical Linear Regression Model (CLRM): Specification and Assumptions. 1. Introduction

Chapter 2 - The Simple Linear Regression Model S =0. e i is a random error. S β2 β. This is a minimization problem. Solution is a calculus exercise.

STAT 3008 Applied Regression Analysis

Outline. Zero Conditional mean. I. Motivation. 3. Multiple Regression Analysis: Estimation. Read Wooldridge (2013), Chapter 3.

Linear Regression Analysis: Terminology and Notation

Statistics for Managers Using Microsoft Excel/SPSS Chapter 14 Multiple Regression Models

Chapter 7 Generalized and Weighted Least Squares Estimation. In this method, the deviation between the observed and expected values of

F statistic = s2 1 s 2 ( F for Fisher )

Module Contact: Dr Susan Long, ECO Copyright of the University of East Anglia Version 1

β0 + β1xi and want to estimate the unknown

The SAS program I used to obtain the analyses for my answers is given below.

First Year Examination Department of Statistics, University of Florida

Dr. Shalabh Department of Mathematics and Statistics Indian Institute of Technology Kanpur

Polynomial Regression Models

Interval Estimation in the Classical Normal Linear Regression Model. 1. Introduction

ISQS 6348 Final Open notes, no books. Points out of 100 in parentheses. Y 1 ε 2

Regression. The Simple Linear Regression Model

a. (All your answers should be in the letter!

Correlation and Regression

Econ107 Applied Econometrics Topic 3: Classical Model (Studenmund, Chapter 4)

Department of Statistics University of Toronto STA305H1S / 1004 HS Design and Analysis of Experiments Term Test - Winter Solution

Question 1 carries a weight of 25%; question 2 carries 20%; question 3 carries 25%; and question 4 carries 30%.

January Examinations 2015

Linear regression. Regression Models. Chapter 11 Student Lecture Notes Regression Analysis is the

DO NOT OPEN THE QUESTION PAPER UNTIL INSTRUCTED TO DO SO BY THE CHIEF INVIGILATOR. Introductory Econometrics 1 hour 30 minutes

Biostatistics 360 F&t Tests and Intervals in Regression 1

Dr. Shalabh Department of Mathematics and Statistics Indian Institute of Technology Kanpur

Chapter 8 Indicator Variables

STA302/1001-Fall 2008 Midterm Test October 21, 2008

where I = (n x n) diagonal identity matrix with diagonal elements = 1 and off-diagonal elements = 0; and σ 2 e = variance of (Y X).

Properties of Least Squares

ECONOMICS 351*-A Mid-Term Exam -- Fall Term 2000 Page 1 of 13 pages. QUEEN'S UNIVERSITY AT KINGSTON Department of Economics

Chapter 14 Simple Linear Regression Page 1. Introduction to regression analysis 14-2

Chapter 15 Student Lecture Notes 15-1

/ n ) are compared. The logic is: if the two

( )( ) [ ] [ ] ( ) 1 = [ ] = ( ) 1. H = X X X X is called the hat matrix ( it puts the hats on the Y s) and is of order n n H = X X X X.

Activity #13: Simple Linear Regression. actgpa.sav; beer.sav;

Lecture 3 Stat102, Spring 2007

Here is the rationale: If X and y have a strong positive relationship to one another, then ( x x) will tend to be positive when ( y y)

Cathy Walker March 5, 2010

STAT 511 FINAL EXAM NAME Spring 2001

LINEAR REGRESSION ANALYSIS. MODULE IX Lecture Multicollinearity

Durban Watson for Testing the Lack-of-Fit of Polynomial Regression Models without Replications

Introduction to Dummy Variable Regressors. 1. An Example of Dummy Variable Regressors

Correlation and Regression. Correlation 9.1. Correlation. Chapter 9

Scatter Plot x

Predictive Analytics : QM901.1x Prof U Dinesh Kumar, IIMB. All Rights Reserved, Indian Institute of Management Bangalore

x = , so that calculated

4 Analysis of Variance (ANOVA) 5 ANOVA. 5.1 Introduction. 5.2 Fixed Effects ANOVA

ECONOMETRICS - FINAL EXAM, 3rd YEAR (GECO & GADE)

PubH 7405: REGRESSION ANALYSIS. SLR: INFERENCES, Part II

2016 Wiley. Study Session 2: Ethical and Professional Standards Application

7.1. Single classification analysis of variance (ANOVA) Why not use multiple 2-sample 2. When to use ANOVA

University of California at Berkeley Fall Introductory Applied Econometrics Final examination

Lecture 4 Hypothesis Testing

THE ROYAL STATISTICAL SOCIETY 2006 EXAMINATIONS SOLUTIONS HIGHER CERTIFICATE

Number of cases Number of factors Number of covariates Number of levels of factor i. Value of the dependent variable for case k

Continuous vs. Discrete Goods

Chapter 12 Analysis of Covariance

Transcription:

UNIVERSITY OF TORONTO Faculty of Arts and Scence JUNE EXAMINATIONS 008 STA 30 HF / STA 00 HF Duraton - 3 hours Ads Allowed: Calculator LAST NAME: FIRST NAME: STUDENT NUMBER: Enrolled n (Crcle one): STA30 STA00 There are 6 pages ncludng ths page. The last page s a table of formulae that may be useful. For all questons you can assume that the results on the formula page are known. Tables of the t and F dstrbutons are attached. Total marks: 90 PLEASE CHECK AND MAKE SURE THAT THERE ARE NO MISSING PAGES IN THIS BOOKLET.

) We want to ft the normal error regresson model Y = β0 + βx + ε wth fve observatons wth X =,, 3, 4, and 5. We know that ε s are ndependent and normally dstrbuted wth mean 0 and standard devaton σ =. a) [3] Calculate Pb ( β b + ) where b s the least square estmator of β. Sol S XX σ 4 = ( X X ) =4++0++4 = 0 and Var( b ) = = = 0.4 ( X X) 0 so b ~ N( β, 0. σ = 0. = 0.4) ( when sgma s known). Or use. Calculate these usng the formulas for var and cov for b0 and b. Ths part only need the formula for var(b) but the full xpx nv matrx s useful to answer part b. b β b β Pb ( β b+ ) = P( ) = P(.58.58) 0.4 0.4 0.4 0.63 = P(.58 Z.58) = - 0.0570534 * = 0.885893 and b) [5] Calculate P( e ) where e = Y Yˆ s the resdual for the frst observaton (.e. at X =). Sol -. -0.3 ( XX ) = -0.3 0. e = Y Yˆ ~ N( 0, ( dagonal element of the hat matrx. X = 3. 4 5 h ) σ ) = N( 0, 0.4 *4 =.6) and so where h s the frst 0.6 0.4 0. 0.0-0. 0.4 0.3 0. 0. 0.0 - XX ( X) X = 0. 0. 0. 0. 0. where 0.0 0. 0. 0.3 0.4-0. 0.0 0. 0.4 0.6

Note: You only need to calculate the frst dagonal element of H. I got the full H because I just used my computer to get t. P( e ) = P( Z ) = P( 0.79 Z 0.79).6.6 = - *0.4764 = 0.57047 ) Consder the lnear regresson model n matrx form that we dscussed n class: Y= Xβ+ ε where X s an n p matrx wth the frst column contanng all s and has - rank p (and so (X X) exsts), and ε s a vector of uncorrelated errors wth covarance matrx σ I. Let Y ˆ = Xb where b= ( XX ) X Y s the vector of least squares regresson estmates. You may any result we proved n class (other than of course the result the queston wants you to prove). a) [] Show that Cov( Yˆ ) = σ Hwhere - H=X( XX) X b) [5] Let H = ( ). Show that 0 for =,, n. h j h Sol To prove To prove h, note that Var( e ) = ( h ) σ 0 h h 0, consder α H = a where a_ s an nx vector wth all components 0 except the th element whch s. αα 0 (ths s the sum of squares of elements of α ) and αα = h 3) Systolc blood pressure readngs of ndvduals are thought to be related to weght and age. The followng SAS outputs were obtaned from a regresson analyss of systolc blood pressure on weght (n pounds). The REG Procedure Model: MODEL Dependent Varable: Systolc 3

Analyss of Varance Sum of Mean Source DF Squares Square F Value Pr > F Model 6.754 6.754 8. 0.047 Error 40.74578 0.065 Corrected Total 3 403.50000 Parameter Estmates Parameter Standard Varable DF Estmate Error t Value Pr > t Intercept 8.93469 omtted omtted omtted Weght 0.373 omtted omtted omtted (a) State whether the followng statements are true or false. Crcle your answer. [ pont for each part] ) More than 60% of the varaton n systolc blood pressure has been accounted for by the lnear relatonshp wth weght. (True / False) Ans F. R-sq = 6.754/403.50000 = 0.403356834, 40.3% ) The margn of error of the 90% confdence nterval for β (the populaton regresson coeffcent of weght) s greater than 0.5. (True / False) Ans F. 90 % CI does not nclude 0 (because the p-value for testng beta = 0 s 0.047 (from the ANOVA table) < 0.0 (the alpha for 90%confdence). The centre of the CI s the estmated beta = 0.3 and so the dstance between the centre and the lower end pont of the nterval (I e the margn of error) s, less than 0.3 0 =0.3. You can also calculate t usng t = sqrt(8.) =.84780673 and so SE = 0.3/.84780673 = * 0.046354696 and ME = tse=.78*0.046354696 =0.085983804 (b) [3] The least squares regresson equaton of systolc blood pressure on weght and age calculated from the same group of ndvduals was: Systolc = 5 + 0.9 Weght + 0.04 Age wth R-Sq = 40.9%. Test the null hypothess H : 0 β age = 0 aganst the alternatve H a : βage 0, where β age s the populaton regresson coeffcent of Age. Use α = 0.05. Show your workngs clearly. F table p 667 Sol use R-sq to calculate ssr(x x)=r-sq*sst. SSR (x) s gven and use F(drop) (partal F test) SSE (F) can also be found from R-sq= SSE/SST snce SST s gven F =.9/.69 = 0.0096888 T = 0.37754983 Here s the mntab output for nfo (for comparng the above answer): 4

Regresson Analyss: Systolc versus Weght, Age The regresson equaton s Systolc = 5 + 0.9 Weght + 0.04 Age Predctor Coef SE Coef T P Constant 5.08 5.35 8.5 0.000 Weght 0.907 0.0643.9 0.083 Age 0.037 0.36 0.3 0.756 S = 4.65688 R-Sq = 40.9% R-Sq(adj) = 30.% Analyss of Varance Source DF SS MS F P Regresson 64.95 8.47 3.80 0.056 Resdual Error 38.55.69 Total 3 403.50 Source DF Seq SS Weght 6.75 Age.9 4) [8] In a smple lnear regresson analyss of the relatonshp between fuel effcency (Y, n gallons per 00 mles) and the weght (X, n 000s of pounds) of cars, the researchers collected data on n = 38 cars. Some summary statstcs of the data and the scatterplot of Y versus X are gven below: X =.863, S = 0.707 X Y = 4.33, S Y =.56 The MSE for the smple lnear regresson of Y on X s 0.95. 7 Scatterplot of Y vs X 6 5 Y 4 3.0.5 3.0 X 3.5 4.0 4.5 Calculate the least square estmates of β 0 and β, and ther standard errors (.e. and sb ) for the smple lnear regresson model Y = β0 + βx + ε, satsfyng the usual assumptons. Show your workngs clearly. s b0 5

Sol B = ˆ sy β = r, R-sq =SSR/SST, s X SST = ( n ) s Y = (38 ).56 = 49.44443 SSE = (n-)*mse = (38-)* 0.95 = 7.0 SSR = SST SSE = 49.44443-7.063780 = 4.4794 R= + sqrt(r-sq) = sqrt (4.4794/49.44443) = = 0.9666076 ( postve because the slope of the scatterplot s postve) b = (0.9666076*.56) / 0.707 =.5457307 b0 = ˆ β y ˆ x 0 = β = 4.33-.5457307*.863 = -0.00506304994 s = MSE 0.4467 ˆ β ( n ) S = X 37 0.707 = 0.4467/sqrt(37*(0.707^)) = 0.0709309 x.863 s = s + = 0.4467 + β ˆ 0 n ( n ) SX 38 37 0.707 Calculator work:.863^)/(37*0.707^) = 0.4430447 ANS+(/38) = 0.469583 ANS^0.5 = 0.68540039 ANS*0.4467 = 0.30639543 = 0.30639543 Here s the MINITAB complete output (for checkng your answers) Regresson Analyss: GPM versus WT The regresson equaton s GPM = - 0.006 +.5 WT Predctor Coef SE Coef T P Constant -0.0060 0.307-0.0 0.984 WT.500 0.07 4.75 0.000 S = 0.4467 R-Sq = 85.8% R-Sq(adj) = 85.4% Analyss of Varance Source DF SS MS F P Regresson 4.4 4.4 7.47 0.000 Resdual Error 36 7.03 0.95 Total 37 49.445 6

5)[5] After fttng the normal error regresson model Y = β0 + βx + ε satsfyng usual assumptons t on n = 6 observatons, the ftted values were calculated (.e. yˆ s) and are gven wth the data on X and Y n the followng table: x y y ˆ 0.8574.749 3.5743 4 3 3.4857 5 5 4.857 6 5 5.486 You may also use these summary statstcs f you need X = 3.5, S =.87, Y = 3, S =.673. Y Test the null hypothess H0 : β = 0 aganst the alternatve H: β 0. Use a t-test wth α = 0.05. Show your workngs clearly. X Sol x y FITS y-y^hat (y-y^hat)^ 0.8574 0.4857 0.00408.749 0.8574 0.08633 3.5743-0.5749 0.3653 4 3 3.4857-0.4857 0.83673 5 5 4.857 0.7486 0.5004 6 5 5.486-0.4857 0.00408 SSE = total of the (y-y^hat)^ column =.49 Descrptve Statstcs: (y-y^hat)^ Varable Sum (y-y^hat)^.49 SST = (n )*var(y) = (6-)*.673^ = 3.994645 SSR = SST SSE = 3.994645 -.49=.85745 = SSR as well F = MSR/MSE =.85745/(.49/(6-)) = 44.9794077 T = sqrt(44.98) = 6.70673055 7

Here s the MINITAB output table (for comparson): Regresson Analyss: y versus x The regresson equaton s y = - 0.000 + 0.857 x Predctor Coef SE Coef T P Constant -0.0000 0.4976-0.00.000 x 0.857 0.78 6.7 0.003 S = 0.5345 R-Sq = 9.8% R-Sq(adj) = 89.8% Analyss of Varance Source DF SS MS F P Regresson.857.857 45.00 0.003 Resdual Error 4.43 0.86 Total 5 4.000 6) After fttng the normal error regresson model Y = β + β X + β X + ε satsfyng 0,, usual assumptons to a set of n = 5 observatons, we obtaned the least square estmates b 0 = 0, b =, b = 5 and s =. It s also known that 0.5 0.5 XX = 0.5 0.5 ( ) 0.5 0.5 0.5 a) [3] Test the null hypothess H0 : β = 0 aganst the alternatve H: β 0. Use α = 0.05. Show your workngs clearly. Sol: b = sb ( ) = 0.5* and so calculate t and use t-table. b) [4] Test the null hypothess H0 : β = β aganst the alternatve H: β β. Use α = 0.05. Show your workngs clearly. Sol: 0 : β β 0 : β β H = H = 0 b b = 5 = 3 and s { b b} = s { b} + s { b} cov( b, b) = 0.5* + * *( 0.5)* and calculate the t-statstc and use t-table. 8

c)[3] Calculate a 95% confdence nterval for β0 β+. 7) The SAS output below was obtaned from a study of the relatonshp between the heght (feet) and the dameter (nches) of sugar maple trees (Johnson, R. A. and Bhattacharyya, G. K, 006). In the output below, y = heght n feet, x = dameter n nches of the sugar maple trees and x _ sq = x x. The SAS System The REG Procedure Model: MODEL Dependent Varable: y Analyss of Varance Sum of Mean Source DF Squares Square F Value Pr > F Model 54.0734 607.00867 33.78 <.000 Error 9 694.4983 77.6576 Corrected Total 5908.5097 Parameter Estmates Parameter Standard Varable DF Estmate Error t Value Pr > t Intercept 7.56905 5.3848 3.6 0.0098 x 6.658.689 5.4 0.0005 x_sq -0.5846 0.04874-3.5 0.000 We convert the unts of dameter (x) from nches to cm (assume that nch =.54 cm) and the unts of heght (y) from feet to meters (assume that foot = 0.3 m). Let y denote the heght n meters and x denote the dameter n cm. Most of the useful nformaton for the regresson model of y on x and x _ sq, where x _ sq = x x, can be calculated from the nformaton on the SAS output gven above (for the regresson of y on x, and x_sq) ) [3] Gve the estmated least square regresson equaton for y on x and x _ sq. Ans: y = ˆ β0 + ˆ β ˆ x + βx _ sq where ˆ β 0 = 7.6*0.3 = 5.8 βˆ = (6.65/.54)*0.3 = 0.7854330709 ˆ β =( - 0.58/(.54^))*3 = 0.00734704694 9

) [] Calculate the SSE for the transformed model. (.e. the model for y on x and x _ sq.) Ans 694.5 *(0.3^) = 6.505 Here s the MINITAB output for the transformed data (for comparng your answers) : Regresson Analyss: y versus x, x_sq The regresson equaton s y = 7.6 + 6.65 x - 0.58 x_sq Predctor Coef SE Coef T P Constant 7.569 5.38 3.6 0.00 x 6.65.68 5.4 0.00 x_sq -0.5846 0.04874-3.5 0.00 S = 8.78440 R-Sq = 88.% R-Sq(adj) = 85.6% Analyss of Varance Source DF SS MS F P Regresson 54.0 607.0 33.78 0.000 Resdual Error 9 694.5 77. Total 5908.5 Regresson Analyss: ty versus tx, tx_sq The regresson equaton s ty = 5.7 + 0.786 tx - 0.00737 tx_sq Predctor Coef SE Coef T P Constant 5.7.64 3.6 0.00 tx 0.7856 0.498 5.4 0.00 tx_sq -0.007368 0.0066-3.5 0.00 S =.6353 R-Sq = 88.% R-Sq(adj) = 85.6% Analyss of Varance Source DF SS MS F P Regresson 469.6 34.63 33.78 0.000 Resdual Error 9 6.50 6.94 Total 53.77 8) A commercal real estate company evaluates vacancy rates, square footage, rental rates and operatng expenses for commercal propertes n a large metropoltan area. The SAS output below was obtaned from a regresson analyss of the rental rates (Y) on four explanatory varables X = age, X = operatng expenses and taxes, X3 = vacancy rates and X4 = total square footage. 0

The SAS System The REG Procedure Model: MODEL Model Crossproducts X'X X'Y Y'Y Varable Intercept x x Intercept 8 637 784.74 x 637 859 6704.3 x 784.74 6704.3 836.498 x3 6.56 33.55 5.9948 x4 3095 909605 359970.57 y 6.5 945. 07.345 Model Crossproducts X'X X'Y Y'Y Varable x3 x4 y Intercept 6.56 3095 6.5 x 33.55 909605 945. x 5.9948 359970.57 07.345 x3.9796 4849.57 00.545 x4 4849.57 3.04535E 05009973.3 y 00.545 05009973.3 8800.6 The REG Procedure Model: MODEL Dependent Varable: y X'X Inverse, Parameter Estmates, and SSE Varable Intercept x x Intercept 0.58438584-0.0003048-0.0505743 x -0.0003048 0.000354-0.0000945 x -0.0505743-0.0000945 0.0030875989 x3-0.508355 0.003443364 0.090798 x4.35557e-7-4.3048e-9-3.0759e-8 y.0058588-0.4033644 0.80653 X'X Inverse, Parameter Estmates, and SSE Varable x3 x4 y Intercept -0.508355.35557E-7.0058588 x 0.003443364-4.3048E-9-0.4033644 x 0.090798-3.0759E-8 0.80653 x3 0.93859849-3.746468E-7 0.693435035 x4-3.746468e-7.48363e- 7.94309E-6 y 0.693435035 7.94309E-6 98.30593943 a) [3] Calculate a 95% confdence nterval for β 4, the populaton regresson coeffcent of X4 n the regresson model for Y wth the three predctors X, X, X3 and X4. Sol

b4 s gven n the X'X Inverse, Parameter Estmates, and SSE above. (also SSE and n, n n the X'X X'Y Y'Y matrx, the st matrx. S(b4) s the 4 th dagonal element of the X'X X'Y Y'Y above. b) [5] Test the null hypothess H0 : β = β3 = β4 = 0 aganst H, s not equal to 0 : at least one of β, β3 or β 4. Use α = 0.05. Sol SSE(F) s gven n the X'X Inverse, Parameter Estmates, and SSE matrx (the last dag element. 98.30593943.and so SSR (F) = SST SSE. SST = Y Y ny_bar^. The reduced model s the smple lnear reg model wth x only and SS= b^.ssxx = ssxy^/ssxx. ssxy = sum of x_*y_- n x_bar*-y_bar., ssxx = sum of x_*x_- n x_bar^, These SS s are n the frst matrx above. I.e Model Crossproducts X'X X'Y Y'Y 9) A company desgnng and marketng lghtng fxtures needed to develop forecasts of sales (.SALES = total monthly sales n thousands of dollars). The company consdered the followng predctors: ADEX = advertng expense n thousands of dollars MTGRATE = mortgage rate for 30-year loans (%) HSSTARTS = housng starts n thousands of unts The company collected data on these varables and the SAS outputs below were obtaned from ths study. The SAS System The CORR Procedure 4 Varables: SALES ADEX MTGRATE HSSTARTS Smple Statstcs Varable N Mean Std Dev Sum SALES 46 63 400.36378 7504 ADEX 46 54.9786 89.369 79 MTGRATE 46 8.485.0403 390.5000 HSSTARTS 46 97.3696 0.978 4473

Pearson Correlaton Coeffcents, N = 46 Prob > r under H0: Rho=0 SALES ADEX MTGRATE HSSTARTS SALES.00000 0.56098-0.6773 0.875 <.000 <.000 <.000 ADEX 0.56098.00000-0.80389 0.673 <.000 <.000 0.075 MTGRATE -0.6773-0.80389.00000-0.34973 <.000 <.000 0.07 HSSTARTS 0.875 0.673-0.34973.00000 <.000 0.075 0.07 The SAS System The REG Procedure Model: MODEL Dependent Varable: SALES Parameter Estmates Parameter Standard Varable DF Estmate Error t Value Pr > t Type I SS Intercept 6.4647 43.3467 3.73 0.0006 4634 ADEX 0.3736 0.4399 0.75 0.460 69959 MTGRATE -5.80 39.74780-3.80 0.0005 044647 HSSTARTS.8636.865 0.84 <.000 87464 a) [5] Test the null hypothess H : 0 0 β = β = aganst ADEX HSSTARTS H : at least one of β ADEX or β HSSTARTS s not equal to 0 where β ADEX and β HSSTARTS are the populaton regresson coeffcents of ADEX and HSSTARTS respectvely n the model E[ SALES] = β0 + βadex ADEX + βmtgrate MTGRATE + β HSSTARTS HSSTARTS. Use α = 0.05. Sol SSR (F) = sum f the type SS. The SLR model SALES on MGRATE s the reduced model and for that model R-sq = 0.6773^ = 0.45876384 and SSR = R-sq *SST. SST =varance (Y) * (n-) b) [4] Let us now consder the smple lnear regresson model E[ SALES] = β + β MTGRATE for predctng SALES based on MTGRATE only. 0 Calculate a 95% confdence nterval for β n ths model. Sol b= r (sales, mtgate)*(s_sales/s_mtgrate) S(b) = sqrt(mse (MTGRATE)/sxx) Sxx =(n-) * var(mtgrate) 3

MSE = SSE/(n-). SSE =SST SSR SSR = R-sq *SST. SST =varance (Y) * (n-) b and the s(b) are n the mntab output for the soluton for part (a) above. 0) An experment was conducted to compare the amounts of tar (n mllgrams) passng through three types of cgarette flters. Ten cgarettes were selected at random from each type and ther tar contents were measured. The means and the standard devatons of the three samples are gven below. Assume that there are no serous volatons n the assumptons needed for the statstcal methods nvolved. Varable Type N Mean StDev Tar 0.689.034 0 8.8.64 3 0 6.48.860 Consder the model Ey = β0 + βx+ βx where y s the tar content. The varables x and x are ndcator (dummy) varables dentfyng the type of cgarette flters and are defned as follows: x = f type and 0 otherwse x = f type and 0 otherwse a) [] Calculate the value of the least squares estmate of β? Show your workngs clearly. Sol b0 = ybar3= 6.5 Ybar=b0+b and so b =.698-6.48 = -3.55 Ybar3 =b0+b and so b = 8.8-6.48 =.034 Here s the MINITAB output fro checkng your answers Descrptve Statstcs: Tar Varable Type N Mean StDev Tar 0.689.034 0 8.8.64 3 0 6.48.860 Regresson Analyss: Tar versus x, x The regresson equaton s Tar = 6. - 3.56 x +.03 x 4

Predctor Coef SE Coef T P Constant 6.478 0.799 0.33 0.000 x -3.558.30-3.5 0.004 x.034.30.80 0.083 S =.5684 R-Sq = 48.% R-Sq(adj) = 44.3% Analyss of Varance Source DF SS MS F P Regresson 60.45 80..55 0.000 Resdual Error 7 7.393 6.385 Total 9 33.638 b) [5] Calculate the value of R-square for ths model. Show your workngs clearly. Sol SSE =(.034^+.64^+.86^) (0-) =.034^+.64^+.86^ = 9.4975*(0-) = 7.347768 SST = 30 = y ny = [( n ) s + n y ] = Or easer ybar = (.689+8.8+ 6.48)/3 =5.73966667 (snce the sample szes are equal) and SSR = 0( y y) + 0( y y) + 0( y y) = 0*( (.689-5.74+8)^+(8.8 3 5.74)^+ (6.48-5.74)^) = 0* ((.689-5.74)^+(8.8-5.74)^+ (6.48-5.74)^) = 60.849 SSt = 7.35+ 60.8 = 33.63 R-sq = SSR/SST = 60.849/ 33.63 = 0.48869649 c) [4] Test the null hypothess H0 : β = 0 aganst the alternatve H: β 0. Show your workngs clearly. sol t = (b-0)/se(b), b = ybar-ybar3 =8.8-6.48 =.034 and SE(b) = SE (ybar-ybar3) = sqrt(7.35/(30-3))*sqrt((/0+/0) =.989675 5

) The followng SAS output was obtaned from a study to dentfy the best set of predctors of sales for a company usng data obtaned from a random sample of n = 5 sales terrtores of the company. The varables n the SAS output below are defned as follows: SALES = sales (n unts) for the terrtory TIME = length of tme terrtory salesperson has been wth the company POTENT = ndustry sales (n unts) for the terrtory ADV = expendtures (n dollars) on advertsng SHARE = weghted average of past market share for the last four years The REG Procedure Model: MODEL Dependent Varable: SALES R-Square Selecton Method Number n Model R-Square Varables n Model 0.3880 TIME 0.3574 POTENT 0.3554 ADV 0.338 SHARE ----------------------------------------------- 0.746 POTENT SHARE 0.607 POTENT ADV 0.5953 TIME ADV 0.564 TIME SHARE 0.530 TIME POTENT 0.4696 ADV SHARE ----------------------------------------------- 3 0.8490 POTENT ADV SHARE 3 0.8 TIME POTENT SHARE 3 0.699 TIME POTENT ADV 3 0.6959 TIME ADV SHARE ----------------------------------------------- 4 0.8960 TIME POTENT ADV SHARE Even though ths output s from the R-square selecton method, t has enough nformaton that can be used n other selecton methods. a) [3] What varable (f any) wll be selected at the frst step f we use the stepwse selecton method? Use α = 0.0 to show whether the varable wll enter the model or not. Show your workng clearly. Sol: TIME because t has the hghest Rsq among all the sngle varable models. Also for TIME F= R-sq//.[(-R-sq)/(5 ) ]= 0.3880 /[(-0.3880)/3] = 4.58 0.3880 /((-0.3880)/3) = 4.5869935 t = sqrt(4.58) = 3.8 sg at alpha = 0.0 6

b) [5] What varable (f any) wll be selected at the second step f we use the stepwse selecton method?? Use α = 0.0 to show whether the varable wll enter the model or not. Show your workng clearly. Sol Note alpha for leavng (or stayng ) s not requred as our questons are askng only the fsrt two varables enterng the model. TIME ADV because ths has the hghest t-rato (or F_drop) among the two varable models wth TIME as one varable. Note F_drop = [SSR(TIME X_k)-SSR(TIME)]/MSE(TIME,x_k) = [Rsq(TIME X_k)-Rsq(TIME)]/[-R-sq(TIME,x_k] /(n-3) = (0.5953-0.3880 )/ ((-0.5953)/(5-3)) =.69088 sqrt(.69088) = 3.35694663 (just dvde the numerator and the denomnator by SST to see ths) Form ths wee see that for all models contanng TIME ths only depends on Rsq(TIME, x_k) and ths (e. F_drop and so the t-value for x_k) ncreases as Rsq(TIME, x_k) ncreases. Rsq(TIME, x_k) s max when x_k = ADV. And so ADV has the hghest t- rato among the models contanng TIME. Multple-choce questons. Crcle the most approprate answer from the lst of answers labeled A), B), C), D), and. E) ( ponts for each queston below) ) In the term test and assgnment, we analyzed the regresson model wth no constant tern for the case wth a sngle predctor. Let us now consder the model wth two predctors Y = β X, + β X, + ε, =,, n (wth no constant term,.e. no β 0 ) wth non-random X varables and the random errors ε s satsfyng the usual assumptons. We estmate β and β usng the method of least squares and calculate least squares resduals e ˆ = Y Y. Whch of the followng statements regardng resduals are necessary true. n I) ex = 0 (.e. the weghted sum of resduals, weghted by the values of the varable = X s equal to 0) n II) ex = 0 (.e. the weghted sum of resduals, weghted by the values of the varable = X s equal to 0) 7

n III) e = 0 (.e. the sum resduals s equal to 0) = A) only III s true B) only I and II are true C) only I and III are true D) only II and III are true E) all the three statements I, II and III are true. Ans B. III s not necessarly true for the no constant model. I and II are true because ex = 0 Here s an example Regresson Analyss: y versus x, x The regresson equaton s y =.6 x + 4.75 x Predctor Coef SE Coef T P Noconstant x.67 0.549 0.47 0.000 x 4.7504 0.583 8.4 0.000 S =.0986 Analyss of Varance Source DF SS MS F P Regresson 7873 359366 97.4 0.000 Resdual Error 9 340 3 Total 707 Descrptve Statstcs: RESI Varable N Sum RESI -.3 Data Dsplay Row y x x RESI 74.4 68.5 6.7-6.08 64.4 45. 6.8.899 3 44. 9.3 8. 9.6767 4 54.6 47.8 6.3-0.354 5 8.6 46.9 7.3 3.3577 6 07.5 66. 8. 3.8448 7 5.8 49.5 5.9-3.008 8 63. 5.0 7. -.838 9 45.4 48.9 6.6 -.7605 8

0 37. 38.4 6.0 -.089 4.9 87.9 8.3.456 9. 7.8 7. -8.955 3 3.0 88.4 7.4 5.980 4 45.3 4.9 5.8 0.6704 5 6. 5.5 7.8-8.5993 6 09.7 85.7 8.4-6.696 7 46.4 4.3 6.5.0399 8 44.0 5.7 6.3-7.76 9 3.6 89.6 8..3087 0 4. 8.7 9. -0.756 66.5 5.3 6.0 5.6758 3) A smple lnear regresson model Y = β0 + βx + ε was ftted to a data set wth 5 observatons and the resduals were calculated for all 5 observatons. The sum of 0 of these values (.e. 0 resduals) was 6.08. What wll be the sum of the remanng 5 resduals? Choose the nterval that contans the answer. Ans D A) (-0, -5) B) (-5, 0) C) (0, 5) D) (5, 0) E) none of the above ntervals contans ths value The sum of all resduals (.e. all 5 resduals) s 0. Snce 0 of them have a sum of 6.08, the sum of the remanng 5, should be +6.08. (to make the sum of all to 0) 4) In a smple lnear regresson analyss of a dependent varable Y on an ndependent varable X, n based on 8 observatons, the 95% confdence nterval for β (.e. the slope) was (.8,.). What s the value of the t-test statstc for testng the null hypothess H0 : β = 0 aganst the alternatve H: β 0? Ans C A) t wll be less than 5.0 B) t wll be greater than 5.0 but less than 8.0 C) t wll be greater than 8.0 but less than 8.0 D) t wll be greater than 8.0 but less than 35.0 E) t wll be greater than 35.0 ME=(.-.8)/ = 0. 9

SE=ME/. = 0.094339664 ( t-table value wth df = n- = 8 = 6 s.) b =(.8+.)/ = ( the md pont of the nterval s the estmate of beta) t = b/se =. 0