UNIVERSITY OF TORONTO AT SCARBOROUGH. Sample Exam STAC67. Duration - 3 hours

Similar documents
STA302/1001-Fall 2008 Midterm Test October 21, 2008

Lecture 7. Confidence Intervals and Hypothesis Tests in the Simple CLR Model

Multiple Regression. More than 2 variables! Grade on Final. Multiple Regression 11/21/2012. Exam 2 Grades. Exam 2 Re-grades

Probability and. Lecture 13: and Correlation

Multiple Linear Regression Analysis

Simple Linear Regression

ENGI 3423 Simple Linear Regression Page 12-01

Statistics MINITAB - Lab 5

Example: Multiple linear regression. Least squares regression. Repetition: Simple linear regression. Tron Anders Moger

Objectives of Multiple Regression

Chapter 13 Student Lecture Notes 13-1

Statistics: Unlocking the Power of Data Lock 5

Lecture Notes Types of economic variables

Simple Linear Regression - Scalar Form

Chapter 14 Logistic Regression Models

b. There appears to be a positive relationship between X and Y; that is, as X increases, so does Y.

Lecture 8: Linear Regression

12.2 Estimating Model parameters Assumptions: ox and y are related according to the simple linear regression model

Example. Row Hydrogen Carbon

UNIVERSITY OF OSLO DEPARTMENT OF ECONOMICS

Applied Statistics and Probability for Engineers, 5 th edition February 23, b) y ˆ = (85) =

Summary of the lecture in Biostatistics

Midterm Exam 1, section 2 (Solution) Thursday, February hour, 15 minutes

residual. (Note that usually in descriptions of regression analysis, upper-case

Chapter 13, Part A Analysis of Variance and Experimental Design. Introduction to Analysis of Variance. Introduction to Analysis of Variance

Statistics. Correlational. Dr. Ayman Eldeib. Simple Linear Regression and Correlation. SBE 304: Linear Regression & Correlation 1/3/2018

REVIEW OF SIMPLE LINEAR REGRESSION SIMPLE LINEAR REGRESSION

Previous lecture. Lecture 8. Learning outcomes of this lecture. Today. Statistical test and Scales of measurement. Correlation

Econometric Methods. Review of Estimation

Linear Regression with One Regressor

Reaction Time VS. Drug Percentage Subject Amount of Drug Times % Reaction Time in Seconds 1 Mary John Carl Sara William 5 4

STATISTICAL PROPERTIES OF LEAST SQUARES ESTIMATORS. x, where. = y - ˆ " 1

STA 108 Applied Linear Models: Regression Analysis Spring Solution for Homework #1

Multiple Choice Test. Chapter Adequacy of Models for Regression

The number of observed cases The number of parameters. ith case of the dichotomous dependent variable. the ith case of the jth parameter

Regression. Linear Regression. A Simple Data Display. A Batch of Data. The Mean is 220. A Value of 474. STAT Handout Module 15 1 st of June 2009

Midterm Exam 1, section 1 (Solution) Thursday, February hour, 15 minutes

Logistic regression (continued)

UNIVERSITY OF OSLO DEPARTMENT OF ECONOMICS

ε. Therefore, the estimate

{ }{ ( )} (, ) = ( ) ( ) ( ) Chapter 14 Exercises in Sampling Theory. Exercise 1 (Simple random sampling): Solution:

ECON 482 / WH Hong The Simple Regression Model 1. Definition of the Simple Regression Model

Ordinary Least Squares Regression. Simple Regression. Algebra and Assumptions.

Chapter Two. An Introduction to Regression ( )

Chapter 8. Inferences about More Than Two Population Central Values

THE ROYAL STATISTICAL SOCIETY HIGHER CERTIFICATE

4. Standard Regression Model and Spatial Dependence Tests

Chapter Business Statistics: A First Course Fifth Edition. Learning Objectives. Correlation vs. Regression. In this chapter, you learn:

ECONOMETRIC THEORY. MODULE VIII Lecture - 26 Heteroskedasticity

STK3100 and STK4100 Autumn 2017

CLASS NOTES. for. PBAF 528: Quantitative Methods II SPRING Instructor: Jean Swanson. Daniel J. Evans School of Public Affairs

Mean is only appropriate for interval or ratio scales, not ordinal or nominal.

Chapter 11 The Analysis of Variance

STA 105-M BASIC STATISTICS (This is a multiple choice paper.)

STK3100 and STK4100 Autumn 2018


Simple Linear Regression and Correlation.

1. The weight of six Golden Retrievers is 66, 61, 70, 67, 92 and 66 pounds. The weight of six Labrador Retrievers is 54, 60, 72, 78, 84 and 67.

: At least two means differ SST

THE ROYAL STATISTICAL SOCIETY GRADUATE DIPLOMA

TESTS BASED ON MAXIMUM LIKELIHOOD

CHAPTER 2. = y ˆ β x (.1022) So we can write

ESS Line Fitting

Chapter 3 Multiple Linear Regression Model

ENGI 4421 Propagation of Error Page 8-01

Lecture Notes 2. The ability to manipulate matrices is critical in economics.

Special Instructions / Useful Data

THE ROYAL STATISTICAL SOCIETY 2016 EXAMINATIONS SOLUTIONS HIGHER CERTIFICATE MODULE 5

( ) = ( ) ( ) Chapter 13 Asymptotic Theory and Stochastic Regressors. Stochastic regressors model

UNIVERSITY OF TORONTO. Faculty of Arts and Science JUNE EXAMINATIONS STA 302 H1F / STA 1001 H1F Duration - 3 hours Aids Allowed: Calculator

best estimate (mean) for X uncertainty or error in the measurement (systematic, random or statistical) best

LINEAR REGRESSION ANALYSIS

ln( weekly earn) age age

Simple Linear Regression and Correlation. Applied Statistics and Probability for Engineers. Chapter 11 Simple Linear Regression and Correlation

Introduction to F-testing in linear regression models

MEASURES OF DISPERSION

Simple Linear Regression Analysis

Functions of Random Variables

Simulation Output Analysis

Handout #8. X\Y f(x) 0 1/16 1/ / /16 3/ / /16 3/16 0 3/ /16 1/16 1/8 g(y) 1/16 1/4 3/8 1/4 1/16 1

Chapter 3 Sampling For Proportions and Percentages

Chapter 2 Simple Linear Regression

Fundamentals of Regression Analysis

Lecture 1 Review of Fundamental Statistical Concepts

Econ 388 R. Butler 2016 rev Lecture 5 Multivariate 2 I. Partitioned Regression and Partial Regression Table 1: Projections everywhere

C. Statistics. X = n geometric the n th root of the product of numerical data ln X GM = or ln GM = X 2. X n X 1

Chapter 5 Properties of a Random Sample

Suggested Answers, Problem Set 4 ECON The R 2 for the unrestricted model is by definition u u u u

Random Variables and Probability Distributions

Wu-Hausman Test: But if X and ε are independent, βˆ. ECON 324 Page 1

The equation is sometimes presented in form Y = a + b x. This is reasonable, but it s not the notation we use.

Lecture 3. Sampling, sampling distributions, and parameter estimation

Continuous Distributions

Qualifying Exam Statistical Theory Problem Solutions August 2005

Chapter -2 Simple Random Sampling

STK4011 and STK9011 Autumn 2016

ENGI 4421 Joint Probability Distributions Page Joint Probability Distributions [Navidi sections 2.5 and 2.6; Devore sections

i 2 σ ) i = 1,2,...,n , and = 3.01 = 4.01

Sum Mean n

Module 7. Lecture 7: Statistical parameter estimation

Transcription:

UNIVERSITY OF TORONTO AT SCARBOROUGH Sample Exam STAC67 Durato - 3 hours AIDS ALLOWED: THIS EXAM IS OPEN BOOK (NOTES) Calculator (No phoe calculators are allowed) LAST NAME FIRST NAME STUDENT NUMBER There are 7 pages cludg ths page. Total marks: 95 PLEASE CHECK AND MAKE SURE THAT THERE ARE NO MISSING PAGES IN THIS BOOKLET.

) The followg SAS output (from PROC UNIVARIATE) was obtaed from a study of the relatoshp betwee the bolg temperature of water ( degrees Fahrehet) ad the atmospherc pressure ( ches of mercury). I the SAS outputs below the bolg temperature s deoted by BT ad the atmospherc pressure by AP. The UNIVARIATE Procedure Varable: BT N 3 Sum Weghts 3 Mea 9.6 Sum Observatos 5939.6 Std Devato 8.3735 Varace 70.06667 Skewess 0.586076 Kurtoss -0.5660379 Ucorrected SS 4030.68 Corrected SS 03.3 Coeff Varato 4.37050 Std Error Mea.5038734 The UNIVARIATE Procedure Varable: AP N 3 Sum Weghts 3 Mea 0.07645 Sum Observatos 60.857 Std Devato 3.863788 Varace 4.9833 Skewess 0.96406479 Kurtoss 0.30908 Ucorrected SS 88.534 Corrected SS 447.84969 Coeff Varato 9.9976 Std Error Mea 0.69394438 The CORR Procedure Varables: BT AP Pearso Correlato Coeffcets, N = 3 Prob > r uder H0: Rho=0 BT AP BT.00000 0.98455 <.000 AP 0.98455.00000 <.000 a) [5 pots] Assumg that a lear relatoshp exsts betwee AP ad BT ad that the data satsfy the ecessary assumptos, calculate the least squares regresso equato of BT o AP. Sol B=rSy/Sx = Bo= y_bar-bx_bar b) [ pots] What proporto of the varablty the bolg temperature of water (.e. BT) s explaed by the ths smple lear regresso model? Sol Ths s R-sq=0.98455^

c) [5 pots] Calculate a 95% cofdece terval for the slope of the regresso le. Sol Fd MSE frst usg R^ = -SSE/SST ad the use the formula for the CI for b Or use SSR=b_SqSxx ) A researcher wshed to study the relato betwee patet satsfacto (Y) ad patet s age (X), severty of lless (X, a dex) ad axety level (X3). Some SAS outputs for the regresso aalyss of hs data are gve below. You may assume that the model s approprate (.e. satsfes the assumptos eeded.) for aswerg the questos below. The REG Procedure Model: MODEL Model Crossproducts X'X X'Y Y'Y Varable Itercept x x Itercept 46 766 30 x 766 7378 9005 x 30 9005 7846 x3 05. 407. 5344.7 y 83 038 4084 Model Crossproducts X'X X'Y Y'Y Varable x3 y Itercept 05. 83 x 407. 038 x 5344.7 4084 x3 44.6 637 y 637 877 The REG Procedure Model: MODEL Depedet Varable: y X'X Iverse, Parameter Estmates, ad SSE Varable Itercept x x Itercept 3.4776535 0.00939-0.06793079 x 0.00939 0.000456086-0.00038596 x -0.06793079-0.00038596 0.0039484 x3-0.0679887-0.004667-0.0770085 y 58.49567 -.46847-0.440046 X'X Iverse, Parameter Estmates, ad SSE Varable x3 y 3

Itercept -0.0679887 58.49567 x -0.004667 -.46847 x -0.0770085-0.440046 x3 0.498577303-3.470639 y -3.470639 448.840688 Parameter Estmates Parameter Stadard Varable DF Estmate Error t Value Pr > t Type I SS Itercept 58.495 8.589 8.74 <.000 74353 x -.46 0.480-5.3 <.000 875.38885 x -0.4400 0.4997-0.90 0.374 480.959 x3-3.4706 omtted omtted omtted 364.595 ) [4 pots] Test whether there s a regresso relato betwee Y ad the explaatory varables X, X ad X3. State the ull ad the alteratve hypotheses. Use α = 0.05. Sol SSE = 448.840688 SST = 877 46 x (83)^ ad so calculate F ) [4 pots] Calculate a 95% cofdece terval for β 3 (the coeffcet of X3 the above model) Sol bera3_hat = -3.4706 S^(beta3_hat) = MSE x 3 rd dagoal elemet of X'X Iverse = (SSE/(46-3+)) x 0.498577303 CI = bera3_hat +/- ts ) [4 pots] Calculate a 95% cofdece terval for β β 3 ( β ad β 3 are the coeffcet of X ad X3 respectvely the above model) Sol estmate of β β 3 = -0.4400 - -3.4706 SE^ of β β 3 = S^(beta_hat) + S^(beta3_hat) - x cov(beta_hat, beta3_hat) cov(beta_hat, beta3_hat) s MSE x d row 3 rd col elemet of of X'X Iverse v) [4 pots] Calculate ad terpret the value of the coeffcet of partal determato betwee Y ad X, gve that X s the model. Sol SSR(X X)/SSE(X) 4

We have SST. SSR(X) = Type I SS for X = 875.38885 ad SSE(X) = SST SSR(X) SSR(X X) = Type SS for X = 480.959 v) [4 pots] Test whether both X ad X3 ca be dropped from the model (.e. keepg oly X the model). Use α = 0.05. Sol SSdrop = 480.9+364.6 = 845.07 v) [4 pots] Test whether both X ad X ca be dropped from the model (.e. keepg oly X3 the model). Use α = 0.05. Sol to calculate SSR(reduced) calculate b for ths smple lear regresso model ad the SSR =b^ x Sxx ad the use the drop test v) [4 pots] Gve the ANOVA table (wth all etres calculated) for the regresso model for Y wth the two depedet varables X ad X. 3) (Based o q6 p73 Terry ths s q5 STAB7 Fal W08) A compay desgg ad marketg lghtg fxtures eeded to develop forecasts of sales (.SALES = total mothly sales thousads of dollars). The compay cosdered the followg predctors: ADEX = advertg expese thousads of dollars MTGRATE = mortgage rate for 30-year loas (%) HSSTARTS = housg starts thousads of uts The compay collected data o these varables ad the SAS outputs below were obtaed from ths study. The REG Procedure Model: MODEL Depedet Varable: SALES Number of Observatos Read 46 Aalyss of Varace Sum of Mea Source DF Squares Square F Value Pr > F Model 3 68707 06357 84.4 <.000 Error 4 0603 449 Corrected Total 45 730 5

Root MSE 56.9883 R-Square 0.8578 Depedet Mea 63.3609 Adj R-Sq 0.8476 Parameter Estmates Parameter Stadard Varace Varable DF Estmate Error t Value Pr > t Iflato Itercept 6.4647 43.3467 3.73 0.0006 0 ADEX 0.3736 0.4399 0.75 0.460.8856 MTGRATE -5.80 39.74780-3.80 0.0005.9944 HSSTARTS.8636.865 0.84 <.000.4005 400 Plot of Resduals vs Predcted Values (Respose: Sales) 300 00 Resduals 00 0-00 -00-300 000 50 500 750 Predcted Values 000 50 400 Plot of Resduals vs Normal Scores (Respose: SALES) 300 00 Resduals 00 0-00 -00-300 - - 0 Normal Scores ) [3 pots] Calculate the value of R-squared for the regresso of ADEX o MTRATE ad HSSTARTS. 6

As VIF = / (-R-sq(ADEX, MTGRATE, HSSTARTS)) =.8856 Ad so R-sq = 0.6464638 Here s the complete output: Regresso Aalyss: ADEX versus MTGRATE, HSSTARTS The regresso equato s ADEX = 766-7. MTGRATE - 0.067 HSSTARTS Predctor Coef SE Coef T P VIF Costat 765.54 94.38 8. 0.000 MTGRATE -7.9 8.56-8.36 0.000. HSSTARTS -0.0670 0.48-0.6 0.87. S = 54.7 R-Sq = 64.6% R-Sq(adj) = 63.0% ) State whether the followg statemets are true or false. Crcle your aswer. [ pot for each part] a) The resdual plots above show that the dstrbuto of resduals s left-skewed. (True / False) As F b) The resdual plots above show clear evdece of o-costat varace of errors. (True / False) As F c) The small p-value (p = 0.000 from the ANOVA table) for the global F-test for model mples that all three varables should be retaed the model. (True / False) As F d) If we add aother predctor for the above model wth three predctors (so that we have 4 predctors), the SSE for that model (.e. the model wth 4 predctors) wll be greater 0603. (True / False) As F, SSE decreases as k creases e) If we add aother predctor for the above model wth three predctors (so that we have 4 predctors), the SSRegresso for that model (.e. the model wth 4 predctors) wll be less tha 68707. (True / False) 7

Ad F, SSReg creases as k creases. f) If we add aother predctor for the above model wth three predctors (so that we have 4 predctors), the SSTotal for that model (.e. the model wth 4 predctors) wll be less tha 730. (True / False) As F SST does ot deped o X s g) The value of the adjusted R-squared for the regresso model for SALES o MTGRATE ad HSSTARTS (.e wth oly two predctors) wll be less tha 0.8476. As F Regresso Aalyss: SALES versus MTGRATE, HSSTARTS The regresso equato s SALES = 863-75 MTGRATE +.8 HSSTARTS Predctor Coef SE Coef T P VIF Costat 863. 70.4 6.89 0.000 MTGRATE -74.54 4.40-7.5 0.000. HSSTARTS.84.80 0.88 0.000. S = 55.489 R-Sq = 85.6% R-Sq(adj) = 84.9% 4) [5 pots] A researcher suspected that the systolc blood pressure of dvduals are relates to weght. He calculated the least squares regresso equato of systolc plod pressure o weght based o a sample of 4 dvduals. The estmated slope of ths smple lear regresso model was 0.373 wth a stadard error of 0.0465 (.e b =0.373 ad s = 0.0465). Calculate the correlato betwee systolc blood pressure b ad weght for ths sample of dvduals. Sol T= 0.373/0.0465 =.84866 F=R-sq/[(-R-sq)/(4-)] = t^ = 8.33563 Ad so R-sq = 8./(+8.) = 0.40389493 Ths questo s based o the data form summer 06 B fal (regresso questo). Here are some useful outputs 8

Systolc blood pressure readgs of dvduals are thought to be related to weght The followg MINITAB output was obtaed from a regresso aalyss of systolc blood pressure o weght ( pouds). The ext fve questos are based o ths formato. Descrptve Statstcs: Systolc, Weght Varable N N* Mea SE Mea StDev Mmum Q Meda Q3 Systolc 4 0 54.50.49 5.57 45.00 50.75 53.50 58.50 Weght 4 0 94.07 7.8 6.86 64.00 73.00 88.00.00 Correlatos: Systolc, Weght Pearso correlato of Systolc ad Weght = 0.635 The regresso equato s Systolc = 9 + 0.3 Weght Predctor Coef StDev T P Costat 8.935 9.055 4.4 0.000 Weght 0.373 0.0465.85 0.05 R-Sq = (omtted) Aalyss of Varace Source DF SS MS F P Regresso 6.75 6.75 8. 0.05 Resdual Error 40.75 0.06 Total 3 403.50 5) The data ad some useful formato o a respose varable y ad two explaatory varables x ad x are gve below: y x x 3 5 8 3 7 3 4 6 4 3 3 3.67-0.57-0. = -0. -0.08 0. ( X ' X ) -0.57 0.33-0.08 a) [ 4 pots] Estmate the lear regresso model for y o the two explaatory varables x ad x. 9

Sol Use ( X' X) X Y b) [ 6 pots] MSE for the smple lear regresso model of y o x s 4.5. Test for the lack of ft of ths model (.e. smple lear regresso model of y o x) usg pure error sums of squares. Regresso Aalyss: y versus x, x The regresso equato s y = - 0.75 + 4.4 x - 0.966 x Predctor Coef SE Coef T P Costat -0.746.64-0.8 0.789 x 4.407.8 3.73 0.04 x -0.966 0.7033 -.37 0.8 S =.0493 R-Sq = 73.6% R-Sq(adj) = 63.0% Aalyss of Varace Source DF SS MS F P Regresso 58.08 9.04 6.96 0.036 Resdual Error 5 0.847 4.69 Total 7 78.875 Source DF Seq SS x 50.6 x 7.867 MTB > fo Iformato o the Worksheet Colum Cout Name C 8 y C 8 x C3 8 x M3 3 x 3 XPXI3 MTB > prt XPXI3 Data Dsplay Matrx XPXI3 0

.67373-0.57034-0.069-0.5703 0.334746-0.0767-0.07-0.0767 0.8644 MTB > Regress 'y' 'x' ; SUBC> Costat; Regresso Aalyss: y versus x The regresso equato s y = -.64 + 3.79 x Predctor Coef SE Coef T P Costat -.643.74-0.60 0.57 x 3.786.69 3.4 0.08 S =.8763 R-Sq = 63.6% R-Sq(adj) = 57.5% Aalyss of Varace Source DF SS MS F P Regresso 50.6 50.6 0.48 0.08 Resdual Error 6 8.74 4.786 Total 7 78.875 MTB > Regress 'y' 'x' ; SUBC> Costat; SUBC> Pure; SUBC> Bref. Regresso Aalyss: y versus x The regresso equato s y = -.64 + 3.79 x Predctor Coef SE Coef T P Costat -.643.74-0.60 0.57 x 3.786.69 3.4 0.08 S =.8763 R-Sq = 63.6% R-Sq(adj) = 57.5% Aalyss of Varace Source DF SS MS F P Regresso 50.6 50.6 0.48 0.08 Resdual Error 6 8.74 4.786 Lack of Ft.74.74 0.3 0.597 Pure Error 5 7.000 5.400 Total 7 78.875 rows wth o replcates

5)[5 pots] Cosder the smple lear regresso model: Y = β0 + βx + ε wth the usual assumptos (.e. E( ε ) = 0 for all, V ( ε ) = σ for all, Cov( ε, ε j ) = 0 wheever, j. The ormalty of ε s s ot requres for the results below.). Let b 0 ad b be the least squares estmators of β 0 ad β respectvely. Prove that Var[ e ] = σ ( X X ) = ( X X ), where e ˆ = Y Y. Sol Var[ e ] = Var[ Y Yˆ ] = Var[ Y ] + Var[ Yˆ ] Cov[ Y, Yˆ ] Cov[ Y, Yˆ ] = Cov[ Y, b + b X ] = Cov[ Y, k Y + X k Y ] where 0 j j j j j= j= k j = X k j. Cov[ Y, Yˆ ] = Cov[ Y, b + b X ] = Cov[ Y, k Y + X k Y ] 0 j j j j j= j= [ ] = k Cov( Y, Y ) + X k Cov( Y, Y ) = σ k + X k = σ + = σ + = σ + ad so X k X k ( X X ) k ( X X ) = ( X X ) k j ( X j X ) = ad S XX Var[ e ] = Var[ Y Yˆ ] = Var[ Y ] + Var[ Yˆ ] Cov[ Y, Yˆ ] ( X X ) ( X X ) = σ + σ + σ + ( X X ) ( X X ) = = σ = ( X X ) = ( X X )

6) A psychologst coducted a study to exame the ature of the relato, f ay, betwee a employee s emotoal stablty (X) ad the employee s ablty to perform a task group (Y). Emotoal stablty was measured by a wrtte test, for whch the hgher the score, the greater the emotoal stablty. Ablty to perform a task group (Y = f able, Y = 0 f uable) was evaluated by the supervsor. The psychologst s cosderg a logstc regresso model for the data. The SAS output below s based o the results for 7 employees. The SAS System The LOGISTIC Procedure Model Iformato Data Set WORK.A Respose Varable Y Number of Respose Levels Number of Observatos 7 Model bary logt Optmzato Techque Fsher's scorg Respose Profle Ordered Total Value Y Frequecy 0 3 4 Probablty modeled s Y=. Testg Global Null Hypothess: BETA=0 Test Ch-Square DF Pr > ChSq Lkelhood Rato 8.5 0.0043 Score 7.33 0.0068 Wald 5.769 0.063 Aalyss of Maxmum Lkelhood Estmates Stadard Wald Parameter DF Estmate Error Ch-Square Pr > ChSq Itercept -0.3089 4.3770 5.547 0.085 X 0.089 0.00788 5.769 0.063 The SAS System The LOGISTIC Procedure Odds Rato Estmates Pot 95% Wald Effect Estmate Cofdece Lmts X omtted omtted omtted 3

[ pots] ) Estmate the probablty that a employee wth emotoal stablty score of 500 (.e. X = 500) wll be able to perform the task. [4 pots] ) Calculate a 90 percet cofdece terval for the odds rato of X. 7) A persoel offcer a compay admstered four apttude tests to each of 5 applcats for etry-level clercal postos. For purpose of ths study, all 5 applcats were accepted for postos rrespectve of ther test scores. After a perod each applcat was rated for profcecy (deoted by Y) o the job. The SAS output below s teded to detfy the best subset of the four tests (deoted by X, X, X3, ad X4). The SAS System The REG Procedure Model: MODEL Depedet Varable: Y Adjusted R-Square Selecto Method Number of Observatos Read 5 Number of Observatos Used 5 Number Adjusted Model R-Square R-Square C(p) Varables Model 3 0.9560 0.965 3.774 X X3 X4 4 0.9555 0.969 5.0000 X X X3 X4 0.969 0.9330 7.30 X X3 3 0.947 0.934 8.55 X X X3 0.866 0.8773 47.540 X3 X4 3 0.867 0.8790 48.30 X X3 X4 3 0.833 0.8454 66.3465 X X X4 0.7985 0.853 80.5653 X X4 0.796 0.8047 84.465 X3 0.7884 0.806 85.596 X X3 0.7636 0.7833 97.7978 X X4 0.745 0.7558 0.5974 X4 A 0.464 69.7800 X X 0.36 0.646 375.3447 X 0.43 0.470 384.835 X Eve though ths SAS output s for R-square selecto method, t has useful formato that ca be used other selecto methods. 4

a) [5 pots] Idetfy the varable that wll eter the model at the secod step of the stepwse regresso procedure. Expla clearly how you detfed ths varable. Sol X3 has the largest Rsq amog the four sgle varable models ad so t eters the model at the frst step (assumg F = [Rsq/df_reg]/[(-Rsq)/df_error] s sgfcat at the requred sg level to eter the model. Now amog the three two-varable models cotag X3, the model wth X has the hghest R-sq ad so X has the hghest t-rato amog the three models cotag X3. If a varable wll be selected at ths step, t must be X (see MINITAB output below) b) [ pots] Idetfy the varables that you wll select f you wat to use the Mallow s C(p) crtero. Expla clearly the reaso for your aswer. Sol The model wth Cp = umber of varables + (other tha the model wth all varables) Eg the model wth X X3 X4 whch has Cp = 3.774 (close to 4) c) [3 pots] Calculate the value of the adjusted R-square for the model wth the predctors X ad X oly. (Note ths s the model for whch the adjusted R-square has bee deleted the above SAS output) As -(4/)*(-0.464) = 0.45490909 Here are some useful outputs The SAS System The REG Procedure Model: MODEL Depedet Varable: Y Adjusted R-Square Selecto Method Number of Observatos Read 5 Number of Observatos Used 5 Number Adjusted Model R-Square R-Square C(p) Varables Model 3 0.9560 0.965 3.774 X X3 X4 5

4 0.9555 0.969 5.0000 X X X3 X4 0.969 0.9330 7.30 X X3 3 0.947 0.934 8.55 X X X3 0.866 0.8773 47.540 X3 X4 3 0.867 0.8790 48.30 X X3 X4 3 0.833 0.8454 66.3465 X X X4 0.7985 0.853 80.5653 X X4 0.796 0.8047 84.465 X3 0.7884 0.806 85.596 X X3 0.7636 0.7833 97.7978 X X4 0.745 0.7558 0.5974 X4 0.455 0.464 69.7800 X X 0.36 0.646 375.3447 X 0.43 0.470 384.835 X L The SAS System The REG Procedure Model: MODEL Depedet Varable: Y Number of Observatos Read 5 Number of Observatos Used 5 Stepwse Selecto: Step Varable X3 Etered: R-Square = 0.8047 ad C(p) = 84.465 Aalyss of Varace Sum of Mea Source DF Squares Square F Value Pr > F Model 785.9775 785.9775 94.78 <.000 Error 3 768.085 76.87056 Corrected Total 4 9054.00000 Parameter Stadard Varable Estmate Error Type II SS F Value Pr > F Itercept -06.384 0.4479 07.058 6.94 <.000 X3.96759 0.00 785.9775 94.78 <.000 Bouds o codto umber:, --------------------------------------------------------------------------- Stepwse Selecto: Step Varable X Etered: R-Square = 0.9330 ad C(p) = 7.30 Aalyss of Varace Sum of Mea Source DF Squares Square F Value Pr > F Model 8447.3455 43.678 53.7 <.000 Error 606.65745 7.57534 Corrected Total 4 9054.00000 Parameter Stadard 6

Varable Estmate Error Type II SS F Value Pr > F Itercept -7.59569.6856 789.9335 0.7 <.000 X 0.34846 0.05369 6.36540 4. <.000 X3.83 0.307 605.48790 9.45 <.000 ^L The SAS System 3 The REG Procedure Model: MODEL Depedet Varable: Y Stepwse Selecto: Step Bouds o codto umber:.0338, 4.35 --------------------------------------------------------------------------- Stepwse Selecto: Step 3 Varable X4 Etered: R-Square = 0.965 ad C(p) = 3.774 Aalyss of Varace Sum of Mea Source DF Squares Square F Value Pr > F Model 3 8705.8099 90.93433 75.0 <.000 Error 348.970 6.5808 Corrected Total 4 9054.00000 Parameter Stadard Varable Estmate Error Type II SS F Value Pr > F Itercept -4.000 9.87406 63.3586 58. <.000 X 0.9633 0.04368 763.559 46.0 <.000 X3.35697 0.583 34.3885 79.87 <.000 X4 0.574 0.305 58.46044 5.59 0.0007 Bouds o codto umber:.8335, 9.764 --------------------------------------------------------------------------- All varables left the model are sgfcat at the 0.500 level. No other varable met the 0.500 sgfcace level for etry to the model. Summary of Stepwse Selecto Varable Varable Number Partal Model Step Etered Removed Vars I R-Square R-Square C(p) F Value Pr > F X3 0.8047 0.8047 84.465 94.78 <.000 X 0.83 0.9330 7.30 4. <.000 3 X4 3 0.085 0.965 3.774 5.59 0.0007 7

8) I a study of the larvae growg a lake, the researchers collected data o the followg varables. Y = The umber of larvae of the Chaoborous collected a sample of the sedmet from a area of approxmately 5 cm of the lake bottom X = The dssolved oxyge (mg/l) the water at the bottom X = The depth (m) of the lake at the samplg pot Some useful SAS outputs for fttg the regresso model Y = β0 + βx+ βx + β3xx + ε, usg the data from ths study are gve below. Assume that the model gve below s approprate (.e. satsfes all the ecessary assumptos) to aswer the questos below. The REG Procedure Model: MODEL Depedet Varable: Y Number of Observatos Read 4 Number of Observatos Used 4 Aalyss of Varace Sum of Mea Source DF Squares Square F Value Pr > F Model 3 3.569 437.053 8.3 <.000 Error 0 54.84308 5.4843 Corrected Total 3 466.00000 Parameter Estmates Parameter Stadard Varable DF Estmate Error t Value Pr > t Type I SS Itercept 4.30070 7.9874 3.07 0.09 5054.00000 X -.3549.087 -.3 0.0590 97.46 X.493 0.95978.8 0.307 3.67566 XX -0.00563 0.5660-0.04 0.970 0.0003 State whether each of the followg statemets s true or false (based o the formato gve above). [ pot for each part] ) The effect of the amout of oxyge dssolved water (.e. X) o the umber of larvae depeds o the depth (at α = 0.) (True / False) As F, the p-value for XX s > 0. ) The terms X ad XX have o sgfcat cotrbuto to the model ad so both these terms ca be dropped from the above model (at α = 0.) (True / False) 8

As False 3.68+0.0 = 3.7 ANS/ = 56.85 ANS/5.48 = 3.674806 p-value = - 0.93630 = 0.063699, F(, 0, 0.0) =.9 ad so rej Ho. ) The value of the t-statstc for testg the hull hypothess H0 : β = agast H : β >, s greater tha.0. (True / False) As F t = (b-)/se(b) = (.49 )/ 0.9598 = 0.34396499 <.0 v) The p-value of the t-test for testg the ull hypothess H0 : β = 0 agast H : β > 0 s less tha 0.0. (True / False) As F P-value = - 0.095 = 0.9705 v) The sum of squares of errors (SSE) for the smple lear regresso model of Y o X s greater tha 50.0. (True / False) As T t s gteater tha the SSE for the bgger model above.e 54.84. 9

Multple-choce questos (Mscellaeous) ( pots for each questo) 9) If the slope of a least squares regresso le of Y o X s egatve, what else must be egatve? A) The correlato of X ad Y B) The slope of a least squares regresso le of X o Y C) The coeffcet of determato (R-sq) for the regresso of Y o X D) More tha oe of the above must be egatve E) Noe of the above eed be egatve Ad D, correlato of X ad Y ad the slope of X o Y must be egatve. 0) If there were o lear relatoshp betwee X ad Y (.e. correlato (r) = 0), what would the predcted value Y (predcted usg the estmated least squares regresso equato) at ay gve value of X? A) 0 B) mea of Y the values (. e. Y ) C) mea of X values(. e. X ) D) (Mea of Y values - Mea of X values ) (.e. Y X ) E) It depeds o varace of Y As B 0

Total 95 pots