Statistics 5100 Spring 2018 Exam 1

Similar documents
Possibly useful formulas for this exam: b1 = Corr(X,Y) SDY / SDX. confidence interval: Estimate ± (Critical Value) (Standard Error of Estimate)

General Linear Model (Chapter 4)

Multicollinearity Exercise

Lecture 11 Multiple Linear Regression

Lecture 3: Inference in SLR

STAT 512 MidTerm I (2/21/2013) Spring 2013 INSTRUCTIONS

Inferences for Regression

Lecture 10 Multiple Linear Regression

Stat 500 Midterm 2 12 November 2009 page 0 of 11

Simple Linear Regression: One Quantitative IV

Simple Linear Regression: One Qualitative IV

STA 302 H1F / 1001 HF Fall 2007 Test 1 October 24, 2007

COMPREHENSIVE WRITTEN EXAMINATION, PAPER III FRIDAY AUGUST 26, 2005, 9:00 A.M. 1:00 P.M. STATISTICS 174 QUESTION

This document contains 3 sets of practice problems.

3 Variables: Cyberloafing Conscientiousness Age

Booklet of Code and Output for STAC32 Final Exam

A discussion on multiple regression models

using the beginning of all regression models

a. The least squares estimators of intercept and slope are (from JMP output): b 0 = 6.25 b 1 =

Lecture 11: Simple Linear Regression

STATISTICS 110/201 PRACTICE FINAL EXAM

ST430 Exam 2 Solutions

Multiple Regression Examples

Math 3330: Solution to midterm Exam

1. The shoe size of five randomly selected men in the class is 7, 7.5, 6, 6.5 the shoe size of 4 randomly selected women is 6, 5.

Correlation and the Analysis of Variance Approach to Simple Linear Regression

STAT 350 Final (new Material) Review Problems Key Spring 2016

Chapter 8 Quantitative and Qualitative Predictors

Lecture notes on Regression & SAS example demonstration

EXST Regression Techniques Page 1. We can also test the hypothesis H :" œ 0 versus H :"

1 A Review of Correlation and Regression

Linear models Analysis of Covariance

STAT 350: Summer Semester Midterm 1: Solutions

Failure Time of System due to the Hot Electron Effect

Data Analysis 1 LINEAR REGRESSION. Chapter 03

Unit 6 - Introduction to linear regression

171:162 Design and Analysis of Biomedical Studies, Summer 2011 Exam #3, July 16th

Linear models Analysis of Covariance

Topic 18: Model Selection and Diagnostics

An Analysis of College Algebra Exam Scores December 14, James D Jones Math Section 01

Sociology Exam 2 Answer Key March 30, 2012

Lecture 1 Linear Regression with One Predictor Variable.p2

1) Answer the following questions as true (T) or false (F) by circling the appropriate letter.

x3,..., Multiple Regression β q α, β 1, β 2, β 3,..., β q in the model can all be estimated by least square estimators

Lecture 18 Miscellaneous Topics in Multiple Regression

Chapter 2 Inferences in Simple Linear Regression

y response variable x 1, x 2,, x k -- a set of explanatory variables

Unit 6 - Simple linear regression

ST Correlation and Regression

SPSS Output. ANOVA a b Residual Coefficients a Standardized Coefficients

df=degrees of freedom = n - 1

Data Analyses in Multivariate Regression Chii-Dean Joey Lin, SDSU, San Diego, CA

1 Multiple Regression

MATH ASSIGNMENT 2: SOLUTIONS

Correlation and Simple Linear Regression

Overview Scatter Plot Example

Chapter 6 Multiple Regression

Stat 302 Statistical Software and Its Applications SAS: Simple Linear Regression

Topic 20: Single Factor Analysis of Variance

ECON 497 Midterm Spring

Topic 28: Unequal Replication in Two-Way ANOVA

PubH 7405: REGRESSION ANALYSIS SLR: DIAGNOSTICS & REMEDIES

Correlation Analysis

In Class Review Exercises Vartanian: SW 540

ST 512-Practice Exam I - Osborne Directions: Answer questions as directed. For true/false questions, circle either true or false.

Outline. Review regression diagnostics Remedial measures Weighted regression Ridge regression Robust regression Bootstrapping

Group Comparisons: Differences in Composition Versus Differences in Models and Effects

ST505/S697R: Fall Homework 2 Solution.

Table 1: Fish Biomass data set on 26 streams

14 Multiple Linear Regression

Unit 11: Multiple Linear Regression

9 Correlation and Regression

Lecture 3: Multiple Regression. Prof. Sharyn O Halloran Sustainable Development U9611 Econometrics II

Paper: ST-161. Techniques for Evidence-Based Decision Making Using SAS Ian Stockwell, The Hilltop UMBC, Baltimore, MD

PART I. (a) Describe all the assumptions for a normal error regression model with one predictor variable,

Chapter 4. Regression Models. Learning Objectives

One-Way ANOVA. Some examples of when ANOVA would be appropriate include:

Ch 2: Simple Linear Regression

ECON 4230 Intermediate Econometric Theory Exam

Chapter 14 Student Lecture Notes Department of Quantitative Methods & Information Systems. Business Statistics. Chapter 14 Multiple Regression

ST430 Exam 1 with Answers

MATH 644: Regression Analysis Methods

SPECIAL TOPICS IN REGRESSION ANALYSIS

CHAPTER 4 & 5 Linear Regression with One Regressor. Kazu Matsuda IBEC PHBU 430 Econometrics

Chapter 12 - Lecture 2 Inferences about regression coefficient

Chapter 1 Linear Regression with One Predictor

Topic 17 - Single Factor Analysis of Variance. Outline. One-way ANOVA. The Data / Notation. One way ANOVA Cell means model Factor effects model

Lab 10 - Binary Variables

Chapter 1: Linear Regression with One Predictor Variable also known as: Simple Linear Regression Bivariate Linear Regression

Acknowledgements. Outline. Marie Diener-West. ICTR Leadership / Team INTRODUCTION TO CLINICAL RESEARCH. Introduction to Linear Regression

Homework 2: Simple Linear Regression

Final Exam. Question 1 (20 points) 2 (25 points) 3 (30 points) 4 (25 points) 5 (10 points) 6 (40 points) Total (150 points) Bonus question (10)

Ch 3: Multiple Linear Regression

" M A #M B. Standard deviation of the population (Greek lowercase letter sigma) σ 2

STAT 3A03 Applied Regression With SAS Fall 2017

Business Statistics. Lecture 10: Correlation and Linear Regression

Correlation and Linear Regression

Correlation and simple linear regression S5

Regression: Main Ideas Setting: Quantitative outcome with a quantitative explanatory variable. Example, cont.

Chapter 5 Introduction to Factorial Designs Solutions

Transcription:

Statistics 5100 Spring 2018 Exam 1 Directions: You have 60 minutes to complete the exam. Be sure to answer every question, and do not spend too much time on any part of any question. Be concise with all your responses. Partial SAS output and statistical tables are found in an accompanying handout. The point value of each question is given, and the points sum to 100. Good luck! (Q1) (1 point) What is your name? Possibly useful formulas for this exam: b1 = Corr(X,Y) SDY / SDX b0 = Y b 1 X ei = Y i Y i conf/pred interval: Estimate ± (Critical Value) (Standard Error of Estimate) regression equation: Y i = β 0 + β 1 X i1 + β 2 X i2 + + β p 1 X i,p 1 + ε i (SSE reduced SSE full ) F p q = SSE full n p o p = # β s in full model (incl. intercept) o q = # β s in reduced model (incl. intercept) o n = sample size SSR(U V) = SSE(V) SSE(U, V) R 2 = SS model = 1 SSE SS total SS total 2 R adj = 1 n 1 SSE, where p = # predictors in model n p SS total Data: This exam uses a single data set involving intelligence quotient (IQ). Data were collected on 38 students at a certain university. The relevant variables are summarized in the table below. Of interest is how the other variables could be used to predict IQ. Variable Description IQ Full-scale IQ, from Wechsler subtests Sex 0=Female, 1=Male BrainSize Total pixel count (in 1000 s) of brain in MRI scans Weight Body weight, in pounds Height Body height, in inches

(Q2) IQ is regressed on Weight, Height, and BrainSize; see the partial output from PROC REG in SAS Output 1. (a) (7 points) What percentage of the variability in IQ can be explained by its linear relationship with Weight, Height, and BrainSize? (b) (12 points) Give an appropriate interpretation of the number 0.06788, which is bolded and underlined in the last line of the Parameter Estimates table. (c) (10 points) Give an appropriate interpretation of the number 0.0016, which is also bolded and underlined in the same table. (Note that you are not asked what you would do with this number, or what you might conclude from it, but what does the value of the number itself represent?) (d) (8 points) Comment briefly on what the graphical diagnostics in this output suggest about two model assumptions regarding error term distribution. For both assumptions, specify (i) the assumption, (ii) the name of the graphical diagnostic, and (iii) what the diagnostic suggests about the assumption here. 1: 2: (i) (iii) (i) (iii) (ii) (ii)

(Q3) (8 points) SAS Output 2 reports some numerical diagnostics on the residuals from the model fit in SAS Output 1. For the same two assumptions you referred to in part (d) of Q2 above, specify (i) the name of the numerical diagnostic, (ii) the value of the threshold against which you compare its results (i.e., when would it be significant), and (iii) what the diagnostic suggests about the assumption here. 1: (i) (iii) (ii) 2: (i) (iii) (ii) (Q4) A researcher wonders whether the effect of BrainSize on IQ is the same for both men and women. (See SAS Output 3, where the plotting symbol is Sex.) (a) (6 points) Write out the linear regression model (in terms of β's and the variable names) to address this question. (b) (1 points) Based on this model, write (but do not test) the appropriate null hypothesis to test the researcher's question. (Q5) A researcher wonders whether the average IQ is different for men and women (see SAS Output 4 ). One could use a t-test or ANOVA model to consider this potential difference. Instead, the researcher also wants to also control for possible effects of the factors Weight, Height, and BrainSize (see SAS Output 5 ), but no interactions. (a) (6 points) Write out the linear regression model (in terms of β's and the variable names) to address this question (including controlling for these other factors). (b) (1 point) Based on this model, write (but do not test) the appropriate null hypothesis to test the researcher's question.

(Q6) SAS Output 6 has evidence that multicollinearity is not problematic in the model fit there. For one diagnostic method to assess multicollinearity, identify the following: (a) (1 point) The name of the diagnostic method (b) (4 points) The numeric rule (or threshold) the method uses for determining multicollinearity (i.e., when would multicollinearity be called problematic?) (c) (4 points) How the output in SAS Output 6 compares to this numeric rule (or threshold) (Q7) A researcher regresses IQ on BrainSize, Sex, and the interaction term BrainSize*Sex. Assume that she has verified that model assumptions are met. She reports 90% interval estimates for male IQ when BrainSize is 1000, and partial results are given below. (There is no other output needed to answer these questions.) Interval Type Left Endpoint Right Endpoint Confidence 108.285 115.455 Prediction 99.5534 124.1866 (a) (6 points) What is the correct interpretation of the confidence interval? (b) (6 points) What is the point estimate of male IQ when BrainSize is 1000?

(Q8) [This collection of questions does not involve any data or output.] Consider a multiple linear regression model where Y is regressed on X1, X2, and X3. (a) (6 points) When asked to state the multiple linear regression model, a student wrote the model below. Do you agree? Why or why not? E[Y i ] = β 0 + β 1 X i1 + β 2 X i2 + β 3 X i3 + ε i (b) (6 points) Suppose there is multicollinearity involving X1 and X3. State clearly what this means. (You are not asked how you would check for it.) (c) (6 points) Suppose there is interaction involving X1 and X3. State clearly what this means. (You are not asked how you would check for it.) (Q9) (1 point) What topic(s) did you study most but that did not appear on this exam?

Output and Tables for Spring 2018 STAT 5100 Exam 1 SAS Output 1 The REG Procedure Dependent Variable: IQ Number of Observations Read 38 Number of Observations Used 38 Analysis of Variance Source DF Sum of Squares Mean Square F Value Pr > F Model 3 711.15276 237.05092 4.91 0.0061 Error 34 1640.55776 48.25170 Corrected Total 37 2351.71053 Parameter Estimates Variable DF Parameter Estimate Standard Error t Value Pr > t Intercept 1 75.50800 22.09766 3.42 0.0017 Weight 1 0.04012 0.06915 0.58 0.5656 Height 1-0.53976 0.43146-1.25 0.2195 BrainSize 1 0.06788 0.01977 3.43 0.0016 (SAS Output 1 continues)

SAS Output 1, continued

SAS Output 2 P-value for Brown-Forsythe test of constant variance in residual vs. predicted Obs t_bf BF_pvalue 1 2.57013 0.014450 Output for correlation test of normality of residual (Check text Table B.6 for threshold) The CORR Procedure 2 Variables: resid expectnorm Simple Statistics Variable N Mean Std Dev Sum Minimum Maximum Label resid 38 0 6.65878 0-13.62938 13.19254 residual expectnorm 38 0 0.97720 0-2.13600 2.13600 Pearson Correlation Coefficients, N = 38 Prob > r under H0: Rho=0 resid expectnorm resid residual 1.00000 0.99417 <.0001 expectnorm 0.99417 1.00000 <.0001

SAS Output 4

SAS Output 6 The REG Procedure Model: MODEL1 Dependent Variable: IQ Analysis of Variance Source DF Sum of Squares Mean Square F Value Pr > F Model 3 711.15276 237.05092 4.91 0.0061 Error 34 1640.55776 48.25170 Corrected Total 37 2351.71053 Parameter Estimates Variable DF Parameter Estimate Standard Error t Value Pr > t Variance Inflation Intercept 1 75.50800 22.09766 3.42 0.0017 0 Weight 1 0.04012 0.06915 0.58 0.5656 2.02139 Height 1-0.53976 0.43146-1.25 0.2195 2.27686 BrainSize 1 0.06788 0.01977 3.43 0.0016 1.57844 Collinearity Diagnostics Number Eigenvalue Condition Index Proportion of Variation Intercept Weight Height BrainSize 1 3.98301 1.00000 0.00016327 0.00071000 0.00009143 0.00024682 2 0.01307 17.45495 0.04428 0.61650 0.00288 0.01280 3 0.00291 36.97864 0.21566 0.05134 0.02574 0.93488 4 0.00100 62.97774 0.73990 0.33145 0.97129 0.05207