STAT 3900/4950 MIDTERM TWO Name: Spring, 2015 (print: first last ) Covered topics: Two-way ANOVA, ANCOVA, SLR, MLR and correlation analysis

Similar documents
Statistics 512: Solution to Homework#11. Problems 1-3 refer to the soybean sausage dataset of Problem 20.8 (ch21pr08.dat).

ST Correlation and Regression

Math 2311 Written Homework 6 (Sections )

STATISTICS 479 Exam II (100 points)

Review of Multiple Regression

Ordinary Least Squares Regression Explained: Vartanian

ECON 497 Midterm Spring

Acknowledgements. Outline. Marie Diener-West. ICTR Leadership / Team INTRODUCTION TO CLINICAL RESEARCH. Introduction to Linear Regression

STAT 350 Final (new Material) Review Problems Key Spring 2016

Chapter Goals. To understand the methods for displaying and describing relationship among variables. Formulate Theories.

Regression Analysis: Exploring relationships between variables. Stat 251

Simple Linear Regression

Multiple Regression. More Hypothesis Testing. More Hypothesis Testing The big question: What we really want to know: What we actually know: We know:

In Class Review Exercises Vartanian: SW 540

Area1 Scaled Score (NAPLEX) .535 ** **.000 N. Sig. (2-tailed)

STA 302 H1F / 1001 HF Fall 2007 Test 1 October 24, 2007

y response variable x 1, x 2,, x k -- a set of explanatory variables

MORE ON SIMPLE REGRESSION: OVERVIEW

1 A Review of Correlation and Regression

Chapter 1: Linear Regression with One Predictor Variable also known as: Simple Linear Regression Bivariate Linear Regression

STA441: Spring Multiple Regression. This slide show is a free open source document. See the last slide for copyright information.

STAT 512 MidTerm I (2/21/2013) Spring 2013 INSTRUCTIONS

Inferences for Regression

IT 403 Practice Problems (2-2) Answers

General Linear Model (Chapter 4)

Data Set 8: Laysan Finch Beak Widths

STAT 350: Summer Semester Midterm 1: Solutions

Categorical Predictor Variables

Correlation and simple linear regression S5

STA 108 Applied Linear Models: Regression Analysis Spring Solution for Homework #6

Statistical View of Least Squares

Chapter 9 - Correlation and Regression

Self-Assessment Weeks 6 and 7: Multiple Regression with a Qualitative Predictor; Multiple Comparisons

STAT 3A03 Applied Regression Analysis With SAS Fall 2017

( ), which of the coefficients would end

Practical Biostatistics

Lecture 6: Linear Regression (continued)

THE PEARSON CORRELATION COEFFICIENT

AMS 7 Correlation and Regression Lecture 8

Basic Business Statistics, 10/e

CHAPTER EIGHT Linear Regression

x3,..., Multiple Regression β q α, β 1, β 2, β 3,..., β q in the model can all be estimated by least square estimators

NC Births, ANOVA & F-tests

Lecture 3: Inference in SLR

Lecture 11 Multiple Linear Regression

Project Report for STAT571 Statistical Methods Instructor: Dr. Ramon V. Leon. Wage Data Analysis. Yuanlei Zhang

Simple Linear Regression Using Ordinary Least Squares

Multiple Regression and Model Building Lecture 20 1 May 2006 R. Ryznar

STAT 350. Assignment 4

STA441: Spring Multiple Regression. More than one explanatory variable at the same time

STATISTICS 110/201 PRACTICE FINAL EXAM

bivariate correlation bivariate regression multiple regression

Stat 500 Midterm 2 12 November 2009 page 0 of 11

Question Possible Points Score Total 100

Multiple linear regression S6

Chapter 4 Regression with Categorical Predictor Variables Page 1. Overview of regression with categorical predictors

Univariate analysis. Simple and Multiple Regression. Univariate analysis. Simple Regression How best to summarise the data?

Notes 6. Basic Stats Procedures part II

REVIEW 8/2/2017 陈芳华东师大英语系

Example: Forced Expiratory Volume (FEV) Program L13. Example: Forced Expiratory Volume (FEV) Example: Forced Expiratory Volume (FEV)

Lab 10 - Binary Variables

ANALYSES OF NCGS DATA FOR ALCOHOL STATUS CATEGORIES 1 22:46 Sunday, March 2, 2003

Lecture 6: Linear Regression

Business Statistics. Lecture 10: Correlation and Linear Regression

4:3 LEC - PLANNED COMPARISONS AND REGRESSION ANALYSES

SPSS Guide For MMI 409

Chapter 10-Regression

Interactions and Centering in Regression: MRC09 Salaries for graduate faculty in psychology

: The model hypothesizes a relationship between the variables. The simplest probabilistic model: or.

Chapter 3 Multiple Regression Complete Example

Correlation. A statistics method to measure the relationship between two variables. Three characteristics

1 Correlation and Inference from Regression

Ch. 1: Data and Distributions

Simple, Marginal, and Interaction Effects in General Linear Models

Lab # 11: Correlation and Model Fitting

T-test: means of Spock's judge versus all other judges 1 12:10 Wednesday, January 5, judge1 N Mean Std Dev Std Err Minimum Maximum

Analysis of Covariance

TOPIC 9 SIMPLE REGRESSION & CORRELATION

Correlation & Simple Regression

AP Statistics L I N E A R R E G R E S S I O N C H A P 7

Answer Keys to Homework#10

Regression Models - Introduction

MATH 2560 C F03 Elementary Statistics I Lecture 1: Displaying Distributions with Graphs. Outline.

Assignment 9 Answer Keys

9. Linear Regression and Correlation

Stat 101 Exam 1 Important Formulas and Concepts 1

using the beginning of all regression models

Immigration attitudes (opposes immigration or supports it) it may seriously misestimate the magnitude of the effects of IVs

Regression ( Kemampuan Individu, Lingkungan kerja dan Motivasi)

Linear Regression Measurement & Evaluation of HCC Systems

Multivariate Correlational Analysis: An Introduction

Parametric Test. Multiple Linear Regression Spatial Application I: State Homicide Rates Equations taken from Zar, 1984.

SPSS Output. ANOVA a b Residual Coefficients a Standardized Coefficients

LI EAR REGRESSIO A D CORRELATIO

Self-Assessment Weeks 8: Multiple Regression with Qualitative Predictors; Multiple Comparisons

Multicollinearity Exercise

Lecture 11: Simple Linear Regression

A discussion on multiple regression models

Sociology 593 Exam 2 Answer Key March 28, 2002

Regression: Main Ideas Setting: Quantitative outcome with a quantitative explanatory variable. Example, cont.

Transcription:

STAT 3900/4950 MIDTERM TWO Name: Spring, 205 (print: first last ) Covered topics: Two-way ANOVA, ANCOVA, SLR, MLR and correlation analysis Instructions: You may use your books, notes, and SPSS/SAS. NO internet access is allowed. Write your answers in the space provided. Question 4950 student must submit your SAS code of this question via Blackboard A survey was taken to see if a person s purchases based on infomercials on television differed by the level of several different factors. One study considered the two factors. One study considered the two factors household income and marital status. Household income was categorized into 4 categories: () under $30K, (2) $30K- $50K, (3) $50K- $00K, and (4) over $00K. Marital status was categorized into 3 levels: A, single (never married); B, married; and C, divorced/ separated/ widowed. For each of the 2 cells, 2 people were surveyed and reported their estimated past purchases per year that were based on infomercials on television. The goal of the study is to see a person s purchase depends on his/her martial status. Here is the purchase (in dollars) data for the different combinations: Household income Marital status 2 3 4 A 350; 270 390; 530 370; 230 430; 530 B 430; 390 450; 50 330; 370 570; 430 C 390; 450 50; 450 350; 490 590; 390 (a) Which model of the following is the most appropriate for the data? Circle your answer. A. Multiple linear regression B. One-way Anova C. One-way Ancova D. Two-way Anova E. Two-way Ancova Key: D (b) If the exact household income (rather than the category) is recorded, such as $0K, which model of the following is the most appropriate? Circle your answer. Key: C A. Multiple linear regression B. One-way Anova C. One-way Ancova D. Two-way Anova E. Two-way Ancova

(c) [3 points] For the given dataset, do the backward model selection. What is your final model? Summarize your model selection procedure. Step : drop the insignificant interaction (p-value is.9 >>.05) Step 2: drop the insignificant main effect of matrial_status (p-value is.8 >.05) Step 3: since all terms left are significant, we cannot drop further. This is our final model Our final model is one-way ANOVA with the only factor (household) income. 2

(d) [2 points] Check the NORMALITY assumption of the final model by graphs and tests. Sketch your plot and report the p-values of your tests. The plot shows all points are more or less on a line and the tests of normality both show insignificant (pvalues.20 and.946, bigger than.05), the normality assumption is valid. 3

(e) [2 points] Draw the interaction plot between household income and marital status. Does it show significant interaction effect to you? Justify your answer. This interaction plot shows several somewhat parallel lines, so it does not indicate significant interaction effect. (f) Can we conduct pairwise comparison for household income only in this case? If we can, explain why and conduct (and report the results) a proper procedure. If we cannot, explain why not. Yes, we can because the interaction term is not significant. Here we do Tukey procedure. Only the pair of Income groups 3 and 4 are significantly different; all other pairs are not significantly different. 4

(g) Which household income group spends the LEAST due to infomercials on television? Justify your answer. household income group 3 because it has the smallest sample mean purchase (as listed below). (h) Is a person s purchase increasing as his/her household income increases? Justify your answer. NO, because the mean purchase of the 4 income groups are not increasing. Household Income 2 3 4 Mean purchase 380 473.3 356.7 490 5

Question 2 Soil and sediment adsorption, the extent to which chemicals collect in a condensed form on the surface, is an important characteristic because it influences the effectiveness of pesticides and various agricultural chemicals. We are interested in how the adsorption (Y) changes as the amount of extractable iron (X ) and the amount of extractable aluminum (X 2 ) change. A Multiple Linear Regression (MLR) model Y = b 0 + b *X + b 2 *X 2 + error is fitted. X Correlations X2 Y X X2 Y X X2 Y Pearson Correlation Sig. (2-tailed) N Pearson Correlation Sig. (2-tailed) N Pearson Correlation Sig. (2-tailed) N X X2 Y.794**.908**.00.000 3 3 3.794**.935**.00.000 3 3 3.908**.935**.000.000 3 3 3 **. Correlation is significant at the 0.0 level (2-tailed). (a) [2 points] Based on the above scatter plots and correlation coefficients: a. Are the two variables X, X 2 associated with Y significantly? Justify your answer. b. Do you think MLR model would fit the data very well? Justify your answer. a. Yes. Because the p-values of their correlation with Y are both.000. b. Yes, because the scatterplots all show strong linear trend and the correlation coefficient between X and Y, and that between X 2 and Y, are both very large (bigger than.90). (b) Using the attached output: a. Write down the fitted regression equation b. Interpret the slopes of the regression equation in practical terms. c. How well does this model fit the data? Use a statistic to justify your answer. a. Y= -7.35+.273X +.349X 2. b. As the amount of extractable iron increases by one unit, the adsorption increases by.273 (on average); as the amount of extractable aluminum increases by one unit, the adsorption increases by.349 (on average). c. Adj R-Sq of 0.9382, very close to 00%, indicates an excellent goodness of fit. 6

(c) [2 points] One assumption is doubtable based on the residual plot below. What is this assumption? Dependent Variable: Y Regression Standardized Residual 0 - -2-2 - 0 Regression Standardized Predicted Value 2 It shows a horn form; the variance seems increasing as the predicted value increases; the assumption of equal variances seems violated. (d) [2 points] To solve the problem, we use the natural log of Y, LY, as the new response to fit the MLR model. The following plot is the residual plot of this new model. Do you think the problem identified in the previous part fixed? Is there another problem in this plot? If yes, what is it? Dependent Variable: LY Regression Standardized Residual 0 - -2-3 -2-0 Regression Standardized Predicted Value 2 The horn form is gone. However, there is a mild outlier with standardized residual between -2 and -3. 7

APPENDIX: OUTPUT FOR QUESTION TWO SPSS Output Model Model Summary b Adjusted Std. Error of R R Square R Square the Estimate.974 a.948.938 4.37937 a. Predictors: (Constant), X2, X b. Dependent Variable: Y Model (Constant) X X2 a. Dependent Variable: Y Unstandardized Coefficients Coefficients a Standardized Coefficients B Std. Error Beta t Sig. -7.35 3.485-2.09.06.3.030.449 3.797.004.349.07.578 4.894.00 SAS OUTPUT Root MSE 4.37937 R-Square 0.9485 Dependent Mean 29.8465 Adj R-Sq 0.9382 Coeff Var 4.6736 Parameter Estimates Parameter Standard Variable DF Estimate Error t Value Pr > t Intercept -7.35066 3.48467-2. 0.06 X 0.273 0.02969 3.80 0.0035 X2 0.34900 0.073 4.89 0.0006 8