Unit 2 Regression and Correlation WEEK 3 - Practice Problems. Due: Wednesday February 10, 2016

Similar documents
Unit 2 Regression and Correlation Practice Problems. SOLUTIONS Version STATA

BIOSTATS 640 Spring 2018 Unit 2. Regression and Correlation (Part 1 of 2) STATA Users

BIOSTATS 640 Spring 2018 Unit 2. Regression and Correlation (Part 1 of 2) R Users

Inference for the Regression Coefficient

Model Building Chap 5 p251

1 Introduction to Minitab

Unit 1 Review of BIOSTATS 540 Practice Problems SOLUTIONS - Stata Users

PubHlth 540 Fall Summarizing Data Page 1 of 18. Unit 1 - Summarizing Data Practice Problems. Solutions

Module 8: Linear Regression. The Applied Research Center

This module focuses on the logic of ANOVA with special attention given to variance components and the relationship between ANOVA and regression.

Defense Acquisition University

Gridded Ambient Air Pollutant Concentrations for Southern California, User Notes authored by Beau MacDonald, 11/28/2017

y f(a) = f (a) (x a) y = f (a)(x a) + f(a) We now give the tangent line a new name: the linearization of f at a and we call it L(x):

Confidence Interval for the mean response

MMWS Software Program Manual

STAT 510 Final Exam Spring 2015

If I see your phone, I wil take it!!! No food or drinks (except for water) are al owed in my room.

Simulating Future Climate Change Using A Global Climate Model

Sp 14 Math 170, section 003, Spring 2014 Instructor: Shari Dorsey Current Score : / 26 Due : Wednesday, February :00 AM MST

Experiment 2. F r e e F a l l

Investigating Models with Two or Three Categories

Chapter 8. Linear Regression. The Linear Model. Fat Versus Protein: An Example. The Linear Model (cont.) Residuals

Project IV Fourier Series

Chapter 19: Logistic regression

1 A Review of Correlation and Regression

Sociology 593 Exam 1 February 17, 1995

STATISTICS 110/201 PRACTICE FINAL EXAM

Lecture 7 Time-dependent Covariates in Cox Regression

Hot Spot / Kernel Density Analysis: Calculating the Change in Uganda Conflict Zones

Is economic freedom related to economic growth?

Univariate analysis. Simple and Multiple Regression. Univariate analysis. Simple Regression How best to summarise the data?

Linear Regression. Chapter 3

Categorical Predictor Variables

TYPE I, TYPE II, AND TYPE III SUMS OF SQUARES DAVID C. HOWELL 1/12/2010

Sociology 593 Exam 1 Answer Key February 17, 1995

Ordinary Least Squares Regression

Last updated: Oct 18, 2012 LINEAR REGRESSION PSYC 3031 INTERMEDIATE STATISTICS LABORATORY. J. Elder

Gravity Modelling Forward Modelling Of Synthetic Data

Chapter 8. Linear Regression. Copyright 2010 Pearson Education, Inc.

LAB 5 INSTRUCTIONS LINEAR REGRESSION AND CORRELATION

15.1 The Regression Model: Analysis of Residuals

Class Notes: Week 8. Probit versus Logit Link Functions and Count Data

Advanced Quantitative Data Analysis

module, with the exception that the vials are larger and you only use one initial population size.

2: SIMPLE HARMONIC MOTION

Residual Analysis for two-way ANOVA The twoway model with K replicates, including interaction,

Inference with Simple Regression

1 Warm-Up: 2 Adjusted R 2. Introductory Applied Econometrics EEP/IAS 118 Spring Sylvan Herskowitz Section #

Name: Biostatistics 1 st year Comprehensive Examination: Applied in-class exam. June 8 th, 2016: 9am to 1pm

MULTIPLE REGRESSION METHODS

Chapter 8: Regression Models with Qualitative Predictors

Purpose: Materials: WARNING! Section: Partner 2: Partner 1:

SteelSmart System Cold Formed Steel Design Software Download & Installation Instructions

Assignment 1: Molecular Mechanics (PART 2 25 points)

Using the Stock Hydrology Tools in ArcGIS

y response variable x 1, x 2,, x k -- a set of explanatory variables

HEC-HMS Lab 2 Using Thiessen Polygon and Inverse Distance Weighting

LDA Midterm Due: 02/21/2005

inaturalist Training AOP inaturalist Training May 21, 2016 Slide 1

Unbalanced Data in Factorials Types I, II, III SS Part 1

(Where does Ch. 7 on comparing 2 means or 2 proportions fit into this?)

2: SIMPLE HARMONIC MOTION

appstats8.notebook October 11, 2016

MORE ON MULTIPLE REGRESSION

We will now find the one line that best fits the data on a scatter plot.

Gridded Traffic Density Estimates for Southern

1 Correlation and Inference from Regression

ECON Introductory Econometrics. Lecture 4: Linear Regression with One Regressor

Comparing Several Means: ANOVA

Analysis of Covariance. The following example illustrates a case where the covariate is affected by the treatments.

How to use the guidance tool (Producing Guidance and Verification)

One-Way ANOVA. Some examples of when ANOVA would be appropriate include:

Didacticiel - Études de cas

SMAM 314 Computer Assignment 5 due Nov 8,2012 Data Set 1. For each of the following data sets use Minitab to 1. Make a scatterplot.

4:3 LEC - PLANNED COMPARISONS AND REGRESSION ANALYSES

Lab 4: Gauss Gun Conservation of Energy

Linear Modelling in Stata Session 6: Further Topics in Linear Modelling

WISE Regression/Correlation Interactive Lab. Introduction to the WISE Correlation/Regression Applet

Sociology Research Statistics I Final Exam Answer Key December 15, 1993

Sociology 593 Exam 2 March 28, 2002

Econometric Analysis of Panel Data. Final Examination: Spring 2013

Daniel Boduszek University of Huddersfield

Oregon Hill Wireless Survey Regression Model and Statistical Evaluation. Sky Huvard

Review: what is a linear model. Y = β 0 + β 1 X 1 + β 2 X 2 + A model of the following form:

An Introduction to SEM in Mplus

let s examine pupation rates. With the conclusion of that data collection, we will go on to explore the rate at which new adults appear, a process

Introduction. Purpose

Multivariate Correlational Analysis: An Introduction

Rejection regions for the bivariate case

Review of Multiple Regression

Worksheet Week 1 Review of Chapter 5, from Definition of integral to Substitution method

Unit 6 Study Guide: Equations. Section 6-1: One-Step Equations with Adding & Subtracting

Q1: What is the interpretation of the number 4.1? A: There were 4.1 million visits to ER by people 85 and older, Q2: What percent of people 65-74

CHAPTER 10. Regression and Correlation

Power Analysis. Ben Kite KU CRMDA 2015 Summer Methodology Institute

1. The shoe size of five randomly selected men in the class is 7, 7.5, 6, 6.5 the shoe size of 4 randomly selected women is 6, 5.

STAT5044: Regression and ANOVA, Fall 2011 Final Exam on Dec 14. Your Name:

Activity Climatographs

Basic Statistics Exercises 66

Activity #12: More regression topics: LOWESS; polynomial, nonlinear, robust, quantile; ANOVA as regression

Transcription:

BIOSTATS 640 Spring 2016 2. Regression and Correlation Page 1 of 5 Unit 2 Regression and Correlation WEEK 3 - Practice Problems Due: Wednesday February 10, 2016 1. A psychiatrist wants to know whether the level of pathology (Y) in psychotic patients 6 months after treatment could be predicted with reasonable accuracy from knowledge of pretreatment symptom ratings of thinking disturbance (X 1 ) and hostile suspiciousness (X 2 ). (a) The least squares estimation equation involving both independent variables is given by Y = -0.628 + 23.639(X 1 ) 7.147(X 2 ) Using this equation, determine the predicted level of pathology (Y) for a patient with pretreatment scores of 2.80 on thinking disturbance and 7.0 on hostile suspiciousness. How does the predicted value obtained compare with the actual value of 25 observed for this patient? (b) Using the analysis of variance tables below, carry out the overall regression F tests for models containing both X 1 and X 2, X 1 alone, and X 2 alone. Regression on X 1 1 1546 Residual 51 12246 Regression on X 2 1 160 Residual 51 13632 Regression on X 1, X 2 2 2784 Residual 50 11008

BIOSTATS 640 Spring 2016 2. Regression and Correlation Page 2 of 5 (c) Based on your results in part (b), how would you rate the importance of the two variables in predicting Y? (d) What are the R 2 values for the three regressions referred to in part (b)? (e) What is the best model involving either one or both of the two independent variables? 2. In an experiment to describe the toxic action of a certain chemical on silkworm larvae, the relationship of log 10 (dose) and log 10 (larva weight) to log 10 (survival) was sought. The data, obtained by feeding each larva a precisely measured dose of the chemical in an aqueous solution and then recording the survival time (ie time until death) are given in the table. Also given are relevant computer results and the analysis of variance table. Larva 1 2 3 4 5 6 7 8 Y = log 10 (survival time) 2.836 2.966 2.687 2.679 2.827 2.442 2.421 2.602 X 1 =log 10 (dose) 0.150 0.214 0.487 0.509 0.570 0.593 0.640 0.781 X 2 =log 10 (weight) 0.425 0.439 0.301 0.325 0.371 0.093 0.140 0.406 Larva 9 10 11 12 13 14 15 Y = log 10 (survival time) 2.556 2.441 2.420 2.439 2.385 2.452 2.351 X 1 =log 10 (dose) 0.739 0.832 0.865 0.904 0.942 1.090 1.194 X 2 =log 10 (weight) 0.364 0.156 0.247 0.278 0.141 0.289 0.193 Y = 2.952 0.550 (X 1 ) Y = 2.187 + 1.370 (X 2 ) Y = 2.593 0.381 (X 1 ) + 0.871 (X 2 ) Regression on X 1 1 0.3633 Residual 13 0.1480 Regression on X 2 1 0.3367 Residual 13 0.1746 Regression on X 1, X 2 2 0.4642 Residual 12 0.0471

BIOSTATS 640 Spring 2016 2. Regression and Correlation Page 3 of 5 (a) Test for the significance of the overall regression involving both independent variables X 1 and X 2. (b) Test to see whether using X 1 alone significantly helps in predicting survival time. (c) Test to see whether using X 2 alone significantly helps in predicting survival time. (d) Compute R 2 for each of the three models. (e) Which independent predictor do you consider to be the best single predictor of survival time? (f) Which model involving one or both of the independent predictors do you prefer and why? 3. Using R or Stata (your choice), try your hand at reproducing the analysis of variance tables you worked with in problem #2. Stata Users How to access the data I ve created the STATA data set for you and uploaded it to our course website. It is larvae.dta. To access larvae.dta in STATA requires 2 steps: Step 1 Download larvae.dta to your computer From the course website page THIS WEEK, right click on larvae.dta Alternatively, from the course website page REGRESSSION, right click on larvae.dta Tip Make sure you know where larvae.dta is located on your computer. Step 2 In STATA, open larvae.dta Launch STATA. From the main menu bar: FILE > OPEN Browse to find larvae.dta on your computer. R Users How to access the data Issue the following commands library(foreign) stata <- http://people.umass.edu/biep640w/datasets/larvae.dta larvae <- read.dta(file=stata)

BIOSTATS 640 Spring 2016 2. Regression and Correlation Page 4 of 5 4. An educator examined the relationship between number of hours devoted to reading each week (Y) and the independent variables social class (X 1 ), number of years school completed (X 2 ), and reading speed measured by pages read per hour (X 3 ). The analysis of variance table obtained from a stepwise regression analysis on data for a sample of 19 women over the age of 60 is shown. Regression (X 3 ) 1 1058.628 (X 2 X 3 ) 1 183.743 (X 1 X 2,X 3 ) 1 37.982 Residual 15 363.300 Total, corrected 18 1643.653 IMPORTANT!! - Please read the following before you continue. The table above makes use of a new notation. Regression (X 3 ) Sum of Squares = 1058.628: This is the regression sum of squares for the model containing the one predictor X 3. Regression (X 2 X 3 ) Sum of Squares = 183.743: The vertical line in this notation stands for conditional on This is the extra regression sum of squares obtained by adding the predictor X 2 to a model that already has the predictor X 3. You can also think of this as the change in the model sum of squares accompanying the addition of the predictor X 2, controlling for (conditional on X 3 ). Regression (X 1 X 2, X 3 ) Sum of Squares = 37.982: The vertical line in this notation stands for conditional on This is the extra regression sum of squares obtained by adding the predictor X 1 to a model that already has the predictors X 2 and X 3. You can also think of this as the change in the model sum of squares accompanying the addition of the predictor X 1, controlling for (conditional on X 2 and X 3 ). Tip! Since the total sum of squares is a fixed total, you can use the information provided in this table to obtain the analysis of variance tables for the following models. Model 1: Predictor = X 3 Model 2: Predictors are = X 3, X 2 Model 3: Predictors are = X 3,X 2, X 1 (a) Test the significance of each variable as it enters the model. (b) Test H O : β 1 = β 2 = 0 in the model Y = β 0 + β 1 X 1 + β 2 X 2 + β 3 X 3 + E. (c) Why can t we test H O : β 1 = β 3 = 0 using the ANOVA table given? What formula would you use for this test? (d) What is your overall evaluation concerning the appropriate model to use given the results in parts (a) and (b)?

BIOSTATS 640 Spring 2016 2. Regression and Correlation Page 5 of 5 5. Consider the following analysis of variance table. Regression (X 1 ) 1 18,953.04 (X 3 X 1 ) 1 7,010.03 (X 2 X 1,X 3 ) 1 10.93 Residual 16 2,248.23 Total, corrected 19 28,222.23 Using a type I error of 0.05, (a) Provide a test to compare the following two models. In 1-2 sentences, interpret. Y = β 0 + β 1 X 1 + β 2 X 2 + β 3 X 3 + E. VERSUS Y = β 0 + β 1 X 1 + E. (b) Provide a test to compare the following two models. In 1-2 sentences, interpret. Y = β 0 + β 1 X 1 + β 3 X 3 + E. VERSUS Y = β 0 + E. (c) State which two models are being compared in computing: (18,953.04 + 7,010.03 + 10.93)/3 F = (2248.23)/16