Data Set 8: Laysan Finch Beak Widths

Similar documents
Multiple Regression Examples

Statistics 512: Solution to Homework#11. Problems 1-3 refer to the soybean sausage dataset of Problem 20.8 (ch21pr08.dat).

Analysis of Covariance. The following example illustrates a case where the covariate is affected by the treatments.

Model Building Chap 5 p251

Data Set 1A: Algal Photosynthesis vs. Salinity and Temperature

Chapter 12: Multiple Regression

Regression Analysis. Regression: Methodology for studying the relationship among two or more variables

ANOVA: Analysis of Variation

Confidence Interval for the mean response

1 Introduction to Minitab

ANOVA Situation The F Statistic Multiple Comparisons. 1-Way ANOVA MATH 143. Department of Mathematics and Statistics Calvin College

9 Correlation and Regression

General Linear Model (Chapter 4)

Introduction to Linear regression analysis. Part 2. Model comparisons

Correlation and Simple Linear Regression

STA 108 Applied Linear Models: Regression Analysis Spring Solution for Homework #6

Inference for the Regression Coefficient

1 A Review of Correlation and Regression

VIII. ANCOVA. A. Introduction

1-Way ANOVA MATH 143. Spring Department of Mathematics and Statistics Calvin College

23. Inference for regression

Correlation & Simple Regression

Inference for Regression Inference about the Regression Model and Using the Regression Line

STA 302 H1F / 1001 HF Fall 2007 Test 1 October 24, 2007

STAT 3900/4950 MIDTERM TWO Name: Spring, 2015 (print: first last ) Covered topics: Two-way ANOVA, ANCOVA, SLR, MLR and correlation analysis

Inferences for Regression

Analysis of Covariance

TABLES AND FORMULAS FOR MOORE Basic Practice of Statistics

Multiple Regression an Introduction. Stat 511 Chap 9

Multiple Regression. Inference for Multiple Regression and A Case Study. IPS Chapters 11.1 and W.H. Freeman and Company

Soil Phosphorus Discussion

Answer Keys to Homework#10

Statistical Modelling in Stata 5: Linear Models

INFERENCE FOR REGRESSION

Assignment 9 Answer Keys

Formula for the t-test

Stat 501, F. Chiaromonte. Lecture #8

STATISTICS 110/201 PRACTICE FINAL EXAM

28. SIMPLE LINEAR REGRESSION III

Regression and Models with Multiple Factors. Ch. 17, 18

W&M CSCI 688: Design of Experiments Homework 2. Megan Rose Bryant

Variance. Standard deviation VAR = = value. Unbiased SD = SD = 10/23/2011. Functional Connectivity Correlation and Regression.

Basic Business Statistics, 10/e

Inference for Regression Simple Linear Regression

Confidence Intervals, Testing and ANOVA Summary

Mathematical Notation Math Introduction to Applied Statistics

Analysis of Bivariate Data

Acknowledgements. Outline. Marie Diener-West. ICTR Leadership / Team INTRODUCTION TO CLINICAL RESEARCH. Introduction to Linear Regression

STAT 350. Assignment 4

Pre-Calculus Multiple Choice Questions - Chapter S8

Homework 2: Simple Linear Regression

y = a + bx 12.1: Inference for Linear Regression Review: General Form of Linear Regression Equation Review: Interpreting Computer Regression Output

Chapter 8. Linear Regression. Copyright 2010 Pearson Education, Inc.

Examination paper for TMA4255 Applied statistics

Analysis of Variance

Intro to Linear Regression

Categorical Predictor Variables

Lecture 18: Simple Linear Regression

[4+3+3] Q 1. (a) Describe the normal regression model through origin. Show that the least square estimator of the regression parameter is given by

Intro to Linear Regression

Estimating σ 2. We can do simple prediction of Y and estimation of the mean of Y at any value of X.

Ch Inference for Linear Regression

Basic Business Statistics 6 th Edition

Table of z values and probabilities for the standard normal distribution. z is the first column plus the top row. Each cell shows P(X z).

GROUPED DATA E.G. FOR SAMPLE OF RAW DATA (E.G. 4, 12, 7, 5, MEAN G x / n STANDARD DEVIATION MEDIAN AND QUARTILES STANDARD DEVIATION

1. An article on peanut butter in Consumer reports reported the following scores for various brands

This document contains 3 sets of practice problems.

Chapter 14. Multiple Regression Models. Multiple Regression Models. Multiple Regression Models

The entire data set consists of n = 32 widgets, 8 of which were made from each of q = 4 different materials.

Chapter 14 Student Lecture Notes 14-1

Institutionen för matematik och matematisk statistik Umeå universitet November 7, Inlämningsuppgift 3. Mariam Shirdel

SMAM 314 Practice Final Examination Winter 2003

K. Model Diagnostics. residuals ˆɛ ij = Y ij ˆµ i N = Y ij Ȳ i semi-studentized residuals ω ij = ˆɛ ij. studentized deleted residuals ɛ ij =

unadjusted model for baseline cholesterol 22:31 Monday, April 19,

Ch 13 & 14 - Regression Analysis

Nature vs. nurture? Lecture 18 - Regression: Inference, Outliers, and Intervals. Regression Output. Conditions for inference.

PART I. (a) Describe all the assumptions for a normal error regression model with one predictor variable,

using the beginning of all regression models

Inference for Regression Inference about the Regression Model and Using the Regression Line, with Details. Section 10.1, 2, 3

MULTIPLE REGRESSION METHODS

Residuals from regression on original data 1

Is economic freedom related to economic growth?

Regression Analysis: Basic Concepts

MATH 1150 Chapter 2 Notation and Terminology

Six Sigma Black Belt Study Guides

Draft Proof - Do not copy, post, or distribute. Chapter Learning Objectives REGRESSION AND CORRELATION THE SCATTER DIAGRAM

PLS205!! Lab 9!! March 6, Topic 13: Covariance Analysis

SMAM 314 Exam 42 Name

Example. Multiple Regression. Review of ANOVA & Simple Regression /749 Experimental Design for Behavioral and Social Sciences

Histogram of Residuals. Residual Normal Probability Plot. Reg. Analysis Check Model Utility. (con t) Check Model Utility. Inference.

Simple Linear Regression. Steps for Regression. Example. Make a Scatter plot. Check Residual Plot (Residuals vs. X)

2.4.3 Estimatingσ Coefficient of Determination 2.4. ASSESSING THE MODEL 23

INTRODUCTION TO DESIGN AND ANALYSIS OF EXPERIMENTS

Example: Four levels of herbicide strength in an experiment on dry weight of treated plants.

Tables Table A Table B Table C Table D Table E 675

Steps for Regression. Simple Linear Regression. Data. Example. Residuals vs. X. Scatterplot. Make a Scatter plot Does it make sense to plot a line?

10 Model Checking and Regression Diagnostics

SMAM 319 Exam 1 Name. 1.Pick the best choice for the multiple choice questions below (10 points 2 each)

Sociology 6Z03 Review II

Lecture Slides. Elementary Statistics. by Mario F. Triola. and the Triola Statistics Series

Transcription:

Data Set 8: Finch Beak Widths Statistical Setting This handout describes an analysis of covariance (ANCOVA) involving one categorical independent variable (with only two levels) and one quantitative covariate. Background and Data The data are from Sheila Conant and Marie Morin s study of Finch beak morphology, described in handouts in ZOOL 631. In this handout of adult female birds will be compared between two populations ( and Islands), controlling for overall size. The data are for adult female Finches captured and measured in 1987. There were 62 on Island and 10 on Island. The two variables used in the following analysis are (the width of the upper mandible, in cm) and, as a measure of body size, the (in cm). Preliminary Data Exploration Scatter Plot The most useful preliminary description of the data is a scatter plot of the response variable () plotted against the covariate (), with the different levels of the categorical variable () shown with different symbols. This plot shows that there is a slight relationship between and, and that Island birds tend both to have wider beaks and to have longer sterna. 0.84 1.65 1.70

Descriptive Statistics These statistics support the impressions drawn from the scatter plot: both beak widths and s tend to be greater in Island birds, and there is at least some relationship between and. The standard deviations of both variables are quite similar between s. Box Plots : N MEAN MEDIAN STDEV MIN MAX Q1 62 031 150 0.02264 200 0.81600 550 0.772 10 0.79950 0.79350 0.02401 600 0.83600 0.77950 3 both 72 575 600 0.02645 200 0.83600 0.75225 0.777 : N MEAN MEDIAN STDEV MIN MAX Q1 62 1.6699 1.6715 0.0614 00 1.8120 1.6415 1.70 10 1.7045 1.7080 0.1041 1.5760 1.8410 1.5870 both 72 1.6747 1.6735 0.0690 00 1.8410 1.6400 1.70 correlations between and : 0.228 0.569 0.84 1.70 1.65 These boxplots again show the greater s and (to a lesser extent) sternum lengths of the Island birds. They also show that all four distributions are fairly symmetrical, with no major outliers. (A number of observations in the population exceed Minitab s 1.5xIQR rule for identifying possible outliers, but none really seem exceptional, given the fairly large size of this sample.) Data Set 8: Finch Beak Widths 2

Analysis of Covariance For the following analyses the covariate,, was centered by subtracting off the overall mean value (1.6747). Test of Parallelism Prior to conducting the ANCOVA it is necessary to determine whether it is reasonable to fit parallel regressions to the two samples. This is done by testing for an x sternum interaction, in a general linear model: Source DF Seq SS Adj SS Adj MS F P 1 0.0132278 0.0097071 0.0097071 19.91 0.000 1 0.0031429 0.0031669 0.0031669 6.49 0.013 *sternum 1 0.0001529 0.0001529 0.0001529 0.31 0.577 Error 68 0.0331598 0.0331598 0.0004876 Total 71 0.0496835 This model fits separate regressions to the two groups, as in the following scatterplot. The slopes of these regressions are not very different, and the analysis above indicates that this difference is not at all significant statistically. 0.84 1.65 1.70 There is no evidence from this analysis that the parallelism assumption is not reasonable. We therefore can proceed with the ANCOVA. Data Set 8: Finch Beak Widths 3

ANCOVA The ANCOVA can be conducted as a general linear model as above, but without the interaction term: Source DF Seq SS Adj SS Adj MS F P 1 0.0132278 0.0107006 0.0107006 22.16 0.000 1 0.0031429 0.0031429 0.0031429 6.51 0.013 Error 69 0.0333127 0.0333127 0.0004828 Total 71 0.0496835 These results show that there is a highly significant difference between beak widths on the two s, after adjusting for. The relationship of beak width to also is highly significant, though this is not really interesting to us (what matters is the effect on the ANOVA conclusions of including the covariate). Analysis of Effect Estimation of Adjusted Means To determine the magnitude of the (adjusted) difference between the s we need the parameter estimates. (Notice that the t-value for the - coefficient is the square root of the F for the effect: these are equivalent tests.) Term Coef SE Coef T P Constant 0.778679 0.003775 206.30 0.000-0.017901 0.003802-4.71 0.000 sternum 0.09793 0.03838 2.55 0.013 Minitab s GLM uses an indicator-variable coding for the variable in which is +1 and Island is 1. Using the preceding coefficients, the fitted regression relationships are : Ŷ i = ( 0.61468 0.017901) + 0.09793x i = 0.596779 + 0.09793x i : Ŷ i = ( 0.61468 + 0.017901) + 0.09793x i = 0.632581 + 0.09793x i With this coding, as these equations show, the coefficient for the variable is half the difference between the s. Thus mean s, for a given sternum length, are 2 x 0.017901 = 0.035802 cm larger on Island. The adjusted means for the s (means at x i = x ) are the LS means since the covariate was centered for these analyses. They are: Least Squares Means for Mean SE Mean 08 0.002797 0.7966 0.007042 Data Set 8: Finch Beak Widths 4

To obtain a confidence interval for the difference in adjusted means, note that this difference is twice the coefficient for, given above. Using the standard error of this coefficient (0.003802), a 99% CI for the difference would be Aptness of Covariance Model It was noted in the preliminary exploration that the variances are similar in the two populations and there are no severe outliers. The very non-significant test for a difference in regression slopes justifies the parallelism assumption. The remaining assumptions to be considered are normality and linearity. Normality: 2τˆ1 ± t 0.995, 69 2se = 2( 0.017901) ± ( 2.649 2 0.003802) = 0.035802 ± 0.020143 = ( 0.055945, 0.015659) 2.5 2.0 1.5 Normal Score 1.0 0.5 0.0-0.5-1.0-1.5-2.0-2.5-0.05 0.00 0.05 Residual This plot is very slightly sigmoid rather than perfectly straight: the tails of the distribution are slightly longer than those of a normal distribution. The correlation between residuals and normal scores, however, is 0.995, which is more than good enough given the sample size. Linearity Linearity is best assessed by the usual plot of residuals vs. fitted values, with different levels of the categorical variable indicated by different symbols. This plot (next page) shows the few large positive and negative residuals, but there is no clear indication of nonlinearity in either population. Data Set 8: Finch Beak Widths 5

0.075 0.050 residuals 0.025 0.000-0.025-0.050 0.75 0.77 fits 0.79 0.81 Conclusion Island birds have wider beaks than do Island birds. Since Island birds also are larger (as measured by ), and larger birds tend to have wider beaks, some of the difference in s between the s could be attributed to the difference in overall size. The analysis of covariance, however, shows that there is a statistically significant difference in s even after adjusting for the difference in s (P < 0.001); the 99% confidence interval for the adjusted difference in mean s is (0.0156, 0.0559) (in cm, ). The analysis also indicates that the relationship between and beak width is roughly linear, is statistically significant (P = 0.013), and does not differ significantly between the two populations (P = 0.577). These results are shown in the plot on the next page, with the fitted regression relationships superimposed on the data, and the adjusted mean scores shown by the (green) diamond and (blue) triangle at x = 1.675. Data Set 8: Finch Beak Widths 6

0.84 1.65 1.70 Comparison with ANOVA To understand the effect (if any) of including the covariate in this analysis, it is interesting to examine the results of a simple ANOVA (or equivalently, a two-sample t-test with pooled standard error). Source DF Seq SS Adj SS Adj MS F P 1 0.013228 0.013228 0.013228 25.40 0.000 Error 70 0.036456 0.036456 0.000521 Total 71 0.049684 Term Coeff Stdev t-value P Constant 0.779903 0.003888 200.57 0.000-0.019597 0.003888-5.04 0.000 The unadjusted difference in mean s is somewhat (about 9%) larger than the adjusted difference: since Island birds had larger s as well as s, and the two variables were positively related, the adjustment to a standard reduced the difference between s. On the other hand, removing the proportion of within- variability explainable by increased the precision of the analysis: the ANCOVA MSE was smaller than that for the ANOVA (0.0004828 vs. 0.000521, a 7% decrease). As a result, the standard error of the adjusted difference is slightly (about 2%) smaller than that of the unadjusted mean. The net result of these somewhat offsetting differences is that the effect is more significant in the ANOVA (the F is larger), though it is still highly significant in the ANCOVA. Data Set 8: Finch Beak Widths 7