STAT 501 Assignment 2 NAME Spring Chapter 5, and Sections in Johnson & Wichern.

Similar documents
You can compute the maximum likelihood estimate for the correlation

Applied Multivariate and Longitudinal Data Analysis

STT 843 Key to Homework 1 Spring 2018

3. (a) (8 points) There is more than one way to correctly express the null hypothesis in matrix form. One way to state the null hypothesis is

Multivariate Statistical Analysis

Data Analyses in Multivariate Regression Chii-Dean Joey Lin, SDSU, San Diego, CA

Applied Multivariate and Longitudinal Data Analysis

Multivariate Statistical Analysis

STAT 501 Assignment 1 Name Spring Written Assignment: Due Monday, January 22, in class. Please write your answers on this assignment

STAT 501 EXAM I NAME Spring 1999

Inferences about a Mean Vector

Multivariate Capability Analysis Using Statgraphics. Presented by Dr. Neil W. Polhemus

Rejection regions for the bivariate case

Prepared by: Prof. Dr Bahaman Abu Samah Department of Professional Development and Continuing Education Faculty of Educational Studies Universiti

Methods for Identifying Out-of-Trend Data in Analysis of Stability Measurements Part II: By-Time-Point and Multivariate Control Chart

Chapter 7, continued: MANOVA

An Introduction to Multivariate Statistical Analysis

Example 1 describes the results from analyzing these data for three groups and two variables contained in test file manova1.tf3.

M A N O V A. Multivariate ANOVA. Data

MANOVA is an extension of the univariate ANOVA as it involves more than one Dependent Variable (DV). The following are assumptions for using MANOVA:

STAT 501 Assignment 1 Name Spring 2005

COMPARISON OF FIVE TESTS FOR THE COMMON MEAN OF SEVERAL MULTIVARIATE NORMAL POPULATIONS

On MANOVA using STATA, SAS & R

4 Statistics of Normally Distributed Data

Lecture 5: Hypothesis tests for more than one sample

Homework 2: Simple Linear Regression

POWER AND TYPE I ERROR RATE COMPARISON OF MULTIVARIATE ANALYSIS OF VARIANCE

MANOVA MANOVA,$/,,# ANOVA ##$%'*!# 1. $!;' *$,$!;' (''

1. Density and properties Brief outline 2. Sampling from multivariate normal and MLE 3. Sampling distribution and large sample behavior of X and S 4.

1 A Review of Correlation and Regression

Applied Multivariate Analysis

STA 2201/442 Assignment 2

with the usual assumptions about the error term. The two values of X 1 X 2 0 1

I L L I N O I S UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN

9 Correlation and Regression

General Linear Model

General Linear Model. Notes Output Created Comments Input. 19-Dec :09:44

11: Comparing Group Variances. Review of Variance

Repeated-Measures ANOVA in SPSS Correct data formatting for a repeated-measures ANOVA in SPSS involves having a single line of data for each

T.I.H.E. IT 233 Statistics and Probability: Sem. 1: 2013 ESTIMATION AND HYPOTHESIS TESTING OF TWO POPULATIONS

HANDBOOK OF APPLICABLE MATHEMATICS

The entire data set consists of n = 32 widgets, 8 of which were made from each of q = 4 different materials.

Tables Table A Table B Table C Table D Table E 675

Multivariate Analysis of Variance

Group comparison test for independent samples

5 Inferences about a Mean Vector

Applied Multivariate Statistical Analysis Richard Johnson Dean Wichern Sixth Edition

7.2 One-Sample Correlation ( = a) Introduction. Correlation analysis measures the strength and direction of association between

Stevens 2. Aufl. S Multivariate Tests c

CHAPTER 2. Types of Effect size indices: An Overview of the Literature

I L L I N O I S UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN

Monitoring Random Start Forward Searches for Multivariate Data

First steps of multivariate data analysis

Stat 216 Final Solutions

Experimental Design and Data Analysis for Biologists

Using SPSS for One Way Analysis of Variance

Section 9.4. Notation. Requirements. Definition. Inferences About Two Means (Matched Pairs) Examples

Applied Multivariate Statistical Modeling Prof. J. Maiti Department of Industrial Engineering and Management Indian Institute of Technology, Kharagpur

Statistical Inference with Monotone Incomplete Multivariate Normal Data

A Likelihood Ratio Test

Introduction to bivariate analysis

Chapter 9. Correlation and Regression

Analysis of variance, multivariate (MANOVA)

Multivariate Tests. Mauchly's Test of Sphericity

Introduction to bivariate analysis

In most cases, a plot of d (j) = {d (j) 2 } against {χ p 2 (1-q j )} is preferable since there is less piling up

Chapter 12 - Lecture 2 Inferences about regression coefficient

Exam Empirical Methods VU University Amsterdam, Faculty of Exact Sciences h, February 12, 2015

Statistical Thinking in Biomedical Research Session #3 Statistical Modeling

1 Introduction to Minitab

6 Multivariate Regression

Generalized linear models

Bootstrap Procedures for Testing Homogeneity Hypotheses

DESIGNING EXPERIMENTS AND ANALYZING DATA A Model Comparison Perspective

Next is material on matrix rank. Please see the handout

We know from STAT.1030 that the relevant test statistic for equality of proportions is:

The In-and-Out-of-Sample (IOS) Likelihood Ratio Test for Model Misspecification p.1/27

Chapter 2 Multivariate Normal Distribution

THE UNIVERSITY OF CHICAGO Graduate School of Business Business 41912, Spring Quarter 2008, Mr. Ruey S. Tsay. Solutions to Final Exam

Chapter 9. Hotelling s T 2 Test. 9.1 One Sample. The one sample Hotelling s T 2 test is used to test H 0 : µ = µ 0 versus

Hypothesis Testing for Var-Cov Components

Three Factor Completely Randomized Design with One Continuous Factor: Using SPSS GLM UNIVARIATE R. C. Gardner Department of Psychology

STAT 135 Lab 6 Duality of Hypothesis Testing and Confidence Intervals, GLRT, Pearson χ 2 Tests and Q-Q plots. March 8, 2015

Wald s theorem and the Asimov data set

Math 475. Jimin Ding. August 29, Department of Mathematics Washington University in St. Louis jmding/math475/index.

AMS7: WEEK 7. CLASS 1. More on Hypothesis Testing Monday May 11th, 2015

STAT5044: Regression and Anova

Principal component analysis

Stat 427/527: Advanced Data Analysis I

Statistical Models with Uncertain Error Parameters (G. Cowan, arxiv: )

MATH5745 Multivariate Methods Lecture 07

D I S C U S S I O N P A P E R

Psychology 282 Lecture #4 Outline Inferences in SLR

Didacticiel Études de cas. Parametric hypothesis testing for comparison of two or more populations. Independent and dependent samples.

The University of Hong Kong Department of Statistics and Actuarial Science STAT2802 Statistical Models Tutorial Solutions Solutions to Problems 71-80

MULTIVARIATE ANALYSIS OF VARIANCE UNDER MULTIPLICITY José A. Díaz-García. Comunicación Técnica No I-07-13/ (PE/CIMAT)

Model Estimation Example

unadjusted model for baseline cholesterol 22:31 Monday, April 19,

THE UNIVERSITY OF CHICAGO Booth School of Business Business 41912, Spring Quarter 2012, Mr. Ruey S. Tsay

Lecture 3. G. Cowan. Lecture 3 page 1. Lectures on Statistical Data Analysis

Transcription:

STAT 01 Assignment NAME Spring 00 Reading Assignment: Written Assignment: Chapter, and Sections 6.1-6.3 in Johnson & Wichern. Due Monday, February 1, in class. You should be able to do the first four problems without a computer, but use computer packages in any way you desire to answer these questions. 1. The following data consist of measurement made on the levels of three liver enzymes (U/L): aspartate aminotransferase (X 1 ), alanine aminotransferase (X ), and glutamate dehydrogenase (X 3 ) in n10 patients diagnosed with aggressive chronic hepatitis. Patient X 1 X X 3 1 31 63 3 6 6 3 0 9 9 6 7 7 39 87 9 6 6 9 8 7 9 7 8 0 0 3 9 9 10 3 These are part of a larger set of data reported by Plomteux (1980, Clin. Chem. 6, 1897-1899). Assuming these data were sampled from a bivariate normal population, evaluate the following quantities. (a) Compute the maximum likelihood estimates for the mean vector and the covariance matrix Σ. ' µ ( 1 µ, µ ) X ^ Compute the unbiased estimate of the covariance matrix S 1

(c) The maximum likelihood estimate of the correlation between X 1 and X and the t- statistic for testing H 0 : ρ 0 are r t _ df (d) (g) Use the Fisher z-transformation to construct an approximate 9% confidence interval for ρ. The generalized sample variance is S. The total sample variance is. Use the Fisher z-transformation to construct an approximate 9% confidence interval for the correlation between X 1 and X 3. lower limit upper limit (h) Use the Fisher z-transformation to construct an approximate 9% confidence interval for ρ 13 lower limit upper limit (i) In one sentence, how would you interpret the estimate of ρ 13 for these data? (j) Test the null hypothesis H 0 : ρ 13 0 against the alternative H A : ρ 13 > 0. Report t d.f. p-value. Consider a random sample X l, X,..., X n from a p-dimensional normal population with mean vector µ and covariance matrix Σ. The purpose of the problem is to construct the likelihood ratio test of the null hypothesis that the p attributes (or components of X) have the same variance σ and are independent, that is, Σ σ I p p. (a) Write down the formula for the natural logarithm of the joint likelihood function, when the null hypothesis is true, by substituting σ I for ( µ, σ ) Give formulas for the m..e.'s for µ and σ.

(c) For testing the null hypothesis H 0 : Σ σ I against the alternative that Σ is any covariance matrix, find a formula for log(λ) in terms of p, n, X, and the elements of the sample covariance matrix S. (d) For large n, the statistic in part (c) can be compared to the quantiles of the chi square distribution with d.f. _. Use the statistic from Part (c) to test H 0 :Σ σ I for the data in Problem 1. In this case, ignore the possibility that n may be too small to accurately use the large sample chi-square approximation for the null distribution of log(λ). log(λ) d.f. p-value. Note that a test statistic that more nearly has a central chi-squared distribution when H 0 : Σ σ I is true is obtained by multiplying the statistic in part (c) by [1 (p + p + )/(6pn)]. Such multipliers are called Bartlett corrections. See Anderson, An Introduction to Multivariate Statistical Analysis, nd edition, Section 10.7. Does using this correction factor change the conclusion you reached in part? 3. Neither the generalized variance Σ nor the total variance trace (Σ) retain all of the information in a covariance matrix. Different covariance matrices can have the same value of the generalized variance or the same value of the total variance. To illustrate this, compute the correlation coefficient, generalized variance, and total variance for each of the following covariance matrices. 1 1 3 0 0 3 8 3. 3. 8 Correlation Generalized variance Total variance 3

. Independent samples of sizes n 1 and n were taken from populations of Wisconsin homeowners with and without air conditioning, respectively. Two measurements of electrical usage (in kilowatt hours) were made on each home: X 1, a measure of total onpeak consumption during July 1977, and X, a measure of total off-peak consumption during July 1977. The resulting summary statistics are: Homes with air conditioning X 1 0. 6.6 138.3 383. S 1 383. 73107. n 1 Homes without air conditioning X 130.0 3.0 863.0 19616.7 S 19616.7 96. n (a) Assuming that the population covariance matrices, 1(for homes with air conditioning) and for homes without air conditioning), are the same, obtain the pooled estimate of the common covariance matrix. Report S with d.f. Evaluate Bartlett s test of the null hypothesis H0 : 1 against the alternative H A :. Report M C 1 MC 1 d.f. < p-value < (c) Test the null hypothesis that the correlations between total on-peak and off-peak usage are the same for homes with and without air conditioning. Present the formula for your test statistic and state your conclusion.

(d) Using the pooled covariance matrix, compute the estimate of the squared Mahalanobis distance between X 1 and X : 1 1 1 1 n1 n n n T X X S X X. + Test the hypothesis that homes with air conditioning have the same vector of means for on-peak and off-peak consumption as homes without air conditioning (use T² from part (d)). F _ d.f. p-value Compute the value of a test statistic that would be more appropriate than the statistic in parts (d) and when the covariance matrices are not homogeneous. Present a formula for your test statistic and report the value of your test statistic, degrees of freedom and a p-value. Do you reach the same conclusion as you did in part?. Systolic blood pressure measures were made on a sample of 8 subjects (Bland and Altman, 1999). For each subject, simultaneous measurements were made by each of two experienced observers using a sphygmomanometer and a third measurement was made by a semi-automatic blood pressure monitor. The data are posted on the course web page as systolic.obrien.dat. The data file has one line for each subject and four numbers on each line arranged in the following order. Subject Identification number X1 Systolic blood pressure measurement made by observer 1 X Systolic blood pressure measurement made by observer X3 Systolic blood pressure measurement made by semi-automatic monitor (a) Construct a scatterplot matrix of the sample data for X 1, X, X 3. Are there any obvious outliers? Report the values of the Shapiro-Wilk statistic for each variable. X 1 X X 3 Value of W p-value

Also examine corresponding univariate normal probability plots. State your conclusions. Keep in mind that the data are discrete because of the limited accuracy of the measuring instruments. Consequently, the joint probability distribution of the three systolic blood pressure measurements cannot exactly be a multivariate normal distribution. We only want to know if the multivariate normal distribution is a good approximation. Examine the chi-square probability plot. What does it indicate about the fit of a three dimensional normal model? (c) If you conclude that the distribution of any measurement is not reasonably well modeled by a normal distribution, look for a transformation to improve the fit of the normal model. List the transformations you think should be used for each measurement. Report 'none' if no transformation is required. X 1 X X 3 Transformation: State any additional comments you would like to make on selecting transformations. (d) Regardless of your results in part (c), use the natural logarithm of each variable to complete the rest of this problem. (This transformation is imposed simply to make it easier to grade this question). Compute the sample covariance matrix S and the sample correlation matrix R and report your results. Test the null hypothesis that correlations among the natural logarithms of systolic blood pressure measurements for the two expert observers and the semi-automatic monitor are all zero (without assuming homogeneous variances). State your conclusions. What do your results imply about agreement between measurements made by the two experts and the semi-automatic monitor. Test the null hypothesis that all of the correlations are equal and all of the variances are equal for the natural logarithms of systolic blood pressure measurements for the two expert observers and the semi-automatic monitor. Report the value of your test statistic and the corresponding p-value. State your conclusions. 6

(g) (h) (i) (j) Examine the estimates of the correlation coefficients. Which population correlations are different? Give some statistical justification for your conclusions. What do your results imply about the reliability of the two experts and the semi-automated monitor? Examine the estimated variances. Which population variances are different? Give some statistical justification for your conclusions. What do your results imply about the reliability of the two experts and the semi-automated monitor? If the variance for log(x 3 ) is significantly larger than the value of the variance for log(x ), for example, does that imply that the semi-automatic monitor is less accurate than the second expert. Use the Hotelling T statistic to test the null hypothesis that the mean systolic blood pressure measurements are the same for the two experts and the semi-automatic monitor. Report the value of you test statistic, degrees of freedom and a p-value. Do pairwise tests to determine which means are significantly different. Identify the tests you used and report values of your test statistics and degrees of freedom. State your conclusions. Some additional problems you might consider are problems.1,.,.,.7,.9, 6., 6., 6., 3.6, 3.8, 3.10, 3.11, 3.1, in the fifth edition of the text. Do not submit answers to these problems. Answers may be distributed with answers to this assignment. 7