Statistics 135 Fall 2008 Final Exam

Similar documents
Problem Selected Scores

Statistics Ph.D. Qualifying Exam: Part II November 3, 2001

Statistics 135: Fall 2004 Final Exam

BTRY 4090: Spring 2009 Theory of Statistics

TUTORIAL 8 SOLUTIONS #

Final Exam. 1. (6 points) True/False. Please read the statements carefully, as no partial credit will be given.

Statistics Ph.D. Qualifying Exam: Part II November 9, 2002

Master s Written Examination - Solution

Hypothesis Test. The opposite of the null hypothesis, called an alternative hypothesis, becomes

Master s Written Examination

Statistics Ph.D. Qualifying Exam: Part I October 18, 2003

Direction: This test is worth 250 points and each problem worth points. DO ANY SIX

A Very Brief Summary of Statistical Inference, and Examples

UNIVERSITY OF MASSACHUSETTS. Department of Mathematics and Statistics. Basic Exam - Applied Statistics. Tuesday, January 17, 2017

Let us first identify some classes of hypotheses. simple versus simple. H 0 : θ = θ 0 versus H 1 : θ = θ 1. (1) one-sided

simple if it completely specifies the density of x

STAT 135 Lab 11 Tests for Categorical Data (Fisher s Exact test, χ 2 tests for Homogeneity and Independence) and Linear Regression

STAT 135 Lab 6 Duality of Hypothesis Testing and Confidence Intervals, GLRT, Pearson χ 2 Tests and Q-Q plots. March 8, 2015

Statistics 135 Fall 2007 Midterm Exam

GROUPED DATA E.G. FOR SAMPLE OF RAW DATA (E.G. 4, 12, 7, 5, MEAN G x / n STANDARD DEVIATION MEDIAN AND QUARTILES STANDARD DEVIATION

UNIVERSITY OF TORONTO Faculty of Arts and Science

Statistics 3858 : Contingency Tables

Parametric versus Nonparametric Statistics-when to use them and which is more powerful? Dr Mahmoud Alhussami

Mathematics Ph.D. Qualifying Examination Stat Probability, January 2018

Chapters 10. Hypothesis Testing

Ph.D. Qualifying Exam Friday Saturday, January 3 4, 2014

Multiple Linear Regression

Central Limit Theorem ( 5.3)

SCHOOL OF MATHEMATICS AND STATISTICS. Linear and Generalised Linear Models

BIOS 2083 Linear Models c Abdus S. Wahed

Section 4.6 Simple Linear Regression

PhD Qualifying Examination Department of Statistics, University of Florida

Math 494: Mathematical Statistics

Can you tell the relationship between students SAT scores and their college grades?

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.

Part 1.) We know that the probability of any specific x only given p ij = p i p j is just multinomial(n, p) where p k1 k 2

Stat 135 Fall 2013 FINAL EXAM December 18, 2013

Spring 2012 Math 541B Exam 1

Ch. 5 Hypothesis Testing

Chapters 10. Hypothesis Testing

Correlation and Linear Regression

Y i = η + ɛ i, i = 1,...,n.

Some General Types of Tests

3 Joint Distributions 71

CHI SQUARE ANALYSIS 8/18/2011 HYPOTHESIS TESTS SO FAR PARAMETRIC VS. NON-PARAMETRIC

Institute of Actuaries of India

This does not cover everything on the final. Look at the posted practice problems for other topics.

Statistics Handbook. All statistical tables were computed by the author.

UNIVERSITY OF TORONTO SCARBOROUGH Department of Computer and Mathematical Sciences FINAL EXAMINATION, APRIL 2013

STA 2201/442 Assignment 2

Introduction to Statistical Data Analysis Lecture 7: The Chi-Square Distribution

STA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis. 1. Indicate whether each of the following is true (T) or false (F).

STATS 200: Introduction to Statistical Inference. Lecture 29: Course review

Chapter 12 - Lecture 2 Inferences about regression coefficient

Formulas and Tables by Mario F. Triola

Simple and Multiple Linear Regression

Correlation: Relationships between Variables

Comparison of Two Samples

Foundations of Statistical Inference

General Linear Model: Statistical Inference

This exam contains 13 pages (including this cover page) and 10 questions. A Formulae sheet is provided with the exam.

Chapter 7. Hypothesis Testing

Summary of Chapters 7-9

Problem #1 #2 #3 #4 #5 #6 Total Points /6 /8 /14 /10 /8 /10 /56

f(x θ)dx with respect to θ. Assuming certain smoothness conditions concern differentiating under the integral the integral sign, we first obtain

A Very Brief Summary of Statistical Inference, and Examples

Analysing data: regression and correlation S6 and S7

Stat 5102 Final Exam May 14, 2015

Probability and Statistics qualifying exam, May 2015

Mathematics Qualifying Examination January 2015 STAT Mathematical Statistics

Hypothesis Testing. 1 Definitions of test statistics. CB: chapter 8; section 10.3

Psych 230. Psychological Measurement and Statistics

2.1 Linear regression with matrices

This paper is not to be removed from the Examination Halls

Data are sometimes not compatible with the assumptions of parametric statistical tests (i.e. t-test, regression, ANOVA)

TABLES AND FORMULAS FOR MOORE Basic Practice of Statistics

Stat 135, Fall 2006 A. Adhikari HOMEWORK 6 SOLUTIONS

4 Hypothesis testing. 4.1 Types of hypothesis and types of error 4 HYPOTHESIS TESTING 49

STA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis. 1. Indicate whether each of the following is true (T) or false (F).

Masters Comprehensive Examination Department of Statistics, University of Florida

Non-parametric methods

Ph.D. Qualifying Exam Friday Saturday, January 6 7, 2017

Modeling and inference for an ordinal effect size measure

Simple Linear Regression

STAT763: Applied Regression Analysis. Multiple linear regression. 4.4 Hypothesis testing

Economics 520. Lecture Note 19: Hypothesis Testing via the Neyman-Pearson Lemma CB 8.1,

Lecture 6 Multiple Linear Regression, cont.

Solutions to the Spring 2015 CAS Exam ST

Table of z values and probabilities for the standard normal distribution. z is the first column plus the top row. Each cell shows P(X z).

Statistical Inference: Estimation and Confidence Intervals Hypothesis Testing

F79SM STATISTICAL METHODS

Ph.D. Qualifying Exam Monday Tuesday, January 4 5, 2016

Qualifying Exam CS 661: System Simulation Summer 2013 Prof. Marvin K. Nakayama

Final Exam - Solutions

Probability and Statistics Notes

Math 494: Mathematical Statistics

Lecture 15. Hypothesis testing in the linear model

Linear Models and Estimation by Least Squares

Testing Hypothesis. Maura Mezzetti. Department of Economics and Finance Università Tor Vergata

TA: Sheng Zhgang (Th 1:20) / 342 (W 1:20) / 343 (W 2:25) / 344 (W 12:05) Haoyang Fan (W 1:20) / 346 (Th 12:05) FINAL EXAM

Transcription:

Name: SID: Statistics 135 Fall 2008 Final Exam Show your work. The number of points each question is worth is shown at the beginning of the question. There are 10 problems. 1. [2] The normal equations for the least squares estimates are X T (Y X ˆβ) = 0. Show that if the design matrix X contains a column of 1 s, then the average value of the residuals is 0. The residuals are Y X ˆβ. If a column of X is all 1 s, then from that row in the normal equations, (Y i x i ˆβ) = 0 so the average is 0. 2. Recall that the probability mass function of a binomial random variable with m trials and success probability θ is p(k) = ( ) m k θ k (1 θ) m k, k = 0, 1,..., m. Suppose that X 1, X 2,..., X n are independent binomial random variables, with common success probability θ, but with differing numbers of trials, m 1, m 2,..., m n. (a) [4] Find the maximum likelihood estimate (MLE) of θ. Lik(θ) = n ( mi ) i=1 x i θ x i (1 θ) m i x i = Cθ T (1 θ) N T, where C is the product of the binomial coefficients, N = m i and T = x i. Maximizing gives ˆθ = T/N (b) [2] Is the MLE unbiased? Since E(X i ) = m i θ, it follows that E(T ) = m i θ = Nθ, so the MLE is unbiased. (c) [2] What is the standard deviation of the MLE? Since V ar(x i ) = m i θ(1 θ), V ar(t ) = θ(1 θ) m i = Nθ(1 θ), so V ar(ˆθ) = θ(1 θ)/n. The large sample approximation gives the same result. (d) [2] What is the probability distribution of the MLE? T is binomial(n, θ), so P (ˆθ = t/n) = P (T = t), for t = 0, 1,..., T and P (T = t) is a binomial probability. A normal approximation could also be used. (e) [2]. If the prior distribution of θ is uniform, what is the posterior distribution? The posterior distribution is proportional to the likelihood times the prior, and the prior equals 1, so the posterior is proportional to the likelihood. That distribution is Beta(T + 1, N T + 1) 3. This problem arises in particle physics. An experiment is run to estimate the background rate at which particles are produced. The particle count X 1 is measured and it is assumed that X 1 P oisson(λ b ), where λ b, the background rate, is unknown. With the experimental apparatus turned on a count X 2 is measured and it is assumed that X 2 P oisson(λ b +λ s ), where the experimental apparatus produces source counts which are Poisson(λ s ) independent of 1

the background particles and what is observed is background counts plus source counts. The counts X 1 and X 2 are thus observed. Explain carefully and clearly how to construct a generalized likelihood ratio test of the null hypothesis that λ s = 0, versus the alternative that λ s > 0, that is, under the null hypothesis, the experimental source does not produce any particles? (a) [4] Derive the test statistic. The likelihood is Lik(λ b, λ s ) = λx 1 b e λ b x 1! (λ b +λ s) x 2 e λ b λs x 2!. Under H 0, λ s = 0. This has to be maximized under H 0 : λ s = 0 and under H A : λ s > 0. Doing so we find that under H 0, ˆλb = x and under H A, ˆλb = x 1 and ˆλ s = max(0, x 2 x 1 ). Substituting these into to likelihood ratio, and assuming for simplicity that x 2 x 1, we find log Λ = x 1 log( x/x 1 ) + x 2 log( x/x 2 ) (b) [2]Identify the approximate null distribution. Under H 0, 2 log Λ approximately follows a chi-square distribution with 1 df (c) [2]Explain how to form the rejection region for a level α test. Reject if 2 log Λ is larger than the upper α quantile of the chi-square distribution with 1 df. 4. In a test of sensory perception, each of n panelists is presented with three items to taste, in randomized order. Two of the items are identical (eg Coke) and one is different (eg Pepsi). Each panelist is asked to identify the one that is different. Explain carefully and clearly how you could test the null hypothesis that the items are indistinguishable. (a) [4]Propose a test statistic. Let T be the number of successful identifications. (Many students set this up as a chisquare test for a multinomial with either two or three cells, and I gave credit for such solutions.) (b) [2] What is the null distribution of the test statistic? Binomial(n, 1/3) (c) [2] Identify the rejection region of the test with significance level α. The upper tail of the binomial distribution 5. Consider the standard linear model, Y = Xβ + ɛ. (a) [2]Give expressions for the fitted values and the residuals. Ŷ = X ˆβ where ˆβ = (X T X) 1 X T Y. ê = Y Ŷ 2

(b) [2] Derive the covariance matrix of the residuals. Let P = X(X T X) 1 X T. Then P = P 2 = P T Σêê = σ 2 (I P )I(I P ) = σ 2 (I P ) (c) [2] Derive the cross-covariance matrix of Y and the residuals. Are they uncorrelated? Σ Y ê = σ 2 I(I P ) = σ 2 (I P ), so they are correlated. 6. SAT scores (x) and freshman year GPAs (Y) are recorded for a group of male and female students. A model of the form Y = β 1 I M + β 2 I F + β 3 x is to be fit to the data, where I M = 1 if the student is male and 0 if female and I F = 1 if the student is female and 0 if male. (a) [2] Explain clearly how to write this a a linear model of the form Y = Xβ + ɛ. What is X? Y is the vector of observations the i-th entry is the GPA of the i-th person. X is a matrix whose i-th entry first column is 1 if the i-th person is male and 0 if female, and similarly the 2nd column is 0 if male and 1 if female. The 3rd column is the vector of SAT scores. (b) [2] Explain how to estimate the standard error of ˆβ 1. s 2 = RSS/(n 3). The estimated standard error is s times the square root of the (1,1) entry of (X T X) 1 (c) [2] Explain how to estimate the variance of ˆβ 1 ˆβ 2. ˆβ 1 ˆβ 2 = [1 1 0] ˆβ so the variance of this quantity is σ 2 [[1 1 0](X T X) 1 [1 1 0] T. This variance is estimate by using s 2 in place of σ 2. (d) [2] Explain how to test H 0 : β 1 β 2 = 0. Refer the statistic ( ˆβ 1 ˆβ 2 )/s ˆβ1 ˆβ 2 to a t distribution with n 3 df. 7. [4]A group of patients who smoked were seen by family practice physicians at a clinic, They were then classified by gender and by whether they had been advised to stop smoking, with results shown in the table below. Is there a relationship between gender and being advised to cease smoking? Advised Not Advised Male 48 47 Female 80 136 Using the chi-square test of independence, the expected counts are 39.09968 55.90032 88.90032 127.09968 and the test statistic is 4.96 with 1 df. The p-value is.05 < p <.01 3

8. Suppose that under the null hypothesis a random variable X has a uniform distribution on [0, 1], and under the alternative it has a density f(x) = 2x. (a) [2] What is the most powerful test at level α = 0.10? By the Neyman-Pearson Lemma, the likelihood ratio test is most powerful. Λ = 1/2x, so the test rejects for large x. For α =.10, the critical value is x c = 0.9 (b) [2] What is the power of the test? The cdf under the alternative is F (x) = x 2, so the power is 1 F (.9) = 0.19 (c) [2] Is this test uniformly most powerful for testing against the alternative H A : f(x) = λx λ 1, for λ > 1? Why or why not? The test is UMP since it is the same for all λ > 1 9. Suppose independent samples are taken under each of two conditions: x 1 = 32, x 2 = 11, x 3 = 4 and y 1 = 44, y 2 = 15. Model the X s as being independent realizations from a continuous distribution F X and the Y s being independent realizations from a continuous distribution F Y. (a) [2] Give an estimate for P (X < Y ). ˆP = (1/6) i j 1(X i < Y j ) = 5/6 (b) [2] Explain how to test H 0 : P (X < Y ) = 1/2 versus H A : P (X < Y ) > 1/2. Identify the test statistic. This hypothesis can be tested by the Mann-Whitney test based on the sum of the ranks of the Y s, which is 5+3=8. (c) [2] What is the null distribution of the test statistic? The null distribution can be computed by considering all possible ways to assign ranks to the Y s. There are 10 possible ways. In this way, the null distribution is found to be: P (T = 9) = P (T = 8) = 1/10. P (T = 7) = P (T = 6) = P (T = 5) = 2/10. P (T = 4) = P (T = 3) = 1/10 (d) [2] What is the p-value of the test for this data? From the probabilities above, P (T 8) = 2/10, the p-value 4

10. A nonparametric test for the slope of a regression line. Suppose that one observes pairs (Y i, x i ), i = 1, 2,..., n and that x 1 < x 2 < < x n. For simplicity assume that the Y i are distinct, that is there are no ties. To test for postive association between the x s and Y s consider the test statistic T = n 1 n i=1 j=i+1 where I(z) is 1 if z > 0 and equals -1 if z < 0. I(D j D i ) (a) [2] Consider the null hypothesis that the slope is 0. Explain why large values of T are evidence against the null hypothesis. If the slope is postive, one would expect that Y j Y i would tend to be postive when x j x i > 0. This should give rise to large values of T. (b) [2] How could the null distribution of T be determined? Under the null hypothesis, the Y s would not be expected to systematically increase, or decrease. The null distribution can thus be constructed by forming all possible permutations of the Y s, calculating the corresponding values of T, and using the resulting distribution. The test would reject if the observed value of T was in the right tail of this distribution, the critical value being determined by α. 5