Master s Written Examination - Solution

Similar documents
Master s Written Examination

Ch 2: Simple Linear Regression

Master s Written Examination

Simple Linear Regression

Statistics & Data Sciences: First Year Prelim Exam May 2018

Probability & Statistics - FALL 2008 FINAL EXAM

Master s Written Examination

Summary of Chapters 7-9

First Year Examination Department of Statistics, University of Florida

MAS223 Statistical Inference and Modelling Exercises

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.

Multivariate Regression

Lecture 15. Hypothesis testing in the linear model

Non-parametric Inference and Resampling

Stat 5102 Final Exam May 14, 2015

Statistics 135 Fall 2008 Final Exam

UNIVERSITY OF MASSACHUSETTS. Department of Mathematics and Statistics. Basic Exam - Applied Statistics. Tuesday, January 17, 2017

Linear models and their mathematical foundations: Simple linear regression

Raquel Prado. Name: Department of Applied Mathematics and Statistics AMS-131. Spring 2010

Direction: This test is worth 250 points and each problem worth points. DO ANY SIX

Course: ESO-209 Home Work: 1 Instructor: Debasis Kundu

Problem Selected Scores

Hypothesis Test. The opposite of the null hypothesis, called an alternative hypothesis, becomes

Chapter 12 - Lecture 2 Inferences about regression coefficient

Hypothesis Testing. 1 Definitions of test statistics. CB: chapter 8; section 10.3

Exercises with solutions (Set D)

Stat 135, Fall 2006 A. Adhikari HOMEWORK 6 SOLUTIONS

Mathematics Ph.D. Qualifying Examination Stat Probability, January 2018

Non-parametric Inference and Resampling

Master s Examination Solutions Option Statistics and Probability Fall 2011

STAT Exam Jam Solutions. Contents

Chapters 10. Hypothesis Testing

EC212: Introduction to Econometrics Review Materials (Wooldridge, Appendix)

Problem 1 (20) Log-normal. f(x) Cauchy

18.05 Final Exam. Good luck! Name. No calculators. Number of problems 16 concept questions, 16 problems, 21 pages

This does not cover everything on the final. Look at the posted practice problems for other topics.

18.05 Practice Final Exam

Review. December 4 th, Review

Table of z values and probabilities for the standard normal distribution. z is the first column plus the top row. Each cell shows P(X z).

STAT 461/561- Assignments, Year 2015

1. (Rao example 11.15) A study measures oxygen demand (y) (on a log scale) and five explanatory variables (see below). Data are available as

Part IB Statistics. Theorems with proof. Based on lectures by D. Spiegelhalter Notes taken by Dexter Chua. Lent 2015

DA Freedman Notes on the MLE Fall 2003

Week 9 The Central Limit Theorem and Estimation Concepts

STAT 135 Lab 6 Duality of Hypothesis Testing and Confidence Intervals, GLRT, Pearson χ 2 Tests and Q-Q plots. March 8, 2015

Masters Comprehensive Examination Department of Statistics, University of Florida

2. Variance and Covariance: We will now derive some classic properties of variance and covariance. Assume real-valued random variables X and Y.

Statistics. Statistics

Statistics Masters Comprehensive Exam March 21, 2003

Linear Models and Estimation by Least Squares

Central Limit Theorem ( 5.3)

Review of Statistics

Section 4.6 Simple Linear Regression

Recall that in order to prove Theorem 8.8, we argued that under certain regularity conditions, the following facts are true under H 0 : 1 n

Hypothesis Testing hypothesis testing approach

Exam 2 Practice Questions, 18.05, Spring 2014

Mathematics Qualifying Examination January 2015 STAT Mathematical Statistics

Hypothesis Testing Chap 10p460

Math 180B Problem Set 3

Fundamental Probability and Statistics

Statistics Ph.D. Qualifying Exam: Part I October 18, 2003

Table of z values and probabilities for the standard normal distribution. z is the first column plus the top row. Each cell shows P(X z).

Tentative solutions TMA4255 Applied Statistics 16 May, 2015

Test Code: STA/STB (Short Answer Type) 2013 Junior Research Fellowship for Research Course in Statistics

Part IB. Statistics. Year

5 Introduction to the Theory of Order Statistics and Rank Statistics

E X A M. Probability Theory and Stochastic Processes Date: December 13, 2016 Duration: 4 hours. Number of pages incl.

STAT763: Applied Regression Analysis. Multiple linear regression. 4.4 Hypothesis testing

Math 494: Mathematical Statistics

Chapters 10. Hypothesis Testing

Statement: With my signature I confirm that the solutions are the product of my own work. Name: Signature:.

Masters Comprehensive Examination Department of Statistics, University of Florida

Asymptotic Statistics-III. Changliang Zou

Ph.D. Qualifying Exam Friday Saturday, January 6 7, 2017

STAT 135 Lab 11 Tests for Categorical Data (Fisher s Exact test, χ 2 tests for Homogeneity and Independence) and Linear Regression

simple if it completely specifies the density of x

Statistical Concepts

Testing Hypothesis. Maura Mezzetti. Department of Economics and Finance Università Tor Vergata

Advanced Quantitative Methods: maximum likelihood

6.4 Type I and Type II Errors

Hypothesis Testing. Testing Hypotheses MIT Dr. Kempthorne. Spring MIT Testing Hypotheses

Simple and Multiple Linear Regression

Lecture Testing Hypotheses: The Neyman-Pearson Paradigm

BTRY 4090: Spring 2009 Theory of Statistics

UQ, Semester 1, 2017, Companion to STAT2201/CIVL2530 Exam Formulae and Tables

Inference for Regression

STAT 526 Spring Midterm 1. Wednesday February 2, 2011

MATH 644: Regression Analysis Methods

This exam contains 13 pages (including this cover page) and 10 questions. A Formulae sheet is provided with the exam.

Mathematical statistics

Regression and Statistical Inference

Statistics Handbook. All statistical tables were computed by the author.

Multiple Linear Regression

Review. DS GA 1002 Statistical and Mathematical Models. Carlos Fernandez-Granda

Topic 15: Simple Hypotheses

Nonparametric hypothesis tests and permutation tests

Chapter 10. Hypothesis Testing (I)

14.30 Introduction to Statistical Methods in Economics Spring 2009

STAT5044: Regression and Anova. Inyoung Kim

Math 151. Rumbos Fall Solutions to Review Problems for Exam 2. Pr(X = 1) = ) = Pr(X = 2) = Pr(X = 3) = p X. (k) =

Transcription:

Master s Written Examination - Solution Spring 204 Problem Stat 40 Suppose X and X 2 have the joint pdf f X,X 2 (x, x 2 ) = 2e (x +x 2 ), 0 < x < x 2 <, zero elsewhere (a) Find the marginal pdf of X 2 (b) Find the conditional expectation E(X X 2 = 2) (c) Find the distribution of Y = X + X 2 (a) f X2 (x 2 ) = x2 0 2e (x +x 2 ) dx = 2e x 2 ( e x 2 ), 0 < x 2 < (b) The conditional density function of X X 2 = 2 is f X,X 2 (x, x 2 ) x2 =2 = f X2 (x 2 ) 2e x 2 2e 2 ( e 2 ) Thus the conditional expectation E(X X 2 = 2) is 2 0 x e x e 2 dx = = e x e 2, 0 < x < 2 e 2 ( x e x e x ) 2 0 = 3e 2 e 2 (c) Let Y 2 = X We first derive the joint distribution of (Y, Y 2 ) Clearly J = Notice that 0 < X < X 2, so we have Y > 2Y 2 > 0 Thus the joint distribution of Y and Y 2 is 2e y, 0 < 2y 2 < y < The distribution of Y can be obtained as y /2 0 2e Y dy 2 = y e y, y > 0 Problem 2 Stat 40 Suppose that number of customers visiting at a bank follows a Poisson process with an average of 3 persons per time unit Let X be the length of time from the bank opening until the first customer visit the bank; let Y be the length of time from the bank opening until the second customer visit the bank

(a) Derive the probability density functions of X (b) Derive the probability density functions of Y (c) Find the joint distribution of X and Y (a) Consider the event {X > x} It implies that there is no customer at the interval [0, x] Let Z x be the number of customers visit the bank at the time interval [0, x] Then Z x has poisson distribution with parameter 3x Thus P (X > x) = P (Z x = 0) = e 3x Consequently, the PDF of X is 3e 3x, x > 0 (b) Similarly, we can derive the distribution of Y, ie, P (Y > y)=p (Z y < 2), where Z y is the number of customers visit the bank at the time interval [0, y] Thus, P (Y > y) = e 3y ( + 3y) Thus the PDF of Y is 9ye 3y, y > 0 (c) P (X > x, Y > y) = P (x < X y, Y > y) + p(x > y, Y > y) = P (T x = 0, T y = ) + p(t y = 0) = P (T y = T x = 0)p(T x = 0) + p(t y = 0) = 3(y x)e 3(y x) e 3x + e 3y = (3y 3x + )e 3y Thus the joint PDF of (X, Y ) is 9e 3y, 0 < x < y < Problem 3 Stat 4 Consider a distribution with density f θ (x) = θx θ, for x (0, ) and θ > 0 This is a beta distribution with parameters θ and a Let X,, X n be independent and identically distributed according to f θ (x) Find the maximum likelihood estimator, ˆθ, of θ b Let X f θ (x) Find the distribution of Y = log X In particular, what is the mean E θ (Y )? (Hint: The distribution of Y is one you know) c Use part (b), the law of large numbers, and the continuous mapping theorem to show that ˆθ is a consistent estimator of θ (a) The log-likelihood function looks like l(θ) = log f θ (X i ) = n log θ + (θ ) The derivative of l(θ) with respect to θ is l (θ) = n θ + log X i, 2 log X i

and setting this equal to zero and solving for θ gives the estimator ˆθ = n n log X i That this, indeed, is a maximizer of the log-likelihood is easy to check with the second derivative test (b) Let X f θ (x) and define Y = log X Then X = e Y transformation is J = e Y Therefore, and the Jacobian of the f Y θ (y) = f X θ (e y )e y = θ(e y ) θ e y = θe θy This is clearly the density of an exponential distribution with mean /θ (c) By the law of large numbers, n n log X i converges in probability to E θ (log X) and, according to part (b), the limit equals /θ Consider the function g(z) = /z for z > 0, a continuous function By continuous mapping theorem we have ˆθ = g( log X) g( E θ (log X)) = g(/θ) = θ This convergence is in probability Therefore, ˆθ is a consistent estimator of θ Problem 4 Stat 4 Let X be a random variable with probability mass function f θ (x) = θ( θ) x, where x = 0,, and θ (0, ) a For fixed θ 0, suppose the goal is to test H 0 : θ = θ 0 versus H : θ < θ 0 Show that the uniformly most powerful test is of the form reject if X > c b Find the constant c above so that the corresponding test has size α Without loss of generality, you can assume c is an integer (a) Take θ < θ 0 For testing H 0 : θ = θ 0 versus H : θ = θ, the most powerful test obtained from the Neyman Pearson lemma is to reject H 0, in favor of H, if and only if the likelihood ratio L(θ 0 )/L(θ ) is too small Since L(θ 0 ) L(θ ) = θ 0( θ 0 ) X θ ( θ ) X = θ 0 θ ( θ0 θ ) X is a non-increasing function of X, we can conclude that the most powerful test rejects H 0, in favor of H, if and only if X > c for some constant c Since the choice of cutoff c will not depend on θ, we can extend the optimality conclusion to hold uniformly for all θ < θ 0 (This also follows since the distribution in question here has the monotone likelihood ratio property in X) Therefore, the stated test is uniformly most powerful for testing H 0 versus H 3

(b) The size of the test is P θ0 (X > c) Following the suggestion, we assume c is an integer Then P θ0 (X > c) = x=c+ θ 0 ( θ 0 ) x = ( θ 0 ) c So, to make the size of the test equal α, we take [ c = ceiling x=0 log α log( θ 0 ) θ 0 ( θ 0 ) x = ( θ 0 ) c the smallest integer greater than or equal to the ratio in the inside ], Problem 5 Stat 4 Let X and X 2 be independent continuous uniform distributed random variables on the interval (θ, θ + ), where θ is an unknown real number Let 2 2 Y = (X 2 + X 2 ) be the sample mean, and Y 2 = (X 2 X 2 ), a scaled difference a Find the conditional density of Y, given Y 2 = u (Hint: The joint distribution of Y and Y 2 is easy to get, and the conditional density is proportional to the joint density with u fixed at the given value) b Find the conditional variance of Y, given Y 2 = u, and the unconditional variance of Y For what values of u is the conditional variance smaller than the unconditional variance? (Hint: The variance of a Unif(a, b) distribution is (b a) 2 /2) (a) The joint density of (Y, Y 2 ) is obtained from the transformation formula Since X = Y + Y 2 and X 2 = Y Y 2, the Jacobian of the transformation is, so f Y,Y 2 (y, y 2 ) = f X,X 2 (y + y 2, y y 2 ) = I [θ 2,θ+ 2 ](y + y 2 )I [θ 2,θ+ 2 ](y y 2 ), where I is the indicator function Following the hint, we know that the conditional density of Y, given Y 2 = u, is proportional to f Y,Y 2 (y, u) as a function of y, ie, f Y Y 2 (y u) I [θ 2 u,θ+ 2 u](y )I [θ 2 +u,θ+ 2 +u](y ) Combining the two indicators, it is clear that the conditional distribution must also be uniform; in particular, (b) From the hint, we have that Y (Y 2 = u) Unif(θ 2 + u, θ + 2 u ) V(Y ) = variance of Unif(θ 2, θ + 2 ) 2 = 24 4

Similarly, V(Y Y 2 = u) = [(θ + 2 u ) (θ 2 + u )]2 2 = ( 2 u )2 2 The conditional variance will be smaller if ( 2 u ) 2 < 2 u > ( ) 2 2 or, equivalently, if Basically, if the distance between observations X and X 2 is sufficiently large, then the conditional variance of the sample mean is less than its unconditional variance Problem 6 Stat 46 The pretest anxiety is investigated in a study in order to see whether the scores are different for two groups of students in two different sections of an introduction course to probability theory Five students enrolled in first section, and six students enrolled in second section Their scores are Section I 9,26,22,2,27 Section II 34,24,30,28,25,23 Use Wilcoxon rank-sum test to check if there a significant difference between the median scores of the two groups at 5% level Sort observations in Section I (denoted by X) into 9, 2, 22, 26, 27 Sort observations in Section II (denoted by Y ) into 23, 24, 25, 28, 30, 34 Combine X and Y into Z and get # 9 2 22 23 24 25 26 27 28 30 34 # X X X Y Y Y X X Y Y Y #i 2 3 4 5 6 7 8 9 0 #Zi 0 0 0 0 0 0 The Wilcoxon rank-sum statistic W n = iz i = + 2 + 3 + 7 + 8 = 2 Using Table J (p 576) with m = 5, n = 6, we get P (W n 2 H 0 ) = 0063 Therefore, the p-value=2*0063=026 We do not reject the null hypothesis That is, there is no significant difference between the median scores of the two groups at 5% 5

Problem 7 Stat 43 The size [N] of a finite population is unknown to start with Ten [0] units are drawn, marked and then released into the population Next, a simple random sample of 30 is drawn only to find that three [3] of the 30 units bear the mark From the above information, suggest a reasonable estimate for N Also indicate what statistical procedure you used in your estimation methodology Sample proportion of marked units = 3/30 Population proportion of marked units = 0/N Using Method of Moments, we equate the sample proportion to the unknown population proportion and so we obtain: 3/30 = 0/N, whence ˆN = 00 Problem 8 Stat 45 The table below shows survival times (days) of patients with advanced terminal cancer of the stomach and breast The goal is to use a permutation test to examine the hypothesis that there is no difference in mean survival times between the two groups (stomach and breast) Describe your algorithm in detail Stomach 25 42 45 46 5 03 24 46 340 396 42 876 2 Breast 24 40 79 727 79 66 235 58 804 3460 3808 There are n = 3 observations in the stomach group, denoted by x,, x n There are m = observations in the breast group, denoted by y,, y m Define the statistic T = T (z,, z n, z n+,, z n+m ) = n z i m m z n+j Calculate t 0 = T (x,, x n, y,, y m ) which is the mean difference of the two groups 2 For k =,, B, permute the original data (x,, x n, y,, y m ) into a new dataset Z (k) = (z (k),, z n (k), z n+, (k), z n+m) (k) and calculate t k = T (Z (k) ) 3 Let L be the number of k s such that t k t 0 Then L/B serves as an estimated p-value We reject the hypothesis that there is no difference in group means if L/B is less than a certain significance level, say 005 j= Problem 9 Stat 46 A die is rolled repeatedly until either two successive s appear or one 6 s appear Suppose the first roll is a 3 Find the probability that the game ends with two successive s We use X n to record the number of successive s by the following way: Outcome of the n-th roll is 2, 3, 4 or 5: X n = 0; 6

Number of successive s is one after the n-th roll: X n = ; Number of successive s is two after the n-th roll: X n = 2; Outcome of the n-th roll is 6: X n = 3 Then X n is a Markov chain with transition probability matrix 4/6 /6 0 /6 P = 4/6 0 /6 /6 0 0 0 0 0 0 States 2 and 3 are absorption states The transition probability matrix corresponding to the non-absorbing states is ( ) 4/6 /6 Q = 4/6 0 ( ) 2/6 /6 W = (I Q) = = 9 ( ) /6 4/6 2 4/6 2/6 Note We have U = W R = 9 2 R = ( 0 /6 /6 /6 ( /6 4/6 2/6 ) ) ( 0 /6 /6 /6 Suppose the first roll is a 3, the probability that the game ends with two successive s is given by U 02 which is the (0, 2)-th entry of U ) Problem 0 Stat 46 We toss a coin repeatedly For each toss, we get if the outcome is a head; we get 2 if the out come is a tail Let Y n be the summation of the outcomes of the first n tosses Denote by X n the remainder of Y n divided by 3 a Is there a limiting distribution for the Markov chain X n? If yes, determine the limiting distribution b Suppose a visit to state j incurs a cost c j for j = 0, and 2 Moreover, we know that c 0 =, c = 2 and c 2 = 3 What is the long run mean cost per unit time? The transition matrix of X n is given by 0 05 05 P = 05 0 05 05 05 0 7

(a) Yes, there is a limiting distribution for the Markov chain, since P 2 has all entries strictly positive and hence regular Moreover, it is clear that P is doubly stochastic, the limiting distribution is thus π = (π 0, π, π 2 ) = (/3, /3, /3) (b) Since π j is also the long run mean fraction of time that the process X n is in state j, we have Long run mean cost per unit time = N j=0 π j c j = ( + 2 + 3) = 2 3 Problem Stat 48 Try to fit data {(x i, Y i ), i =,, n} with a simple linear regression model Y i = β 0 + β x i + ε i, where iid errors ε i N (0, σ 2 ) (a) Based on the least square criterion (loss function), calculate the least squares estimates for the intercept and slope, ˆβ 0, ˆβ What is the least square estimate for β 0 under restriction that β = 0? Is it different from the unrestricted estimator? (b) Show that SSR = ˆβ 2 n ( ) (x i x) 2, and derive its distribution under β = 0 [Given V ar ˆβ = σ [ 2 n (x i x) 2] ] (c) Show that the coefficient of determination R 2 = r 2, where r is the linear correlation coefficient of x = (x,, x n ) and Y = (Y,, Y n ) (a) Least square estimators ( Q ˆβ0, ˆβ ) = min { Q/ β0 = 0 Q/ β = 0 ( ˆβ0, ˆβ ) : { } {Q (β 0, β )} = min (Y i β 0 β x i ) 2 β 0,β β 0,β { n Y i = nβ 0 + β n x i n Y ix i = β n 0 x i + β n x2 i ˆβ 0 = ȳ ˆβ x, ˆβ = S xy S xx = n (x i x) (y i ȳ) n (x i x) 2 When β = 0, the least square estimator for β 0 is the response average { ( ) } Q ˆβ0 = min {Q (β 0 )} = min (Y i β 0 ) 2 β 0 β 0 dq/dβ 0 = 0 Y i = nβ 0 ˆβ 0 = Ȳ (b) Note that ȳ = ˆβ 0 + ˆβ x, ŷ i = ˆβ 0 + ˆβ x i, then SSR = (ŷ i ȳ) 2 = ( ˆβ0 + ˆβ x i ˆβ ) 2 0 ˆβ x = ˆβ2 (x i x) 2 8

where ˆβ = n (x i x) (y i ȳ) s xx = c i Y i, where c i = (x i x) s xx It can be shown that ˆβ N (β, σ 2 s xx ) Under hypothesis β = 0, then ˆβ ( ) N (0, ) ˆβ2 ( ) = ˆβ 2 s xx χ 2 () V ar ˆβ V ar ˆβ σ 2 ie SSR/σ 2 χ 2 () (c) The coefficient of determination R 2 = SSR SST O = ˆβ ( ) 2 2 s xx sxy = sxx s yy s xx s yy s 2 n xy = = (x i x) (y i ȳ) s xx s n yy (x i x) 2 n (y i ȳ) 2 2 = r 2 Problem 2 Stat 48 A researcher studied the sodium content in beer by selecting six brands from the large number of brands of US and Canadian beers The researcher then chose eight 2-ounce cans or bottles of each selected brand at random and measured the sodium content Y (in mg) (a) Write down appropriate statistic model and necessary assumptions for the model What is the hypotheses for this study? (b) Complete the following ANOVA table and then conclude given level α = 005F 005 (5, 42) = 244, F 005 (6, 4) = 233 Source DF Sum of Squares Mean Square F Brand 5 650 30 78 Error 42 308 073 T otal 47 6808 (c) Estimate variance components in the model given in () (d) Find Corr (Y ij, Y i j ) i, i =,, k; j, j =,, n, the correlation coefficient between any two responses (a) This is a random effect one-way ANOVA model Y ij = µ + τ i + ε ij, i =,, 6; j =,, 8 where iid errors ε ij N (0, σ 2 ) is independent of the random effect τ i N (0, σ 2 τ) Hypotheses: H 0 : σ 2 τ = 0 vs H : σ 2 τ 0 (b) See the table above Reject null hypothesis as p value < 005 since F = 78 > F 005 (5, 42) = 244 9

(c) ˆσ 2 = MSE = 073, ˆσ 2 τ = (MST R MSE) /n = (30 073) /8 = 62 (d) Calcluate the covariance first Cov (Y ij, Y i j ) = Cov (µ + τ i + ε ij, µ + τ i + ε i j ) 0, i i = σ 2 τ, i = i, j j σ 2 + στ 2 i = i, j = j 0, i i Corr (Y ij, Y i j ) = στ 2, i = i, j j σ 2 +στ 2 i = i, j = j 0