Master s Written Examination

Similar documents
Master s Written Examination

Master s Written Examination - Solution

Master s Written Examination

This does not cover everything on the final. Look at the posted practice problems for other topics.

Ch 2: Simple Linear Regression

Ph.D. Qualifying Exam Friday Saturday, January 6 7, 2017

Problem Selected Scores

ECE 275A Homework 7 Solutions

Ph.D. Qualifying Exam Friday Saturday, January 3 4, 2014

Master s Examination Solutions Option Statistics and Probability Fall 2011

Lecture 15. Hypothesis testing in the linear model

STAT5044: Regression and Anova. Inyoung Kim

Spring 2012 Math 541B Exam 1

where x and ȳ are the sample means of x 1,, x n

Spring 2012 Math 541A Exam 1. X i, S 2 = 1 n. n 1. X i I(X i < c), T n =

Central Limit Theorem ( 5.3)

Statement: With my signature I confirm that the solutions are the product of my own work. Name: Signature:.

Statistics & Data Sciences: First Year Prelim Exam May 2018

Course: ESO-209 Home Work: 1 Instructor: Debasis Kundu

BIOS 2083 Linear Models c Abdus S. Wahed

Problem 1 (20) Log-normal. f(x) Cauchy

General Linear Model: Statistical Inference

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.

Chapter 7. Hypothesis Testing

This paper is not to be removed from the Examination Halls

BIO5312 Biostatistics Lecture 13: Maximum Likelihood Estimation

UQ, Semester 1, 2017, Companion to STAT2201/CIVL2530 Exam Formulae and Tables

Stat 579: Generalized Linear Models and Extensions

Chapter 1. Linear Regression with One Predictor Variable

Notes on the Multivariate Normal and Related Topics

Exercises and Answers to Chapter 1

I i=1 1 I(J 1) j=1 (Y ij Ȳi ) 2. j=1 (Y j Ȳ )2 ] = 2n( is the two-sample t-test statistic.

Mathematics Ph.D. Qualifying Examination Stat Probability, January 2018

Generalized Linear Models Introduction

Qualifying Exam in Probability and Statistics.

Chapter 1: Linear Regression with One Predictor Variable also known as: Simple Linear Regression Bivariate Linear Regression

Answer Key for STAT 200B HW No. 7

Problems. Suppose both models are fitted to the same data. Show that SS Res, A SS Res, B

Linear Methods for Prediction

Brief Review on Estimation Theory

Practical Econometrics. for. Finance and Economics. (Econometrics 2)

STAT 730 Chapter 4: Estimation

Simple Linear Regression

MA 575 Linear Models: Cedric E. Ginestet, Boston University Midterm Review Week 7

First Year Examination Department of Statistics, University of Florida

F & B Approaches to a simple model

MS&E 226: Small Data. Lecture 11: Maximum likelihood (v2) Ramesh Johari

Table of z values and probabilities for the standard normal distribution. z is the first column plus the top row. Each cell shows P(X z).

simple if it completely specifies the density of x

Multiple Linear Regression

Hypothesis Testing: The Generalized Likelihood Ratio Test

Mathematics Qualifying Examination January 2015 STAT Mathematical Statistics

[y i α βx i ] 2 (2) Q = i=1

Institute of Actuaries of India

Probability Theory and Statistics. Peter Jochumzen

Statistical Methods for Handling Incomplete Data Chapter 2: Likelihood-based approach

STAT 705 Chapter 16: One-way ANOVA

Unbiased Estimation. Binomial problem shows general phenomenon. An estimator can be good for some values of θ and bad for others.

Section 4.6 Simple Linear Regression

Part IB Statistics. Theorems with proof. Based on lectures by D. Spiegelhalter Notes taken by Dexter Chua. Lent 2015

Hypothesis Testing. Robert L. Wolpert Department of Statistical Science Duke University, Durham, NC, USA

Statistics and Econometrics I

A Very Brief Summary of Statistical Inference, and Examples

Test Code: STA/STB (Short Answer Type) 2013 Junior Research Fellowship for Research Course in Statistics

MAS223 Statistical Inference and Modelling Exercises

Introduction to Normal Distribution

2. Variance and Covariance: We will now derive some classic properties of variance and covariance. Assume real-valued random variables X and Y.

Formal Statement of Simple Linear Regression Model

2014/2015 Smester II ST5224 Final Exam Solution

STAT 100C: Linear models

Hypothesis Testing. 1 Definitions of test statistics. CB: chapter 8; section 10.3

Qualifying Exam in Probability and Statistics.

Statistics Masters Comprehensive Exam March 21, 2003

Quick Tour of Basic Probability Theory and Linear Algebra

Unbiased Estimation. Binomial problem shows general phenomenon. An estimator can be good for some values of θ and bad for others.

STAT 461/561- Assignments, Year 2015

Non-parametric Inference and Resampling

Lecture 2: Basic Concepts and Simple Comparative Experiments Montgomery: Chapter 2

Lecture 7 Introduction to Statistical Decision Theory

STAT 540: Data Analysis and Regression

Markov Chain Monte Carlo (MCMC)

STAT 4385 Topic 03: Simple Linear Regression

Masters Comprehensive Examination Department of Statistics, University of Florida

EXAMINATIONS OF THE HONG KONG STATISTICAL SOCIETY

Lecture 3. Inference about multivariate normal distribution

McGill University. Faculty of Science. Department of Mathematics and Statistics. Part A Examination. Statistics: Theory Paper

If we want to analyze experimental or simulated data we might encounter the following tasks:

Nonresponse weighting adjustment using estimated response probability

(a) (3 points) Construct a 95% confidence interval for β 2 in Equation 1.


Lecture 8: Information Theory and Statistics

Review and continuation from last week Properties of MLEs

Estimation Theory. as Θ = (Θ 1,Θ 2,...,Θ m ) T. An estimator

Correlation and the Analysis of Variance Approach to Simple Linear Regression

CHAPTER 2 SIMPLE LINEAR REGRESSION

Math 494: Mathematical Statistics

HT Introduction. P(X i = x i ) = e λ λ x i

PART I. (a) Describe all the assumptions for a normal error regression model with one predictor variable,

Masters Comprehensive Examination Department of Statistics, University of Florida

MS&E 226: Small Data

Transcription:

Master s Written Examination Option: Statistics and Probability Spring 05 Full points may be obtained for correct answers to eight questions Each numbered question (which may have several parts) is worth the same number of points All answers will be graded, but the score for the examination will be the sum of the scores of your best eight solutions Use separate answer sheets for each question DO NOT PUT YOUR NAME ON YOUR ANSWER SHEETS When you have finished, insert all your answer sheets into the envelope provided, then seal it

Problem Stat 40 X and Y are independent random variables with X exponential(λ) and Y exponential(µ) It is impossible to obtain direct observations of X and Y Instead, we observe the random variables Z and W, where {, Z = X; Z = min{x, Y } and W = 0, Z = Y (a) Find the joint distribution of Z and W (b) Prove that Z and W are independent Solution to Problem (a) We first compute P (Z > z, W = ) P (Z > z, W = ) = P (Z > z, Z = X) Clearly, P (W = ) = P (Z > 0, W = ) = Similarly, we have = P (X > z, X Y ) = = P (Z z, W = ) = z z x λµe λx e µy dydx λe λx e µx dx = λ λ + µ e (λ+µ)z λ Thus λ+µ λ ( ) e (λ+µ)z λ + µ P (Z z, W = 0) = µ ( ) e (λ+µ)z λ + µ (b) It is sufficient to show that P (Z z W = i) = P (Z z) Clearly, we have On the other hand, P (Z z) = e (λ+µ)z P (Z z W = i) = P (Z z, W = i) P (W = i) Utilizing the results in (a), we can directly verify that P (Z z W = i) = P (Z z) Problem Stat 40 Suppose you have a wooden stick with length You first break it into two pieces at a randomly selected point on the stick Then you again break the longer piece at random into two pieces Let X and X be the places where you break the stick the first and second time, respectively Find the joint probability density function of X and X

Find the probability that you can make a triangle using the three pieces Solution to Problem Let X and X be the places where you break the stick the first/second time X Unif (0,) Let, x > /, and x (x, ); x f (x x ) =, x < /, and x (0, x ), x Since X Unif (0, ), we have, x > /, and x (x, ); x f(x, x ) = f(x )f (x x ) =, x < /, and x (0, x ), x Now we have two cases x > / In this case, the three pieces have length x, x x and x In order to make a triangle, we need x > x, x + x > x x, and x > x In other words, we need x / < x < / x < / Similarly, the three pieces have length x, x x and x Solving these inequalities gives / < x < x + / Therefore the probability that a triangle can be made using the three pieces is = P(X > /, X / < X < / OR X < /, / < X < X + /) x =/ / x / x dx dx + / /+x x =0 x =/ = log(/) / / log(/) = log 4 x dx dx Problem 3 Stat 4 Let X,, X n be a random sample from X Normal (θ, σ ), θ R is unknown, and σ > 0 is known () Find the maximum likelihood estimator ˆθ mle for mean θ () Is ˆθ mle an efficient estimator for mean θ? (3) Let parameter η = θ, determine its mle ˆη mle and the asymptotic distribution of ˆη mle? Solution to Problem 3 Log likelihood function of the sample is () The likelihood equation is l (θ) = n log ( πσ ) σ l θ = σ (x i θ) (x i θ) = 0 3

and its solution is ˆθ = x = n x i, and l θ = n σ < 0, so ˆθ mle = x ) () E (ˆθmle = E ( X) = θ, an unbiased estimator for θ, ) V ar (ˆθmle = V ar ( V ar (X) X) = n = σ n = ni (θ) as Fisher information of the sample is ( ) log f (X, θ) ni (θ) = ne = n θ σ The mle reaches the R-C lower bound, so it is an efficient estimator for θ ) (3) Based on the functional invariant property of the mle, ˆη mle = (ˆθmle = X Based on the asymptotic normal distribution of the mles, we have ( ) n (ˆηmle η) N 0, (η (θ)), I (θ) ie n (ˆη mle η) N (0, 4θ σ ) Problem 4 Stat 4 Let X,, X n and Y,, Y n be independent random samples from two normal distributions N(µ, σ ) and N(µ, σ ), respectively, where σ is the common but unknown variance (a) Find the likelihood ratio for testing H 0 : µ = µ = 0 against all alternatives (b) Rewrite so that it is a function of a statistic Z which has a well-known distribution under null hypotheses (Hint: X i = (X i X + X) ) (c) Give the distribution of Z under null hypotheses Solution to Problem 4 The likelihood function is ( (πσ ) n exp σ ( (X i µ ) + (a) Under the null hypotheses, the MLE of σ is ) (Y j µ ) ) j= ˆσ = X i + j= Y j n 4

and ( L(ˆω) = (π) n X i + j= Y ) j n exp ( n) n Under Ω, we have ˆµ = X, ˆµ = Ȳ, ˆσ = (X i X) + j= (Y j Ȳ ), n and Thus, we have (b) Notice that Thus where ( L(ˆΩ) = (π) n (X i X) + j= (Y j Ȳ ) n ) exp ( n) n = L(ˆω) L(ˆΩ) = Xi + j= Y j = Z = ( X i + j= Y j (X i X) + j= (Y j Ȳ ) (X i X) + ) n (Y j Ȳ ) + n X + nȳ j= = ( + Z) n, n X + nȳ (X i X) + j= (Y j Ȳ ) (c) Under null hypotheses, n X /σ χ (), nȳ /σ χ (), (X i X) /σ χ (n ), and j= (Y j Ȳ ) /σ χ (n ) Notice that they are independent with each other Thus (n X + nȳ )/σ χ () and ( (X i X) + j= (Y j Ȳ ) )/σ χ (n ) Consequently, Z F distribution with first df= and second df=n Problem 5 Stat 4 Consider a bivariate observation X = (X, X ) with density f θ (x, x ) = exp{ θx θ x }, x, x, θ > 0 Note: Under this model, X and X are independent (but not iid) exponential random variables with means /θ and θ, respectively Use the factorization theorem to find the minimal sufficient statistic; you don t have to prove the minimal part Find the maximum likelihood estimator of θ Is it sufficient? 3 Show that the minimal sufficient statistic identified in Part is not complete 5

Solution to Problem 5 The likelihood for θ is just the joint density of the observations, and it is clear that the shape of the likelihood depends on all of the data, (X, X ) So, by the factorization theorem, there is no non-trivial sufficient statistic The derivative of the log-density is X + X /θ Setting this equal to zero and solving for θ gives the MLE, ie, ˆθ = (X /X ) / This is, of course, a function of the sufficient statistic, but since it s not one-to-one, it cannot itself be a sufficient statistic 3 To show that T = (X, X ) is not complete, we need to find a non-zero function f(t) such that E θ {f(t )} = 0 for all θ Since the exponential distribution is a scale parameter family, we expect that some multiplication or division might cancel out the dependence on θ Indeed, if we take then we get f(t) = f(t, t ) = t t, E θ {f(t )} = E θ (X X ) = E θ (X )E θ (X ) = θ θ = 0, θ Since f is not identically zero, we conclude T is not complete Problem 6 Stat 46 in six consecutive years: The table below shows yearly sales (in $0, 000) of a bookstore Suppose we choose significant level 005 Time Yr- Yr- Yr-3 Yr-4 Yr-5 Yr-6 Sales 4 7 6 0 (a) Test if there exists a trend using total number of runs above and below sample median (b) Use the rank von Neumann (RVN) test to see if there exists a trend (c) Are your conclusions in (a) and (b) consistent? What s your final conclusion? Solution to Problem 6 (a) Arrange the yearly sales in increasing order:,,4,6,7,0 Then the sample median is (4 + 6)/ = 5 Using + or to indicate the yearly sales above and below the sample median, we get,,, +, +, + Then the total number of runs R = According to Table D, the p-value is P (R ) = 0 > 005 We conclude that there is no significant trend at 5% level 6

(b) The ranks of yearly sale, rank(x i ) s, are,, 3, 5, 4, 6 Thus the test statistic NM = 5 [rank(x i ) rank(x i+ )] = + + + + = 4 According to Table S, the p-value is P (NM 4) = 0047 We conclude that there is a significant trend at 5% level (c) The conclusions in (a) and (b) are not consistent The reason is that the RVN test is more powerful than the test based on total number of runs We follow the RVN test and conclude that there is a significant trend Problem 7 Stat 43 Consider a without-replacement sample of size from a population of size 5, with joint inclusion probabilities π = π 45 = 05, π 3 = π 34 = 0, π 3 = π 4 = 005, π 4 = π 35 = 0, and π 5 = π 5 = 008 (a) Calculate the inclusion probability π i for this design (b) The variance Horvite-Thompson estimator can be written as V (ˆt HT ) = N N ( π i ) t i + π i N k i π ik π i π k π ik t i t k Show that it is equivalent to V (ˆt HT ) = N N k i π i π k π ik π ik ( ti t ) k π i π k (Hint: N k i π ik = (n )π i ) (c) Suppose that t i = i, i =,, 5 Find ˆV HT (ˆt HT ) and ˆV SY G (ˆt HT ) for sample {, } Which one is more stable in general? Solution to Problem 7 (a) π = π + π 3 + π 4 + π 5 = 04, π = π + π 3 + π 4 + π 5 = 038, π 3 = π 3 + π 3 + π 34 + π 35 = 037, π 4 = π 4 + π 4 + π 34 + π 45 = 04, π 5 = π 5 + π 5 + π 35 + π 45 = 043 (b) Utilizing that fact that N k i π ik = (n )π i, we can draw the conclusion after some ( ) t simple algebra through the expansion of the term i π i t k π k 7

(c) For S = {, }, we have ˆV HT (ˆt HT ) = i S ( π i ) t i π i + i S k Sk i π ik π i π k π ik t i π i t k π k For S = {, }, we have ˆV HT (ˆt HT ) = ( π ) t π = 057 ˆV HT (ˆt HT ) = i S k Sk i + ( π ) t π π i π k π ik π ik ˆV SY G (ˆt HT ) = π π π π = 008 + π π π π t π t π ( ti t ) k π i π k ( t t ) π π SYG one is generally more stable among the two estimators Problem 8 Stat 45 Suppose that X = (X,, X n ) are iid samples from a twocomponent normal mixture model with density f(x) = αn(x µ, ) + ( α)n(x ν, ) The parameter of interest is θ = (α, µ, ν), where α [0, ] Derive the EM algorithm for computing the maximum likelihood estimator of θ Solution to Problem 8 Introduce missing data Z = (Z,, Z n ), where Z i is a label for the mixture component that X i was sampled from That is, { if X i N(µ, ) Z i = i =,, n 0 if X i N(ν, ), The complete data is Y = (X, Z), and the complete-data log-likelihood is log L Y (θ) = Z i {log α + (X i µ) } + ( Z i ){log( α) + (X i ν) } Now we need the conditional distribution of Z, given X This is effectively an application of Bayes formula, and we can write Z i (X i, θ (t) ) Ber(ω (t) i ), where ω (t) i = α (t) N(X i µ (t), ), i =,, n α (t) N(X i µ (t), ) + ( α (t) )N(X i ν (t), ) Then we have the following E- and M-steps 8

E-step We evaluate the Q-function as follows: Q(θ θ (t) ) = E[log L Y (θ) X, θ (t) ] = ω (t) i {log α + (X i µ) } + ( ω (t) i ){log( α) + (X i ν) } M-step Differentiation of Q with respect to each element in θ, setting the equations to zero, and solving, we get the following updates: α (t+) = n ω (t) i, µ (t+) = ω(t) i X i ω(t) i, ν (t+) = ( ω(t) i )X i ( ω(t) i ) Problem 9 Stat 46 matrix (a) Find the period of state 0 A Markov chain X 0, X, has the transition probability 0 0 0 P = 0 0 0 0 0 0 0 0 (b) Is this a regular Markov chain? If yes, determine the limiting distribution If no, explain why Solution to Problem 9 (a) Since there is only one communicating class, the periods of all states are the same Hence we only need to compute the period of state 3 But it is easy to see that the period of state 3 is (b) It is not a regular Markov chain because P n 33 will not be strictly positive for all large n (Note that if the chain starts from state 3, the probability it comes back to state 3 in odd number of steps is zero) Problem 0 Stat 46 A Markov chain has a state space S = {0,, } and the following transition probability matrix 0 0 4 4 4 3 3 If the chain starts from state, find the probability that it does not visit state prior its absorption at state 0 9

Solution to Problem 0 The question is equivalent to assuming that state is also an absorption state and ask what is the probability that the chain is stopped at state 0, instead of state More precisely, we have a Markov chain with state space S = {0,, } and the following transition probability matrix 0 0 4 4 0 0 We want to compute the probability that the chain is eventually stopped at state 0 There are several ways to compute this probability For example, it is computed as 0 0 + 04 = 3 Problem Stat 48 We are interested the effect of nozzle type on the rate of fluid flow Three specific nozzle types are under study Five runs through each of the three nozzle types led to the following results: Nozzle Type Rate of Flow ȳ i = 5 j y ij j (y ij ȳ i ) A 966 97 964 974 978 9708 38 B 975 964 970 96 968 9678 048 C 970 960 956 958 970 968 808 Construct the ANOVA table and test whether nozzle type has an effect on fluid flow Use significance level α = 005 and F critical value F(005;,) = 389 Solution to Problem The ANOVA table is Source SS df MS F Treatment 633 087 34 Error 484 0349 Total 587 4 where 484 = 38 + 048 + 808, ȳ = (9708 + 9678 + 968)/3 = 9673, 633 = 5 ((9708 9673) + (9678 9673) + (968 9673) ) Since 34 < F (005;, ) = 389, there is no sufficient evidence to conclude that nozzle type has an effect Problem Stat 48 Y, the model is Consider one explanatory variable x and the response variable Y i = β 0 + β x i + ε i, i =,, n, where the iid errors ε i N (0, σ ) (a) Derive the least square estimates ˆβ 0 and ˆβ as follows such that the fitted line ˆβ 0 + ˆβ x is the best straight line to fit the observed data (x, y ),, (x n, y n ),ie ˆβ = (x i x) y i (x i x), ˆβ 0 = ȳ ˆβ x 0

given that ( x i) n (x i ) = (x i x) (b) Prove that the distribution of ˆβ is as follows: ) ˆβ N (β, σ s xx where s xx = (x i x) (c) Based on the sampling distribution of ( ) ˆσ ˆβ β / s xx when σ is unknown, construct a confidence interval for β How to detect the existence of the linear relationship by using this confidence interval? Solution to Problem (a) Least square criterion Q (β 0, β ) = n (Y i β 0 + β x i + ε i ) Take derivative with respect to β 0, β respectively Solve the equation ˆβ = Q (β 0, β ) β 0 = Q (β 0, β ) β = (b) Let c i = (x i x) (x i x), then (Y i β 0 + β x i + ε i ) = 0 (Y i β 0 + β x i + ε i ) x i = 0 (x i x)y i (x i x), ˆβ 0 = ȳ ˆβ x ˆβ = (x i x) y i (x i x) = c i y i, as y i ind N (β 0 + β x i, σ ), then ( ˆβ N c i (β 0 + β x i ), ) c i σ = N (β, σ (c) Take ˆσ = MSE = SSE/ (n ) ( ) ˆσ ˆβ β / t (n ) s xx then the confidence interval for β : ˆβ ± t α (n ) MSE s xx If the confidence interval doesn t cover zero point, then there exists significant linear relationship between the response and the covariate at the given significance level s xx )