ORF 245. Rigollet Date: 11/21/2008 Problem 1 (20) f(x) f(x) 0.0 0.1 0.2 0.3 0.4 0.0 0.2 0.4 0.6 0.8 4 2 0 2 4 Normal (with mean -1) 4 2 0 2 4 Negative-exponential x x f(x) f(x) 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.0 0.1 0.2 0.3 0.4 0.5 4 2 0 2 4 Log-normal 4 2 0 2 4 x x Cauchy 4 2 0 2 4 f(x) f(x) 0.0 0.1 0.2 0.3 0.4 0.5 0.0 0.1 0.2 0.3 0.4 0.5 x Uniform 5 0 5 Mixture of two normals The above figure displays six probability density functions together with the pdf of a standard normal distribution in dashed line for comparison (the scale for the axes can change!). They correspond to the following well known distributions: Normal (with mean -1), Log-normal, Uniform, Negative-exponential, Cauchy and mixture of two normals. The figure below displays normal quantile-quantile (Q-Q) plots of six samples, each of them being simulated from one of these distributions. For each of normal Q-Q plot (numbered from 1 to 6) write from which distribution you think the sample has been simulated and explain briefly your choice (simply write indications such as negatively or positively skewed, light or heavy tails,...) x page 1 of 10
Normal Q Q Plot Normal Q Q Plot Normal Q Q Plot 7 6 5 4 3 2 1 0 4 2 0 2 4 3 2 1 0 1 2 0 2 4 6 8 10 12 14 3 2 1 0 1 2 3 3 2 1 0 1 2 3 3 2 1 0 1 2 3 1 2 3 Normal Q Q Plot 3 2 1 0 1 2 3 40 20 0 20 40 Normal Q Q Plot 3 2 1 0 1 2 3 3 2 1 0 1 2 3 4 5 6 1. Negative exponential (skewed to the left) 2. Mixture of normals (two sets of aligned points)) 3. Normal with mean -1 (aligned points) 4. Log-normal (skewed to the right) 5. Cauchy (symmetric, heavy tails) 6. Uniform (Symmetric, light tails) 1.0 0.5 0.0 0.5 1.0 Normal Q Q Plot page 2 of 10
Problem 2 (25) A certain pen has been designed so that the true average writing lifetime µ under controlled conditions (involving the use of a writing machine) is at least 10 hours. A random sample of 14 pens is selected, the writing lifetime (in hours) of each is denoted by X 1,...,X 14. The observations gave: x = 9.62 and s 2 = 0.16. The normal Q-Q plot of the X i is given below. 10.0 10.2 10.4 10.6 10.8 11.0 11.2 Normal Q Q Plot 1 0 1 1. From the normal Q-Q plot, can you conclude that the observations are normally distributed? Why? Yes, the points are almost aligned. 2. State the appropriate hypothesis testing problem for the true average writing lifetime µ. H 0 : µ = 10 (H 0 : µ 10) Vs H 1 : µ < 10 The a priori belief it that µ is (at least) 10 hence H 0. The interesting alternative is µ < 10 because if we reject H 0 we want to be able to conclude that the pens have a too short lifetime. It is not interesting to conclude that the pens have a too long lifetime. 3. Find a testing procedure at level α. Normal observations and unknown variance = We use a Student t test. The test statistic is T = X 10 s/ n page 3 of 10
and has t distribution with n 1 degrees of freedom under the null hypothesis. Here n = 14 and in view of the alternative, we reject if T < t 13,α 4. Perform the above test at level 5%. From the table t 13,5% = 1.771 and the observed T is therefore, we reject H 0 at level 5%. 9.62 10 0.4/ = 3.555 < 1.771 14 5. Find a two-sided confidence interval for µ with confidence level 95%. The interval is of the form: [ X s t n 1,α/2, X s + t n 1,α/2 ] n n with α = 5% and n = 14. Numerical application yields [9.389, 9.851] 6. What is the smallest number of pens to be selected for this confidence interval to be of width 6 minutes? Since 6 min = 0.1 hr, we have to solve: s 2 t n 1,α/2 = 0.1 n with respect to n. Let us check what happens if we take n = 61 (the largest value for which we can read t n 1,α/2 in the table before it becomes equal to t,α/2 = z α/2. This value yields 0.4 2 2 = 0.205 61 which is larger than 0.1 so we should take n even larger than 61 and for such values t n 1,α/2 = z α/2 = 1.960. Therefore, we have to solve 2 1.960 0.4 n = 0.1 which yields n = 246 (which is indeed > 61). page 4 of 10
Problem 3 (25) Let X 1,...,X n be i.i.d random variables with uniform distribution on [0, θ]. In particular, each of them has pdf { 1/θ if 0 x θ f(x; θ) = 0 otherwise 1. What is the joint pdf f(x 1,...,x n ; θ) of the sample (X 1,..., X n )? Since the X i are i.i.d, their joint distribution is given by { 1/θ n if 0 min f(x 1,...,x n ; θ) = f(x 1 ; θ)... f(x n ; θ) = i x i max i x i θ 0 otherwise 2. Show that the maximum likelihood estimator of θ is given by The joint distribution of the previous question as a function of θ is equal to 0 if θ < max i x i and equal to 1/θ n if θ > max i x i. The likelihood is given by: f(x 1,...,X n ; θ) and its maximum is therefore attained at the smalled value of θ for which the likelihood is non zero and this value is ˆθ ML = max i X i. 3. Find first the cdf and then the pdf of ˆθ ML. The cdf is the function: (t/θ) n if 0 t θ F(t) = P(maxX i t) = [P(X 1 t)] n = 0 if t 0 i 1 if t θ The pdf is the derivative of the cdf and is given by nt n 1 f(t) = F (t) = if 0 t θ θ n 0 otehrwise 4. Using the previous question, compute E(ˆθ ML ). E(ˆθ ML ) = θ 0 tf(t)dt = θ 0 nt n θ n dt = n n + 1 θ page 5 of 10
5. Is ˆθ ML an unbiased estimator of θ? Why? If not, give a simple modification of that is unbiased. ˆθ ML is not an unbiased estimator of θ because E(ˆθ ML ) θ. However the estimator ˆθ ML is unbiased. Indeed n + 1 ML ˆθ n [ n + 1 E n ˆθ ] ML = n + 1 n E(ˆθ ML ) = θ. page 6 of 10
Problem 4 (30) Paul has decided to travel across Europe for a year. He is very interested in food and will be trying restaurants in each country he will visit. However he is afraid of putting on some weight and has decided that he will monitor his weight regularly every month during the 12 months he will spend there. His weight (in Lbs) at month i, i = 1,..., 12 is modelled by a random variable Y i of the form Y i = µ i + ε i, where µ i is Paul s true weight on month i and ε i, i = 1,..., 12 are i.i.d standard normal random variables which account for the errors of measurement from one month to another. Indeed, he will use different scales, measure his weight at different times of the day, etc... We assume that his true weight µ i will change over the months as follows: µ i = 150 + β i, where 150Lbs corresponds Paul s true weight before he leaves to Europe and β is an unknown parameter. We are mainly interested in the parameter β. You will need to use the following identities n n(n + 1) n i = and i 2 n(n + 1)(2n + 1) = 2 6 We will also need the pdf of a random variable X N(µ, σ 2 ): f(x; µ, σ 2 ) = 1 e (x µ)2 2σ 2 2πσ 2 Estimation problem: The parameter β controls the rate at which Paul will gain weight over time. 1. Denote by Ȳ = 1 12 Y i the average observed weight over the whole year Paul will spend in Europe. Find E(Ȳ ), V (Ȳ ) and the distribution of Ȳ. E(Ȳ ) = 1 12 12 1 12 12 13 E(Y i ) = 150 + β i = 150 + β = 150 + β 6.5 12 2 12 For the variance, remark that each Y i has variance 1. Therefore V (Ȳ ) = 1 12 V (Y 1) = 1 12 The Y i being independent random variables with normal distribution, Ȳ also has normal distribution with the above parameters: Ȳ N(150 + β 6.5, 1/12) page 7 of 10
2. Using the previous question, find an estimator ˆβ MOM for β using the method of moments. Since we have E(Ȳ ) = 150 + β 6.5 ˆβ MOM = Ȳ 150 6.5 3. Is ˆβ MOM an unbiased estimator of β? Why? E(ˆβ MOM ) = therefore ˆβ MOM is an unbiased estimator of β. E(Ȳ ) 150 = β 6.5 4. What is the joint pdf f(y 1,...,y 12 ; β) of the random variables Y 1,...,Y 12? The Y i, being independent, we find the joint pdf by taking the product of the marginal pdf. The latter are given by Taking the product gives the joint pdf: 1 f(y 1,...,y 12 ; β) = ( 2π) 12e f(y i ) = f(y i ; µ i, 1) = 1 e (y i µ i )2 2 2π P 12 (y i µ i ) 2 2 = 1 ( 2π) 12e P 12 (y i 150 β i) 2 2 5. Using the previous question, find explicitly the maximum likelihood estimator of β. Consider the log likelihood, given by ln[f(y 1,...,Y n ; β)] = n ln( 2π) 1 2 12 (Y i 150 β i) 2 Therefore, the maximum likelihood estimator is the β that minimizes the term 12 The derivative with respect to β is (Y i 150 β i) 2 ˆβ ML 12 2 (Y i 150 ˆβ ML i) i = 0 page 8 of 10
which gives β ML = where we used the fact that i(y i 150) i2 = 12 6. Is ˆβ ML an unbiased estimator of β? Why? E(β ML ) = i 2 = 650 i(e(y i) 150) = β i2 Therefore, β ML is an unbiased estimator of β. i(y i 150) 650 i2 12 i2 = β Testing problem: Paul is confident that by monitoring his weight regularly he will stay the same weight (150 Lbs) in average. He wants to test this hypothesis. 7. Given that Paul will either stay the same weight or gain some weight (losing weight in Europe is not an option!), state the appropriate hypothesis testing problem for β. H 0 : β = 0, Vs H 1 : β > 0 8. Based on 12 observations x 1,...,x 12, we used R to perform this test and the software outputs a p-value equal to 0.035 together with a lower confidence bound at level 95% for β. What is the sign (> 0 or < 0) of this lower confidence bound? Why? [Hint: a plot can be helpful] The p-value is smaller than 5%, which means that the null hypothesis is rejected. On the other hand, we can use the lower confidence bound (LCB) to construct a test at level 5% by rejecting when the LCB is positive. Since we reject, it means that the LCB is positive. page 9 of 10
Student s t critical values ν 0.60 0.667 0.75 0.80 0.87 0.90 0.95 0.975 0.99 0.995 0.999 1 0.325 0.577 1.000 1.376 2.414 3.078 6.314 12.706 31.821 63.657 318.31 2 0.289 0.500 0.816 1.061 1.604 1.886 2.920 4.303 6.965 9.925 22.327 3 0.277 0.476 0.765 0.978 1.423 1.638 2.353 3.182 4.541 5.841 10.215 4 0.271 0.464 0.741 0.941 1.344 1.533 2.132 2.776 3.747 4.604 7.173 5 0.267 0.457 0.727 0.920 1.301 1.476 2.015 2.571 3.365 4.032 5.893 6 0.265 0.453 0.718 0.906 1.273 1.440 1.943 2.447 3.143 3.707 5.208 7 0.263 0.449 0.711 0.896 1.254 1.415 1.895 2.365 2.998 3.499 4.785 8 0.262 0.447 0.706 0.889 1.240 1.397 1.860 2.306 2.896 3.355 4.501 9 0.261 0.445 0.703 0.883 1.230 1.383 1.833 2.262 2.821 3.250 4.297 10 0.260 0.444 0.700 0.879 1.221 1.372 1.812 2.228 2.764 3.169 4.144 11 0.260 0.443 0.697 0.876 1.214 1.363 1.796 2.201 2.718 3.106 4.025 12 0.259 0.442 0.695 0.873 1.209 1.356 1.782 2.179 2.681 3.055 3.930 13 0.259 0.441 0.694 0.870 1.204 1.350 1.771 2.160 2.650 3.012 3.852 14 0.258 0.440 0.692 0.868 1.200 1.345 1.761 2.145 2.624 2.977 3.787 15 0.258 0.439 0.691 0.866 1.197 1.341 1.753 2.131 2.602 2.947 3.733 16 0.258 0.439 0.690 0.865 1.194 1.337 1.746 2.120 2.583 2.921 3.686 17 0.257 0.438 0.689 0.863 1.191 1.333 1.740 2.110 2.567 2.898 3.646 18 0.257 0.438 0.688 0.862 1.189 1.330 1.734 2.101 2.552 2.878 3.610 19 0.257 0.438 0.688 0.861 1.187 1.328 1.729 2.093 2.539 2.861 3.579 20 0.257 0.437 0.687 0.860 1.185 1.325 1.725 2.086 2.528 2.845 3.552 21 0.257 0.437 0.686 0.859 1.183 1.323 1.721 2.080 2.518 2.831 3.527 22 0.256 0.437 0.686 0.858 1.182 1.321 1.717 2.074 2.508 2.819 3.505 23 0.256 0.436 0.685 0.858 1.180 1.319 1.714 2.069 2.500 2.807 3.485 24 0.256 0.436 0.685 0.857 1.179 1.318 1.711 2.064 2.492 2.797 3.467 25 0.256 0.436 0.684 0.856 1.178 1.316 1.708 2.060 2.485 2.787 3.450 26 0.256 0.436 0.684 0.856 1.177 1.315 1.706 2.056 2.479 2.779 3.435 27 0.256 0.435 0.684 0.855 1.176 1.314 1.703 2.052 2.473 2.771 3.421 28 0.256 0.435 0.683 0.855 1.175 1.313 1.701 2.048 2.467 2.763 3.408 29 0.256 0.435 0.683 0.854 1.174 1.311 1.699 2.045 2.462 2.756 3.396 30 0.256 0.435 0.683 0.854 1.173 1.310 1.697 2.042 2.457 2.750 3.385 35 0.255 0.434 0.682 0.852 1.170 1.306 1.690 2.030 2.438 2.724 3.340 40 0.255 0.434 0.681 0.851 1.167 1.303 1.684 2.021 2.423 2.704 3.307 45 0.255 0.434 0.680 0.850 1.165 1.301 1.679 2.014 2.412 2.690 3.281 50 0.255 0.433 0.679 0.849 1.164 1.299 1.676 2.009 2.403 2.678 3.261 55 0.255 0.433 0.679 0.848 1.163 1.297 1.673 2.004 2.396 2.668 3.245 60 0.254 0.433 0.679 0.848 1.162 1.296 1.671 2.000 2.390 2.660 3.232 0.253 0.431 0.674 0.842 1.150 1.282 1.645 1.960 2.326 2.576 3.090 page 10 of 10
Scrap paper
Scrap paper