Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.

Similar documents
Problem Selected Scores

Monte Carlo Studies. The response in a Monte Carlo study is a random variable.

STA 2201/442 Assignment 2

401 Review. 6. Power analysis for one/two-sample hypothesis tests and for correlation analysis.

STAT 461/561- Assignments, Year 2015

Part 4: Multi-parameter and normal models

The Delta Method and Applications

Statement: With my signature I confirm that the solutions are the product of my own work. Name: Signature:.

First Year Examination Department of Statistics, University of Florida

Master s Written Examination

STAT 512 sp 2018 Summary Sheet

Chapter 2. Discrete Distributions

Statistics 3858 : Maximum Likelihood Estimators

Problem 1 (20) Log-normal. f(x) Cauchy

Stat 5101 Lecture Notes

Statistics & Data Sciences: First Year Prelim Exam May 2018

Bias Variance Trade-off

Contents 1. Contents

Central Limit Theorem ( 5.3)

Peter Hoff Linear and multilinear models April 3, GLS for multivariate regression 5. 3 Covariance estimation for the GLM 8

Review. December 4 th, Review

BTRY 4090: Spring 2009 Theory of Statistics

HT Introduction. P(X i = x i ) = e λ λ x i

Master s Written Examination - Solution

Bayesian performance

Linear Methods for Prediction

ECON 4160, Autumn term Lecture 1

Masters Comprehensive Examination Department of Statistics, University of Florida

Statistics - Lecture One. Outline. Charlotte Wickham 1. Basic ideas about estimation

Econometrics I KS. Module 2: Multivariate Linear Regression. Alexander Ahammer. This version: April 16, 2018

Ph.D. Qualifying Exam Friday Saturday, January 6 7, 2017

STATISTICS SYLLABUS UNIT I

MISCELLANEOUS TOPICS RELATED TO LIKELIHOOD. Copyright c 2012 (Iowa State University) Statistics / 30

Math 494: Mathematical Statistics

Statistical Methods for Handling Incomplete Data Chapter 2: Likelihood-based approach

Unbiased Estimation. Binomial problem shows general phenomenon. An estimator can be good for some values of θ and bad for others.

Final Exam. 1. (6 points) True/False. Please read the statements carefully, as no partial credit will be given.

Spring 2012 Math 541B Exam 1

Mathematical statistics

Bivariate Paired Numerical Data

A Very Brief Summary of Statistical Inference, and Examples

Simple Linear Regression

Economics 583: Econometric Theory I A Primer on Asymptotics

Random vectors X 1 X 2. Recall that a random vector X = is made up of, say, k. X k. random variables.

Association studies and regression

Asymptotic Statistics-III. Changliang Zou

IEOR E4703: Monte-Carlo Simulation

Regression and Statistical Inference

Data Mining Stat 588

POLI 8501 Introduction to Maximum Likelihood Estimation

Ch 2: Simple Linear Regression

Model Checking and Improvement

Lecture 21: Convergence of transformations and generating a random variable

parameter space Θ, depending only on X, such that Note: it is not θ that is random, but the set C(X).

STATS 200: Introduction to Statistical Inference. Lecture 29: Course review

Regression Estimation - Least Squares and Maximum Likelihood. Dr. Frank Wood

MA 575 Linear Models: Cedric E. Ginestet, Boston University Non-parametric Inference, Polynomial Regression Week 9, Lecture 2

Hypothesis Testing. 1 Definitions of test statistics. CB: chapter 8; section 10.3

Review. DS GA 1002 Statistical and Mathematical Models. Carlos Fernandez-Granda

Primer on statistics:

Confidence Intervals, Testing and ANOVA Summary

Terminology Suppose we have N observations {x(n)} N 1. Estimators as Random Variables. {x(n)} N 1

Introduction to Estimation Methods for Time Series models Lecture 2

Unbiased Estimation. Binomial problem shows general phenomenon. An estimator can be good for some values of θ and bad for others.

Answer Key for STAT 200B HW No. 7

Outline. Motivation Contest Sample. Estimator. Loss. Standard Error. Prior Pseudo-Data. Bayesian Estimator. Estimators. John Dodson.

Lecture 2: Basic Concepts and Simple Comparative Experiments Montgomery: Chapter 2

Robustness and Distribution Assumptions

Estimation, Inference, and Hypothesis Testing

Statistics. Lecture 2 August 7, 2000 Frank Porter Caltech. The Fundamentals; Point Estimation. Maximum Likelihood, Least Squares and All That

Practice Problems Section Problems

[y i α βx i ] 2 (2) Q = i=1

TABLE OF CONTENTS CHAPTER 1 COMBINATORIAL PROBABILITY 1

Mathematical statistics

Lecture 7 Introduction to Statistical Decision Theory

MAS223 Statistical Inference and Modelling Exercises

SYSM 6303: Quantitative Introduction to Risk and Uncertainty in Business Lecture 4: Fitting Data to Distributions

Maximum Likelihood, Logistic Regression, and Stochastic Gradient Training

ST440/540: Applied Bayesian Statistics. (9) Model selection and goodness-of-fit checks

Estimators as Random Variables

ECE531 Lecture 10b: Maximum Likelihood Estimation

STA 732: Inference. Notes 10. Parameter Estimation from a Decision Theoretic Angle. Other resources

Statistics 135 Fall 2008 Final Exam

Linear Methods for Prediction

Linear models and their mathematical foundations: Simple linear regression

Generalized Linear Models Introduction

Review Quiz. 1. Prove that in a one-dimensional canonical exponential family, the complete and sufficient statistic achieves the

A General Overview of Parametric Estimation and Inference Techniques.

Master s Written Examination

Lecture 1: August 28

Lecture 6: Discrete Choice: Qualitative Response

Likelihood-based inference with missing data under missing-at-random

P n. This is called the law of large numbers but it comes in two forms: Strong and Weak.

Probability and Estimation. Alan Moses

EC212: Introduction to Econometrics Review Materials (Wooldridge, Appendix)

Stat 5102 Final Exam May 14, 2015

COS513 LECTURE 8 STATISTICAL CONCEPTS

Non-parametric Inference and Resampling

A Very Brief Summary of Statistical Inference, and Examples

LECTURE 5 NOTES. n t. t Γ(a)Γ(b) pt+a 1 (1 p) n t+b 1. The marginal density of t is. Γ(t + a)γ(n t + b) Γ(n + a + b)

Transcription:

1. Let P be a probability measure on a collection of sets A. (a) For each n N, let H n be a set in A such that H n H n+1. Show that P (H n ) monotonically converges to P ( k=1 H k) as n. (b) For each n N, let G n be a set in A such that G n+1 G n. Show that P (G n ) monotonically converges to P ( k=1 G k) as n. If you can do this problem, you will be OK in a measure theory class. 2. Show formally that for any CDF defined by F (y) = Pr(Y y), we have lim y F (y) = 0, lim y F (y) = 1, and that F is right continuous. 3. Show how you you can use the CDF F of a random variable Y to compute (a) Pr(Y (a, b]); (b) Pr(Y (a, b)); (c) Pr(Y [a, b]). 4. Let Z N(0, 1). Derive the density of X where (a) X = e Z. (b) X = Z 2 ; 5. Let Y be a random variable with a continuous strictly increasing CDF, so in particular F 1 exists and F 1 (F (y)) = y. (a) Find the CDF of U where U = F (Y ); (b) Find the CDF of X, where X = F 1 (U); (c) Explain how these results can be used to simulate a normal distribution in R using only the runif command and the qnorm command. Write out your computer code. 1

6. Let F be a discrete CDF with jumps at Y {0, 1, 2, 3, 4}, with (F (0), F (1), F (2), F (3), F (4)) = (.1,.15,.3,.6, 1.0). Describe how to simulate from this distribution using a random variable U uniform(0, 1). 7. Let Y θ Binomial(n, θ) and let θ beta(a, b). Derive the marginal density of Y and the conditional density of θ given Y. 8. Let Y θ N(θ, σ 2 ) and let θ N(µ, τ 2 ). Derive the marginal density of Y and the conditional density of θ. 9. Normal distribution properties via change of variables: (a) Let X N(θ, τ 2 ). Using the univariate change of variables formula, obtain the distribution of Y = µ + σx. (b) Let W N(θ 1, τ 2 1 ) and X N(θ 2, τ 2 2 ) be independent. Find the distribution of W + X using the multivariate change of variables method. (c) Let Y 1,..., Y n i.i.d. N(µ, σ 2 ). Use the above two results to obtain the distribution of Ȳ. (d) Let Y 1,..., Y n be independent with Y i N(µ i, σ 2 i ). Use the first two results to obtain the distribution of Ȳ. 10. Let Y 1,..., Y n i.i.d. N(µ, σ 2 ). Using the multivariate change of variables formula, show that Ȳ = Y i /n is independent of S 2 = (Yi Ȳ )2 /(n 1). 11. Show the following: (a) E[a + by ] = a + be[y ]. (b) V [a + by ] = b 2 V [Y ]. 12. Let Y be a positive random variable. Use Jensen s inequality to relate 2

(a) E[Y p ] 1/p to E[Y q ] 1/q for p > q 1; (b) E[1/Y ] to 1/E[Y ]; (c) log E[Y ] to E[log Y ]. 13. Let {Y t : t N} be i.i.d. random variables on Y = { 1, 0, +1}, with Pr(Y t = 1) = p and Pr(Y t = +1) = p +. These random variables represent jumps of a particle along a one-dimensional grid. Let S T = T t=1 Y t be the position of the particle at time T. Compute the mean and variance of S T as a function of p and p +. Describe qualitatively the behavior of the particle as T increases, as a function of p and p +. 14. Let (w 1, w 2, w 3 ) Dirichlet(α 1, α 2, α 3 ). (a) Compute the expected value and variance of w j for j {1,..., 3}. (b) Compute the variance of w 1 + w 2 + w 3. (c) Compute the covariance of w 1 and w 2, and explain intuitively the sign of the result. (d) Obtain the distribution of θ = w 1 + w 2. 15. Let X and Y be random variables. Show that E[f(X)g(X, Y ) X] = f(x)e[g(x, Y ) X]. 16. Highly skewed data (y 1,..., y n ) are often analyzed on a log scale, i.e. we analyze (x 1,..., x n ) = (ln y 1,..., ln y n ). (a) Show that e x ȳ. (*) Bonus question: Compare f 1 ( f(y i )/n) for f(x) = 1/x, f(x) = ln x and f(x) = x. 17. Variance of correlated sums: (a) Derive the variance of ax + by for possibly correlated random variables X and Y. 3

(b) Let Y = (Y 1,..., Y n ) T be a vector of real-valued random variables. Compute the variance of Y i /n when Var[Y i ] = σ 2 for all i and i. Cor[Y i, Y j ] = 0 for all i.j; ii. Cor[Y i, Y j ] = ρ for all i, j; iii. Cor[Y i, Y j ] = ρ if i j = 1 and is zero if i j > 1. Describe in words how correlation affects the variance of the sample mean. 18. Suppose E[Y ] = µ and Var[Y ] = σ 2. Consider the estimator ˆµ = (1 w)µ 0 + wy. (a) Find the expectation and variance of ˆµ. (b) Find the bias and MSE of ˆµ (as functions of µ). (c) For what values of µ does ˆµ have lower MSE than Y? 19. Let C k (Y 1,..., Y n ) = (Ȳ a kσ/ n, Ȳ + a kσ/ n) for k {1, 2}, where a 1 = z.975 and a 2 = 1/.05. Via simulation, find the coverage rates of C 1 and C 2 for n {1, 10} when (a) Y 1,..., Y n i.i.d. N(µ, σ 2 ); (b) Y 1,..., Y n i.i.d. double exponential with variance 2. (c) Y 1,..., Y n i.i.d. beta(.1,.5). Include your code as an appendix to your homework. Discuss your results, and your thoughts on the robustness of the z-interval when the data are not normal (importantly, in this exercise, we are using the true variance of the population instead of an estimate). 20. Interval for a proportion: Let Y binomial(n, θ). (a) Find the approximate (large n) distribution of ˆθ = Y/n. Find a function of ˆθ that is approximately standard normal. 4

(b) Based on the normal approximation, obtain the form of an approximate 1 α CI for θ. Roughly how wide to you expect this to be for a given value of θ? (c) Obtain a CI for θ using Hoeffding s inequality. Compare the width of this CI to the approximate normal CI. 21. Convergence of correlated sums: Let {Y i : i N} be a vector of realvalued random variables with E[Y i ] = µ and Var[Y i ] = σ 2. obtain a WLLN for Ȳ = Y i /n in the cases (a) Cor[Y i, Y j ] = ρ for all i, j; (b) Cor[Y i, Y j ] = ρ if i j = 1 and is zero if i j > 1. Try to Discuss how correlation affects the asymptotic concentration of Ȳ around µ. 22. Weighted estimates: Sometimes our measurements of a quantity of interest have differing levels of precision. Let {Y i : i N} be a vector of independent real-valued random variables with E[Y i ] = µ and Var[Y i ] = σ 2 i. (a) Find the mean and variance of Ȳw = n i=1 w iy i, where the w i s are constants that sum to one. (b) Find the values of the w i s that minimize the variance of Ȳw. (c) Obtain a WLLN for Ȳw. 23. Moment generating functions: (a) Obtain the MGFs for the Poisson, exponential, and Gamma distributions. (b) Find the distributions of n i=1 Y i, where Y 1,..., Y n are i.i.d. Poisson, exponential, or gamma random variables. 5

24. Normal tail behavior: Show that if Z N(0, 1), then Pr(Z > t) φ(t)/t for t > 0. (Hint: Recall from the proof of Markov s inequality that zp(z) dz tp(z) dz). t t 25. Sketch a proof of the multivariate delta method. 26. Let Y 1,..., Y n be a sample from a bivariate population P where E[Y i ] = (µ A, µ B ), Var[Y i,a ] = Var[Y i,b ] = 1 and Cor[Y i,a, Y i,b ] = ρ. The purpose of this problem is to derive an estimator and standard error for ρ. (a) For this problem, what moments of P does ρ depend on? (b) Find CAN estimators of the moments from (a). (c) Find a CAN estimator ˆρ of ρ and give its limiting distribution. (d) Find a large-n estimate of the standard deviation of ρ, that is, its standard error. 27. Suppose you obtain a random sample from a population and the numerical values of the sample are (y 1,..., y n ), where each y i is a real valued number. Let ˆF be the empirical CDF based on these numbers, that is, ˆF (y) equals the fraction of yi s at or below the value y. (a) ˆF is discrete - where are the jumps? (b) How big are the jumps if there are no ties in the sample? What if there are ties? (c) ˆF is a valid CDF, and so it corresponds to a probability distribution, say ˆP, called the empirical distribution. distribution, that is, describe ˆP ((a, b]). 28. Confidence intervals and bands for CDFs. Describe this (a) Write down the formula for a 95% pointwise confidence interval for F (y), using a plug-in estimate of F (y) (hint: this is just the usual 6

normal interval for a binomial proportion, with F (y) replacing p, the population proportion). Also write down the formula for the 95% interval based on Hoeffding/DKW. (b) Simulate at least S = 10, 000 datasets consisting of samples of size n = 10 from the uniform distribution on [0, 1]. For each simulated dataset, check to see if the two confidence intervals cover the true value of F (y) for y {.1,.2,...,.8,.9}. For example, for each simulation s = 1,..., S you might make a vector c N s, where c N s [k] indicates whether or not the normal interval covers the true value of F (y) at the value k/10. (c) For both types of intervals use the results of the simulation to evaluate the pointwise coverage rates at y {.1,.2,...,.8,.9}; the global coverage rate, i.e., for what fraction of datasets did the intervals cover the true values of F at all y {.1,.2,...,.8,.9}. (d) Comment on the relative widths of the intervals, and summarize your findings about coverage rates. 29. Let Y 1,..., Y n i.i.d. from a distribution P with CDF F. Let ˆF be the empirical CDF of Y 1,..., Y n. For two points x, y with x < y, calculate E[ ˆF (y)], E[ ˆF (x) ˆF (y)] and Cov[ ˆF (x), ˆF (y)]. 30. Suppose we wish to simulate a bootstrap dataset Y = (Y 1,..., Y n ) from the empirical distribution of the observed sample values y = (y 1,..., y n ). Explain mathematically why this can be done with the R command Ystar <- sample(y,replace=true). 31. Suppose you observe an outcome y and a predictor x for a random sample of n = 10 objects, with y-values (2.38,2.72,-0.13,2.66,3.72,0.48,2.86,4.27,3.86, 2.04) and x-values (-0.63,0.18,-0.84,1.60,0.33,-0.82,0.49,0.74,0.58,-0.31). In other words, you sample (X 1, Y 1 ),..., (X n, Y n ) i.i.d. from some bivariate distribution, and these are the numerical results you get. Con- 7

sider the normal linear regression model y i ɛ 1,... ɛ n i.i.d. N(0, σ 2 ). = β 0 + β 1 x i + ɛ i, with (a) Obtain the usual normal-theory standard error, 95% CI and p- value for the OLS estimate of β 1 (use the lm command in R). (b) Obtain a bootstrap estimate of the standard deviation of ˆβ 1 (this is the bootstrap standard error), obtain a normal-theory CI for β 1 using the bootstrap standard error, and compare to the results in (a). (c) Obtain the bootstrap distribution of the p-value and display it graphically, including the observed p-value as a reference. How might you describe the evidence that β 1 0? How stable is this evidence? 32. Let Y 1,..., Y n i.i.d. from a continuous distribution P. Given Y 1,..., Y n, let Y 1,..., Y n i.i.d. from ˆP, the empirical distribution of Y 1,..., Y n. Let Ȳ = Y i /n. (a) Compute E[Ȳ ˆP ] and Var[Ȳ ˆP ]. (b) Now compute E[Ȳ ] and Var[Ȳ ], the unconditional expectation and variance of Ȳ, marginal over i.i.d. samples Y 1,..., Y n from P. 33. Suppose Y 1,..., Y n i.i.d. N(µ, σ 2 ). (a) What is the distribution of n(ȳ µ)/σ, and why? (b) What is the distribution of (n 1)s 2 /σ 2, and why? (c) What is the distribution of n(ȳ µ)/s, and why? (d) For w [0, 1], find the coverage rate for the set C w (Y ) = (Ȳ + s n t α(1 w), Ȳ + s n t 1 αw ). 34. Let Y N(µ, 1) and consider testing the hypothesis H : µ = 0. Consider an acceptance region of the form A 0 = (Y : z α(1 w) < Y < z 1 αw ). 8

(a) Show that the type I error rate of such a test is α for w [0, 1]. (b) Obtain the power of the test, that is, Pr(Y A 0 µ) as a function of µ. Make a plot of this power function for w = 1/2 and w = 1/4. When would you use w = 1/2? When would you use w = 1/4? 35. Suppose treatment A is assigned to a random selection of 5 experimental units, and 5 remaining experimental units are assigned treatment B. The observed treatment assignments and measured responses are (X 1,..., X 10 ) = (B, A, A, B, A, B, A, B, B, A), and (Y 1,..., Y 10 ) = (7.5, 1.2, 5.5, 2.2, 9.1, 8.7, 3.2, 5.1, 6.2, 1.7). (a) Assuming the A and B outcomes are random samples from N(µ A, σ 2 ) and N(µ B, σ 2 ) populations, compute the appropriate t-statistic for testing H : µ A = µ B, state the distribution of the statistic under H, and compute the p-value, (b) Using the same test statistic, do a permutation test of H : no treatment effect. Specifically, obtain the permeation null distribution, and compute the corresponding p-value. (c) Graphically compare the two null distributions, and compare the p-values. Describe the differences in assumptions that the two testing procedures make. (d) Obtain the permutation null distributions and p-values for the test statistics ȲA ȲB and ȲA/ȲB. 36. Let Y be a random variable and t(y ) be a test statistic. The p-value is p(y ) = Pr(t(Y ) > t(y )), where the distribution of Y is the null distribution P 0, with CDF F 0. Show that the distribution of p(y ) under Y P 0 is uniform on [0, 1] (Hint: Find the CDF of the p-value in terms of F 0 ). 37. Let Y 1,..., Y n i.i.d. P θ0 P. Find the log likelihood function and the form of the MLE in the cases where P is the set of 9

(a) Poisson distributions with mean θ R; (b) the multinomial distributions with probabilities (θ 1,..., θ p ); (c) the uniform distribution on (θ 1 θ 2 /2, θ 1 + θ 2 /2), with θ 1 R and θ 2 R +. 38. Let Y i = e X i where X 1,..., X n i.i.d. N(µ, σ 2 ). Let φ = E[Y i ]. (a) Find the expectation and variance of Y i, and the expectation and variance of Ȳ, in terms of µ and σ2. (b) Find the MLE ˆφ of φ based on Y 1,..., Y n, and find an approximation to the variance of ˆφ. Discuss the magnitude of Var[ ˆφ] relative to Var[Ȳ ]. (c) Perform a simulation study where you compare ˆφ and Ȳ in terms of bias, variance and MSE. 39. Let f and g be discrete pdfs on {0, 1, 2,...} and define D(f, g) = y log(f(y)/g(y)) f(y). (a) Show that D(f, g) > 0 if f g and D(f, f) = 0. (b) Let g θ be the Poisson pdf with mean θ. Find the value of θ that minimizes D(f, g θ ), in terms of moments of f. 40. Consider a one parameter exponential family model, P = {P θ : θ Θ}, with densities f(y θ) = c(y) exp(θt(y) A(θ)) for θ Θ R. Here, t(y) is a scalar-valued function of the data point y. (a) For a sample of size n, write out the log-likelihood function and simplify as much as possible. (b) Find the likelihood equation, with the data on one side of the equation and items involving the parameter on the other. (c) Now take the derivative of p(y θ) with respect to θ and integrate to obtain a formula for the expectation of t(y). equation to the one obtained on (b), and comment. Compare this 10

41. Let (X i, Y i ) i.i.d. with Y i X i binary(e β 0+β 1 X i /(1 + e β 0+β 1 X i )) and X i P X. Our goal is to infer θ = (β 0, β 1 ). (a) Find a formula for the log-likelihood and the score function, and obtain equations that determine the MLE (i.e., the likelihood equations ). (b) Write down the observed information for θ, and compute the Fisher information. (c) Find the asymptotic distribution of ˆθ MLE as a function of the Fisher information. Is this usable for inference if P X is unknown? (d) Find another asymptotic approximation to the distribution of ˆθ MLE that can be used if P X is unknown. Describe how this approximation can be used to provide a hypothesis test of H : β 1 = 0. 42. Let Y 1,..., Y n i.i.d. gamma(a, b), parameterized so that E[Y i ] = a/b. (a) Write down the log-likelihood and obtain the likelihood equations. (b) Compute the Fisher information and use this to obtain a joint asymptotic distribution for (â MLE, ˆb MLE ). (c) Let µ = a/b. Obtain the asymptotic distribution of ˆµ MLE = â MLE /ˆb MLE. 43. Suppose Y 1,..., Y n i.i.d. gamma(a, b) as in the previous problem, but the statistician thinks that Y 1,..., Y n i.i.d. N(µ, σ 2 ) for some unknown values of µ, σ 2. (a) What values of (µ, σ 2 ) will maximize the expected log likelihood, E[log p(y µ, σ 2 )]? Here, the expectation is with respect to the true gamma distribution for Y, and your answer should depend on (a, b). (b) Make an argument that (ˆµ MLE, ˆσ MLE 2 ) converges in probability to something, say what that something is and explain your reasoning. 11

(c) What is the standard error of ˆµ MLE for the statistician who assumes normality? How does this compare to the standard error of a statistician who correctly assumes the gamma model (as in the previous problem)? (d) Discuss the consequences of model misspecification in this case. 44. Information inequalities: (a) Adapt the derivation of the Cramer-Rao information inequality to obtain a lower bound on the variance of a biased estimator. (b) For the model Y 1,..., Y n i.i.d. N(µ, σ 2 ), the posterior mean estimator ˆµ of µ under the prior µ N(0, τ 2 ) is ȳ (n/σ 2 )/(n/σ 2 + 1/τ 2 ). Use (a) to obtain a lower bound on the variance of ˆµ and compare to the actual variance, Var[ˆµ µ]. 45. Let X 1,..., X n i.i.d. gamma(a x, b x ) and Y 1,..., Y n i.i.d. gamma(a y, b y ). (a) Compute the (-2 log) likelihood ratio statistic for testing H : a x = a y, b x = b y, and state the asymptotic null distribution. (b) Simulate the actual null distribution of the statistic in the case that a x = a y = b x = b y = 1 for the sample sizes n = 5, 10, 20, 40, and compare to the asymptotic null distribution. 46. Let X 1,..., X m i.i.d. N(µ x, σ 2 ) and Y 1,..., Y n i.i.d. N(µ y, σ 2 ). (a) For the case that σ 2 is known, compute and compare the AIC and BIC for the two models corresponding to µ x = µ y and µ x µ y. For each model selection criterion, give the decision rule for choosing µ x µ y over µ x = µ y. Also compare these decision rules to deciding based on a level-α z-test. (b) Repeat for the case that σ 2 is unknown, but now compare AIC and BIC to deciding based on a level-α t-test. 12

(c) Now compute and compare the AIC and BIC decision rules for the case that the variances of the two population are not necessarily equal. 47. Let Y 1,..., Y n i.i.d. N(θ, 1). Using level-α n z-tests where α n depends on n, develop a consistent model selection procedure for choosing between θ = 0 and θ 0. 48. Let p 1,..., p m i.i.d. (1 γ)p 0 + γp 1, where P 0 is the uniform distribution on [0, 1] and P 1 is some other distribution, with CDF F 1. (a) Write out the probability that p 1 < α/m in terms of α, m, F 1, γ. (b) Write out the probability that the Bonferroni procedure rejects the global null hypothesis H 0 : γ = 0 at level α, that is, the probability that the smallest p-value is less than α/m. (c) Approximate the above probability using the approximation that log(1 x) x for small x. (d) Based on this approximation, evaluate if the probability of rejection is increasing or decreasing in α and in γ. Explain why your answers make sense. (e) What are conditions on F 1 that suggest (based on the approximation) that the Bonferroni procedure will have good power as m? 49. Let Y i θ i N(θ i, 1) independently for i = 1,..., m with m = 100. Using a Monte Carlo approximation, compute the probability of rejecting the global null H 0 : θ 1 =... = θ m = 0 at level α =.05 using the Bonferroni procedure, Fisher s procedure, and a test based on the statistic Y 2 i, under the following scenarios: (a) θ 1,..., θ m i.i.d. N(0, K/100) for K {1, 2, 4, 8, 16, 32}. (b) θ 1 = K and θ 2 =... = θ m = 0, where K {1, 2, 3, 4, 5, 6}. 13

50. Consider a model for m p-values, p 1,..., p m i.i.d. from a mixture distribution P = (1 γ)p 0 + γp 1, where P 0 is uniform on [0, 1] and P 1 is a beta(1, b) distribution. (a) Propose a modified Benjamini-Hochberg procedure to control the FDR at level α, in the case that γ and b are known. (b) Compute the mean and variance of p 1 in terms of γ and b. Using these calculations, propose moment-based estimators of γ and b using the observed values of p 1,..., p m. Based on this, propose a modified BH procedure that can be used if γ and b are not known. (*) Compare the FDR and the number of discoveries made by the BH and modified BH procedure in a simulation study, for the case that b {1, 2, 4, 8} and some interesting values of α and γ. 14