STAT 461/561- Assignments, Year 2015

Similar documents
Stat 710: Mathematical Statistics Lecture 31

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.

STATS 200: Introduction to Statistical Inference. Lecture 29: Course review

Stat 5102 Final Exam May 14, 2015

Master s Written Examination

Stat 5101 Lecture Notes

Non-parametric Inference and Resampling

Some General Types of Tests

Homework 7: Solutions. P3.1 from Lehmann, Romano, Testing Statistical Hypotheses.

simple if it completely specifies the density of x

Institute of Actuaries of India

Statistics - Lecture One. Outline. Charlotte Wickham 1. Basic ideas about estimation

Review. December 4 th, Review

Let us first identify some classes of hypotheses. simple versus simple. H 0 : θ = θ 0 versus H 1 : θ = θ 1. (1) one-sided

Master s Written Examination - Solution

parameter space Θ, depending only on X, such that Note: it is not θ that is random, but the set C(X).

Nonparametric tests. Timothy Hanson. Department of Statistics, University of South Carolina. Stat 704: Data Analysis I

STAT 135 Lab 5 Bootstrapping and Hypothesis Testing

Statistical Inference

Review. DS GA 1002 Statistical and Mathematical Models. Carlos Fernandez-Granda

McGill University. Faculty of Science. Department of Mathematics and Statistics. Part A Examination. Statistics: Theory Paper

Recall that in order to prove Theorem 8.8, we argued that under certain regularity conditions, the following facts are true under H 0 : 1 n

Spring 2012 Math 541B Exam 1

Lecture 17: Likelihood ratio and asymptotic tests

Extended Bayesian Information Criteria for Model Selection with Large Model Spaces

This does not cover everything on the final. Look at the posted practice problems for other topics.

Chapter 11. Hypothesis Testing (II)

Hypothesis Test. The opposite of the null hypothesis, called an alternative hypothesis, becomes

Statistics Ph.D. Qualifying Exam: Part I October 18, 2003


Hypothesis Testing. 1 Definitions of test statistics. CB: chapter 8; section 10.3

Final Exam. 1. (6 points) True/False. Please read the statements carefully, as no partial credit will be given.

Stat 135, Fall 2006 A. Adhikari HOMEWORK 6 SOLUTIONS

Political Science 236 Hypothesis Testing: Review and Bootstrapping

Lecture 10: Generalized likelihood ratio test

Math 494: Mathematical Statistics

Robustness and Distribution Assumptions

Practice Problems Section Problems

Lecture 26: Likelihood ratio tests

First Year Examination Department of Statistics, University of Florida

Hypothesis testing: theory and methods

STAT 263/363: Experimental Design Winter 2016/17. Lecture 1 January 9. Why perform Design of Experiments (DOE)? There are at least two reasons:

One-Sample Numerical Data

Association studies and regression

Math 494: Mathematical Statistics

Testing Statistical Hypotheses

Estimating the accuracy of a hypothesis Setting. Assume a binary classification setting

2014/2015 Smester II ST5224 Final Exam Solution

Subject CS1 Actuarial Statistics 1 Core Principles

Qualifying Exam CS 661: System Simulation Summer 2013 Prof. Marvin K. Nakayama

Dr. Maddah ENMG 617 EM Statistics 10/15/12. Nonparametric Statistics (2) (Goodness of fit tests)

f(x θ)dx with respect to θ. Assuming certain smoothness conditions concern differentiating under the integral the integral sign, we first obtain

STAT 5200 Handout #7a Contrasts & Post hoc Means Comparisons (Ch. 4-5)

First Year Examination Department of Statistics, University of Florida

SPRING 2007 EXAM C SOLUTIONS

STA2601. Tutorial Letter 104/1/2014. Applied Statistics II. Semester 1. Department of Statistics STA2601/104/1/2014 TRIAL EXAMINATION PAPER

Cherry Blossom run (1) The credit union Cherry Blossom Run is a 10 mile race that takes place every year in D.C. In 2009 there were participants

A Large-Sample Approach to Controlling the False Discovery Rate

The University of Hong Kong Department of Statistics and Actuarial Science STAT2802 Statistical Models Tutorial Solutions Solutions to Problems 71-80

Mathematics Qualifying Examination January 2015 STAT Mathematical Statistics

Advanced Statistics II: Non Parametric Tests

Math Review Sheet, Fall 2008

A Very Brief Summary of Statistical Inference, and Examples

Contents 1. Contents

Y i = η + ɛ i, i = 1,...,n.

Master s Written Examination

STAT 512 sp 2018 Summary Sheet

UQ, Semester 1, 2017, Companion to STAT2201/CIVL2530 Exam Formulae and Tables

MS&E 226: Small Data

Large-Scale Multiple Testing of Correlations

Chapter 4. Theory of Tests. 4.1 Introduction

Sequential Analysis & Testing Multiple Hypotheses,

Performance Evaluation and Comparison

Statistics 135 Fall 2007 Midterm Exam

Hypothesis Testing. Robert L. Wolpert Department of Statistical Science Duke University, Durham, NC, USA

EECS564 Estimation, Filtering, and Detection Exam 2 Week of April 20, 2015

Mathematics Ph.D. Qualifying Examination Stat Probability, January 2018

This paper is not to be removed from the Examination Halls

Quantitative Introduction ro Risk and Uncertainty in Business Module 5: Hypothesis Testing

Lecture 3. Inference about multivariate normal distribution

Marginal Screening and Post-Selection Inference

Inferential Statistics

Math 562 Homework 1 August 29, 2006 Dr. Ron Sahoo

STAT 830 Hypothesis Testing

Stat 710: Mathematical Statistics Lecture 27

Chapter 2: Resampling Maarten Jansen

Chapter 1 Statistical Inference

STT 843 Key to Homework 1 Spring 2018

Mathematical statistics

M(t) = 1 t. (1 t), 6 M (0) = 20 P (95. X i 110) i=1

* Tuesday 17 January :30-16:30 (2 hours) Recored on ESSE3 General introduction to the course.

The Delta Method and Applications

Master s Examination Solutions Option Statistics and Probability Fall 2011

(a) (3 points) Construct a 95% confidence interval for β 2 in Equation 1.

Bivariate Paired Numerical Data

CHL 5225H Advanced Statistical Methods for Clinical Trials: Multiplicity

TUTORIAL 8 SOLUTIONS #

STAT 830 Hypothesis Testing

Monte Carlo Studies. The response in a Monte Carlo study is a random variable.

Chapter 4 HOMEWORK ASSIGNMENTS. 4.1 Homework #1

Transcription:

STAT 461/561- Assignments, Year 2015 This is the second set of assignment problems. When you hand in any problem, include the problem itself and its number. pdf are welcome. If so, use large fonts and double-space. Change lines whenever it helps. Hand in any number of questions you have completed every Friday. I am happy to give you feedbacks, or ask TA to give you feedbacks. 1. Example 9.2 (at this moment) is to show that t-test for two-sided hypothesis is UMPU based on a theorem. I was a bit ignorant on checking two important conditions. (1) V is independent of T at ξ = 0. This is given in the note. Have this condition verified. (2) V is linear in U given T. This is not true. However, V is a monotone function of a linear function of U (given T ). Find this linear function and show the conclusion in this example is solid. (3) For the two-sided test, the UMPU test contains two constants k 1 and k 2 to be decided. Demonstrate why k 1 = k 2 is our solution in this problem. 2. Suppose X n has binomial distribution with probability of success given by θ. Consider the problem of constructing confidence interval for θ based on following methods. (i) Direct use of Wald s method: make use of asymptotic normality of the MLE ˆθ and replace the asymptotic variance by a sensible estimator; (ii) Direct use of Wald s method: make use of asymptotic normality of the MLE ˆθ and regarding the asymptotic variance as a function of θ. (iii) Find a variance stabilization transformation g(θ) such that g(ˆθ) is asymptotic normal with constant limiting variance. (iv) Work out the likelihood interval without activating asymptotic distribution. 1

2 (v) Work out the likelihood interval based on Wilks theorem. 3. (i) Suppose n = 200 and θ 0 = 0.4. The observed value of X n is 73. Obtain all intervals in the last problem. (ii) Suppose n = 20 and θ 0 = 0.1. The observed value of X n is 1. Obtain all intervals in the last problem. 4. Let X 1,..., X n be a sample from N(ξ, σ 2 ). (a) Show that the power of the student s t-test is an increasing function of ξ/σ for testing H 0 : ξ < 0 versus H 1 : ξ > 0. (One-sided test). (b) Show that the power of the student s t-test is an increasing function of ξ /σ for testing H 0 : ξ = 0 versus H 1 : ξ 0. (two-sided test). The next 2 questions are too hard for today s students. Try one of them. Waived for Undergrad students. 5. Suppose that X i = β 0 + β 1 t i + ϵ i, where t i s are fixed constants that are not all the same, ϵ i s are iid from N(0, σ 2 ), and β 0, β 1 and σ 2 are unknown parameters. Derive a UMPU test of sizes α for testing (a) H 0 : β 0 θ 0 versus H 1 : β 0 > θ 0 ; (b) H 0 : β 0 = θ 0 versus H 1 : β 0 θ 0. 6. Suppose (X i, y i ), i = 1, 2,..., n are a sample from a bivariate normal distribution with density function f(x, y; ξ, η, σ, τ) = {2πστ 1 ρ 2 } n exp { ( 1 1 2(1 ρ 2 ) σ 2 (xi ξ) 2 2ρ (xi ξ)(y στ i η) + 1 τ 2 (yi η) 2)}. (a) Determine the form of the UMPU test for H 0 : ρ 0 versus H 1 : ρ > 0; (b) Determine the rejection region of the test of size α in terms of the quantile of a well known distribution (t-distribution).

3 7. Carry out two permutation tests on the Precambrian iron formation data. Consider the hypothesis that the first two types have the same mean (H 0 ) versus the hypothesis that the first two formations have unequal means, (a) Use permutation methods (via mean, and Wilcoxin test) to get the p-values. (b) Use t-test, and CLT to obtain approximate P-values. An article on the origin of Precambrian iron formation reported the following data on percentage iron for 4 types of iron formation (1=carbonate, 2=silicate, 3=magnetite, 4=hematite) group observations 1: 20.5 28.1 27.8 27.0 28.0 25.2 25.3 27.1 20.5 31.3 2: 26.3 24.0 26.2 20.2 23.7 34.0 17.1 26.8 23.7 24.9 Decide for yourself on two-sided or one-sided tests. However, have it declared before you perform the analysis. 8. (Graduate students only). Let F n (x) be the empirical distribution function based on an iid sample from a continuous distribution F. Let D n (F ) be the Kolmogorov-Smirnov test statistic. (a) Show that D n (F ) 0 almost surely. (b) Show that the distribution of D n (F ) for any continuous F is the same as that of D n (F 0 ) when F 0 is a uniform distribution on [0, 1]. 9. The following values are iid observations from a binomial distribution with m = 10 and the probability of success θ. 4 3 3 3 2 3 4 3 2 1 3 7 5 2 2 2 2 1 3 4 (1) Obtain the 95% confidence interval of θ based on likelihood method.

4 (2) Let T n ( x, θ) = 20( x 10θ) 10θ(1 θ) be used as a test statistic for H 0 : θ = θ 0 versus H 1 : θ θ 0. Note that the sample size n = 20. Based on CLT, T n is asymptotic N(0, 1). Thus, we reject H 0 when T n ( x, θ 0 ) > 1.96 at 5% level. Numerically find all value of θ which is not rejected by the above test. Your outcome is a confidence interval. 10. The following are 5 iid observations of a random vector: (1.96, 1.93), (0.42,.46), (1.12, 0.27), (0.20, 0.39), (1.16, 0.12). Use some R-function to draw the asymptotic empirical likelihood 95% confidence region of the mean. 11. (Stat Graduate students only). Let F be the distribution family contains all one-dimensional distributions with finite first moment. Let θ = T (F ) be the first moment of F. Define R n (θ) as the empirical likelihood ratio function based on an iid sample of size n from F which was given as n R n (θ) = sup{ (np i ) : p i > 0; i=1 n n p i = 1, p i x i = θ} i=1 i=1 where x 1,..., x n is a set of i.i.d. observations from F. Show that, if we change the definition slightly into R n (θ) = sup F n { (nf {x i }) : i=1 xdf (x) = θ} where sup F is taken over F, F ({x i }) = F (x i ) F(x i ) is the probability mass the distribution F puts on x i. Then the region {θ : R n (θ) r 0 } contains all real values of θ for any choice of 1 > r 0 > 0.

5 12. Prove that the univariate empirical likelihood confidence region for θ = E(X 1 ) is an interval. Hint: show that some function is concave in θ. 13. Let X 1,..., X n be a random sample from exponential distribution with density function f(x; θ) = θ 1 exp( θ 1 x). Consider the case n = 201 and θ = 1. (a) Theoretically determine the median of this distribution. (b) Generate 1000 data sets with n = 201 to estimate the bias and variance of the sample median for estimating the population median. (c) Bootstrap the first sample in (b) to obtain estimates of the bias and variance of the sample median for estimating the population median. Remark: Use set.seed(2015561) so that we get at least the same first sample. 14. Generate 1000 sets of two sample data of size 30+30 from normal distribution with mean 0 and variance 1. Randomly select 20 sets of these two sample data. Add to each observation in control group by a common random value generated from Uniform (0, 3). This number is the same for the data in the same group, but different for different groups. Use Benjamini and Hochberg procedure to identify a set of differentially expressed genes based on two-sample t-test (two-sided). Choose q = 0.05 and 0.10. Compute the positive identification rate and the false discovery rate. positive identification rate : percentage of false null hypotheses (20 of them) are rejected. Repeat the above procedure 2000 times to get averages and standard deviations of PIR and FDR.

6 Remark: write a flexible code so that you can simulate data from other distributions and different effect sizes. 15. Repeat the above simulation experiment with data generated from standard Gamma distribution with 2 degrees of freedom. 16. In book of Wu and Hamada, there is a data set on girder experiment which studies 10 methods. Analyze the full experiment, including ANOVA table, multiple comparisons based on Bonferroni and Turkey s method. 17. Let us try out the LASSO. Generate a data set i = 1, 2,..., n according to the model y i = x τ i (s)β(s) + ϵ i such that ϵ i are all i.i.d. N(0, 1) and independent of each other. Create each x i a vector of length P = 1000 and such that has its first entry 1 and the 999 of them generated from N(0, 1). Let s be a random subset of {1, 2,..., P } of size 5. If s = {3, 6, 8, 20, 21}, then x τ i (s) is the sub vector of x i made of its 3rd, 6th, 8th, 20th and 21st entries. Let β(s) = (0.7, 0.9, 0.4, 0.3, 1.0). Now create a data set with n = 200 according to the linear model. Run glmpath function in R to find out the first 10 covariates will be selected by LASSO. Compare it to s, the covariates which should be ideally sellcted. Repeat the computer experiment 5 times, and put the outcomes in a table in 5 rows and each row has two segments: The first segment contains 5 entries that are truly active covariates; the next segment contains 10 covariates selected by lasso. Mark your table clearly.

7 18. If BIC or EBIC (with γ = 0.5) are used to decide the variables selected, what would be the variables selected in the 5 runs of the last question. Stop selection beyond 10. That is, select at most 10 covariates. 19. No more questions.