Theory of Inference: Homework 4

Similar documents
Lecture 21: October 19

GOV 2001/ 1002/ E-2001 Section 3 Theories of Inference

Chapter 4 HOMEWORK ASSIGNMENTS. 4.1 Homework #1

STAT 135 Lab 5 Bootstrapping and Hypothesis Testing

ST417 Introduction to Bayesian Modelling. Conjugate Modelling (Poisson-Gamma)

Stat 135, Fall 2006 A. Adhikari HOMEWORK 6 SOLUTIONS

Inference for a Population Proportion

Introduction to Bayesian Learning. Machine Learning Fall 2018

Lecture 23 Maximum Likelihood Estimation and Bayesian Inference

(4) One-parameter models - Beta/binomial. ST440/550: Applied Bayesian Statistics

Testing Restrictions and Comparing Models

ECE531 Lecture 6: Detection of Discrete-Time Signals with Random Parameters

Statistics Ph.D. Qualifying Exam: Part I October 18, 2003

Regression Estimation - Least Squares and Maximum Likelihood. Dr. Frank Wood

Hypothesis Testing. Part I. James J. Heckman University of Chicago. Econ 312 This draft, April 20, 2006

Machine Learning Basics Lecture 7: Multiclass Classification. Princeton University COS 495 Instructor: Yingyu Liang

From Model to Log Likelihood

Likelihoods. P (Y = y) = f(y). For example, suppose Y has a geometric distribution on 1, 2,... with parameter p. Then the pmf is

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.

Introduction 1. STA442/2101 Fall See last slide for copyright information. 1 / 33

CS-E3210 Machine Learning: Basic Principles

Testing Simple Hypotheses R.L. Wolpert Institute of Statistics and Decision Sciences Duke University, Box Durham, NC 27708, USA

Metric Predicted Variable on Two Groups

Review. December 4 th, Review

BIOS 312: Precision of Statistical Inference

Bayesian Methods. David S. Rosenberg. New York University. March 20, 2018

The University of Hong Kong Department of Statistics and Actuarial Science STAT2802 Statistical Models Tutorial Solutions Solutions to Problems 71-80

Definition 3.1 A statistical hypothesis is a statement about the unknown values of the parameters of the population distribution.

Mathematical statistics

Computational Cognitive Science

Statistical Inference

Economics 520. Lecture Note 19: Hypothesis Testing via the Neyman-Pearson Lemma CB 8.1,

Lecture 12 November 3

Mathematical Statistics

PMR Learning as Inference

Review. DS GA 1002 Statistical and Mathematical Models. Carlos Fernandez-Granda

DS-GA 1003: Machine Learning and Computational Statistics Homework 7: Bayesian Modeling

Two hours. To be supplied by the Examinations Office: Mathematical Formula Tables THE UNIVERSITY OF MANCHESTER. 21 June :45 11:45

STA 414/2104, Spring 2014, Practice Problem Set #1

ebay/google short course: Problem set 2

Bayesian Regression (1/31/13)

DS-GA 1002 Lecture notes 11 Fall Bayesian statistics

Module 22: Bayesian Methods Lecture 9 A: Default prior selection

Testing Hypothesis. Maura Mezzetti. Department of Economics and Finance Università Tor Vergata

DIFFERENT APPROACHES TO STATISTICAL INFERENCE: HYPOTHESIS TESTING VERSUS BAYESIAN ANALYSIS

Partitioning the Parameter Space. Topic 18 Composite Hypotheses

Handout 4: Simple Linear Regression

Lecture : Probabilistic Machine Learning

Stat 5102 Final Exam May 14, 2015

Hypothesis Testing - Frequentist

Week 5: Logistic Regression & Neural Networks

1 Introduction. P (n = 1 red ball drawn) =

TUTORIAL 8 SOLUTIONS #

Parametric Techniques Lecture 3

Lecture 17: Small-Sample Inferences for Normal Populations. Confidence intervals for µ when σ is unknown

Final Examination. STA 215: Statistical Inference. Saturday, 2001 May 5, 9:00am 12:00 noon

Joint, Conditional, & Marginal Probabilities

EECS564 Estimation, Filtering, and Detection Exam 2 Week of April 20, 2015

Summary of Chapters 7-9

Lecture 2. G. Cowan Lectures on Statistical Data Analysis Lecture 2 page 1

Introduction to Bayesian Methods

Special Topic: Bayesian Finite Population Survey Sampling

STAT 514 Solutions to Assignment #6

Statistical Inference

Stat 535 C - Statistical Computing & Monte Carlo Methods. Arnaud Doucet.

Lecture 8: Information Theory and Statistics

Lecture 2: Basic Concepts and Simple Comparative Experiments Montgomery: Chapter 2

Birnbaum s Theorem Redux

Lecture notes on statistical decision theory Econ 2110, fall 2013

Midterm Examination. STA 215: Statistical Inference. Due Wednesday, 2006 Mar 8, 1:15 pm

Homework 6 Solutions

Hypothesis Testing. Testing Hypotheses MIT Dr. Kempthorne. Spring MIT Testing Hypotheses

LECTURE 10: NEYMAN-PEARSON LEMMA AND ASYMPTOTIC TESTING. The last equality is provided so this can look like a more familiar parametric test.

Parametric Techniques

Model Averaging (Bayesian Learning)

If there exists a threshold k 0 such that. then we can take k = k 0 γ =0 and achieve a test of size α. c 2004 by Mark R. Bell,

Chapter 9: Hypothesis Testing Sections

10. Composite Hypothesis Testing. ECE 830, Spring 2014

Compute f(x θ)f(θ) dθ

3 Joint Distributions 71

COMP90051 Statistical Machine Learning

Bayesian inference. Justin Chumbley ETH and UZH. (Thanks to Jean Denizeau for slides)

ECE531 Homework Assignment Number 6 Solution

Lecture 8: Information Theory and Statistics

LECTURE NOTES 57. Lecture 9

Regression Estimation Least Squares and Maximum Likelihood

Statistics & Data Sciences: First Year Prelim Exam May 2018

MAS3301 Bayesian Statistics

(1) Introduction to Bayesian statistics

Introduction to Statistics

HOMEWORK #4: LOGISTIC REGRESSION

Nuisance parameters and their treatment

A Very Brief Summary of Statistical Inference, and Examples

UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013

Stat 5102 Notes: Markov Chain Monte Carlo and Bayesian Inference

STAT 425: Introduction to Bayesian Analysis

Statistical Inference

Hypothesis Test. The opposite of the null hypothesis, called an alternative hypothesis, becomes

Institute of Actuaries of India

P n. This is called the law of large numbers but it comes in two forms: Strong and Weak.

Transcription:

Theory of Inference: Homework 4 1. Here is a slightly technical question about hypothesis tests. Suppose that Y 1,..., Y n iid Poisson(λ) λ > 0. The two competing hypotheses are H 0 : λ = λ 0 versus H 1 : λ = λ 1 for specified values of λ 0 and λ 1. (a) Derive the form of the likelihood ratio, f 0 (y)/f 1 (y). Answer. First, we need the PMF for y, which is f(y; λ) = n i=1 e λ λy i y i! e nλ λ i y i = e nλ λ nȳ where ȳ is the sample mean; we can ignore multiplicative terms that do not depend on λ (since they will cancel in ratios). Then f 0 (y) f 1 (y) = e n(λ 0 λ 1 ) ( λ0 λ 1 ) nȳ. (b) Evaluate this ratio for the values λ 0 = 0.50, λ 1 = 1.25 and the dataset y obs = { 1, 0, 2, 2, 0, 1, 1, 1, 1, 1, 3 }. Thinking like a Bayesian, what does the dataset say about the two hypotheses? Answer. Here is the R code I used to generate the dataset and evaluate the likelihood ratio. 1

set.seed(101) lambda0 <- 0.50 lambda1 <- 1.25 n <- 11 y <- rpois(n, lambda1); print(paste(y, collapse = ", ")) ybar <- mean(y) loglr <- -n * (lambda0 - lambda1) + (n * ybar) * log(lambda0 / lambda1) print(exp(loglr)) # 0.02568676 This is a likelihood ratio of about 1/40. To a Bayesian who did not have strong prior beliefs about λ, this would indicate strong support for H 1 over H 0. This is reassuring because you will see from the code that the true hypothesis is H 1. (c) (Harder maybe just think about this one over Easter.) Type 1 and Type 2 errors of the rejection region R := { y : f } 0(y) f 1 (y) k Evaluate the for k 0.5. Hint: you will find the Normal approximation to the Poisson very helpful for a pen-and-paper calculation (leave your results expressed in terms of the Normal distribution function), or else you can do the exact calculation in R, using the ppois function. Answer. Taking λ 0 < λ 1, we have y R nȳ log k + n(λ 0 λ 1 ) log(λ 0 /λ 1 ) =: c (make sure you check this!). If the Y s are IID Poisson with rate λ, then the sum of the Y s is also Poisson (prove this using the Moment Generating Function), and its rate is its expectation, nλ. Using the Normal approximation to the Poisson, we then have nȳ N( ; nλ, nλ), approximately, remembering that the variance of a Poisson is the same as the 2

expectation. Therefore the Type 1 error is approximately α := Pr 0 (Y R) = Pr 0 (nȳ c) 1 Φ(c ; nλ 0, nλ 0 ) where Φ is the Normal distribution function, with specified expectation and variance. The Type 2 error is β := Pr 1 (Y R) Φ(c ; nλ 1, nλ 1 ) Here is some R code for the Type 1 and the Type 2 error: k <- 0.5 c <- (log(k) + n * (lambda0 - lambda1)) / log(lambda0 / lambda1) print(c) # 9.760163 alpha <- pnorm(c, n * lambda0, sqrt(n * lambda0), lower.tail = FALSE) print(alpha) # 0.03464381 beta <- pnorm(c, n * lambda1, sqrt(n * lambda1)) print(beta) # 0.1409683 Now let s also do the exact calculation in R: alpha <- ppois(floor(c), n * lambda0, lower.tail = FALSE) print(alpha) # 0.05377747 beta <- ppois(c, n * lambda1) print(beta) # 0.1217738 showing that the Normal approximation is accurate to about 2 percentage points, in this case. (d) (Another one to think about.) Produce a plot of the Type 1 error (x-axis) versus the Type 2 error (y-axis) for a range of values for k. This curve is known as the operating characteristics curve for this model and this pair of hypotheses. It is obvious that operating characteristics curve goes through (0, 1) and (1, 0), and it is fairly easy to prove that it is always convex (follows from the Neyman-Pearson Lemma). 3

Answer. kvals <- seq(from = -2, to = 2, length = 101) kvals <- 10^kvals opchar <- sapply(kvals, function(k) { c <- (log(k) + n * (lambda0 - lambda1)) / log(lambda0 / lambda1) alpha <- ppois(floor(c), n * lambda0, lower.tail = FALSE) beta <- ppois(c, n * lambda1) c(k = k, alpha = alpha, beta = beta) }) opchar <- t(opchar) # has columns: k, alpha, beta opchar <- rbind(c(0, 0, 1), opchar, c(inf, 1, 0)) # always have these endpoints plot(opchar[, c("alpha", "beta")], type = "l", col = "black", asp = 1, xlim = c(0, 1), ylim = c(0, 1), xlab = expression(paste("values for ", alpha)), ylab = expression(paste("values for ", beta)), main = "Operating characteristics curve") dev.print(pdf, "opchar.pdf") 4

Operating characteristics curve Values for β 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 Values for α 2. Here is an exam-style revision question about hypothesis tests. Part 2d is unseen and a bit harder. Each part carries five marks. (a) Describe the general framework for Hypothesis Testing, and give illustrations of the types of hypotheses that might be tested. Answer. I ll just sketch the answers, without reference to the notes. A hypothesis is a statement about the values of the parameters in a parametric model for some observables. In a hypothesis test we assess the evidential support for two competing hypotheses. So if the parametric model is Y p( ; θ) for θ Ω, then the two hypotheses can be written, in general, as H 0 : θ Ω 0 versus H 1 : θ Ω 1 5

where Ω 0, Ω 1 Ω, and Ω 0 Ω 1 =. If Ω i is a single point, then H i is a simple hypothesis; otherwise it is a composite hypothesis. One illustration is for two competing models: Y p 0 ( ) versus Y p 1 ( ), which can be thought of as special cases of a single more general model with θ {0, 1}. In this case both H 0 and H 1 are simple. Another common case is where some value θ 0 divides the scalar parameter space into two parts. In this case we have H 0 : θ θ 0 versus H 1 : θ > θ 0, and both H 0 and H 1 are composite. One special case of this is where Ω = [θ 0, ), in which case H 0 is simple and H 1 is composite. A third illustration is where a hypothesis concerns only one of the parameters, say H 0 : µ = µ 0 versus H 1 : µ > µ 0, in the case where Y N( ; µ, σ 2 ). In this case both H 0 and H 1 are composite, because the value of σ 2 is not specified. (b) Explain how a Bayesian might choose between two simple hypotheses. Answer. Let the simple hypotheses be H 0 : Y p 0 ( ) versus H 1 : Y p 1 ( ). According to Bayes s theorem in odds form, Pr(H 0 Y = y) Pr(H 1 Y = y) = p 0(y) p 1 (y) Pr(H 0 ) Pr(H 1 ). The Bayesian will typically favour the hypothesis with the larger posterior probability; i.e. H 0 if the lefthand side (the posterior odds for H 0 versus H 1 ) is greater than one, and H 1 if it is less than one. The first term on the righthand side is the likelihood ratio. If this is much larger or much smaller than one, then it will typically dominate the second term, the prior odds. So the Bayesian will typically favour H 0 if the likelihood ratio is large, say more than 20, and H 1 if the likelihood ratio is small, say less than 1/20. In the middle, he will need to include explicit prior probabilities: if he does not have these, then he remains undecided. (c) State and prove the Neyman-Pearson Lemma. Answer. You can look this one up in the lecture notes! My proof was adapted 6

from section 8.2 of M. DeGroot and M. Schervish, 2002, Probability and Statistics, 3rd edn, Addison-Wesley. (d) Consider two simple hypotheses, H 0 and H 1, and the rejection region R := { y : f } 0(y) f 1 (y) k. Let α be the Type 1 error level of R, and β be the Type 2 error level (these are both functions of k). Show that α β k, where indicates the change that arises from a small change in k. Answer. Imagine ordering all the values of y Y in terms of the values of f 0 (y)/f 1 (y); denote this ordering y (1), y (2) and so on. Fix a value of k and let y (j) be the value in Y for which f 0 (y (j 1) ) f 1 (y (j 1) ) < k < f 0(y (j) ) f 1 (y (j) ), or f 0 (y (j) )/f 1 (y (j) ) k. Now slightly increase k, so that the rejection region now includes y (j). The increase in Type 1 error is f 0 (y (j) ), and the increase in the Type 2 error is f 1 (y (j) ), i.e. the Type 2 error goes down because the rejection region is bigger. So as stated. [Comment: α β = f 0(y (j) ) f 1 (y (j) ) k this is an important and useful result, but we did not prove it in the lectures, and it is not in the notes. If you look back to the operating characteristics picture above, the gradient of the curve is β/ α 1/k, so you can almost read the value of k from the curve, providing that the aspect ratio of the plot is 1, which it is, because I specified asp = 1 as an argument to the plot function.] 7

(e) Explain briefly how we might examine the hypotheses H 0 : θ = 0 versus H 1 : θ > 0 in the case of the model Y p( ; θ) with θ 0. Answer. θ 0 0.] [Note: I have corrected an ambiguity in the question, by writing Briefly, the lower bound on the likelihood ratio, over all possible prior distributions for θ with support on H 1, is p 0 (y) p 1 (y) p(y; θ 0) p(y; ˆθ 1 (y)) where ˆθ 1 (y) := argmax t>0 p(y; t). Now if this lower bound is quite large, say above 0.3, then this rules out the possibility of strong evidence against H 0, and so, if there is a reasonable preference for H 0 over H 1 in the prior (say, because H 0 is the current theory and H 1 is a speculative new theory), then H 0 will be chosen. On the other hand, if this lower bound is very small, say below 0.001, then the likelihood ratio could easily be small, say about 1/20, in which case we would want to choose H 1. But it could also be large, say 20, in which case we would want to choose H 0. It depends on the prior for θ under H 1, which we are reluctant to specify. So in this case we are undecided. This homework is not due until the first Tue of next Term (21 Apr). In the meantime I will circulate another homework on p-values. Jonathan Rougier Mar 2015 8