STA 2101/442 Assignment 2 1

Similar documents
STA 2201/442 Assignment 2

Introduction 1. STA442/2101 Fall See last slide for copyright information. 1 / 33

STA 2101/442 Assignment 3 1

STA 302f16 Assignment Five 1

The Bootstrap 1. STA442/2101 Fall Sampling distributions Bootstrap Distribution-free regression example

UNIVERSITY OF TORONTO Faculty of Arts and Science

Categorical Data Analysis 1

STA 431s17 Assignment Eight 1

The Multinomial Model

STA 2101/442 Assignment Four 1

Introduction to Bayesian Statistics 1

Generalized Linear Models 1

CONTINUOUS RANDOM VARIABLES

M(t) = 1 t. (1 t), 6 M (0) = 20 P (95. X i 110) i=1

Interactions and Factorial ANOVA

Masters Comprehensive Examination Department of Statistics, University of Florida

The Multivariate Normal Distribution 1

Interactions and Factorial ANOVA

The Multivariate Normal Distribution 1

# of 6s # of times Test the null hypthesis that the dice are fair at α =.01 significance

STA442/2101: Assignment 5

Review. December 4 th, Review

STA441: Spring Multiple Regression. This slide show is a free open source document. See the last slide for copyright information.

Contingency Tables Part One 1

Probability theory and inference statistics! Dr. Paola Grosso! SNE research group!! (preferred!)!!

Problem 1 (20) Log-normal. f(x) Cauchy

The t-test: A z-score for a sample mean tells us where in the distribution the particular mean lies

First Year Examination Department of Statistics, University of Florida

Math 50: Final. 1. [13 points] It was found that 35 out of 300 famous people have the star sign Sagittarius.

Random Vectors 1. STA442/2101 Fall See last slide for copyright information. 1 / 30

INTERVAL ESTIMATION AND HYPOTHESES TESTING

Table of z values and probabilities for the standard normal distribution. z is the first column plus the top row. Each cell shows P(X z).

YORK UNIVERSITY. Faculty of Science Department of Mathematics and Statistics MATH A Test #2 June 11, Solutions

Summary of Chapters 7-9

Chapter 9 Inferences from Two Samples

Stat 135, Fall 2006 A. Adhikari HOMEWORK 6 SOLUTIONS

Purposes of Data Analysis. Variables and Samples. Parameters and Statistics. Part 1: Probability Distributions

Null Hypothesis Significance Testing p-values, significance level, power, t-tests Spring 2017

Regression and Statistical Inference

This does not cover everything on the final. Look at the posted practice problems for other topics.

Asymptotic Statistics-III. Changliang Zou

Mathematics 375 Probability and Statistics I Final Examination Solutions December 14, 2009

Mock Exam - 2 hours - use of basic (non-programmable) calculator is allowed - all exercises carry the same marks - exam is strictly individual

Statistics Ph.D. Qualifying Exam: Part I October 18, 2003

Final Exam. 1. (6 points) True/False. Please read the statements carefully, as no partial credit will be given.

Chapter 26: Comparing Counts (Chi Square)

Continuous Random Variables 1

Practice Problems Section Problems

determine whether or not this relationship is.

Power and Sample Size for Log-linear Models 1

Business Statistics. Lecture 5: Confidence Intervals

Null Hypothesis Significance Testing p-values, significance level, power, t-tests

Master s Written Examination

This paper is not to be removed from the Examination Halls

AMCS243/CS243/EE243 Probability and Statistics. Fall Final Exam: Sunday Dec. 8, 3:00pm- 5:50pm VERSION A

Review: General Approach to Hypothesis Testing. 1. Define the research question and formulate the appropriate null and alternative hypotheses.

UC Berkeley Math 10B, Spring 2015: Midterm 2 Prof. Sturmfels, April 9, SOLUTIONS

Midterm Examination. STA 215: Statistical Inference. Due Wednesday, 2006 Mar 8, 1:15 pm

Math Review Sheet, Fall 2008

Quantitative Introduction ro Risk and Uncertainty in Business Module 5: Hypothesis Testing

Introduction to Statistics

Smoking Habits. Moderate Smokers Heavy Smokers Total. Hypertension No Hypertension Total

Eco517 Fall 2004 C. Sims MIDTERM EXAM

If we want to analyze experimental or simulated data we might encounter the following tasks:

Stat 135 Fall 2013 FINAL EXAM December 18, 2013

18.05 Practice Final Exam

Statistical Inference: Estimation and Confidence Intervals Hypothesis Testing

z-test, t-test Kenneth A. Ribet Math 10A November 28, 2017

UNIVERSITY OF TORONTO SCARBOROUGH Department of Computer and Mathematical Sciences FINAL EXAMINATION, APRIL 2013

Statistical Inference

GEOMETRIC -discrete A discrete random variable R counts number of times needed before an event occurs

18.05 Final Exam. Good luck! Name. No calculators. Number of problems 16 concept questions, 16 problems, 21 pages

Lecture 10: Generalized likelihood ratio test

, 0 x < 2. a. Find the probability that the text is checked out for more than half an hour but less than an hour. = (1/2)2

Lectures on Statistics. William G. Faris

Gov 2000: 6. Hypothesis Testing

Lecture 21. Hypothesis Testing II

Marketing Research Session 10 Hypothesis Testing with Simple Random samples (Chapter 12)

Probability Theory and Statistics. Peter Jochumzen

Multiple Regression Analysis

10/31/2012. One-Way ANOVA F-test

Factorial ANOVA. STA305 Spring More than one categorical explanatory variable

UQ, Semester 1, 2017, Companion to STAT2201/CIVL2530 Exam Formulae and Tables

Elementary Statistics Triola, Elementary Statistics 11/e Unit 17 The Basics of Hypotheses Testing

Probability and Statistics Notes

STA Module 10 Comparing Two Proportions

Review. One-way ANOVA, I. What s coming up. Multiple comparisons

Inference for Proportions, Variance and Standard Deviation

The t-statistic. Student s t Test

Polytechnic Institute of NYU MA 2212 MIDTERM Feb 12, 2009

SALES AND MARKETING Department MATHEMATICS. 2nd Semester. Bivariate statistics. Tutorials and exercises

Y i = η + ɛ i, i = 1,...,n.

Confidence intervals CE 311S

Regression Part II. One- factor ANOVA Another dummy variable coding scheme Contrasts Mul?ple comparisons Interac?ons

Was there under-representation of African-Americans in the jury pool in the case of Berghuis v. Smith?

Business Statistics. Lecture 10: Course Review

Review. DS GA 1002 Statistical and Mathematical Models. Carlos Fernandez-Granda

STAT 461/561- Assignments, Year 2015

HYPOTHESIS TESTING: FREQUENTIST APPROACH.

Transcription:

STA 2101/442 Assignment 2 1 These questions are practice for the midterm and final exam, and are not to be handed in. 1. A polling firm plans to ask a random sample of registered voters in Quebec whether Quebec should separate from Canada and become an independent nation: Yes or No. They would like to be able to say that their results are expected to be accurate within three percentage points, nineteen times out of twenty. (a) Suppose the population percent favouring independence is 25%. What sample size is required to achieve the desired margin of error? (b) Suppose the population percent favouring independence is 40%. What sample size is required to achieve the desired margin of error? (c) What sample size would be required if you were unwilling to make any assumptions about the true percentage favouring independence? 2. For years, brand awareness for Big Red chewing gum has been stuck at about 6%, meaning that about 6% of consumers who chew gum say they remember hearing about Big Red gum. The gum company is planning an advertising campaign to increase brand awareness, in the hope that increased brand awareness will lead to increased sales. The advertising agency has a problem. With the budget they have been given to purchase media (air time, junk email, pop-up ads and so on), they are confident they can move brand awareness a little perhaps to 8%. In the old days, they could tell the client they had increased awareness by 33% and start to celebrate, but now the client has fallen under the influence of a U of T graduate who insists that a null hypothesis be rejected at the α = 0.05 level with a non-directional test before they admit that anything actually worked. A market research analyst from the adversing agency took a market research analyst from the gum company out to lunch, and they agreed on the test statistic n(y θ0 ) Z = θ0 (1 θ 0 ). Now, the advertising agency has to decide how many people they need to survey when they measure brand awareness, in order to have a good chance of rejecting the null hypothesis. It s important, because if the client thinks the advertising didn t work, they might get a new advertising agency. On the other hand, they also don t want to survey more people than necessary, because that s expensive. Suppose they want to be 90% sure of rejecting H 0 if they manage to increase brand awareness to 8%. What sample size do they need? I will start you out. You want the 1 This assignment was prepared by Jerry Brunner, Department of Statistics, University of Toronto. It is licensed under a Creative Commons Attribution - ShareAlike 3.0 Unported License. Use any part of it as you like and share the result freely. The L A TEX source code is available from the course website: http://www.utstat.toronto.edu/ brunner/oldclass/appliedf18 1

smallest (integer) sample size so that P r{ Z > 1.96} 0.90. Here are some points to consider. The null hypothesis, of course, is θ = θ 0. What is θ 0? The answer is a specific number in this problems. Power is being calculated under the assumption a true parameter value of θ = 0.08. When I calculate the probability indicated above (power), I get an expression in n, and my answer emerges in terms of Φ, the cumulative distribution function of a standard normal. That is, Φ(0) = 1, and so on. Φ is exactly R s pnorm function. 2 Again, what sample size is required? I suggest you use R to calculate power for different values of n, until you find the smallest n that makes the power at least 0.90. Please do the paper and pencil calculations, and then obtain the answer using R. My answer is 1653. 3. Suppose that several studies have tested the same hypothesis or similar hypotheses. For example, six studies could all be testing the effect of a treatment for arthritis. Two studies rejected the null hypothesis, one was almost significant, and the other three were clearly non-significant. What should be concluded? Ideally, one would pool the data from the six studies, but in practice the raw data are not available. All we have are the published test statistics. How do we combine the information we have and come to an overall conclusion? That is the main task of meta-analysis. In this question you will develop some simple, standard tools for meta-analysis. (a) Let the test statistic T be continuous, with pdf f(t) and cdf F (t) under the null hypothesis. The null hypothesis is rejected if T > c. Show that if H 0 is true, the distribution of the p-value is U(0, 1). Derive the density. Start with the cumulative distribution function of the p-value: P r{p x} =.... (b) Suppose H 0 is false. Would you expect the distribution of the p-value to still be uniform? Pick one of the alternatives below. You are not asked to derive anything for now. i. The distribution should still be uniform. ii. We would expect more small p-values. iii. We would expect more large p-values. (c) Let P i U(0, 1). Show that Y i = 2 ln(p i ) has a χ 2 distribution. What are the degrees of freedom? (d) Let P 1,... P n be a random sample of p-values with the null hypotheses all true, and let Y = n i=1 2 ln(p i). What is the distribution of Y? Only derive it (using moment-generating functions) if you don t know the answer. (e) Let P i U(0, 1), and denote the cumulative distribution function of the standard normal by Φ(x). 2

i. What is the distribution of Y i = Φ 1 (1 P i )? Show your work. ii. If H 0 is false and P i is not uniform, would you expect Y i to be bigger, or smaller? Why? (f) Let P 1,... P n be a random sample of p-values. i. Propose a test statistic based on your answer to Question 3(e)i. ii. What is the null hypothesis of your test? iii. What is the distribution of your test statistic under the null hypothesis? Only derive it (using moment-generating functions) if you don t know the answer. iv. Would you reject the null hypotesis when your test statistic has big values, or when it has small values? Which one? (g) Suppose we observe the following random sample of p-values: 0.016 0.188 0.638 0.148 0.917 0.124 0.695. i. For the test statistic of Question 3d, A. What is the critical value at α = 0.05? The answer is a number. B. What is the value of the test statistic? The answer is a number. C. Do you reject the null hypothesis? Yes or No. D. What if anything do you conclude? ii. For the test statistic of Question 3(e)i, A. What is the critical value at α = 0.05? The answer is a number. B. What is the value of the test statistic? The answer is a number. C. Do you reject the null hypothesis? Yes or No. D. What if anything do you conclude? 4. Suppose n(t n θ) d p T. Show T n θ. Please use Slutsky lemmas rather than 1 definitions. Hint: Think of the sequence of constants n as a sequence of degenerate random variables (variance zero) that converge almost surely and hence in probability to zero. Now you can use a Slutsky lemma. 5. Let X 1,..., X n be a random sample from a Binomial distribution with parameters 3 and θ. That is, ( ) 3 P (X i = x i ) = θ x i (1 θ) 3 x i, x i for x i = 0, 1, 2, 3. Find the maximum likelihood estimator of θ, and show that it is strongly consistent. 3

6. Let X 1,..., X n be a random sample from a continuous distribution with density f(x; τ) = τ 1/2 2π e τx2 2, where the parameter τ > 0. Let τ = n n. i=1 X2 i Is τ a consistent estimator of τ? Answer Yes or No and prove your answer. Hint: You can just write down E(X 2 ) by inspection. This is a very familiar distribution. 7. Let X 1,..., X n be a random sample from a distribution with mean µ. Show that T n = 1 n+400 n i=1 X i is a strongly consistent estimator of µ. 8. Let X 1,..., X n be a random sample from a distribution with mean µ and variance σ 2. Prove that the sample variance S 2 n i=1 = (X i X) 2 is a strongly consistent estimator of n 1 σ 2. 9. Independently for i = 1,..., n, let Y i = βx i + ɛ i, where E(X i ) = E(ɛ i ) = 0, V ar(x i ) = σx 2, V ar(ɛ i) = σɛ 2, and ɛ i is independent of X i. Let n i=1 β n = X iy i n. i=1 X2 i Is β n a consistent estimator of β? Answer Yes or No and prove your answer. 10. In this problem, you ll use (without proof) the variance rule, which says that if θ is a real constant and T 1, T 2,... is a sequence of random variables with then T n P θ. lim E(T n) = θ and lim V ar(t n ) = 0, n n In Problem 9, the independent variables are random. Here they are fixed constants, which is more standard (though a little strange if you think about it). Accordingly, let Y i = βx i + ɛ i for i = 1,..., n, where ɛ 1,..., ɛ n are a random sample from a distribution with expected value zero and variance σ 2, and β and σ 2 are unknown constants. (a) What is E(Y i )? 4

(b) What is V ar(y i )? (c) Use the same estimator as in Problem 9. Is β n unbiased? Answer Yes or No and show your work. (d) Suppose that the sequence of constants n i=1 x2 i as n. Does this guarantee β n will be consistent? Answer Yes or No. Show your work. (e) Let β 2,n = Y n x n. Is β 2,n unbiased? Consistent? Answer Yes or No to each question and show your work. Do you need a condition on the x i values? (f) Prove that β n is a more accurate estimator than β 2,n in the sense that it has smaller variance. Hint: The sample variance of the explanatory variable values cannot be negative. 11. Let X be a random variable with expected value µ and variance σ 2. Show X n p 0. 12. Let X 1,..., X n be a random sample from a Gamma distribution with α = β = θ > 0. That is, the density is f(x; θ) = 1 θ θ Γ(θ) e x/θ x θ 1, for x > 0. Let θ = X n. Is θ a consistent estimator of θ? Answer Yes or No and prove your answer. 13. Here is an integral you cannot do in closed form, and numerical integration is challenging. For example, R s integrate function fails. 1/2 0 e cos(1/x) dx Using R, approximate the integral with Monte Carlo integration, and give a 99% confidence interval for your answer. You need to produce 3 numbers: the estimate, a lower confidence limit and an upper confidence limit. 5