Statistical Inference Classical and Bayesian Methods Revision Class for Midterm Exam AMS-UCSC Th Feb 9, 2012 Winter 2012. Session 1 (Revision Class) AMS-132/206 Th Feb 9, 2012 1 / 23
Topics Topics We will talk about... 1 Estimation Winter 2012. Session 1 (Revision Class) AMS-132/206 Th Feb 9, 2012 2 / 23
Topics Topics We will talk about... 1 Estimation 2 Sampling Distributions of Estimators Winter 2012. Session 1 (Revision Class) AMS-132/206 Th Feb 9, 2012 2 / 23
Topics Topics We will talk about... 1 Estimation 2 Sampling Distributions of Estimators 3 Confidence Intervals Winter 2012. Session 1 (Revision Class) AMS-132/206 Th Feb 9, 2012 2 / 23
Topics Topics We will talk about... 1 Estimation 2 Sampling Distributions of Estimators 3 Confidence Intervals 4 Hypothesis testing Winter 2012. Session 1 (Revision Class) AMS-132/206 Th Feb 9, 2012 2 / 23
Topics Topics We will talk about... 1 Estimation 2 Sampling Distributions of Estimators 3 Confidence Intervals 4 Hypothesis testing 5 Problems Winter 2012. Session 1 (Revision Class) AMS-132/206 Th Feb 9, 2012 2 / 23
Estimation Maximum Likelihood Estimation Maximum Likelihood Estimation Maximum likelihood estimation This is a method that chooses as the estimate of θ the value that provides the largest value of the likelihood function Likelihood function When the joint p.d.f or the joint p.f f n (x θ) is regarded as a function of θ given the values x 1, x 2,..., x n it is called a likelihood function Maximum likelihood estimator/estimate For each possible value of x, we consider δ(x) Ω the value of θ which maximizes f n (x θ). Let ˆθ = δ(x) the estimator defined that way. The estimator ˆθ is a maximum likelihood estimator of θ. After X = x is observed, the value δ(x) is the maximum likelihood estimate Winter 2012. Session 1 (Revision Class) AMS-132/206 Th Feb 9, 2012 3 / 23
Estimation Properties of Maximum Likelihood Estimators MLE Properties Invariance Theorem If ˆθ is the maximum likelihood estimator of θ and if g is a one-to-one function, then g(ˆθ) is a MLE of g(θ) Consistency property Consider a random sample taken from a distribution with parameter θ. Suppose that for every large sample of size n greater that some given minimum, there exists a unique MLE of θ. Then under certain conditions, the sequence of MLE s is a consistent sequence of estimators of θ. This means that the MLE sequence converges in probability to the unknown value of θ as n Winter 2012. Session 1 (Revision Class) AMS-132/206 Th Feb 9, 2012 4 / 23
Estimation MLE for Multiparametric models MLE for the mean and variance of a Normal distribution ˆθ = (ˆµ, ˆσ 2 ) = ( X n, 1 n n (X i X n ) 2 ) i=1 Winter 2012. Session 1 (Revision Class) AMS-132/206 Th Feb 9, 2012 5 / 23
Estimation Sufficient Statistics Definition and Factorization theorem Sufficient statistics Let X 1, X 2,..., X n be a random sample from a distribution indexed by parameter θ. Let T be a statistics. Suppose that for every θ and every possible value t of T, the conditional distribution of X 1, X 2,..., X n given T = t and θ depends on t but not on θ. This means that the conditional distribution of X 1, X 2,..., X n given T = t and θ is the same for all values of θ. We say hat T is a sufficient statistic for the parameter θ Theorem: Factorization criterion Let X 1, X 2,..., X n form a random variable from either a continuous or a discrete distribution for which the pdf or the pf is f (x θ). The value of θ is unknown and belongs to a parameter space Ω. A statistic T = r(x 1, X 2,..., X n ) is a sufficient statistic if and only if the joint pdf or pf f n (x θ) can be factored as: f n (x θ) = u(x)v[r(x, θ)] for all values of x = (x 1, x 2,..., x n ) R n and θ Ω Winter 2012. Session 1 (Revision Class) AMS-132/206 Th Feb 9, 2012 6 / 23
Sampling Distributions of Estimators Sampling distribution of a statistic Definition of a Sampling distribution Sampling distribution Let X = (X 1, X 2,..., X n ) a random sample from a distribution involving parameter θ (unknown). Let T be a function of X and possibly of θ. This means, T = r(x 1, X 2,..., X n, θ). The distribution of T given θ is the sampling distribution of T Winter 2012. Session 1 (Revision Class) AMS-132/206 Th Feb 9, 2012 7 / 23
Sampling Distributions of Estimators The Chi-Square Distribution Definition of a Sampling distribution Theorem: Relationship with the Normal Distribution If X N(0, 1) then the variable Y = X 2 has a χ 2 distribution with one degree of freedom. Corollary: If the random variables X 1, X 2,..., X m are iid with standard normal distribution, the sum of squares X 2 1 +... + X 2 m has a χ 2 distribution with m degrees of freedom. Suppose X 1, X 2,..., X n is a sample from a Normal model with known mean and unknown variance. We found that the MLE of σ 2 is: ˆσ 0 2 = 1 n n i=1 (X i µ) 2. Consider the random variables Z i = (X i µ)/σ. These variables form a sequence of random variables from the standard normal distribution. From the Corollary before it follows that n i=1 Z i 2 has a χ 2 distribution with n degrees of freedom. Please note that n i=1 Z i 2 = n ˆσ 0 2/σ2. Therefore this quantity has a χ 2 distribution with n degrees of freedom. Winter 2012. Session 1 (Revision Class) AMS-132/206 Th Feb 9, 2012 8 / 23
Sampling Distributions of Estimators Sample mean and sample variance distributions Sample mean and sample variance distributions Theorem: Suppose X 1, X 2,..., X n is a random sample with a normal distribution with mean µ and variance σ 2. The sample mean X n and the sample variance 1 n n i=1 (X i X n ) 2 are independent variables. The sample mean X n has a normal distribution with mean µ and variance σ 2 /n and the variable n i=1 (X i X n ) 2 /σ 2 has a χ 2 distribution with n 1 degrees of freedom The quantity n i=1 (X i X n ) 2 /σ 2 is equivalent to n ˆσ 2 /σ 2 Winter 2012. Session 1 (Revision Class) AMS-132/206 Th Feb 9, 2012 9 / 23
Definition Sampling Distributions of Estimators t Distribution t Distribution Consider two independent random variables Y and Z where Y has a χ 2 distribution with m degrees of freedom and Z has the standard normal distribution. The random variable X = Z is called the t distribution ( Y )1/2 m with m degrees of freedom. Let Z = n 1/2 ( X n µ)/σ and Y = n ˆσ 2 /σ 2. Z and Y are independent random variables. Since Z has a standard normal distribution, and Y has a χ 2 distribution with n 1 degrees of freedom, the quantity Z/(Y /(n 1)) 1/2 has a t distribution with n 1 degrees of freedom. n i=1 (X i X n) 2 If we set σ = ( n 1 ) 1/2 the quantity U = n 1/2 ( X n µ)/σ has a t distribution with n 1 degrees of freedom. Winter 2012. Session 1 (Revision Class) AMS-132/206 Th Feb 9, 2012 10 / 23
Confidence Intervals Confidence Intervals for the mean of a Normal Distribution Definition of Confidence Intervals Confidence Interval Let X = (X 1,..., X n ) be a random sample from a distribution that depends of a parameter (or paramater vector) θ. Let g(θ) a real valued function of θ. Let A and B two statistics such that A B and for all values of θ: Pr(A < g(θ) < B) γ. Then the random interval (A, B) is called a coefficient γ confidence interval for g(θ) or a 100γ percent confidence interval for g(θ). Winter 2012. Session 1 (Revision Class) AMS-132/206 Th Feb 9, 2012 11 / 23
Confidence Intervals Confidence Intervals for the mean of a Normal Distribution Theorem Confidence interval for the mean of a Normal distribution Theorem Let X = (X 1,..., X n ) be a random sample from a Normal distribution with parameters µ and σ 2.For each γ in (0, 1), the interval with the following end points is an exact coefficient γ confidence interval for µ: A = X n T 1 n 1 ((γ + 1)/2)σ /n 1/2 B = X n + T 1 n 1 ((γ + 1)/2)σ /n 1/2 Note: Before observing the data we can be 95% confident that the interval (A, B) will contain µ. However after observing the data we are not sure whether (a, b) will contain µ. The safest interpretation is to say that (a, b) is simply an observed value of the random interval (A, B). Winter 2012. Session 1 (Revision Class) AMS-132/206 Th Feb 9, 2012 12 / 23
Confidence Intervals Confidence Intervals for the mean of a Normal Distribution Confidence Intervals from a Pivotal Quantity Definition Pivotal Quantity Let X = (X 1,..., X n ) be a random sample from a distribution that depends on a parameter (or paramater vector) θ. Let V (X, θ) be a random variable whose distribution is the same for all values of θ. Then V is called a pivotal quantity. Note: In order to use the pivotal quantity to find a confidence interval for g(θ), we need to invert the pivotal. This means, we need a function r(v, x) such that r(v (X, θ), X) = g(θ) Winter 2012. Session 1 (Revision Class) AMS-132/206 Th Feb 9, 2012 13 / 23
Hypothesis testing The Null and Alternative Hypotheses Definitions Some definitions Null and Alternative Hypotheses The hypothesis H 0 is the null hypothesis and the hypothesis H 1 is the alternative hypothesis. If after performing the test we decide that θ lies in Ω 1 we are said to reject H 0. If θ lies in Ω 0 we are said not to reject H 0 Suppose we want to test the following hypothesis: H 0 : θ Ω 0 H 1 : θ Ω 1 Simple and Composite Hypotheses If Ω i contains just a single value of θ then H i is a simple hypothesis. If Ω i contains more than one value of θ then H i is a composite hypothesis Winter 2012. Session 1 (Revision Class) AMS-132/206 Th Feb 9, 2012 14 / 23
Hypothesis testing The Null and Alternative Hypotheses Hypothesis testing Some definitions (Cont.) One-sided and Two-sided Hypothesis Let θ be a one-dimensional parameter. One sided null hypothesis are of the form: H 0 : θ θ 0 or θ θ 0 ; which corresponds to the one-sided alternative hypothesis being H 1 : θ > θ 0 or θ < θ 0. When the null hypothesis is simple (H 0 : θ = θ 0 ) the alternative hypothesis is usually two-sided (H 1 : θ θ 0 ) Winter 2012. Session 1 (Revision Class) AMS-132/206 Th Feb 9, 2012 15 / 23
Hypothesis testing The Null and Alternative Hypotheses Critical Region and Test Statistic Mean of a Normal distribution with known variance (Cont.) In most problems for hypothesis testing the critical region is specified in terms of a statistic T = r(x) Test Statistic Let X a random sample from a distribution that depends on parameter θ. Let T = r(x) be a statistic and R a subset of the real line. Suppose a test procedure such that we reject H 0 if T R. Then we call T a test statistic and R a rejection region Winter 2012. Session 1 (Revision Class) AMS-132/206 Th Feb 9, 2012 16 / 23
Hypothesis testing Power function and types of errors Power function Definition Power function Let δ be a test statistic. The function π(θ δ) is the power function of the test δ. If δ is described in terms of a critical region S 1 the power function is defined as π(θ δ) = Pr{X S 1 θ} for θ Ω. If δ is described in terms of a test statistic T and a rejection region R the power function is defined as π(θ δ) = Pr{T R θ} for θ Ω Type I/II errors The rejection of a true null hypotheses is a type I error or an error of the first kind. The decision not to reject a false null hypothesis is a type II error or an error of the second kind Winter 2012. Session 1 (Revision Class) AMS-132/206 Th Feb 9, 2012 17 / 23
Hypothesis testing Significance level/size of the test Definition Level of significance/size of a test Which test is the best? We would like to have a test δ with low values of its power function for θ Ω 0 and high values of its power function for θ Ω 1. These two goals can not be reached simultaneously. Normally you select a number 0 < α 0 < 1 such that: π(θ δ) α 0, for all θ Ω 0 The value of α 0 is the level of the test and we say that the test has a level of significance α 0 Size of a test The size of a test is defined as: α(δ) = sup θ Ω0 π(θ δ). A test is a level α 0 test if and only if its size is at most α 0. If the null hypothesis is simple (H 0 : θ = θ 0 ), the size of δ will be α(δ) = π(θ 0 δ) Winter 2012. Session 1 (Revision Class) AMS-132/206 Th Feb 9, 2012 18 / 23
Hypothesis testing The p-value Definition Calculating p-values Instead of choosing a value of α 0 in advance, we can report the value of the Z statistic and the smallest level α 0 such that we would reject the null hypothesis at level α 0 with the observed data. This value is the p-value. p-value Is the smallest level α 0 such that we would reject the null hypothesis at level α 0 with the observed data Winter 2012. Session 1 (Revision Class) AMS-132/206 Th Feb 9, 2012 19 / 23
Problems Problem 1 Confidence Intervals from a pivotal quantity Problem 1: Suppose X 1, X 2,..., X n is a random sample from the normal distribution with unknown mean and unknown variance. Describe a method to construct a confidence interval for σ 2 with a specified confidence coefficient γ (0 < γ < 1). Solution: We use the fact that the variable n i=1 (X i X n ) 2 /σ 2 has a χ 2 distribution with n 1 degrees of freedom. We can find a pair of values n i=1 (X i X n) 2 c 1, c 2 such that Pr[c 1 < < c σ 2 2 ] = γ. This inequality can be rewritten in the form: Pr[1/c n 2 i=1 (X i X n ) 2 < σ 2 < 1/c n 1 i=1 (X i X n ) 2 ] = γ. The interval with end points: 1/c n 2 i=1 (X i X n ) 2 and 1/c n 1 i=1 (X i X n ) 2 is a CI for σ 2 with confidence coefficient γ. Winter 2012. Session 1 (Revision Class) AMS-132/206 Th Feb 9, 2012 20 / 23
Problems Problem 2 Hypothesis testing Problem 2: Suppose X 1, X 2,..., X n is a random sample from the normal distribution with unknown mean µ and unknown variance σ 2. The following hypothesis are to be tested: H 0 : µ 3 H 1 : µ > 3 Suppose the sample size n is 17, and it is found from the observed values that X n = 3.2 and 1 n n i=1 (X i X n ) 2 = 0.09. Calculate the U statistics and the p-value. Winter 2012. Session 1 (Revision Class) AMS-132/206 Th Feb 9, 2012 21 / 23
Problems Problem 2 Problem 2 (Cont.) Solution: The statistic U = n 1/2 ( X n µ)/σ = (n 1) 1/2 ( X n µ)/ˆσ U = ( X n 3) [ˆσ/16] 1/2 = 0.2 (0.09/16) 1/2 = 8 3 = 2.666667 has a t distribution with 16 degrees of freedom. The p value is the corresponding tail area such that Pr(U > 8 3 ) = p = 0.008442422 (This value was found using the pt function of R and using the following command: 1 pt(2.666667, 16)) Note: The statistic U = n 1/2 ( X n µ)/σ is in terms of σ and in the problem we were given ˆσ 2 = 1 n n i=1 (X i X n ) 2 = 0.09. σ 2 and ˆσ 2 relate to each other as σ 2 = nˆσ2 n 1 Winter 2012. Session 1 (Revision Class) AMS-132/206 Th Feb 9, 2012 22 / 23
Problems Problem 2 Thanks for your attention... Winter 2012. Session 1 (Revision Class) AMS-132/206 Th Feb 9, 2012 23 / 23