Quantitative Introduction ro Risk and Uncertainty in Business Module 5: Hypothesis Testing

Quantitative Introduction ro Risk and Uncertainty in Business Module 5: Hypothesis Testing M. Vidyasagar Cecil & Ida Green Chair The University of Texas at Dallas Email: M.Vidyasagar@utdallas.edu October 13, 2012

Outline Hypothesis Testing 1 Hypothesis Testing 2 3 K-S (Kolmogorov-Smirnov) Tests: Objectives Kolmogorov-Smirnov Tests: Statements 4 5

Hypothesis Testing: Basic Idea Null hypothesis: What we believe in the absence of further evidence, e.g. a two-sided coin is fair with equal likelihood. Think: Null hypothesis = default assumption. Two kinds of testing: There is only the null hypothesis, and we accept or reject it. There is a null as well as an alternate hypothesis, and we choose one or the other. The second kind of testing is easier: We choose whichever hypothesis is more likely under the data. The first kind of testing is harder.

Choosing Between Alternatives: Example We are given a coin. The null hypothesis is that the coin is fair with equal probabilities of heads and tails. Call it H 0. The alternative hypothesis is that the coin is biased with the probability of heads equal to 0.7. Call it H 1. Suppose we toss the coin 20 times and 12 heads result. Which hypothesis should we accept?

Choosing Between Alternatives: Example (Cont d) Let n = 20 (number of coin tosses), k = 12 (number of heads), p 0 = 0.5 (probability of heads under hypothesis H 0 ) and P 1 = 0.7 (probability of heads under hypothesis H 1 ). The likelihood of the observed outcome under each hypothesis is computed. ( ) 20 L 0 = (p 12 0 ) 12 (1 p 0 ) 8 = 0.1201, L 1 = ( 20 12 ) (p 1 ) 12 (1 p 1 ) 8 = 0.1144. So we accept hypothesis H 0, that the coin is fair, but only because the alternative hypothesis is even less likely!

Connection to MLE We choose the hypothesis that the coin is fair only because the alternate hypothesis is even more unlikely! So what is the value of p that maximizes ( ) 20 L = p 12 (1 p) 8? 12 Answer: p MLE = 12/20 = 0.6, the fraction of heads observed. With MLE (maximum likelihood estimation), we need not choose between two competing hypotheses MLE gives the most likely values for the parameters!

Outline Hypothesis Testing 1 Hypothesis Testing 2 3 K-S (Kolmogorov-Smirnov) Tests: Objectives Kolmogorov-Smirnov Tests: Statements 4 5

Estimating Probabilities of Binary Outcomes Suppose an event has only two outcomes, e.g. coin toss. Let p equal the true but unknown probability of success, e.g. that the coin comes up heads. After n trials, suppose k successes result. Then ˆp := k/n is called the empirical probability of success. As we have seen, it is also the maximum likelihood estimate of p. Question: How close is the empirical probability ˆp to the true but unknown probability p? Hoeffding s inequalities answer this question.

: Statements Let ɛ > 0 be any specified accuracy. Then Pr{ˆp p ɛ} exp( 2nɛ 2 ). Pr{ˆp p ɛ} exp( 2nɛ 2 ). Pr{ ˆp p ɛ} 1 2 exp( 2nɛ 2 ).

: Interpretation Interpretations of Hoeffding s inequalities: With confidence 1 2 exp( 2nɛ 2 ), we can say that the true but unknown probability p lies in the interval (ˆp ɛ, ˆp + ɛ). As we increase ɛ, the term δ := 2 exp( 2nɛ 2 ) decreases, and we can be more sure of our interval. The widely used 95% confidence interval corresponds to δ = 0.5. The one-sided inequalities have similar interpretations.

An Example of Applying Hoeffding s Inequality Suppose we toss a coin 1000 times and it comes up heads 552 times. How sure can we be that the coin is biased? n = 1000, k = 552, ˆp = 0.552. If p > 0.5 then we can say that the coin is biased. So let ɛ = ˆp p = 0.052. Compute δ = exp( 2nɛ 2 ) = 0.0045 So with confidence 1 δ = 0.9955, we can say that p > 0.5. In other words, we can be 99.55% sure that the coin is biased. Using the two-sided Hoeffding inequality, we can be 99.1% sure that ˆp (0.5, 0.614).

Another Example An opinion poll of 750 voters (ignoring don t know s) shows that 387 will vote for candidate A and 363 will vote for candidate B. How sure can we be that candidate A will win? Let p denote the true but unknown fraction of voters who will vote for A, and ˆp = 387/750 = 0.5160 denote the empirical estimate of p. If p < 0.5 then A will lose. So the accuracy ɛ = 0.0160, and the number of samples n = 750. The one-sided confidence is δ = exp( 2nɛ 2 ) = 0.6811. So we can be only 1 δ 32% sure that A will win. In other words, the election cannot be called with any confidence based on such a small margin of preference.

Relating Confidence, Accuracy and Number of Samples For the two-sided Hoeffding inequality, the confidence δ associated with n samples and accuracy ɛ is given by δ = 2 exp( 2nɛ 2 ). We can turn this around and ask: Given an empirical estimate ˆp based on n samples, what is the accuracy corresponding to a given confidence level δ? Solving the above equation for ɛ in terms of δ and n gives ɛ(n, δ) = ( 1 2n log 2 ) 1/2. δ So with confidence δ we can say that the true but unknown probability p is in the interval [ˆp ɛ(n, δ), ˆp + ɛ(n, δ)].

for More Than Two Outcomes Suppose a random experiment has more than two possible outcomes (e.g. rolling six-sided die). Say there are k outcomes, and in n trials, the i-th outcome appears n i times (and of course k i=1 n i = n). We can define ˆp i = n i, i = 1,..., k, n and as we have seen, these are the maximum likelihood estimates for each probability. Question: How good are these estimates?

More Than Two Outcomes 2 Fact: For any sample size n and any accuracy ɛ, it is the case that Pr{max ˆp i p i > ɛ} 2k exp( 2nɛ 2 ). i So with confidence 1 2k exp( 2nɛ 2 ), we can assert that every empirical probability ˆp i is within ɛ of the correct value.

More Than Two Outcomes: Example Suppose we roll a six-sided die 1,000 times and get the outcomes 1 through 6 in the following order: ˆp 1 = 0.169, ˆp 2 = 0.165, ˆp 3 = 0.166, ˆp 4 = 0.165, ˆp 5 = 0.167, ˆp 6 = 0.168. With what confidence can we say that the die is not fair, that is, that ˆp i 1/6 for all i?

More Than Two Outcomes: Example (Cont d) Suppose that indeed the true probability is p i = 1/6 for all i. Then max ˆp i p i = ˆp 1 1/6 0.0233. i Take ɛ = 0.233, n = 1000 and compute δ = 6 2 exp( 2nɛ 2 ) 11.87! How can a probability be greater than one? Note: This δ is just an upper bound for Pr{max i ˆp i p i > ɛ}; so it can be larger than one. So we cannot rule out the possibility that the die is fair (which is quite different from saying that it is fair).

Outline Hypothesis Testing K-S (Kolmogorov-Smirnov) Tests: Objectives Kolmogorov-Smirnov Tests: Statements 1 Hypothesis Testing 2 3 K-S (Kolmogorov-Smirnov) Tests: Objectives Kolmogorov-Smirnov Tests: Statements 4 5

K-S Tests: Problem Formulations K-S (Kolmogorov-Smirnov) Tests: Objectives Kolmogorov-Smirnov Tests: Statements There are two widely used tests. They should be called the Kolmogorov test and the Smirnov test, respectively. Unfortunately the erroneous names one-sample K-S test and two-sample K-S test have become popular. Kolmogorov Test, or One-Sample K-S Test: We have a set of samples, and we have a candidate probability distribution. Question: How well does the distribution fit the set of samples? Smirnov Test, or Two-Sample K-S Test: We have two sets of samples, say x 1,..., x n and y 1,..., y m. Question: How sure are we that both sets of samples came from the same (but unknown) distribution?

Empirical Distributions K-S (Kolmogorov-Smirnov) Tests: Objectives Kolmogorov-Smirnov Tests: Statements Suppose X is a random variable for which we have generated n i.i.d. samples, call them x 1,..., x n. Then we define the empirical distribution of X, based on these observations, as follows: ˆΦ(a) = 1 n n i=1 I {xi a}, where I denotes the indicator function: I = 1 if the condition below is satisfied and I = 0 otherwise. So in this case ˆΦ(a) is just the fraction of the n samples that are a. The diagram on the next slide illustrates this.

Empirical Distribution Depicted K-S (Kolmogorov-Smirnov) Tests: Objectives Kolmogorov-Smirnov Tests: Statements Note: The diagram shows the samples occurring in increasing order but they can be in any order. 1 1 Source: http://www.aiaccess.net/english/glossaries/glosmod/e gm distribution function.htm

Glivenko-Cantelli Lemma K-S (Kolmogorov-Smirnov) Tests: Objectives Kolmogorov-Smirnov Tests: Statements Theorem: As n, the empirical distribution ˆΦ( ) approaches the true distribution Φ( ). Specifically, if we define the Kolmogorov-Smirnov distance then d n 0 as n. d n = max ˆΦ(u) Φ(u), u At what rate does the convergence take place?

K-S (Kolmogorov-Smirnov) Tests: Objectives Kolmogorov-Smirnov Tests: Statements One-Sample Kolmogorov-Smirnov Statistic Fix a confidence level δ > 0 (usually δ is taken as 0.05 or 0.02). Define the threshold ( 1 θ(n, δ) = 2n log 2 ) 1/2. δ Then with probability 1 δ, we can say that max u ˆΦ(u) Φ(u) =: d n θ n.

One-Sample Kolmogorov-Smirnov Test K-S (Kolmogorov-Smirnov) Tests: Objectives Kolmogorov-Smirnov Tests: Statements Given samples x 1,..., x n, fit it with some distribution F ( ) (e.g. Gaussian). Compute the K-S statistic d n = max ˆΦ(u) F (u). u Compare d n with the threshold θ(n, δ). If d n > θ(n, δ), we reject the null hypothesis at level δ. In other words, if d n > θ(n, δ), then we are 1 δ sure that the data was not generated by the distribution F ( ).

Outline Hypothesis Testing 1 Hypothesis Testing 2 3 K-S (Kolmogorov-Smirnov) Tests: Objectives Kolmogorov-Smirnov Tests: Statements 4 5

: Motivation The student t test is used the null hypothesis that two sets of samples have the same mean, assuming that they have the same variance. The test has broad applicability even if the assumption of same variance is not satisfied. Problem: We are given two samples x 1,..., x m1 and x m1 +1,..., x m1 +m 2. Determine whether the two sets of samples arise from a distribution with the same mean. Application: Most commonly used in quality control.

: Theory Let x 1, x 2 denote the means of the two sample classes, that is, x 1 = 1 m 1 x i, x 2 = 1 m 2 m 1 m 2 i=1 i=1 x m1 +i. Let S 1, S 2 denote the unbiased estimates of the standard deviations of the two samples, that is, S 2 1 = S 2 2 = 1 m 1 1 1 m 2 1 m 1 i=1 m 2 i=1 (x i x 1 ) 2, (x m1 +i x 2 ) 2.

: Theory 2 Now define the pooled standard deviation S 12 by Then the quantity S12 2 = (m 1 1)S1 2 + (m 2 1)S2 2. m 1 + m 2 2 d t = x 1 x 2 S 12 (1/m1 ) + (1/m 2 ) satisfies the t distribution with m 1 + m 2 2 degrees of freedom. As the number of d.o.f. becomes large, the t distribution approaches the normal distribution. The next slide shows the density of the t distribution for various d.o.f.

Density of the t Distribution

Outline Hypothesis Testing 1 Hypothesis Testing 2 3 K-S (Kolmogorov-Smirnov) Tests: Objectives Kolmogorov-Smirnov Tests: Statements 4 5

: Motivation The t test is to determine whether two samples have the same mean. The chi-squared test is to determine whether two samples have the same variance. The application is again to quality control.

: Theory Given two sets of samples, say x 1,..., x m1 and x m1 +1,..., x m1 +m 2 (where usually m 2 m 1 ), compute the unbiased variance estimate V 1 of the larger (first) sample V 1 = 1 m 1 1 m 1 i=1 (x i x 1 ) 2, and the sum of squares of the smaller (second) sample m 2 S 2 = (x m1 +i x 2 ) 2 = (m 2 1)V 2. i=1 Then the ratio S 2 /V 1 satisfies the chi-squared (or χ 2 ) distribution with m 2 1 degrees of freedom.

Distribution Function of the Chi-Squared Variable

Density Function of the Chi-Squared Variable

Application of the Note that the χ 2 r.v. is always nonnegative. So, given some confidence δ (usually δ = 0.05), we need to determine a confidence interval x l = Φ 1 χ 2,m 2 1 (δ), x u = Φ 1 χ 2,m 2 (1 δ). 1 If the test statistic S 2 /V 1 lies in the interval [x l, x u ], then we accept the null hypothesis that both samples have the same variance.