Statistical Hypothesis Testing

Size: px
Start display at page:

Download "Statistical Hypothesis Testing"

Transcription

1 Statistical Hypothesis Testing Dr. Phillip YAM 2012/2013 Spring Semester Reference: Chapter 7 of Tests of Statistical Hypotheses by Hogg and Tanis.

2 Section 7.1 Tests about Proportions A statistical hypothesis test is a formal method of making decisions, upon the probabilistic structure of a random mathematical model, by analyzing the available sample Example (Simple) Null hypothesis: H 0 : p = 0.06 completely specifies the distribution Against (Composite) Alternative hypothesis: H 1 : p < 0.06 does not completely specify the distribution; it is composed of many simple hypotheses Possible error: Type I error: Rejecting H 0 and accepting H 1 when H 0 is true; Type II error: Failing to reject H 0 when H 1 is true (i.e., when H 0 is false).

3 Section 7.1 Tests about Proportions Consider the test of H 0 : p = p 0 against H 1 : p > p 0, where p 0 = probability of success. Base our test upon the number of successes Y in n independent Bernoulli trials. Using CLT, Y /n has an approximate normal distribution N[p 0, p 0 (1 p 0 )/n], provided that H 0 : p = p 0 is true and n is large. We intend to reject H 0 and accepts H 1 if and only if Z = Y /n p 0 p0 (1 p 0 )/n z α. That is to say: if Y /n exceeds p 0 by z α standard deviations of Y /n, we reject H 0 and accept the hypothesis H 1 : p > p 0. The approximate probability of this occurring when H 0 : p = p 0 is true is α. The significance level of this test is approximately α.

4 Section 7.1 Tests about Proportions Example 7.1-1: Many commercially manufactured dice are not fair because the spots are really indentations, so that, for example, the 6-side is lighter then the 1-side. Let p = the probability of rolling a 6. To test H 0 : p = 1/6 against the alternative hypothesis H 1 : p > 1/6. Suppose that we have a total of n = 8000 observations. Let Y equal the number of times that 6 resulted in the 8000 trials. The results of the experiment yielded y = 1389, so the calculated value of the test statistic is z = 1389/8000 1/6 (1/6)(5/6)/8000 = > = z and hence, the null hypothesis is rejected, and the experimental results indicate that these dice favor a 6 more than a fair die would be.

5 Section 7.1 Tests about Proportions Formal Statistical Hypothesis Testing can be regarded as a statistical version of Mathematical Proof by Contradiction. An example of the latter is Euclid s proof of infinitely many primes. Analogy: (1) H 1 VS H 0 (C 1 : Infinitely Many Primes VS C 0 Finitely Many Primes ) (2) A random sequence of sample X 1,..., X n under H 0 (A finite deterministic sequence of primes p 1,..., p n under C 0 ) Y /n p 0 (3) Functional inequality Z = p0 (1 p 0 )/n z α (a new positive integer p = p 1 p n + 1) (4) Definite conclusion (conclusion subject to chance) A reasonably good test for a parameter normally relies on the maximum likelihood estimator (more precisely, the sufficient statistic) for the parameter.

6 Section 7.1 Tests about Proportions One-sided tests: H 0 : p = p 0 against H 1 : p < p 0 and H 0 : p = p 0 against H 1 : p > p 0 Two-sided tests: H 1 : p p 0. In the Example 7.1-1, a test with the approximate significance level α for doing this is to reject H 0 : p = p 0 against H 1 : p p 0 if Z = Y /n p 0 p0 (1 p 0 )/n z α/2, since, under H 0, P( Z z α/2 ) α. The rejection region for H 0 is often called the critical region. The p-value associated with a test is the probability, under the null hypothesis H 0, that the test statistic (a random variable) is equal to or exceeds the observed value (a constant) of the test statistic in the direction of the alternative hypothesis.

7 Section 7.1 Tests about Proportions Test about difference of two proportions: let Y 1 and Y 2 represent, respectively, the numbers of observed successes in n 1 and n 2 independent trials with probabilities of success p 1 and p 2. To test H 0 : p 1 p 2 = 0 or, equivalently, H 0 : p 1 = p 2, let p = p 1 = p 2 be the common value under H 0. p 1 = Y 1 /n 1 is approximately N[p 1, p 1 (1 p 1 )/n 1 ], p 2 = Y 2 /n 2 is approximately N[p 2, p 2 (1 p 2 )/n 2 ], and p 1 p 2 = Y 1 /n 1 Y 2 /n 2 is approximately N[p 1 p 2, p 1 (1 p 1 )/n 1 + p 2 (1 p 2 )/n 2 ] Estimate p with p = (Y 1 + Y 2 )/(n 1 + n 2 ) Rely a test on a test statistic: Z = p 1 p 2 0 p(1 p)(1/n1 + 1/n 2 ), which has an approximate N(0, 1) distribution when the null hypothesis is true.

8 Section 7.1 Tests about Proportions Remark: In testing both H 0 : p = p 0 and H 0 : p 1 = p 2, statisticians sometimes use different denominators for z. For tests of single proportions, p 0 (1 p 0 )/n can be replaced by (y/n)(1 y/n)/n, and for tests of the equality of two proportions, the following denominator can be used: p 1 (1 p 1 ) + p 2(1 p 2 ). n 1 n 2 In general, it is difficult to say that one is better than the other; fortunately, the numerical answers are about the same.

9 Section 7.2 Tests about One Mean To test which of the two hypotheses, H 0 or H 1, is true, it is necessary to partition the sample space into two parts, C and C, such that if (x 1, x 2,..., x n ) C, H 0 is rejected, and if (x 1, x 2,..., x n ) C, H 0 is accepted (not rejected). The rejection region C for H 0 is called the critical region for the test. The partitioning of the sample space is specified in terms of the values of a test statistic Type I error: If (x 1, x 2,..., x n ) C when H 0 is true. The probability of a Type I error is called the significance level of the test and is denoted by α, i.e. α = P[(X 1, X 2,..., X n ) C; H 0 ] Type II error: If (x 1, x 2,..., x n ) C when H 1 is true. The probability of a Type II error is denoted by β; β = P[(X 1, X 2,..., X n ) C ; H 1 ]

10 Section 7.2 Tests about One Mean A decrease in the size of α leads to an increase in the size of β. Both α and β can be decreased if the sample size n is increased.

11 Section 7.2 Tests about One Mean Sampling from a normal distribution, the null hypothesis is generally of the form H 0 : µ = µ 0. Three possibilities for the alternative hypothesis: i) µ has increased, or H 1 : µ > µ 0 ; ii) µ has decreased, or H 1 : µ < µ 0 ; iii) µ has changed, but it is not known whether it has increased or decreased; two-sided alternative hypothesis: H 1 : µ µ 0. A random sample is taken from the distribution. Observed sample mean, x, that is close (measured in terms of standard deviations of X, σ/ n) to µ 0 supports H 0 (I) When the variance is known, consider a test statistic, Z = X µ 0 σ 2 /n = X µ 0 σ/ n, and critical regions, at a significance level α, for the three respective alternative hypotheses would be (i) z z α, (ii), z z α and (iii) z z α/2.

12 Section 7.2 Tests about One Mean (II) When the variance is not known, we consider the test statistic: T = X µ S 2 /n = X µ S/ n. The rule that rejects H 0 : µ = µ 0 and accepts H 1 ; µ µ 0 if and only if t = x µ 0 s/ n t α/2(n 1) General comment: many statisticians believe that the observed p-value provides an understandable measure of the truth of H 0 : The smaller the p-value, the less they believe in H 0. We do not reject H 0 if the confidence interval covers µ; otherwise, we would have to reject H 0. Many statisticians believe that estimation is much more important than tests of hypotheses and accordingly approach statistical tests through confidence intervals.

13 Section 7.3 Tests of the Equality of Two Means A sample: (X 1, Y 1 ),..., (X n, Y n ). If X and Y are dependent, for example, patient s records before and after a treatment. Let W = X Y, and the hypothesis that H 0 : µ X = µ Y would be replaced with the hypothesis H 0 : µ W = 0. (I) X and Y are independent and normally distributed. Assumed that the variances of X and Y were equal. X Y T = {[(n 1)SX 2 + (m 1)S Y 2 ]/(n + m 2)}(1/n + 1/m) = X Y S p 1/n + 1/m, S p = (n 1)S 2 X +(m 1)S 2 Y n+m 2. T has a t distribution with r = n + m 2 degrees of freedom when H 0 is true and the variances are (approximately) equal.

14 Section 7.3 Tests of the Equality of Two Means If the common-variance assumption is violated, but not too badly, the test is satisfactory, but the significance levels are only approximate. (II) If both the variances of X and Y are unequal yet they are known, then the appropriate test statistic to use for testing H 0 : µ X = µ Y is Z = X Y, σ 2 X n + σ2 Y m which has a standard normal distribution when the null hypothesis is true. (III) If the variances are unknown and unequal, and the sample sizes are large, replace σ 2 X with S 2 X and σ2 Y with S 2 Y in the above equation. The resulting statistic will have an approximate N(0, 1) distribution.

15 Section 7.3 Tests of the Equality of Two Means As long as the underlying distributions are not highly skewed, the normal assumptions are not too critical. As distributions become non-normal and highly skewed, the sample mean and sample variance become more dependent. Some of the nonparametric methods have to be used. When the distributions are close to normal, but the variances seem to differ by a great deal, the t statistic should again be avoided, particularly if the same sizes are also different. (IV) Different values of variances and with small sample size, use Welch s t-statistic.

16 Example on Hypothesis Testing: Classroom activities Source from Beau Lotto Exercises: (1) A single mean test for each class; (2) Test of the equality of means from two classes.

17 Section 7.4 Tests for Variances (I) Test of hypothesis for a single variance, H 0 : σ 2 X = σ2 0, with normal distributions: the critical region is also given in terms of the chi-square test statistic χ 2 (n 1)S 2 =. (II) Test for the equality of two variances, H 0 : σ 2 X /σ2 Y = 1, from normal populations. Two random samples of n observations of X and m observations of Y. When H 0 is true, F = σ 2 0 (n 1)SX 2 σx 2 (n 1) (m 1)SY 2 σy 2 (m 1) = S 2 X S 2 Y has an F distribution with r 1 = n 1 and r 2 = m 1 degrees of freedom. If H 0 is true, the observed value of F is expected to be close to 1.

18 Section 7.5 One-Factor Analysis of Variance (ANOVA) Experimenters want to compare more than two treatments, e.g. yields of several different corn hybrids, results due to three or more teaching techniques, or miles per gallon obtained from many different types of compact cars, consumptions from different class (upper, middle, or lower). Consider m normal distributions with unknown means µ 1, µ 2,..., µ m and an unknown, but common, variance σ 2. A test of the equality of the m means, namely, H 0 : µ 1 = µ 2 = = µ m = µ, with µ unspecified, against all possible alternative hypotheses H 1.

19 Section 7.5 One-Factor Analysis of Variance (ANOVA) Let X i1, X i2,, X ini represent a random sample of size n i from the normal distribution N(µ i, σ 2 ), i = 1, 2,..., m. With n = n 1 + n n m, we denote sample means by: X.. = 1 m n i X ij and X i. = 1 n i X ij, i = 1, 2,..., m. n n i i=1 j=1 j=1 SS(TO) = = = m n i (X ij X.. ) 2 i=1 j=1 n i m (X ij X i. + X i. X.. ) 2 i=1 j=1 n i m (X ij X i. ) 2 + i=1 j=1 + 2 m n i (X i. X.. ) 2 i=1 j=1 m n i (X ij X i. )(X i. X.. ). i=1 j=1

20 Section 7.5 One-Factor Analysis of Variance (ANOVA) Using the facts: m n i m 2 (X i. X.. ) (X ij X i. ) = 2 (X i. X.. )(n i X i. n i X i. ) i=1 j=1 i=1 = 0, and m n i (X i. X.. ) 2 = i=1 j=1 We deduce that m n i (X i. X.. ) 2. i=1 SS(TO) = m n i m (X ij X i. ) 2 + n i (X i. X.. ) 2. i=1 j=1 i=1

21 Section 7.5 One-Factor Analysis of Variance (ANOVA) SS(TO) = m n i (X ij X.. ) 2, the total sum of squares; i=1 j=1 SS(E) = m n i (X ij X i. ) 2, the sum of squares within treatments, i=1 j=1 SS(T ) = m n i (X i. X.. ) 2, i=1 groups, or classes, often called the error sum of squares; the sum of squares among the different treatments, groups, or classes, often called the between-treatment sum of squares. SS(TO) = SS(E) + SS(T ).

22 Section 7.5 One-Factor Analysis of Variance (ANOVA) SS(TO)/σ 2 is χ 2 (n 1), so E[SS(TO)/(n 1)] = σ 2. n i j=1 (X ij X i. ) 2 W i = for i = 1, 2,..., m, n i 1 (n i 1)W i /σ 2 is χ 2 (n i 1). Therefore, no matter H 0 is true or not, m (n i 1)W i σ 2 = SS(E) σ 2, i=1 is also chi-square with (n 1 1) + (n 2 1) + + (n m 1) = n m degrees of freedom. SS(TO) σ 2 where SS(TO) σ 2 is χ 2 (n 1) and = SS(E) σ 2 + SS(T ) σ 2, SS(E) σ 2 is χ 2 (n m).

23 Section 7.5 One-Factor Analysis of Variance (ANOVA) (Theorem 7.5-1) Let Q = Q 1 + Q Q k, where Q, Q 1,..., Q k are k + 1 real quadratic forms in n mutually independent (mean zero) random variables normally distributed with the same variance σ 2. Let Q/σ 2, Q 1 /σ 2,..., Q k 1 /σ 2 have chi-square distributions with r, r 1,..., r k 1 degrees of freedom, respectively. If Q k is nonnegative, then (a) Q 1,..., Q k are mutually independent, and hence, (b) Q k /σ 2 has a chi-square distribution with r (r r k 1 ) = r k degrees of freedom. Applications: (1) Re-deriving: (1) The independence of X and S 2 ; (2) Distribution of (n 1)S 2 /σ 2. (2) Because SS(T ) 0, applying the Theorem, we deduce that SS(E) and SS(T ) are independent and the distribution of SS(T )/σ 2 is χ 2 (m 1).

24 Section 7.5 One-Factor Analysis of Variance (ANOVA) Back to testing H 0 : µ 1 = µ 2 = = µ m = µ Note that SS(E)/(n m) is always unbiased no matter whether H 0 is true or false If µ 1, µ 2,..., µ m are not equal, the expected value of the estimator that is based on SS(T ) will be greater than σ 2. [ m ] [ m ] E[SS(T )] = E n i (X i. X.. ) 2 = E n i X 2 i. nx 2.. = = i=1 i=1 m n i {Var(X i. ) + [E(X i. )] 2 } n{var(x.. ) + [E(X.. )] 2 } i=1 m i=1 n i { σ 2 n i = (m 1)σ µ 2 i } { } σ 2 n n + µ2 m n i (µ i µ) 2, i=1 where µ = (1/n) m i=1 n iµ i..

25 Section 7.5 One-Factor Analysis of Variance (ANOVA) If µ 1 = µ 2 = = µ m = µ, then ( ) SS(T ) E = σ 2. m 1 If the means are not all equal, then E ( ) SS(T ) = σ 2 + m 1 m i=1 n i (µ i µ) 2 m 1 > σ 2. Base our test of H 0 on the ratio of SS(T )/(m 1) and SS(E)/(n m), both of which are unbiased estimators of σ 2, under H 0, the ratio would assume values near 1. In the case that the means µ 1, µ 2,..., µ m begin to differ, this ratio tends to become large, since E[SS(T )/(m 1)] gets larger.

26 Section 7.5 One-Factor Analysis of Variance (ANOVA) Under H 0, SS(T )/(m 1) SS(E)/(n m) = [SS(T )/σ2 ]/(m 1) [SS(E)/σ 2 ]/(n m) = F has an F distribution with m 1 and n m degrees of freedom because SS(T )/σ 2 and SS(E)/σ 2 are independent chi-square variables. We shall reject H 0 if the observed value of F is too large, and the critical region is of the form F F α (m 1, n m).

27 Section 7.5 One-Factor Analysis of Variance (ANOVA) Alternative formulas: SS(TO) = SS(T ) = m n i i=1 j=1 m 1 n i i=1 X 2 ij 1 n n i j=1 X ij m n i SS(E) = SS(TO) SS(T ). 2 i=1 j=1 X ij 2 1 m n i n, i=1 j=1 F test works quite well even if the underlying distributions are non-normal, unless they are highly skewed or the variances are quite different. X ij 2,

28 Section 7.5 One-Factor Analysis of Variance (ANOVA) For only 2 populations, comparison with T-test for a symmetric Two-sided test: under common variance assumption T = ( 1 X Ȳ )/ n + 1 m (n 1)S 2 X +(m 1)SY 2 n+m 2 The square of a t-statistic T 2 is a F-statistic with degrees of freedom 1 and n + m 2. Also note that: (n 1)SX 2 + (m 1)S Y 2 = n i=1 (x i x) 2 + m i=1 (y i ȳ) 2 = SS(E) ( x ȳ) 2 1 n + 1 m T 2 = = n( x = SS(T ) n x + mȳ n x + mȳ n + m )2 + m(ȳ n + m )2 SS(T )/1 = F (2 1, n + m 2) SS(E)/(n + m 2)

29 Section 7.6 Two-Factor Analysis of Variance (2 way ANOVA) Dependence of real estate prices on districts and ages of the buildings. Assume that there are two factors (attributes), one of which has a levels and the other b levels. X ij is N(µ ij, σ 2 ), i = 1, 2,..., a, and j = 1, 2,..., b; and the n = ab random variables are independent. Assume that the means µ ij are composed of a row effect, a column effect, and an overall effect in some additive way, namely, µ ij = µ + α i + β j, where a i=1 α i = 0 and b j=1 β j = 0. The parameter α i represents the i th row effect, and the parameter β j represents the j th column effect. (a) Test the hypothesis that there is no row effect. Test H A : α 1 = α 2 = = α a = 0, since a i=1 α i = 0. (b) Test the there is no column effect, we would test H B : β 1 = β 2 = = β b = 0, since b j=1 β j = 0.

30 Section 7.6 Two-Factor Analysis of Variance (2 way ANOVA) Consider the sum of squares: SS(TO) = = = b a i=1 j=1 a i=1 j=1 b (X ij X.. ) 2 b [(X i. X.. ) + (X.j X.. ) + (X ij X i. X.j + X.. )] 2 a (X i. X.. ) 2 + a i=1 + a i=1 j=1 b (X.j X.. ) 2 j=1 b (X ij X i. X.j + X.. ) 2 = SS(A) + SS(B) + SS(E),

31 Section 7.6 Two-Factor Analysis of Variance (2 way ANOVA) The distribution of the error sum of squares SS(E) does not depend on the mean µ ij, provided that the additive model is correct. Hence, its distribution is the same whether H A or H B is true or not. Noting that X ij X i. X.j + X.. = X ij (X i. X.. ) (X.j X.. ) X.. which is similar to X ij µ ij = X ij α i β j µ. Under both H A and H B are true, we have SS(TO)/σ 2 is χ 2 (ab 1), both SS(A)/σ 2 and SS(B)/σ 2 are chi-square variables, namely, χ 2 (a 1) and χ 2 (b 1), Since SS(E) 0, using Theorem 7.5-1, SS(A), SS(B) and SS(E) are all independent. SS(E) is a chi-square variable with ab 1 (a 1) (b 1) = (a 1)(b 1) degrees of freedom.

32 Section 7.6 Two-Factor Analysis of Variance (2 way ANOVA) (I) To test H A : α 1 = α 2 = = α a = 0 Consider the F-statistic: F A = SS(A)/[σ 2 (a 1)] SS(E)/[σ 2 (a 1)(b 1)] = SS(A)/(a 1) SS(E)/[(a 1)(b 1)] which has an F distribution with a 1 and (a 1)(b 1) degrees of freedom when H A is true, H A is rejected if the observed value of F A F α [a 1, (a 1)(b 1)]. (II) To test H B : β 1 = β 2 = β b = 0 against all alternatives, F B = SS(B)/[σ 2 (b 1)] SS(E)/[σ 2 (a 1)(b 1)] = SS(B)/(b 1) SS(E)/[(a 1)(b 1)], which has an F distribution with b 1 and (a 1)(b 1) degrees of freedom, provided that H B is true.

33 Section 7.6 Two-Factor Analysis of Variance (2 way ANOVA) (III) Test for interactions between two factors: particular combinations of the 2 factors might interact differently from what is expected from the additive model. Assume that X ijk, i = 1, 2,, a; j = 1, 2,, b; and k = 1, 2,, c, are n = abc random variables that are mutually independent and have normal distributions with a common, but unknown, variance σ 2. The mean of each X ijk, k = 1, 2,, c, is µ ij = µ + α i + β j + γ ij, where a i=1 α i = 0, b j=1 β j = 0, a i=1 γ ij = 0, and b j=1 γ ij = 0. γ ij is called the interaction associated with cell (i, j). To test the hypotheses that (a) the row effects are equal to zero, (b) the column effects are equal to zero, and (c) there is no interaction

34 Section 7.6 Two-Factor Analysis of Variance (2 way ANOVA) Using notations X ij. = 1 c X i.. = 1 bc X.j. = 1 ac c X ijk, k=1 X... = 1 abc b j=1 k=1 a i=1 k=1 a c X ijk, c X ijk, b i=1 j=1 k=1 c X ijk,

35 Section 7.6 Two-Factor Analysis of Variance (2 way ANOVA) We again have the total sum of squares: SS(TO) = a b i=1 j=1 k=1 = bc c (X ijk X... ) 2 a (X i.. X... ) 2 + ac i=1 + c + a i=1 j=1 a b i=1 j=1 k=1 b (X.j. X... ) 2 j=1 b (X ij. X i.. X.j. + X... ) 2 c (X ijk X ij. ) 2 = SS(A) + SS(B) + SS(AB) + SS(E),

36 Section 7.6 Two-Factor Analysis of Variance (2 way ANOVA) Under the null hypothesis, all the means equal to the same value µ, hence SS(TO)/σ 2 is χ 2 (abc 1). SS(A)/σ 2 and SS(B)/σ 2 are χ 2 (a 1) and χ 2 (b 1). Moreover, for each (i, j), we also have c (X ijk X ij. ) 2 k=1 is χ 2 (c 1); therefore, SS(E)/σ 2 is the sum of ab independent chi-square variables such as this and thus is χ 2 [ab(c 1)]. σ 2 Since SS(AB) 0, using Theorem 7.5-1, SS(A)/σ 2, SS(B)/σ 2, SS(AB)/σ 2, and SS(E)/σ 2 are mutually independent chi-square variables with a 1, b 1, (a 1)(b 1), and ab(c 1) degrees of freedom.

37 Section 7.6 Two-Factor Analysis of Variance (2 way ANOVA) (i) The statistic for testing the hypothesis H AB : γ ij = 0, i = 1, 2,, a, j = 1, 2,, b, against all alternatives is c a b (X ij. X i.. X.j. + X... ) 2 /[σ 2 (a 1)(b 1)] F AB = = i=1 j=1 a b i=1 j=1 k=1 SS(AB)/[(a 1)(b 1)] SS(E)/[ab(c 1)] c (X ijk X ij. ) 2 /[σ 2 ab(c 1)] which has an F distribution with (a 1)(b 1) and ab(c 1) degrees of freedom when H AB is true. If F AB F α [(a 1)(b 1), ab(c 1)], we reject H AB and say that there is a difference among the means, since there seems to be interaction.,

38 Section 7.6 Two-Factor Analysis of Variance (2 way ANOVA) (ii) The statistic for testing the hypothesis against all alternatives is H A : α 1 = α 2 = = α a = 0 F A = a bc a (X i.. X... ) 2 /[σ 2 (a 1)] b i=1 i=1 j=1 k=1 c (X ijk X ij. ) 2 /[σ 2 ab(c 1)] = SS(A)/(a 1) SS(E)/[ab(c 1)], which has an F distribution with a 1 and ab(c 1) degrees of freedom when H A is true.

39 Section 7.6 Two-Factor Analysis of Variance (2 way ANOVA) (iii) The statistic for testing the hypothesis against all alternatives is H B : β 1 = β 2 = = β b = 0 F B = ac b (X.j. X... ) 2 /[σ 2 (b 1)] a b j=1 i=1 j=1 k=1 c (X ijk X ij. ) 2 /[σ 2 ab(c 1)] = SS(B)/(b 1) SS(E)/ab(c 1)], which has an F distribution with b 1 and ab(c 1) degrees of freedom when H B is true.

40 Section 7.7 Tests concerning Regression and Correlation Let X and Y have a bivariate normal distribution. Using the sample correlation coefficient to test the hypothesis H 0 : ρ = 0 and also to form a confidence interval for ρ. Let (X 1, Y 1 ), (X 2, Y 2 ),, (X n, Y n ) denote a random sample from a bivariate normal distribution with parameters µ X, µ Y, σx 2, σ2 Y, and ρ. Sample correlation coefficient: R = 1 n 1 n (X i X )(Y i Y ) i=1 = n (X i X ) 2 1 n (Y i Y ) n n 1 i=1 i=1 S XY S X S Y.

41 Section 7.7 Tests concerning Regression and Correlation Note that R S Y S X = S XY S 2 X = 1 n (X i X )(Y i Y ) n 1 i=1 1 n (X i X ) n 1 2 is exactly the solution that we obtained for ˆβ in Secton 6.7. If H 0 : ρ = 0 is true, Y 1, Y 2,, Y n are independent of X 1, X 2,, X n, and thus β = ρσ Y /σ X = 0. The conditional distribution of ˆβ, given X 1 = x 1,, X n = x n : i=1 ˆβ = n (x i x)(y i Y ) i=1 n (x i x) 2 i=1 is N[0, σ 2 Y /(n 1)s2 x ] when s 2 x > 0.

42 Section 7.7 Tests concerning Regression and Correlation Recall from Section 6.7, the conditional distribution of n i=1 [Y i Y (S xy /sx 2 )(x i x)] 2 σy 2 = (n 1)S Y 2 (1 R2 ) σy 2 given that X 1 = x 1,, X n = x n, is χ 2 (n 2) and is independent of ˆβ. When ρ = 0, the conditional distribution of, T = (RS Y /s x )/(σ Y / n 1s x ) = R n 2 (n 1)SY 2 (1 R2 )/σy 2 ][1/(n 2)] 1 R 2 is t with n 2 degrees of freedom. Since the conditional distribution of T given that X 1 = x 1,, X n = x n, does not depend on x 1, x 2,, x n, the unconditional distribution of T must be t with n 2 degrees of freedom, and T and (X 1, X 2,, X n ) are independent when ρ = 0.

43 Section 7.7 Tests concerning Regression and Correlation (Remark) In the discussion about the distribution of T, nothing was said about the distribution of X 1, X 2,, X n. If X and Y are independent and Y has a normal distribution, then T has a t distribution whatever the distribution of X. The roles of X and Y can be reversed in all of this development. T can be used to test H 0 : ρ = 0; if H 1 : ρ > 0, we would use the critical region defined by the observed T t α (n 2), since large T implies large R. The distribution function and p.d.f. of R when 1 < r < 1, provided that ρ = 0: g(r) = Γ[(n 1)/2] Γ(1/2)Γ[(n 2)/2] (1 r 2 ) (n 4)/2, 1 < r < 1. (See Appendix B Table XI)

44 Section 7.7 Tests concerning Regression and Correlation (Proof) G(r) = P(R r) = P = r n 2/ 1 r 2 ( T r ) n 2 1 r 2 h(t) dt ( Γ[(n 1)/2] 1 h(t) = 1 + t2 Γ(1/2)Γ[(n 2)/2] n 2 n 2 The derivative of G(r), with respect to r, is ( ) r n 2 d(r n 2/ 1 r g(r) = h 2 ), 1 r 2 dr ) (n 1)/2 To test the hypothesis H 0 : ρ = 0 against the alternative hypothesis H 1 : ρ 0 at a significance level α, select either a constant r α/2 (n 2) or a constant t α/2 (n 2) so that α = P( R r α/2 (n 2); H 0 ) = P( T t α/2 (n 2); H 0 )

45 Section 7.7 Tests concerning Regression and Correlation To test H 0 : ρ = ρ 0, an approximate test of size α can be obtained by using the fact that W = 1 2 ln 1 + R 1 R has an approximate normal distribution with mean (1/2) ln[(1 + ρ)/(1 ρ)] and variance 1/(n 3) (since R has an asymptotic normal distribution with mean ρ and variance (1 ρ 2 ) 2 n ). A test of H 0 : ρ = ρ 0 can be based on the statistic z = 1 2 ln 1 + R 1 R 1 2 ln 1 + ρ 0 1 ρ 0 1 n 3, which has a distribution that is approximately N(0, 1).

46 Section 7.7 Tests concerning Regression and Correlation An approximate 100(1 α)% confidence interval for ρ, ( (1/2) ln[(1 + R)/(1 R)] (1/2) ln[(1 + ρ)/(1 ρ)] P c 1/(n 3) c ) 1 α. P ( 1 + R (1 R) exp(2c/ n 3) 1 + R + (1 R) exp(2c/ n 3) ρ 1 + R (1 R) exp( 2c/ n 3) 1 + R + (1 R) exp( 2c/ n 3) ) 1 α.

47 The end of Chapter 7

Summary of Chapter 7 (Sections ) and Chapter 8 (Section 8.1)

Summary of Chapter 7 (Sections ) and Chapter 8 (Section 8.1) Summary of Chapter 7 (Sections 7.2-7.5) and Chapter 8 (Section 8.1) Chapter 7. Tests of Statistical Hypotheses 7.2. Tests about One Mean (1) Test about One Mean Case 1: σ is known. Assume that X N(µ, σ

More information

Confidence Intervals, Testing and ANOVA Summary

Confidence Intervals, Testing and ANOVA Summary Confidence Intervals, Testing and ANOVA Summary 1 One Sample Tests 1.1 One Sample z test: Mean (σ known) Let X 1,, X n a r.s. from N(µ, σ) or n > 30. Let The test statistic is H 0 : µ = µ 0. z = x µ 0

More information

Summary of Chapters 7-9

Summary of Chapters 7-9 Summary of Chapters 7-9 Chapter 7. Interval Estimation 7.2. Confidence Intervals for Difference of Two Means Let X 1,, X n and Y 1, Y 2,, Y m be two independent random samples of sizes n and m from two

More information

INTERVAL ESTIMATION AND HYPOTHESES TESTING

INTERVAL ESTIMATION AND HYPOTHESES TESTING INTERVAL ESTIMATION AND HYPOTHESES TESTING 1. IDEA An interval rather than a point estimate is often of interest. Confidence intervals are thus important in empirical work. To construct interval estimates,

More information

Simple and Multiple Linear Regression

Simple and Multiple Linear Regression Sta. 113 Chapter 12 and 13 of Devore March 12, 2010 Table of contents 1 Simple Linear Regression 2 Model Simple Linear Regression A simple linear regression model is given by Y = β 0 + β 1 x + ɛ where

More information

Linear models and their mathematical foundations: Simple linear regression

Linear models and their mathematical foundations: Simple linear regression Linear models and their mathematical foundations: Simple linear regression Steffen Unkel Department of Medical Statistics University Medical Center Göttingen, Germany Winter term 2018/19 1/21 Introduction

More information

Probability and Statistics Notes

Probability and Statistics Notes Probability and Statistics Notes Chapter Seven Jesse Crawford Department of Mathematics Tarleton State University Spring 2011 (Tarleton State University) Chapter Seven Notes Spring 2011 1 / 42 Outline

More information

Linear Models and Estimation by Least Squares

Linear Models and Estimation by Least Squares Linear Models and Estimation by Least Squares Jin-Lung Lin 1 Introduction Causal relation investigation lies in the heart of economics. Effect (Dependent variable) cause (Independent variable) Example:

More information

Math 628 In-class Exam 2 04/03/2013

Math 628 In-class Exam 2 04/03/2013 Math 628 In-class Exam 2 04/03/2013 Name: KU ID: Note: Show ALL work clearly in the space provided. In order to receive full credit on a problem, solution methods must be complete, logical and understandable.

More information

Theorem A: Expectations of Sums of Squares Under the two-way ANOVA model, E(X i X) 2 = (µ i µ) 2 + n 1 n σ2

Theorem A: Expectations of Sums of Squares Under the two-way ANOVA model, E(X i X) 2 = (µ i µ) 2 + n 1 n σ2 identity Y ijk Ȳ = (Y ijk Ȳij ) + (Ȳi Ȳ ) + (Ȳ j Ȳ ) + (Ȳij Ȳi Ȳ j + Ȳ ) Theorem A: Expectations of Sums of Squares Under the two-way ANOVA model, (1) E(MSE) = E(SSE/[IJ(K 1)]) = (2) E(MSA) = E(SSA/(I

More information

Fractional Factorial Designs

Fractional Factorial Designs k-p Fractional Factorial Designs Fractional Factorial Designs If we have 7 factors, a 7 factorial design will require 8 experiments How much information can we obtain from fewer experiments, e.g. 7-4 =

More information

Ch 2: Simple Linear Regression

Ch 2: Simple Linear Regression Ch 2: Simple Linear Regression 1. Simple Linear Regression Model A simple regression model with a single regressor x is y = β 0 + β 1 x + ɛ, where we assume that the error ɛ is independent random component

More information

[y i α βx i ] 2 (2) Q = i=1

[y i α βx i ] 2 (2) Q = i=1 Least squares fits This section has no probability in it. There are no random variables. We are given n points (x i, y i ) and want to find the equation of the line that best fits them. We take the equation

More information

STA121: Applied Regression Analysis

STA121: Applied Regression Analysis STA121: Applied Regression Analysis Linear Regression Analysis - Chapters 3 and 4 in Dielman Artin Department of Statistical Science September 15, 2009 Outline 1 Simple Linear Regression Analysis 2 Using

More information

Chapter 4. Regression Models. Learning Objectives

Chapter 4. Regression Models. Learning Objectives Chapter 4 Regression Models To accompany Quantitative Analysis for Management, Eleventh Edition, by Render, Stair, and Hanna Power Point slides created by Brian Peterson Learning Objectives After completing

More information

1 One-way analysis of variance

1 One-way analysis of variance LIST OF FORMULAS (Version from 21. November 2014) STK2120 1 One-way analysis of variance Assume X ij = µ+α i +ɛ ij ; j = 1, 2,..., J i ; i = 1, 2,..., I ; where ɛ ij -s are independent and N(0, σ 2 ) distributed.

More information

Chapter 8: Hypothesis Testing Lecture 9: Likelihood ratio tests

Chapter 8: Hypothesis Testing Lecture 9: Likelihood ratio tests Chapter 8: Hypothesis Testing Lecture 9: Likelihood ratio tests Throughout this chapter we consider a sample X taken from a population indexed by θ Θ R k. Instead of estimating the unknown parameter, we

More information

Institute of Actuaries of India

Institute of Actuaries of India Institute of Actuaries of India Subject CT3 Probability & Mathematical Statistics May 2011 Examinations INDICATIVE SOLUTION Introduction The indicative solution has been written by the Examiners with the

More information

STAT 4385 Topic 01: Introduction & Review

STAT 4385 Topic 01: Introduction & Review STAT 4385 Topic 01: Introduction & Review Xiaogang Su, Ph.D. Department of Mathematical Science University of Texas at El Paso xsu@utep.edu Spring, 2016 Outline Welcome What is Regression Analysis? Basics

More information

Formal Statement of Simple Linear Regression Model

Formal Statement of Simple Linear Regression Model Formal Statement of Simple Linear Regression Model Y i = β 0 + β 1 X i + ɛ i Y i value of the response variable in the i th trial β 0 and β 1 are parameters X i is a known constant, the value of the predictor

More information

Week 14 Comparing k(> 2) Populations

Week 14 Comparing k(> 2) Populations Week 14 Comparing k(> 2) Populations Week 14 Objectives Methods associated with testing for the equality of k(> 2) means or proportions are presented. Post-testing concepts and analysis are introduced.

More information

Sociology 6Z03 Review II

Sociology 6Z03 Review II Sociology 6Z03 Review II John Fox McMaster University Fall 2016 John Fox (McMaster University) Sociology 6Z03 Review II Fall 2016 1 / 35 Outline: Review II Probability Part I Sampling Distributions Probability

More information

STAT420 Midterm Exam. University of Illinois Urbana-Champaign October 19 (Friday), :00 4:15p. SOLUTIONS (Yellow)

STAT420 Midterm Exam. University of Illinois Urbana-Champaign October 19 (Friday), :00 4:15p. SOLUTIONS (Yellow) STAT40 Midterm Exam University of Illinois Urbana-Champaign October 19 (Friday), 018 3:00 4:15p SOLUTIONS (Yellow) Question 1 (15 points) (10 points) 3 (50 points) extra ( points) Total (77 points) Points

More information

Notes for Week 13 Analysis of Variance (ANOVA) continued WEEK 13 page 1

Notes for Week 13 Analysis of Variance (ANOVA) continued WEEK 13 page 1 Notes for Wee 13 Analysis of Variance (ANOVA) continued WEEK 13 page 1 Exam 3 is on Friday May 1. A part of one of the exam problems is on Predictiontervals : When randomly sampling from a normal population

More information

Correlation analysis. Contents

Correlation analysis. Contents Correlation analysis Contents 1 Correlation analysis 2 1.1 Distribution function and independence of random variables.......... 2 1.2 Measures of statistical links between two random variables...........

More information

Table of z values and probabilities for the standard normal distribution. z is the first column plus the top row. Each cell shows P(X z).

Table of z values and probabilities for the standard normal distribution. z is the first column plus the top row. Each cell shows P(X z). Table of z values and probabilities for the standard normal distribution. z is the first column plus the top row. Each cell shows P(X z). For example P(X.04) =.8508. For z < 0 subtract the value from,

More information

Chapter 4: Regression Models

Chapter 4: Regression Models Sales volume of company 1 Textbook: pp. 129-164 Chapter 4: Regression Models Money spent on advertising 2 Learning Objectives After completing this chapter, students will be able to: Identify variables,

More information

Simple Linear Regression

Simple Linear Regression Simple Linear Regression In simple linear regression we are concerned about the relationship between two variables, X and Y. There are two components to such a relationship. 1. The strength of the relationship.

More information

Lecture 5: ANOVA and Correlation

Lecture 5: ANOVA and Correlation Lecture 5: ANOVA and Correlation Ani Manichaikul amanicha@jhsph.edu 23 April 2007 1 / 62 Comparing Multiple Groups Continous data: comparing means Analysis of variance Binary data: comparing proportions

More information

Chapter 5: HYPOTHESIS TESTING

Chapter 5: HYPOTHESIS TESTING MATH411: Applied Statistics Dr. YU, Chi Wai Chapter 5: HYPOTHESIS TESTING 1 WHAT IS HYPOTHESIS TESTING? As its name indicates, it is about a test of hypothesis. To be more precise, we would first translate

More information

In a one-way ANOVA, the total sums of squares among observations is partitioned into two components: Sums of squares represent:

In a one-way ANOVA, the total sums of squares among observations is partitioned into two components: Sums of squares represent: Activity #10: AxS ANOVA (Repeated subjects design) Resources: optimism.sav So far in MATH 300 and 301, we have studied the following hypothesis testing procedures: 1) Binomial test, sign-test, Fisher s

More information

Asymptotic Statistics-III. Changliang Zou

Asymptotic Statistics-III. Changliang Zou Asymptotic Statistics-III Changliang Zou The multivariate central limit theorem Theorem (Multivariate CLT for iid case) Let X i be iid random p-vectors with mean µ and and covariance matrix Σ. Then n (

More information

STA301- Statistics and Probability Solved Subjective From Final term Papers. STA301- Statistics and Probability Final Term Examination - Spring 2012

STA301- Statistics and Probability Solved Subjective From Final term Papers. STA301- Statistics and Probability Final Term Examination - Spring 2012 STA30- Statistics and Probability Solved Subjective From Final term Papers Feb 6,03 MC004085 Moaaz.pk@gmail.com Mc004085@gmail.com PSMD0 STA30- Statistics and Probability Final Term Examination - Spring

More information

Hypothesis Test. The opposite of the null hypothesis, called an alternative hypothesis, becomes

Hypothesis Test. The opposite of the null hypothesis, called an alternative hypothesis, becomes Neyman-Pearson paradigm. Suppose that a researcher is interested in whether the new drug works. The process of determining whether the outcome of the experiment points to yes or no is called hypothesis

More information

Statistics For Economics & Business

Statistics For Economics & Business Statistics For Economics & Business Analysis of Variance In this chapter, you learn: Learning Objectives The basic concepts of experimental design How to use one-way analysis of variance to test for differences

More information

Chap The McGraw-Hill Companies, Inc. All rights reserved.

Chap The McGraw-Hill Companies, Inc. All rights reserved. 11 pter11 Chap Analysis of Variance Overview of ANOVA Multiple Comparisons Tests for Homogeneity of Variances Two-Factor ANOVA Without Replication General Linear Model Experimental Design: An Overview

More information

Regression and Statistical Inference

Regression and Statistical Inference Regression and Statistical Inference Walid Mnif wmnif@uwo.ca Department of Applied Mathematics The University of Western Ontario, London, Canada 1 Elements of Probability 2 Elements of Probability CDF&PDF

More information

Solution: First note that the power function of the test is given as follows,

Solution: First note that the power function of the test is given as follows, Problem 4.5.8: Assume the life of a tire given by X is distributed N(θ, 5000 ) Past experience indicates that θ = 30000. The manufacturere claims the tires made by a new process have mean θ > 30000. Is

More information

Probability Theory and Statistics. Peter Jochumzen

Probability Theory and Statistics. Peter Jochumzen Probability Theory and Statistics Peter Jochumzen April 18, 2016 Contents 1 Probability Theory And Statistics 3 1.1 Experiment, Outcome and Event................................ 3 1.2 Probability............................................

More information

Qualifying Exam in Probability and Statistics. https://www.soa.org/files/edu/edu-exam-p-sample-quest.pdf

Qualifying Exam in Probability and Statistics. https://www.soa.org/files/edu/edu-exam-p-sample-quest.pdf Part : Sample Problems for the Elementary Section of Qualifying Exam in Probability and Statistics https://www.soa.org/files/edu/edu-exam-p-sample-quest.pdf Part 2: Sample Problems for the Advanced Section

More information

Statistics. Statistics

Statistics. Statistics The main aims of statistics 1 1 Choosing a model 2 Estimating its parameter(s) 1 point estimates 2 interval estimates 3 Testing hypotheses Distributions used in statistics: χ 2 n-distribution 2 Let X 1,

More information

UQ, Semester 1, 2017, Companion to STAT2201/CIVL2530 Exam Formulae and Tables

UQ, Semester 1, 2017, Companion to STAT2201/CIVL2530 Exam Formulae and Tables UQ, Semester 1, 2017, Companion to STAT2201/CIVL2530 Exam Formulae and Tables To be provided to students with STAT2201 or CIVIL-2530 (Probability and Statistics) Exam Main exam date: Tuesday, 20 June 1

More information

Problems. Suppose both models are fitted to the same data. Show that SS Res, A SS Res, B

Problems. Suppose both models are fitted to the same data. Show that SS Res, A SS Res, B Simple Linear Regression 35 Problems 1 Consider a set of data (x i, y i ), i =1, 2,,n, and the following two regression models: y i = β 0 + β 1 x i + ε, (i =1, 2,,n), Model A y i = γ 0 + γ 1 x i + γ 2

More information

Analysis of Variance

Analysis of Variance Analysis of Variance Math 36b May 7, 2009 Contents 2 ANOVA: Analysis of Variance 16 2.1 Basic ANOVA........................... 16 2.1.1 the model......................... 17 2.1.2 treatment sum of squares.................

More information

Stat 704 Data Analysis I Probability Review

Stat 704 Data Analysis I Probability Review 1 / 39 Stat 704 Data Analysis I Probability Review Dr. Yen-Yi Ho Department of Statistics, University of South Carolina A.3 Random Variables 2 / 39 def n: A random variable is defined as a function that

More information

PHP2510: Principles of Biostatistics & Data Analysis. Lecture X: Hypothesis testing. PHP 2510 Lec 10: Hypothesis testing 1

PHP2510: Principles of Biostatistics & Data Analysis. Lecture X: Hypothesis testing. PHP 2510 Lec 10: Hypothesis testing 1 PHP2510: Principles of Biostatistics & Data Analysis Lecture X: Hypothesis testing PHP 2510 Lec 10: Hypothesis testing 1 In previous lectures we have encountered problems of estimating an unknown population

More information

STA2601. Tutorial letter 203/2/2017. Applied Statistics II. Semester 2. Department of Statistics STA2601/203/2/2017. Solutions to Assignment 03

STA2601. Tutorial letter 203/2/2017. Applied Statistics II. Semester 2. Department of Statistics STA2601/203/2/2017. Solutions to Assignment 03 STA60/03//07 Tutorial letter 03//07 Applied Statistics II STA60 Semester Department of Statistics Solutions to Assignment 03 Define tomorrow. university of south africa QUESTION (a) (i) The normal quantile

More information

Bivariate distributions

Bivariate distributions Bivariate distributions 3 th October 017 lecture based on Hogg Tanis Zimmerman: Probability and Statistical Inference (9th ed.) Bivariate Distributions of the Discrete Type The Correlation Coefficient

More information

iron retention (log) high Fe2+ medium Fe2+ high Fe3+ medium Fe3+ low Fe2+ low Fe3+ 2 Two-way ANOVA

iron retention (log) high Fe2+ medium Fe2+ high Fe3+ medium Fe3+ low Fe2+ low Fe3+ 2 Two-way ANOVA iron retention (log) 0 1 2 3 high Fe2+ high Fe3+ low Fe2+ low Fe3+ medium Fe2+ medium Fe3+ 2 Two-way ANOVA In the one-way design there is only one factor. What if there are several factors? Often, we are

More information

Chapter 11 - Lecture 1 Single Factor ANOVA

Chapter 11 - Lecture 1 Single Factor ANOVA April 5, 2013 Chapter 9 : hypothesis testing for one population mean. Chapter 10: hypothesis testing for two population means. What comes next? Chapter 9 : hypothesis testing for one population mean. Chapter

More information

Asymptotic Statistics-VI. Changliang Zou

Asymptotic Statistics-VI. Changliang Zou Asymptotic Statistics-VI Changliang Zou Kolmogorov-Smirnov distance Example (Kolmogorov-Smirnov confidence intervals) We know given α (0, 1), there is a well-defined d = d α,n such that, for any continuous

More information

Chapter 12 - Lecture 2 Inferences about regression coefficient

Chapter 12 - Lecture 2 Inferences about regression coefficient Chapter 12 - Lecture 2 Inferences about regression coefficient April 19th, 2010 Facts about slope Test Statistic Confidence interval Hypothesis testing Test using ANOVA Table Facts about slope In previous

More information

Chapter 15: Analysis of Variance

Chapter 15: Analysis of Variance Chapter 5: Analysis of Variance 5. Introduction In this chapter, we introduced the analysis of variance technique, which deals with problems whose objective is to compare two or more populations of quantitative

More information

PSY 307 Statistics for the Behavioral Sciences. Chapter 20 Tests for Ranked Data, Choosing Statistical Tests

PSY 307 Statistics for the Behavioral Sciences. Chapter 20 Tests for Ranked Data, Choosing Statistical Tests PSY 307 Statistics for the Behavioral Sciences Chapter 20 Tests for Ranked Data, Choosing Statistical Tests What To Do with Non-normal Distributions Tranformations (pg 382): The shape of the distribution

More information

Review of Statistics

Review of Statistics Review of Statistics Topics Descriptive Statistics Mean, Variance Probability Union event, joint event Random Variables Discrete and Continuous Distributions, Moments Two Random Variables Covariance and

More information

i=1 X i/n i=1 (X i X) 2 /(n 1). Find the constant c so that the statistic c(x X n+1 )/S has a t-distribution. If n = 8, determine k such that

i=1 X i/n i=1 (X i X) 2 /(n 1). Find the constant c so that the statistic c(x X n+1 )/S has a t-distribution. If n = 8, determine k such that Math 47 Homework Assignment 4 Problem 411 Let X 1, X,, X n, X n+1 be a random sample of size n + 1, n > 1, from a distribution that is N(µ, σ ) Let X = n i=1 X i/n and S = n i=1 (X i X) /(n 1) Find the

More information

Statistics, Data Analysis, and Simulation SS 2015

Statistics, Data Analysis, and Simulation SS 2015 Statistics, Data Analysis, and Simulation SS 2015 08.128.730 Statistik, Datenanalyse und Simulation Dr. Michael O. Distler Mainz, 27. April 2015 Dr. Michael O. Distler

More information

EC212: Introduction to Econometrics Review Materials (Wooldridge, Appendix)

EC212: Introduction to Econometrics Review Materials (Wooldridge, Appendix) 1 EC212: Introduction to Econometrics Review Materials (Wooldridge, Appendix) Taisuke Otsu London School of Economics Summer 2018 A.1. Summation operator (Wooldridge, App. A.1) 2 3 Summation operator For

More information

Review. December 4 th, Review

Review. December 4 th, Review December 4 th, 2017 Att. Final exam: Course evaluation Friday, 12/14/2018, 10:30am 12:30pm Gore Hall 115 Overview Week 2 Week 4 Week 7 Week 10 Week 12 Chapter 6: Statistics and Sampling Distributions Chapter

More information

Regression Models. Chapter 4. Introduction. Introduction. Introduction

Regression Models. Chapter 4. Introduction. Introduction. Introduction Chapter 4 Regression Models Quantitative Analysis for Management, Tenth Edition, by Render, Stair, and Hanna 008 Prentice-Hall, Inc. Introduction Regression analysis is a very valuable tool for a manager

More information

STAT 135 Lab 7 Distributions derived from the normal distribution, and comparing independent samples.

STAT 135 Lab 7 Distributions derived from the normal distribution, and comparing independent samples. STAT 135 Lab 7 Distributions derived from the normal distribution, and comparing independent samples. Rebecca Barter March 16, 2015 The χ 2 distribution The χ 2 distribution We have seen several instances

More information

Chapter 16. Simple Linear Regression and dcorrelation

Chapter 16. Simple Linear Regression and dcorrelation Chapter 16 Simple Linear Regression and dcorrelation 16.1 Regression Analysis Our problem objective is to analyze the relationship between interval variables; regression analysis is the first tool we will

More information

557: MATHEMATICAL STATISTICS II HYPOTHESIS TESTING: EXAMPLES

557: MATHEMATICAL STATISTICS II HYPOTHESIS TESTING: EXAMPLES 557: MATHEMATICAL STATISTICS II HYPOTHESIS TESTING: EXAMPLES Example Suppose that X,..., X n N, ). To test H 0 : 0 H : the most powerful test at level α is based on the statistic λx) f π) X x ) n/ exp

More information

1 Statistical inference for a population mean

1 Statistical inference for a population mean 1 Statistical inference for a population mean 1. Inference for a large sample, known variance Suppose X 1,..., X n represents a large random sample of data from a population with unknown mean µ and known

More information

Purposes of Data Analysis. Variables and Samples. Parameters and Statistics. Part 1: Probability Distributions

Purposes of Data Analysis. Variables and Samples. Parameters and Statistics. Part 1: Probability Distributions Part 1: Probability Distributions Purposes of Data Analysis True Distributions or Relationships in the Earths System Probability Distribution Normal Distribution Student-t Distribution Chi Square Distribution

More information

Cherry Blossom run (1) The credit union Cherry Blossom Run is a 10 mile race that takes place every year in D.C. In 2009 there were participants

Cherry Blossom run (1) The credit union Cherry Blossom Run is a 10 mile race that takes place every year in D.C. In 2009 there were participants 18.650 Statistics for Applications Chapter 5: Parametric hypothesis testing 1/37 Cherry Blossom run (1) The credit union Cherry Blossom Run is a 10 mile race that takes place every year in D.C. In 2009

More information

STAT 135 Lab 6 Duality of Hypothesis Testing and Confidence Intervals, GLRT, Pearson χ 2 Tests and Q-Q plots. March 8, 2015

STAT 135 Lab 6 Duality of Hypothesis Testing and Confidence Intervals, GLRT, Pearson χ 2 Tests and Q-Q plots. March 8, 2015 STAT 135 Lab 6 Duality of Hypothesis Testing and Confidence Intervals, GLRT, Pearson χ 2 Tests and Q-Q plots March 8, 2015 The duality between CI and hypothesis testing The duality between CI and hypothesis

More information

The t-distribution. Patrick Breheny. October 13. z tests The χ 2 -distribution The t-distribution Summary

The t-distribution. Patrick Breheny. October 13. z tests The χ 2 -distribution The t-distribution Summary Patrick Breheny October 13 Patrick Breheny Biostatistical Methods I (BIOS 5710) 1/25 Introduction Introduction What s wrong with z-tests? So far we ve (thoroughly!) discussed how to carry out hypothesis

More information

Two or more categorical predictors. 2.1 Two fixed effects

Two or more categorical predictors. 2.1 Two fixed effects Two or more categorical predictors Here we extend the ANOVA methods to handle multiple categorical predictors. The statistician has to watch carefully to see whether the effects being considered are properly

More information

Statistical methods for comparing multiple groups. Lecture 7: ANOVA. ANOVA: Definition. ANOVA: Concepts

Statistical methods for comparing multiple groups. Lecture 7: ANOVA. ANOVA: Definition. ANOVA: Concepts Statistical methods for comparing multiple groups Lecture 7: ANOVA Sandy Eckel seckel@jhsph.edu 30 April 2008 Continuous data: comparing multiple means Analysis of variance Binary data: comparing multiple

More information

ECON 4160, Autumn term Lecture 1

ECON 4160, Autumn term Lecture 1 ECON 4160, Autumn term 2017. Lecture 1 a) Maximum Likelihood based inference. b) The bivariate normal model Ragnar Nymoen University of Oslo 24 August 2017 1 / 54 Principles of inference I Ordinary least

More information

Section 4.6 Simple Linear Regression

Section 4.6 Simple Linear Regression Section 4.6 Simple Linear Regression Objectives ˆ Basic philosophy of SLR and the regression assumptions ˆ Point & interval estimation of the model parameters, and how to make predictions ˆ Point and interval

More information

" M A #M B. Standard deviation of the population (Greek lowercase letter sigma) σ 2

 M A #M B. Standard deviation of the population (Greek lowercase letter sigma) σ 2 Notation and Equations for Final Exam Symbol Definition X The variable we measure in a scientific study n The size of the sample N The size of the population M The mean of the sample µ The mean of the

More information

COMPSCI 240: Reasoning Under Uncertainty

COMPSCI 240: Reasoning Under Uncertainty COMPSCI 240: Reasoning Under Uncertainty Andrew Lan and Nic Herndon University of Massachusetts at Amherst Spring 2019 Lecture 20: Central limit theorem & The strong law of large numbers Markov and Chebyshev

More information

Exercises and Answers to Chapter 1

Exercises and Answers to Chapter 1 Exercises and Answers to Chapter The continuous type of random variable X has the following density function: a x, if < x < a, f (x), otherwise. Answer the following questions. () Find a. () Obtain mean

More information

Hypothesis Testing hypothesis testing approach

Hypothesis Testing hypothesis testing approach Hypothesis Testing In this case, we d be trying to form an inference about that neighborhood: Do people there shop more often those people who are members of the larger population To ascertain this, we

More information

Question. Hypothesis testing. Example. Answer: hypothesis. Test: true or not? Question. Average is not the mean! μ average. Random deviation or not?

Question. Hypothesis testing. Example. Answer: hypothesis. Test: true or not? Question. Average is not the mean! μ average. Random deviation or not? Hypothesis testing Question Very frequently: what is the possible value of μ? Sample: we know only the average! μ average. Random deviation or not? Standard error: the measure of the random deviation.

More information

Table of z values and probabilities for the standard normal distribution. z is the first column plus the top row. Each cell shows P(X z).

Table of z values and probabilities for the standard normal distribution. z is the first column plus the top row. Each cell shows P(X z). Table of z values and probabilities for the standard normal distribution. z is the first column plus the top row. Each cell shows P(X z). For example P(X 1.04) =.8508. For z < 0 subtract the value from

More information

2 Hand-out 2. Dr. M. P. M. M. M c Loughlin Revised 2018

2 Hand-out 2. Dr. M. P. M. M. M c Loughlin Revised 2018 Math 403 - P. & S. III - Dr. McLoughlin - 1 2018 2 Hand-out 2 Dr. M. P. M. M. M c Loughlin Revised 2018 3. Fundamentals 3.1. Preliminaries. Suppose we can produce a random sample of weights of 10 year-olds

More information

(ii) Scan your answer sheets INTO ONE FILE only, and submit it in the drop-box.

(ii) Scan your answer sheets INTO ONE FILE only, and submit it in the drop-box. FINAL EXAM ** Two different ways to submit your answer sheet (i) Use MS-Word and place it in a drop-box. (ii) Scan your answer sheets INTO ONE FILE only, and submit it in the drop-box. Deadline: December

More information

Chapter 10: Analysis of variance (ANOVA)

Chapter 10: Analysis of variance (ANOVA) Chapter 10: Analysis of variance (ANOVA) ANOVA (Analysis of variance) is a collection of techniques for dealing with more general experiments than the previous one-sample or two-sample tests. We first

More information

Course: ESO-209 Home Work: 1 Instructor: Debasis Kundu

Course: ESO-209 Home Work: 1 Instructor: Debasis Kundu Home Work: 1 1. Describe the sample space when a coin is tossed (a) once, (b) three times, (c) n times, (d) an infinite number of times. 2. A coin is tossed until for the first time the same result appear

More information

2017 Financial Mathematics Orientation - Statistics

2017 Financial Mathematics Orientation - Statistics 2017 Financial Mathematics Orientation - Statistics Written by Long Wang Edited by Joshua Agterberg August 21, 2018 Contents 1 Preliminaries 5 1.1 Samples and Population............................. 5

More information

We need to define some concepts that are used in experiments.

We need to define some concepts that are used in experiments. Chapter 0 Analysis of Variance (a.k.a. Designing and Analysing Experiments) Section 0. Introduction In Chapter we mentioned some different ways in which we could get data: Surveys, Observational Studies,

More information

Inference for Regression

Inference for Regression Inference for Regression Section 9.4 Cathy Poliak, Ph.D. cathy@math.uh.edu Office in Fleming 11c Department of Mathematics University of Houston Lecture 13b - 3339 Cathy Poliak, Ph.D. cathy@math.uh.edu

More information

Masters Comprehensive Examination Department of Statistics, University of Florida

Masters Comprehensive Examination Department of Statistics, University of Florida Masters Comprehensive Examination Department of Statistics, University of Florida May 6, 003, 8:00 am - :00 noon Instructions: You have four hours to answer questions in this examination You must show

More information

Lectures on Simple Linear Regression Stat 431, Summer 2012

Lectures on Simple Linear Regression Stat 431, Summer 2012 Lectures on Simple Linear Regression Stat 43, Summer 0 Hyunseung Kang July 6-8, 0 Last Updated: July 8, 0 :59PM Introduction Previously, we have been investigating various properties of the population

More information

Evaluating Hypotheses

Evaluating Hypotheses Evaluating Hypotheses IEEE Expert, October 1996 1 Evaluating Hypotheses Sample error, true error Confidence intervals for observed hypothesis error Estimators Binomial distribution, Normal distribution,

More information

F79SM STATISTICAL METHODS

F79SM STATISTICAL METHODS F79SM STATISTICAL METHODS SUMMARY NOTES 9 Hypothesis testing 9.1 Introduction As before we have a random sample x of size n of a population r.v. X with pdf/pf f(x;θ). The distribution we assign to X is

More information

z and t tests for the mean of a normal distribution Confidence intervals for the mean Binomial tests

z and t tests for the mean of a normal distribution Confidence intervals for the mean Binomial tests z and t tests for the mean of a normal distribution Confidence intervals for the mean Binomial tests Chapters 3.5.1 3.5.2, 3.3.2 Prof. Tesler Math 283 Fall 2018 Prof. Tesler z and t tests for mean Math

More information

Hypothesis Testing. 1 Definitions of test statistics. CB: chapter 8; section 10.3

Hypothesis Testing. 1 Definitions of test statistics. CB: chapter 8; section 10.3 Hypothesis Testing CB: chapter 8; section 0.3 Hypothesis: statement about an unknown population parameter Examples: The average age of males in Sweden is 7. (statement about population mean) The lowest

More information

Ch 3: Multiple Linear Regression

Ch 3: Multiple Linear Regression Ch 3: Multiple Linear Regression 1. Multiple Linear Regression Model Multiple regression model has more than one regressor. For example, we have one response variable and two regressor variables: 1. delivery

More information

Statistical Inference: Estimation and Confidence Intervals Hypothesis Testing

Statistical Inference: Estimation and Confidence Intervals Hypothesis Testing Statistical Inference: Estimation and Confidence Intervals Hypothesis Testing 1 In most statistics problems, we assume that the data have been generated from some unknown probability distribution. We desire

More information

MATH 728 Homework 3. Oleksandr Pavlenko

MATH 728 Homework 3. Oleksandr Pavlenko MATH 78 Homewor 3 Olesandr Pavleno 4.5.8 Let us say the life of a tire in miles, say X, is normally distributed with mean θ and standard deviation 5000. Past experience indicates that θ = 30000. The manufacturer

More information

Master s Written Examination

Master s Written Examination Master s Written Examination Option: Statistics and Probability Spring 016 Full points may be obtained for correct answers to eight questions. Each numbered question which may have several parts is worth

More information

Space Telescope Science Institute statistics mini-course. October Inference I: Estimation, Confidence Intervals, and Tests of Hypotheses

Space Telescope Science Institute statistics mini-course. October Inference I: Estimation, Confidence Intervals, and Tests of Hypotheses Space Telescope Science Institute statistics mini-course October 2011 Inference I: Estimation, Confidence Intervals, and Tests of Hypotheses James L Rosenberger Acknowledgements: Donald Richards, William

More information

PCMI Introduction to Random Matrix Theory Handout # REVIEW OF PROBABILITY THEORY. Chapter 1 - Events and Their Probabilities

PCMI Introduction to Random Matrix Theory Handout # REVIEW OF PROBABILITY THEORY. Chapter 1 - Events and Their Probabilities PCMI 207 - Introduction to Random Matrix Theory Handout #2 06.27.207 REVIEW OF PROBABILITY THEORY Chapter - Events and Their Probabilities.. Events as Sets Definition (σ-field). A collection F of subsets

More information

Practice Problems Section Problems

Practice Problems Section Problems Practice Problems Section 4-4-3 4-4 4-5 4-6 4-7 4-8 4-10 Supplemental Problems 4-1 to 4-9 4-13, 14, 15, 17, 19, 0 4-3, 34, 36, 38 4-47, 49, 5, 54, 55 4-59, 60, 63 4-66, 68, 69, 70, 74 4-79, 81, 84 4-85,

More information

DESAIN EKSPERIMEN Analysis of Variances (ANOVA) Semester Genap 2017/2018 Jurusan Teknik Industri Universitas Brawijaya

DESAIN EKSPERIMEN Analysis of Variances (ANOVA) Semester Genap 2017/2018 Jurusan Teknik Industri Universitas Brawijaya DESAIN EKSPERIMEN Analysis of Variances (ANOVA) Semester Jurusan Teknik Industri Universitas Brawijaya Outline Introduction The Analysis of Variance Models for the Data Post-ANOVA Comparison of Means Sample

More information

Finding Relationships Among Variables

Finding Relationships Among Variables Finding Relationships Among Variables BUS 230: Business and Economic Research and Communication 1 Goals Specific goals: Re-familiarize ourselves with basic statistics ideas: sampling distributions, hypothesis

More information