STAT 135: Anova. Joan Bruna. April 10, Department of Statistics UC, Berkeley

Size: px

Start display at page:

Download "STAT 135: Anova. Joan Bruna. April 10, Department of Statistics UC, Berkeley"

Drusilla McDaniel
6 years ago
Views:

1 Department of Statistics UC, Berkeley April 10, 2015

2 Motivation Say you want to buy a pair of shoes. You go to a store and observe a large sample of shoes with varying price. Where does the variance in the price come from?

3 Motivation Say you want to buy a pair of shoes. You go to a store and observe a large sample of shoes with varying price. Where does the variance in the price come from? The color? The size? The brand? The material?

4 Motivation Say you want to buy a pair of shoes. You go to a store and observe a large sample of shoes with varying price. Where does the variance in the price come from? The color? The size? The brand? The material? Relevant groups should help me explain the variance.

5 Motivation We want to compare the effect of several treatments or several groups. Also, analyze other factors simultaneously. Examples: Movie Recommendations. Testing Several Drugs. Include other factors, such as Age, Sex, etc.

6 Motivation We want to compare the effect of several treatments or several groups. Also, analyze other factors simultaneously. Examples: Movie Recommendations. Testing Several Drugs. Include other factors, such as Age, Sex, etc. We will introduce the statistical framework for such problems.

7 Different Settings One-way Layout: Independent Measurements under several Treatments. This generalizes the methods from previous Chapter. Two-way Layout: Analyze two factors simultaneously (ex: several Treatments + several age ranges).

8 Different Settings One-way Layout: Independent Measurements under several Treatments. This generalizes the methods from previous Chapter. Two-way Layout: Analyze two factors simultaneously (ex: several Treatments + several age ranges). For each setting, two different approaches: Parametric: Methods based on the Normal distribution. Non-parametric: Mostly based on rank statistics.

9 One-Way Layout Suppose we have I groups G 1,..., G I, with measurements G 1 G 2 G I Y 1,1 Y 2,1 Y I,1 Y 1,2 Y 2,2 Y I, Y 1,J Y 2,J Y I,J We assume for now the same # of measurements in each group. Question: Are the means of each group the same?

10 One-Way Layout Model Model observations as Y i,j = µ + α i + ɛ i,j, where µ: global mean across all groups. α i : differential effect of i-th group. ɛ i,j : random error of each observation.

11 One-Way Layout Model Model observations as Y i,j = µ + α i + ɛ i,j, where µ: global mean across all groups. α i : differential effect of i-th group. ɛ i,j : random error of each observation. Main assumption: Errors ɛ i,j are iid N (0, σ 2 ). I We define µ such that α i = 0. i=1 Testing equality of means: Null hypothesis is H 0 : i, α i = 0.

12 Testing equality of means Given data Y i,j, i = 1,..., I, j = 1,..., J, define Y = 1 IJ Y i,j, Y i = 1 Y i,j. J i,j j Fact We have I i=1 j=1 J (Y i,j Y ) 2 = } {{ } total variability I i=1 j=1 J (Y i,j Y i ) 2 + J } {{ } var. within groups I (Y i Y ) 2. i=1 }{{} var. between groups

13 Testing equality of means We write SS w = I i=1 j=1 J (Y i,j Y i ) 2 and SS b = J Theorem Under the previous assumptions, we have E (SS w ) = I (J 1)σ 2 and E (SS b ) = J I (Y i Y ) 2. i=1 I αi 2 + (I 1)σ 2. i=1

14 A useful lemma Lemma Let X 1,..., X n be independent rv s with E (X i ) = µ i and var (X i ) = σ 2. Write µ = 1 µ i. Then n Proof (sketch): i E ( (X i X ) 2) = (µ i µ) 2 + n 1 n σ2. 1 E ( Z 2) = E (Z) 2 + var (Z). 2 var (X Y ) = var (X ) + var (Y ) 2cov (X, Y ).

15 Proof of the Theorem I J 1 E (SS w ) = E ( (Y i,j Y i ) 2). i=1 j=1 2 Apply the lemma to Y i,j, knowing that E (Y i,j ) = µ + α i. I 3 E (SS b ) = J E ( (Y i Y ) 2). i=1 4 Apply again the lemma to Y i, knowing E ( ) Y i = µ + αi and var ( ) Y i = σ 2 /J.

16 Designing a test statistic Theorem Under the previous assumptions, SS w /σ 2 follows a χ 2 distribution with I (J 1) df. Under the null hypothesis, SS b /σ 2 follows a χ 2 distribution with I 1 df and is independent of SS w. Proof: Recall definition of χ 2. For independence part, recall that X i X n and X n are independent if X i are independent Gaussians.

17 Designing a test statistic We consider the statistic F = SS b/(i 1) SS w /(I (J 1)). If H 0 true, F 1. If H 0 false, F 1. Theorem The null distribution of F is the F -distribution with I 1 and I (J 1) degrees of freedom. Proof: Recall the definition of F -distribution.

18 Remarks Easy to generalize when each group has different # of measurements. The F -test is approximately valid even if ɛ i,j are not Gaussian, provided J is large enough. The test is also robust to the assumption of constant variance. Under the normality assumption, It is a generalized likelihood ratio test.

19 Example: Netflix dataset Are these movies all equally good?

20 Netflix dataset Average over groups of 20 users to gaussianize measurements. With J = 50 measurements, we obtain F = SS b/(3 1) SS w /(3 49) = 195. P H0 (F 195) 0 = Null is rejected!.

21 Limitations of the Method The test is quite weak: it does not inform us where are the differences. If we have I groups, why not compare all possible pairs (= I (I 1)/2)?

22 Limitations of the Method The test is quite weak: it does not inform us where are the differences. If we have I groups, why not compare all possible pairs (= I (I 1)/2)? That is OK, but we need to control joint Type-I error, ie, P( at least a pair rejects H 0 true ).

23 Tukey s Method Suppose sample sizes are equal. Then We estimate σ 2 by s 2 p = Y i µ i N (0, σ 2 /J). 1 I (J 1) We consider the random variable Z = (Y i,j Y i ) 2. i (Y i1 µ i1 ) (Y i2 µ i2 ) max i 1,i 2 {1,...,I } s p / J (this is called the studentized range distribution).. j.

24 Tukey s Method Denote by q(α) the 1 α quantile of Z. Then P( (Y i1 µ i1 ) (Y i2 µ i2 ) q(α) s p J, i 1, i 2 ) = P(max (Y i1 µ i1 ) (Y i2 µ i2 ) q(α) s p ) = 1 α. i 1,i 2 J We then have confidence intervals for each difference µ i1 µ i2 : Y i1 Y i2 ± q(α) s p J. Now, by duality c.i. and hypothesis tests, if an interval does not contain 0, i.e., if Y i1 Y i2 > q(α) s p J we reject the hypothesis that µ i1 = µ i2 at joint level α.

25 Bonferroni Method Basic idea: P(at least a pair rejects H 0 true) = P( i1 i 2 pair (i 1, i 2 ) rejects H 0 true) i 1 i 2 P( pair (i 1, i 2 ) rejects H 0 true ), 2α so if we design individual tests at significance, the joint I (I 1) test will have significance α. Recall that the pairwise test rejects at level α if Y i1 Y i2 > t I (J 1) (α/2)s p 2/J.

26 Revisit Example I = 3, J = 50. Set α = Tukey Method: We compute δ = q(α) s p J We measure Y 1 Y 2 Y 1 Y 3 Y 2 Y Bonferroni: the threshold becomes δ = t 147 (α/6)s p 1/ Same conclusions!

27 Non-parametric Method Similarly as in the two sample case, we can design a non-parametric test using ranks. Let It results that 1 I R i,j = rank of Y i,j in the combined sample, and i R i = 1 J J R i,j. j=1 R i = IJ + 1. We define the statistic 2 SS r = I i=1 (R i IJ ) 2. The null distribution of SS r can be computed by enumeration. For large enough J, K = distributed with I 1 df. 12 IJ(IJ + 1) SS r is approximately χ 2

28 Revisit Example In our Netflix example, we compute K = SS r 4900, If Z χ 2 2, P(Z > K) 0. We reject the null hypothesis as well!

29 A last Method Theorem If the number of groups is I = 14, J = 2015 and σ 2 = , then a test statistic of the null hypothesis µ i = µ i i i, is given by Λ = (SS b π) SS w log 4 Y e sinh 3(SSw SS. b/(ij) The null distribution of Λ is Γ(4, 2.105).

30 Two-Way Layout Experiment Now we are interested in experiments involving two factors, each with several classes. Examples: Assess different drugs across several age ranges. Analyze house prices in terms of surface and zip code. etc.

31 An Example Suppose we record the number of daily tweets per capita in terms of different locations and different tags: city cat dog coffee nyc sf paris Goal: describe the variability in terms of the different factors.

32 An Example Denoting Y i,j the entry in row i and column j, we start with the average: Y = 1 Y i,j = i,j We can also compute the average of each row Y r,i and each column Y c,j : Y r,1 = 1.9, Y r,2 = 1.97, Y r,3 = 1.5, Y c,1 = 2.13, Y c,2 = 1.23, Y c,3 = 2. Which effect is more important?

33 A Useful Graphic We can plot the two factors simultaneously:

34 An Example Compute the differences between global average and row/column averages: ˆα 1 = Y r,1 Y = 0.11, ˆα 2 = 0.178, ˆα 3 = 0.29, ˆβ 1 = Y c,1 Y = 0.34, ˆβ2 = 0.56, ˆβ3 = 0.21.

35 An Example Compute the differences between global average and row/column averages: ˆα 1 = Y r,1 Y = 0.11, ˆα 2 = 0.178, ˆα 3 = 0.29, ˆβ 1 = Y c,1 Y = 0.34, ˆβ2 = 0.56, ˆβ3 = The effect of the tag is more determinant than location.

36 The Additive Model This is a simple example of an additive model: Ŷ i,j = ˆµ + ˆα i + ˆβ j.

37 The Additive Model This is a simple example of an additive model: Ŷ i,j = ˆµ + ˆα i + ˆβ j. We have less parameters (I + J 1 than input dimensions IJ). It is a good model provided the factors do not interact. Question: How to spot visually if factors interact?

38 Good additive fit vs poor additive fit?

39 The Additive Model Interactions can be incorporated to make the model exact: δ i,j = Y i,j Ŷi,j = Y i,j ˆµ ˆα i ˆβ j = Y i,j Y r,i Y c,j + Y. We have δ i,j = i j δ i,j = 0.

40 Normal Theory for the Additive Model We assume K > 1 observations per cell, and two-way layout (balanced design). Y i,j,k : Observation k in cell (i, j) is modeled as Y i,j,k N (µ + α i + β j + δ i,j, σ 2 ), independent of the other observations. The parameters of the model satisfy α i = 0, β j = 0, δ i,j = i j i j δ i,j = 0.

41 Normal Theory Given observations Y i,j,k, we estimate the parameters of the model (ˆµ, ˆα i, ˆβj, δi,j ˆ ).

42 Normal Theory Given observations Y i,j,k, we estimate the parameters of the model (ˆµ, ˆα i, ˆβj, δi,j ˆ ). Maximum likelihood estimation under linear constraints: ˆµ = Y, ˆα i = Y r,i Y, ˆβ j = Y c,j Y, ˆ δ i,j = Y i,j + Y Y r,i Y c,j.

43 Two-way Anova Diagram We decompose the data variability:

44 Two-way Anova Diagram We decompose the data variability:

45 Sum of Squares decompositions With SS tot = i (Y i,j,k Y ) 2, SS E = j i k (Y i,j,k Y i,j ) 2, j k SS A = JK i (Y r,i Y ) 2, SS B = IK j (Y c,j Y ) 2, and we have SS AB = K j (Y i,j Y c,j Y r,i + Y ) 2, i SS tot = SS E + SS A + SS B + SS AB.

46 Distributions under Normal Model Theorem Under the normal model with uniform variance σ 2, we have E (SS E ) = IJ(K 1)σ 2, E (SS A ) = (I 1)σ 2 + JK i α 2 i, E (SS B ) = (J 1)σ 2 + IK j E (SS AB ) = (I 1)(J 1)σ 2 + K i βj 2, δi,j 2. j

47 Distributions under Normal Model Theorem Under the normal model with uniform variance σ 2, we have 1 SS E /σ 2 follows a chi-squared distribution with IJ(K 1) df. 2 Under the null hypothesis H A : α i = 0 i, SS A /σ 2 follows a chi-squared distribution with I 1 df. 3 Under the null hypothesis H B : β j = 0 j, SS B /σ 2 follows a chi-squared distribution with J 1 df. 4 Under the null hypothesis H AB : δ i,j = 0 i, j, SS AB /σ 2 follows a chi-squared distribution with (I 1)(J 1) df. 5 The different SS variables are independent of each other.

48 F -tests for the two-way layout Similarly as in the one-way layout case, we can test the null hypothesis with an F test. Example: Test whether factor A has an effect.

49 F -tests for the two-way layout Similarly as in the one-way layout case, we can test the null hypothesis with an F test. Example: Test whether factor A has an effect. Under H A, SS A σ 2 χ2 I 1, and SS E σ 2 χ2 IJ(K 1). Also, we know that E (SS E ) /(IJ(K 1)) = σ 2. The null distribution of SS A /(I 1) SS E /(IJ(K 1)) is the F -distribution with I 1 and IJ(K 1) degrees of freedom.

50 Example: Tweeter Data of tags vs cities Recall we measured daily tweets as a function of topic and city. city cat dog coffee nyc sf paris Each cell is the average of K = 8 daily measures Y i,j,k with σ approximately We want to test several hypothesis:

51 Example: Tweeter Data of tags vs cities Recall we measured daily tweets as a function of topic and city. city cat dog coffee nyc sf paris Each cell is the average of K = 8 daily measures Y i,j,k with σ approximately We want to test several hypothesis: Influence of topic?

52 Example: Tweeter Data of tags vs cities Recall we measured daily tweets as a function of topic and city. city cat dog coffee nyc sf paris Each cell is the average of K = 8 daily measures Y i,j,k with σ approximately We want to test several hypothesis: Influence of topic? Influce of city?

53 Example: Tweeter Data of tags vs cities Recall we measured daily tweets as a function of topic and city. city cat dog coffee nyc sf paris Each cell is the average of K = 8 daily measures Y i,j,k with σ approximately We want to test several hypothesis: Influence of topic? Influce of city? Do these factors interact?

54 The Famous ANOVA table Source df SS SS/df F p-value Topic City Interaction Error Total

55 The Famous ANOVA table Source df SS SS/df F p-value Topic City Interaction Error Total So, Enough evidence to say that Topic matters, Not enough evidence to say that City matters......but we are pretty sure that these factors interact.

56 Example Question: What do you expect to happen if we add more measurements (i.e we increase K)?

57 Example Question: What do you expect to happen if we add more measurements (i.e we increase K)? Results with K = 40: Source df SS SS/df F p-value Topic City Interaction Error Total

58 Example Question: What do you expect to happen if we add more measurements (i.e we increase K)? Results with K = 40: Source df SS SS/df F p-value Topic City Interaction Error Total As more data becomes available, our tests become more powerful.

59 One-Way layout vs Two-Way layout We can see a two-way layout as a special case of One-Way layout, with I J cells.

60 One-Way layout vs Two-Way layout We can see a two-way layout as a special case of One-Way layout, with I J cells. Hypothesis that there is no mean difference is equivalent to α i = 0, β j = 0, δ i,j = 0, i, j. Two-way layouts allow us to test more diverse hypothesis (eg presence or absence of interactions). They are more efficient, since they recycle observations to test different factors.

61 Randomized Block Designs Suppose we want to evaluate the appeal of I different logos worldwide. We pick J countries, and within each country we split randomly between I populations.

62 Randomized Block Designs Suppose we want to evaluate the appeal of I different logos worldwide. We pick J countries, and within each country we split randomly between I populations. Why?

63 Randomized Block Designs Suppose we want to evaluate the appeal of I different logos worldwide. We pick J countries, and within each country we split randomly between I populations. Why? By comparing the effect within each county, we control the variability across different countries. It is widely used in industry and in agricultural experiments It is the generalization of the matched pairs design from last chapter.

64 Randomized Block Designs We model the responses as Y i,j = µ + α i + β j + ɛ i,j, with independent zero-mean errors ɛ i,j and no interactions. We want to perform inference on the α i. With no interactions, we have E (SS A /(I 1)) = σ 2 + J αi 2, E (SS B /(J 1)) = σ 2 + I I 1 J 1 i E (SS AB /((I 1)(J 1))) = σ 2. j β 2 j

65 Randomized Block Designs To test H A : α i = 0, i = 1,... I we consider F = SS A /(I 1) SS AB /((I 1)(J 1)).

66 Randomized Block Designs To test H A : α i = 0, i = 1,... I we consider F = SS A /(I 1) SS AB /((I 1)(J 1)). Under H A, F F I 1,(I 1)(J 1). If there are interactions, we have E (SS AB /((I 1)(J 1))) = σ 2 + so the test is conservative. 1 (I 1)(J 1) δi,j 2, i,j

67 Example Suppose we want to choose the right color for our logo. We evaluate I different options across J different countries. Pantone 201 Pantone 202 Pantone 203 USA France UK China

68 Example We compute the statistic F = SS A /(I 1) SS AB /(I 1)(J 1) 6.2, which has a p-value of approximately The test is statistically significant.

69 Example We compute the statistic F = SS A /(I 1) SS AB /(I 1)(J 1) 6.2, which has a p-value of approximately The test is statistically significant. Question: What if we did a classic one-way ANOVA?

70 The advantage of randomized designs We compute the within variability across columns SS w = SS AB + SS B, and we consider the one-way F -statistic: F = SS A/(I 1) SS w /(I (J 1)) = 0.051, with a p-value of approximately The test cannot reject!

71 The advantage of randomized designs We compute the within variability across columns SS w = SS AB + SS B, and we consider the one-way F -statistic: F = SS A/(I 1) SS w /(I (J 1)) = 0.051, with a p-value of approximately The test cannot reject! Morality: If data comes paired/grouped, we should always exploit this information.

72 Friedman s Test Can we think of a non-parametric test for the Randomized block design?

73 Friedman s Test Can we think of a non-parametric test for the Randomized block design? Hint: Recall the case of paired samples

74 Friedman s Test Can we think of a non-parametric test for the Randomized block design? Hint: Recall the case of paired samples Idea: Use Ranks! Within each of the J blocks, we rank the observations: R i,j : rank of sample of treatment i within block j,

75 Friedman s Test Can we think of a non-parametric test for the Randomized block design? Hint: Recall the case of paired samples Idea: Use Ranks! Within each of the J blocks, we rank the observations: R i,j : rank of sample of treatment i within block j, then consider the average rank of each treatment: R i = 1 R i,j. J j

76 Friedman s Test To test the hypothesis that there is no effect of treatments, we consider SS r = J i (R i R) 2. For large enough sample sizes, the null distribution of Q = 12SS r I (J + 1) is approximately χ 2 with I 1 degrees of freedom.

77 Summary of Comparison Tests Test Layout Paired? Param? t-test X n Ȳm > t n+m 2(1 α/2)s p 1/n + 1/m 2 samples unp par Mann-Whitney 2 samples unp npar t-test D > t n 1 (1 α/2)s p 1/n 2 samples pair par Wilcoxon Signed-Rank 2 samples pair npar SS F -test b /(I 1) SSw /(I (J 1)) > F I 1,I (J 1)(1 α) One-Way unp par Bonferroni Y i Y i > t 2n 2 (1 α/(i (I 1)))s p 2/n One-Way (MT) unp par Kruskall-Wallis K = 12SS R IJ(IJ+1) > q I 1(1 α) One-Way unp npar SS F -test A /(I 1) SS E /(IJ(K 1)) > F I 1,IJ(K 1) (1 α) Two-Way unp par SS F -test B /(J 1) SS E /(IJ(K 1)) > F J 1,IJ(K 1) (1 α) Two-Way unp par F -test SS AB /(J 1)(I 1) > F SS E /(IJ(K 1)) (J 1)(I 1),IJ(K 1) (1 α) Two-Way unp par SS F -test A /(I 1) SS AB /((I 1)(J 1)) > F I 1,(I 1)(J 1) (1 α) Two-Way pair par Friedman test Q = 12SSr I (J+1) Two-Way pair npar

78 Tests for Categorical Data In many contexts, data consists of counts of many sorts. Examples: Number of tweets Number of votes etc. We will discuss tests specific for that sort of data.

79 Fisher s Exact Test Let us consider the example from p We consider an experiment where supervisors are handed a personnel file and have to decide whether to promote the employee or to hold the file and interview other candidates. Using randomization, 24 supervisors are given applications labeled as male employee, 24 supervisors receive applications labeled as female (the files were identical except for the label).

80 Fisher s Exact Test The results are: Question: Is there a gender bias? Male Female Promote Hold 3 10

81 Fisher s Exact Test Idea: Measure the likelihood of the imbalance as a result of the randomization.

82 Fisher s Exact Test Idea: Measure the likelihood of the imbalance as a result of the randomization. Null hypothesis H 0 : There is no sex bias. Denote the observed counts as Male Female total Promote N 1,1 = 21 N 1,2 = 14 n r,1 = 35 Hold N 2.1 = 3 N 2,2 = 10 n r,2 = 13 total n c,1 = 24 n c,2 = 24 n = 48 The margins n are fixed under the null hypothesis. In this example, there is only 1 degree of freedom for the N s.

83 Fisher s Exact Test Pick for example N 1,1. We have P H0 (N 1,1 = k) = ( nr,1 ) ( k nr,2 ) n c,1 k ( n ). n c,1 The p-value for N 1,1 = 21 is < Strong evidence of gender bias in this experiment.

84 Simpson s Paradox Another experiment about gender bias. UC Berkeley Graduate School Admissions (1973): Question: Is there Gender bias? Male Female Admitted Not admitted % Admitted 44 35

85 Fisher Exact Test for UC Berkeley data Male Female Total Admitted N 1,1 = n r,1 = 5226 Not admitted n r,2 = 7537 Total n c,1 = 8442 n c,2 = We have that This suggests gender bias... P H0 (N )

86 Simpson s Paradox...but let s examine what happens if we look at each particular department:

87 Simpson s Paradox...but let s examine what happens if we look at each particular department: Dept Men Women p-value Applicants Admitted Applicants Admitted A % % B % 25 68% 0.4 C % % 0.22 D % % 0.32 E % % 0.20 F 272 6% 341 7% 0.34

88 Simpson s Paradox...but let s examine what happens if we look at each particular department: Dept Men Women p-value Applicants Admitted Applicants Admitted A % % B % 25 68% 0.4 C % % 0.22 D % % 0.32 E % % 0.20 F 272 6% 341 7% 0.34 In most Departments, women had higher acceptance rates than men! Differences towards men are never statistically significant, Dept A biased towards women. Explanation: Men tended to apply to easier programs than women.

89 χ 2 test for Homogeneity Suppose now we have independent observations from J multinomial distributions, each with I cells. Example: Weather in SF vs weather in Berkeley (quantized as sunny/ fog / overcast / rainy). Literary style to detect plagiats. etc. We want to asses whether the distributions are equal based on some collected data.

90 Homogeneity test set-up Denote the probability of cell i of the j-th multinomial as π i,j. The null hypothesis is H 0 : i, π i,1 = π i,2 =... π i,j.

91 Homogeneity test set-up Denote the probability of cell i of the j-th multinomial as π i,j. The null hypothesis is H 0 : i, π i,1 = π i,2 =... π i,j. Testing this hypothesis amounts to a goodness-of-fit test. We can thus consider the Pearson s χ 2 statistic. Data are independent samples from each multinomial distribution: n i,j = counts in cell i for the j-th multinomial.

92 Homogeneity test Theorem Under H 0, the mle s of parameters π 1,... π I are j ˆπ i = n i,j i,j n. i,j

93 Homogeneity test Theorem Under H 0, the mle s of parameters π 1,... π I are j ˆπ i = n i,j i,j n. i,j Goodness-of-fit test: I J X 2 (O i,j E i,j ) 2 = i=1 j=1 E i,j, where E i,j = ( I i =1 n i,j) ˆπ i = ( i n i,j)( j n i,j ) i,j n i,j. For large sample sizes, X 2 is approximately χ 2, with J(I 1) (I 1) = (J 1)(I 1) df.

94 Example of Homgeneity test Weather in Berkeley vs San Francisco. (data from weather underground). Weather type Berkeley San Francisco Sunny and hot(> 70F ) Sunny and cold(< 70F ) Fog Rain 72 66

95 Example of Homogeneity test We compute X 2 = I J (O i,j E i,j ) 2 = 3.79, E i,j i=1 j=1 which has associated p-value of The test is not conclusive. It is consistent with a homogeneous weather model...

96 χ 2 test for Independence One final test. Suppose we have the following data: grade stepped into UC seal did not step into UC seal D C B A Question: Does stepping into the seals bring bad luck?

97 χ 2 test for Independence Statistical Model: Counts n i,j are samples from a multinomial distribution, with cell probabilities π i,j, i = 1,..., I, j = 1,..., J. Denote α i = π i,j, β j = π i,j j i as the marginal probabilities.

98 χ 2 test for Independence Statistical Model: Counts n i,j are samples from a multinomial distribution, with cell probabilities π i,j, i = 1,..., I, j = 1,..., J. Denote α i = π i,j, β j = π i,j j i as the marginal probabilities. Null hypothesis H 0 : Grades and Stepping into Seals are independent. In other words, Under H 0, π i,j = α i β j.

99 χ 2 test for Independence Under H 0, the MLE of π i,j is ˆπ i,j = ˆα i ˆβ j = j n i,j n i n i,j n Alternative hypothesis H 1 : cell probabilities are free: ˆπ H 1 i,j. = n i,j n.

100 χ 2 test for Independence Under H 0, the MLE of π i,j is ˆπ i,j = ˆα i ˆβ j = j n i,j n i n i,j n. Alternative hypothesis H 1 : cell probabilities are free: ˆπ H 1 i,j = n i,j n. Likelihood ratio test, or equivalently Pearson s chi-square statistic: X 2 = i j (O i,j E i,j ) 2 E i,j, with E i,j = nˆπ i,j = ( j n i,j )( i n i,j), O i,j = n i,j n and IJ 1 (I 1) (J 1) = (I 1)(J 1) degrees of freedom.

101 Back to Our Example We compute X 2 = 5.13, with a p-value of Evidence is not statistically significant.

102 Remarks The test statistics for homogeneity and independence are identical: X 2 = n(n i,j n r,i n c,j /n) 2 n r,i n c,j i,j with (I 1)(J 1) degrees of freedom. Coincidence?

103 Remarks The test statistics for homogeneity and independence are identical: X 2 = n(n i,j n r,i n c,j /n) 2 n r,i n c,j i,j with (I 1)(J 1) degrees of freedom. Coincidence? p(x, y) = p 1 (x)p 2 (y) implies that and vice-versa. p(x y = y 1 ) = p(x y = y 2 ) =... p(x y = y J ) We could approach homogeneity and independence with a Fisher exact test, but Pearson s χ 2 statistic has a nicer null distribution.

104 Test of Tests What is the appropriate test for the following experiments?

105 The Ballmer Peak

106 The Ballmer Peak A recent study at the University of Illinois tested the creative problem solving ability of a group of men who were given vodka cranberry and snacks and asked to solve brain teasers. The results were starkly different for the tipsy group, which had a blood alcohol concentration level of 0.075, versus the control group: Astonishingly, those in the drinking group averaged nine correct questions to the six answers correct by the non-drinking group. It also took drunk men 11.5 seconds to answer a question, whereas non-drunk men needed 15.2 seconds to think. Both groups had comparable results on a similar exam before the alcohol consumption began.

107 The Ballmer Peak A recent study at the University of Illinois tested the creative problem solving ability of a group of men who were given vodka cranberry and snacks and asked to solve brain teasers. The results were starkly different for the tipsy group, which had a blood alcohol concentration level of 0.075, versus the control group: Astonishingly, those in the drinking group averaged nine correct questions to the six answers correct by the non-drinking group. It also took drunk men 11.5 seconds to answer a question, whereas non-drunk men needed 15.2 seconds to think. Both groups had comparable results on a similar exam before the alcohol consumption began. A: Non-parametric Mann-Whitney (if sample size small), t-test (if sample size large enough).

108 HIV tests From Yesterday s news.bbc.co.uk site: HIV: new approach against virus holds promise The first human trial of a new type of HIV therapy suggests it could be a promising weapon in the fight against the virus. Reports in the journal Nature show infusions of so-called broadly neutralising antibodies could suppress the amount of HIV in a patient s blood (...) Patients given the highest concentrations were able to fight the virus for some time, dampening the replication of HIV in their blood. (...)

109 HIV tests From Yesterday s news.bbc.co.uk site: HIV: new approach against virus holds promise The first human trial of a new type of HIV therapy suggests it could be a promising weapon in the fight against the virus. Reports in the journal Nature show infusions of so-called broadly neutralising antibodies could suppress the amount of HIV in a patient s blood (...) Patients given the highest concentrations were able to fight the virus for some time, dampening the replication of HIV in their blood. (...) A: Randomized Block Design.

110 Company Lifespan From sciencedaily.com : How long do firms live? Finding patterns of company mortality in market data It s a simple enough question: how long does a typical business have to live? Economists have been thinking about that one for decades without a particularly clear answer, but new research by scientists at the Santa Fe Institute in New Mexico reveals a surprising insight: publicly-traded firms die off at the same rate regardless of their age or economic sector.

111 Company Lifespan From sciencedaily.com : How long do firms live? Finding patterns of company mortality in market data It s a simple enough question: how long does a typical business have to live? Economists have been thinking about that one for decades without a particularly clear answer, but new research by scientists at the Santa Fe Institute in New Mexico reveals a surprising insight: publicly-traded firms die off at the same rate regardless of their age or economic sector. A: Two-way ANOVA.

Chapter 12. Analysis of variance

Chapter 12. Analysis of variance Serik Sagitov, Chalmers and GU, January 9, 016 Chapter 1. Analysis of variance Chapter 11: I = samples independent samples paired samples Chapter 1: I 3 samples of equal size J one-way layout two-way layout