Goodness of Fit Goodness of fit - 2 classes

Size: px
Start display at page:

Download "Goodness of Fit Goodness of fit - 2 classes"

Transcription

1 Goodness of Fit Goodness of fit - 2 classes A B Do these data correspond reasonably to the proportions 3:1? We previously discussed options for testing p A = 0.75! Exact p-value Exact confidence interval Normal approximation

2 Goodness of fit - 3 classes AA AB BB Do these data correspond reasonably to the proportions 1:2:1? Multinomial distribution Imagine an urn with k types of balls. Let p i denote the proportion of type i. Draw n balls with replacement. Outcome: (n 1, n 2,..., n k ), with i n i = n, where n i is the no. balls drawn that were of type i. n! P(X 1 =n 1,..., X k =n k )= n 1! n k! pn 1 1 pn k k if 0 n i n, i n i =n Otherwise P(X 1 =n 1,..., X k =n k ) = 0.

3 Example Let (p 1, p 2, p 3 ) = (0.25, 0.50, 0.25) and n = 100. P(X 1 =35, X 2 =43, X 3 =22) = 100! 35! 43! 22! Rather brutal, numerically speaking. Take logs (and use a computer). Goodness of fit test We observe (n 1, n 2, n 3 ) Multinomial(n,p={p 1, p 2, p 3 }). We seek to test H 0 : p 1 = 0.25, p 2 = 0.5, p 3 = versus H a : H 0 is false. We need two things: A test statistic. The null distribution of the test statistic.

4 The likelihood-ratio test (LRT) Back to the first example: A n A B n B Test H 0 :(p A, p B ) = (π A, π B ) versus H a :(p A, p B ) (π A, π B ). MLE under H a : ˆp A = n A /n where n = n A + n B. Likelihood under H a : Likelihood under H 0 : L a = Pr(n A p A = ˆp A )= ( ) n n A ˆp n A A (1 ˆp A ) n n A L 0 = Pr(n A p A = π A )= ( ) n n A π n A A (1 π A) n n A Likelihood ratio test statistic: LRT = 2 ln (L a /L 0 ) Some clever people have shown that if H 0 is true, then LRT follows a χ 2 (df=1) distribution (approximately). Likelihood-ratio test for the example We observed n A = 78 and n B = 22. H 0 :(p A, p B )=(0.75,0.25) H a :(p A, p B ) (0.75,0.25) L a = Pr(n A =78 p A =0.78) = ( ) = L 0 = Pr(n A =78 p A =0.75) = ( ) = LRT = 2 ln (L a /L 0 ) = Using a χ 2 (df=1) distribution, we get a p-value of We therefore have no evidence against the null hypothesis.

5 Null distribution Obs LRT A little math... n=n A +n B, n 0 A = E[n A H 0 ] = n π A, n 0 B = E[n B H 0 ] = n π B. ( ) na ( Then L a /L 0 = na n nb 0 n A 0 B ) nb ) Or equivalently LRT = 2 n A ln( na n 0 A ) +2 n B ln( nb. n 0 B Why do this?

6 Generalization to more than two groups If we have k groups, then the likelihood ratio test statistic is LRT = 2 k i=1 n i ln ( ni n 0 i ) If H 0 is true, LRT χ 2 (df=k-1) The chi-square test There is an alternative technique. The test is called the chi-square test, and has the greater tradition in the literature. For two groups, calculate the following: X 2 = (n A n 0 A) 2 n 0 A + (n B n 0 B) 2 n 0 B If H 0 is true, then X 2 is a draw from a χ 2 (df=1) distribution (approximately).

7 Example In the first example we observed n A = 78 and n B = 22. Under the null hypothesis we have n 0 A = 75 and n 0 B = 25. We therefore get X 2 = (78-75) (22-25)2 25 = = This corresponds to a p-value of We therefore have no evidence against the hypothesis (p A, p B )=(0.75,0.25). Note: using the likelihood ratio test we got a p-value of Generalization to more than two groups As with the likelihood ratio test, there is a generalization to more than just two groups. If we have k groups, the chi-square test statistic we use is X 2 = k (n i n 0 i ) 2 i=1 χ 2 (df=k-1) n 0 i

8 Test statistics Let n 0 i denote the expected count in group i if H 0 is true. LRT statistic { } Pr(data p = MLE) LRT = 2 ln =... = 2 i Pr(data H 0 ) n i ln(n i /n 0 i ) χ 2 test statistic X 2 = (observed expected) 2 expected = i (n i n 0 i ) 2 n 0 i Null distribution of test statistic What values of LRT (or X 2 ) should we expect, if H 0 were true? The null distributions of these statistics may be obtained by: Brute-force analytic calculations Computer simulations Asymptotic approximations If the sample size n is large, we have LRT χ 2 (k 1) and X 2 χ 2 (k 1)

9 Recommendation For either the LRT or the χ 2 test: The null distribution is approximately χ 2 (k 1) if the sample size is large. The null distribution can be approximated by simulating data under the null hypothesis. If the sample size is sufficiently large that the expected count in each cell is 5, use the asymptotic approximation without worries. Otherwise, consider using computer simulations. Composite hypotheses Sometimes, we ask not p AA = 0.25, p AB = 0.5, p BB = 0.25 But rather something like: p AA = f 2, p AB = 2f(1 f), p BB =(1 f) 2 for some f. For example: Consider the genotypes, of a random sample of individuals, at a diallelic locus. Is the locus in Hardy-Weinberg equilibrium (as expected in the case of random mating)? Example data: AA AB BB

10 Another example ABO blood groups 3 alleles A, B, O. Phenotype A genotype AA or AO B genotype BB or BO AB genotype AB O genotype O Allele frequencies: f A, f B, f O (Note that f A + f B + f O = 1) Under Hardy-Weinberg equilibrium, we expect p A = f 2 A + 2f A f O p B = f 2 B + 2f B f O p AB = 2f A f B p O = f 2 O Example data: O A B AB LRT for example 1 Data: (n AA, n AB, n BB ) Multinomial(n,{p AA, p AB, p BB }) We seek to test whether the data conform reasonably to H 0 : p AA = f 2, p AB = 2f(1 f), p BB =(1 f) 2 for some f. General MLEs: ˆp AA = n AA /n, ˆp AB = n AB /n, ˆp BB = n BB /n MLE under H 0 : ˆf =(naa + n AB /2)/n p AA = ˆf 2, p AB = 2ˆf (1 ˆf), p BB =(1 ˆf) 2 LRT statistic: LRT = 2 ln { } Pr(nAA, n AB, n BB ˆp AA, ˆp AB, ˆp BB ) Pr(n AA, n AB, n BB p AA, p AB, p BB )

11 LRT for example 2 Data: (n O, n A, n B, n AB ) Multinomial(n,{p O, p A, p B, p AB }) We seek to test whether the data conform reasonably to H 0 : p A = f 2 A + 2f A f O,p B = f 2 B + 2f B f O,p AB = 2f A f B,p O = f 2 O for some f O, f A, f B, where f O + f A + f B = 1. General MLEs: ˆp O, ˆp A, ˆp B, ˆp AB, like before. MLE under H 0 : Requires numerical optimization Call them (ˆf O,ˆf A,ˆf B ) ( p O, p A, p B, p AB ) LRT statistic: LRT = 2 ln { } Pr(nO, n A, n B, n AB ˆp O, ˆp A, ˆp B, ˆp AB ) Pr(n O, n A, n B, n AB p O, p A, p B, p AB ) χ 2 test for these examples Obtain the MLE(s) under H 0. Calculate the corresponding cell probabilities. Turn these into (estimated) expected counts under H 0. Calculate X 2 = (observed expected) 2 expected

12 Null distribution for these cases Computer simulation (with one wrinkle) Simulate data under H 0 (plug in the MLEs for the observed data) Calculate the MLE with the simulated data Calculate the test statistic with the simulated data Repeat many times Asymptotic approximation Under H 0, if the sample size, n, is large, both the LRT statistic and the χ 2 statistic follow, approximately, a χ 2 distribution with k s 1 degrees of freedom, where s is the number of parameters estimated under H 0. Note that s = 1 for example 1, and s = 2 for example 2, and so df = 1 for both examples. Example 1 Example data: AA AB BB MLE: ˆf = (5 + 20/2) / 100 = 15% Expected counts: Test statistics: LRT statistic = 3.87 X 2 = 4.65 Asymptotic χ 2 (df = 1) approx n: P 4.9% P 3.1% 10,000 computer simulations: P 8.2% P 2.4%

13 Example 1 Est d null dist n of LRT statistic Observed 95th %ile = LRT Est d null dist n of chi!square statistic 95th %ile = 3.36 Observed X 2 Example 2 Example data: O A B AB MLE: ˆfO 62.8%, ˆf A 25.0%, ˆf B 12.2%. Expected counts: Test statistics: LRT statistic = 1.99 X 2 = 2.10 Asymptotic χ 2 (df = 1) approx n: P 16% P 15% 10,000 computer simulations: P 17% P 15%

14 Example 2 Est d null dist n of LRT statistic Observed 95th %ile = LRT Est d null dist n of chi!square statistic Observed 95th %ile = X 2 Example 3 Data on number of sperm bound to an egg: count Do these follow a Poisson distribution? MLE: ˆλ = sample average = ( ) / Expected counts n 0 i = n e ˆλ ˆλi / i!

15 Example observed expected X 2 = (obs exp) 2 exp =... = 42.8 LRT = 2 obs log(obs/exp) =... = 18.8 Compare to χ 2 (df = = 4) P-value = (χ 2 ) and (LRT). By simulation: p-value = 16/10,000 (χ 2 ) and 7/10,000 (LRT) Null simulation results Observed Simulated! 2 statistic Observed Simulated LRT statistic

16 A final note With these sorts of goodness-of-fit tests, we are often happy when our model does fit. In other words, we often prefer to fail to reject H 0. Such a conclusion, that the data fit the model reasonably well, should be phrased and considered with caution. We should think: how much power do I have to detect, with these limited data, a reasonable deviation from H 0? Contingency Tables

17 2 x 2 tables Apply a treatment A or B to 20 subjects each, and observe the reponse. Question: N Y A B Are the response rates for the two treatments the same? Sample 100 subjects and determine whether they are infected with viruses A and B. I-B NI-B I-A NI-A Question: Is infection with virus A independent of infection with virus B? Underlying probabilities Observed data B 0 1 A 0 n 00 n 01 n 0+ 1 n 10 n 11 n 1+ n +0 n +1 n Underlying probabilities B 0 1 A 0 p 00 p 01 p 0+ 1 p 10 p 11 p 1+ p +0 p +1 1 Model: (n 00, n 01, n 10, n 11 ) Multinomial(n,{p 00, p 01, p 10, p 11 }) or n 01 Binomial(n 0+, p 01 /p 0+ ) and n 11 Binomial(n 1+, p 11 /p 1+ )

18 Conditional probabilities Underlying probabilities B 0 1 A 0 p 00 p 01 p 0+ 1 p 10 p 11 p 1+ p +0 p +1 1 Conditional probabilities Pr(B = 1 A = 0) = p 01 /p 0+ Pr(B = 1 A = 1) = p 11 /p 1+ Pr(A = 1 B = 0) = p 10 /p +0 Pr(A = 1 B = 1) = p 11 /p +1 The questions in the two examples are the same! They both concern: p 01 /p 0+ = p 11 /p 1+ Equivalently: p ij = p i+ p +j for all i,j think Pr(A and B) = Pr(A) Pr(B). This is a composite hypothesis! 2 x 2 table A different view B 0 1 A 0 p 00 p 01 p 0+ p 00 p 01 p 10 p 11 1 p 10 p 11 p 1+ p +0 p +1 1 H 0 : p ij = p i+ p +j for all i,j H 0 : p ij = p i+ p +j for all i,j Degrees of freedom = = 1

19 Expected counts Observed data B 0 1 A 0 n 00 n 01 n 0+ 1 n 10 n 11 n 1+ n +0 n +1 n Expected counts B 0 1 A 0 e 00 e 01 n 0+ 1 e 10 e 11 n 1+ n +0 n +1 n To get the expected counts under the null hypothesis we: Estimate p 1+ and p +1 by n 1+ /n and n +1 /n, respectively. These are the MLEs under H 0! Turn these into estimates of the p ij. Multiply these by the total sample size, n. The expected counts The expected count (assuming H 0 ) for the 11 cell is the following: e 11 = n ˆp 11 = n ˆp 1+ ˆp +1 = n (n 1+ /n) (n +1 /n) =(n 1+ n +1 )/n The other cells are similar. We can then calculate a χ 2 or LRT statistic as before!

20 Example 1 Observed data N Y A B Expected counts N Y A B X 2 = ( ) ( ) (2 5.5) (9 5.5)2 5.5 = 6.14 LRT = 2 [18 log( ) log( 5.5 )] = 6.52 P-values (based on the asymptotic χ 2 (df = 1) approximation): 1.3% and 1.1%, respectively. Example 2 Observed data I-B NI-B I-A NI-A Expected counts I-B NI-B I-A NI-A X 2 = (9 5.2) ( ) (9 12.8) ( ) = 4.70 LRT = 2 [9 log( ) log( 58.2 )] = 4.37 P-values (based on the asymptotic χ 2 (df = 1) approximation): 3.0% and 3.7%, respectively.

21 Fisher s exact test Observed data N Y A B Assume the null hypothesis (independence) is true. Constrain the marginal counts to be as observed. What s the chance of getting this exact table? What s the chance of getting a table at least as extreme? Hypergeometric distribution Imagine an urn with K white balls and N K black balls. Draw n balls without replacement. Let x be the number of white balls in the sample. x follows a hypergeometric distribution (w/ parameters K, N, n). In urn white black sampled x n not sampled N n K N K N

22 Hypergeometric probabilities Suppose X Hypergeometric (N, K, n). No. of white balls in a sample of size n, drawn without replacement from an urn with K white and N K black. ( K N K ) Pr(X = x) = x)( n x ( N n) Example: In urn 0 1 sampled not N = 40, K = 29, n = 20 Pr(X = 18) = ( 29 )( ( ) ) 1.4% Back to Fisher s exact test Observed data N Y A B Assume the null hypothesis (independence) is true. Constrain the marginal counts to be as observed. Pr(observed table H 0 ) = Pr(X=18) X Hypergeometric (N=40, K=29, n=20)

23 Fisher s exact test 1. For all possible tables (with the observed marginal counts), calculate the relevant hypergeometric probability. 2. Use that probability as a statistic. 3. P-value (for Fisher s exact test of independence): The sum of the probabilities for all tables having a probability equal to or smaller than that observed. An illustration The observed data N Y A B All possible tables (with these marginals):

24 Fisher s exact test: example 1 Observed data N Y A B P-value 3.1% In R: fisher.test() Recall: χ 2 test: P-value = 1.3% LRT: P-value = 1.1% Fisher s exact test: example 2 Observed data I-B NI-B I-A NI-A P-value 4.4% Recall: χ 2 test: P-value = 3.0% LRT: P-value = 3.7%

25 Summary Testing for independence in a 2 x 2 table: A special case of testing a composite hypothesis in a onedimensional table. You can use either the LRT or χ 2 test, as before. You can also use Fisher s exact test. If Fisher s exact test is computationally feasible, do it! Paired data Sample 100 subjects and determine whether they are infected with viruses A and B. I-B NI-B I-A NI-A Underlying probabilities B 0 1 A 0 p 00 p 01 p 0+ 1 p 10 p 11 p 1+ p +0 p +1 1 Is the rate of infection of virus A the same as that of virus B? In other words: Is p 1+ = p +1? Equivalently, is p 10 = p 01?

26 McNemar s test H 0 :p 01 = p 10 Under H 0, e.g. if p 01 = p 10, the expected counts for cells 01 and 10 are both equal to (n 01 + n 10 )/2. The χ 2 test statistic reduces to X 2 = (n 01 n 10 ) 2 n 01 + n 10 For large sample sizes, this statistic has null distribution that is approximately a χ 2 (df = 1). For the example: X 2 = (20 9) 2 / 29 = 4.17 P = 4.1%. An exact test Condition on n 01 + n 10. Under H 0,n 01 n 01 + n 10 Binomial(n 01 + n 10, 1/2). In R, use the function binom.test. For the example, P = 6.1%.

27 Paired data Paired data I-B NI-B I-A NI-A P = 6.1% Unpaired data I NI A B P = 9.5% Taking appropriate account of the pairing is important! r x k tables Blood type Population A B AB O Florida Iowa Missouri Same distribution of blood types in each population?

28 Underlying probabilities Observed data 1 2 k 1 n 11 n 12 n 1k n 1+ 2 n 21 n 22 n 2k n r n r1 n r2 n rk n r+ n +1 n +2 n +k n Underlying probabilities 1 2 k 1 p 11 p 12 p 1k p 1+ 2 p 21 p 22 p 2k p r p r1 p r2 p rk p r+ p +1 p +2 p +k 1 H 0 : p ij = p i+ p +j for all i,j. Expected counts Observed data A B AB O F I M Expected counts A B AB O F I M Expected counts under H 0 : e ij = n i+ n +j /n for all i,j.

29 χ 2 and LRT statistics Observed data A B AB O F I M Expected counts A B AB O F I M X 2 statistic = (obs exp) 2 exp = = 5.64 LRT statistic = 2 obs ln(obs/exp) = = 5.55 Asymptotic approximation If the sample size is large, the null distribution of the χ 2 and likelihood ratio test statistics will approximately follow a χ 2 distribution with (r 1) (k 1) d.f. In the example, df = (3 1) (4 1) = 6 X 2 = 5.64 P = LRT = 5.55 P = 0.48.

30 Fisher s exact test Observed data 1 2 k 1 n 11 n 12 n 1k n 1+ 2 n 21 n 22 n 2k n r n r1 n r2 n rk n r+ n +1 n +2 n +k n Assume H 0 is true. Condition on the marginal counts Then Pr(table) 1/ ij n ij! Consider all possible tables with the observed marginal counts Calculate Pr(table) for each possible table. P-value = the sum of the probabilities for all tables having a probability equal to or smaller than that observed. Fisher s exact test: the example Observed P!value = 48% " log(n ij!) Since the number of possible tables can be very large, we often must resort to computer simulation.

31 Another example Survival in different treatment groups: Survive Treatment No Yes A 15 5 B 17 3 C D 17 3 E 16 4 Is the survival rate the same for all treatments? Results Observed Survive Treatment No Yes A 15 5 B 17 3 C D 17 3 E 16 4 Expected under H 0 Survive Treatment No Yes A 15 5 B 15 5 C 15 5 D 15 5 E 15 5 X 2 = 9.07 P = 5.9% (how many df?) LRT = 8.41 P = 7.8% Fisher s exact test: P = 8.7%

32 All pairwise comparisons N Y A 15 5 P=69% B 17 3 N Y A 15 5 P=19% C N Y A 15 5 P=69% D 17 3 N Y B 17 3 P=4.1% C N Y B 17 3 P=100% D 17 3 N Y B 17 3 P=100% E 16 4 N Y C P=9.6% E 16 4 N Y D 17 3 P=100% E 16 4 N Y A 15 5 E 16 4 P=100% N Y C D 17 3 P=4.1% Is this a good thing to do? Two-locus linkage in an intercross BB Bb bb AA Aa aa Are these two loci linked?

33 General test of independence Observed data BB Bb bb AA Aa aa Expected counts BB Bb bb AA Aa aa χ 2 test: X 2 = 10.4 P = 3.5% (df = 4) LRT test: LRT = 9.98 P = 4.1% Fisher s exact test: P = 4.6% A more specific test Observed data BB Bb bb AA Aa aa Underlying probabilities BB Bb bb AA 1 4 (1 θ)2 1 2 θ(1 θ) 1 4 θ2 Aa 1 2 θ(1 θ) 1 2 [θ2 +(1 θ) 2 1 ] 2θ(1 θ) 1 aa 4 θ2 1 2 θ(1 θ) 1 4 (1 θ)2 H 0 : θ = 1/2 versus H a : θ < 1/2 Use a likelihood ratio test! Obtain the general MLE of θ. Calculate the LRT statistic = 2 ln Compare this statistic to a χ 2 (df = 1). { Pr(data ˆθ) } Pr(data θ=1/2)

34 Results BB Bb bb AA Aa aa MLE: ˆθ = LRT statistic: LRT = 7.74 P = 0.54% (df = 1) Here we assume Mendelian segregation, and that deviation from H 0 is in a particular direction. If these assumptions are correct, we ll have greater power to detect linkage using this more specific approach.

STAT 536: Genetic Statistics

STAT 536: Genetic Statistics STAT 536: Genetic Statistics Tests for Hardy Weinberg Equilibrium Karin S. Dorman Department of Statistics Iowa State University September 7, 2006 Statistical Hypothesis Testing Identify a hypothesis,

More information

Normal distribution We have a random sample from N(m, υ). The sample mean is Ȳ and the corrected sum of squares is S yy. After some simplification,

Normal distribution We have a random sample from N(m, υ). The sample mean is Ȳ and the corrected sum of squares is S yy. After some simplification, Likelihood Let P (D H) be the probability an experiment produces data D, given hypothesis H. Usually H is regarded as fixed and D variable. Before the experiment, the data D are unknown, and the probability

More information

Statistics for laboratory scientists II

Statistics for laboratory scientists II This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike License. Your use of this material constitutes acceptance of that license and the conditions of use of materials on this

More information

Ling 289 Contingency Table Statistics

Ling 289 Contingency Table Statistics Ling 289 Contingency Table Statistics Roger Levy and Christopher Manning This is a summary of the material that we ve covered on contingency tables. Contingency tables: introduction Odds ratios Counting,

More information

BTRY 4830/6830: Quantitative Genomics and Genetics

BTRY 4830/6830: Quantitative Genomics and Genetics BTRY 4830/6830: Quantitative Genomics and Genetics Lecture 23: Alternative tests in GWAS / (Brief) Introduction to Bayesian Inference Jason Mezey jgm45@cornell.edu Nov. 13, 2014 (Th) 8:40-9:55 Announcements

More information

Lecture 10: Generalized likelihood ratio test

Lecture 10: Generalized likelihood ratio test Stat 200: Introduction to Statistical Inference Autumn 2018/19 Lecture 10: Generalized likelihood ratio test Lecturer: Art B. Owen October 25 Disclaimer: These notes have not been subjected to the usual

More information

Statistical Genetics I: STAT/BIOST 550 Spring Quarter, 2014

Statistical Genetics I: STAT/BIOST 550 Spring Quarter, 2014 Overview - 1 Statistical Genetics I: STAT/BIOST 550 Spring Quarter, 2014 Elizabeth Thompson University of Washington Seattle, WA, USA MWF 8:30-9:20; THO 211 Web page: www.stat.washington.edu/ thompson/stat550/

More information

Quantitative Genomics and Genetics BTRY 4830/6830; PBSB

Quantitative Genomics and Genetics BTRY 4830/6830; PBSB Quantitative Genomics and Genetics BTRY 4830/6830; PBSB.5201.01 Lecture 20: Epistasis and Alternative Tests in GWAS Jason Mezey jgm45@cornell.edu April 16, 2016 (Th) 8:40-9:55 None Announcements Summary

More information

Case-Control Association Testing. Case-Control Association Testing

Case-Control Association Testing. Case-Control Association Testing Introduction Association mapping is now routinely being used to identify loci that are involved with complex traits. Technological advances have made it feasible to perform case-control association studies

More information

Testing Independence

Testing Independence Testing Independence Dipankar Bandyopadhyay Department of Biostatistics, Virginia Commonwealth University BIOS 625: Categorical Data & GLM 1/50 Testing Independence Previously, we looked at RR = OR = 1

More information

Lecture 21: October 19

Lecture 21: October 19 36-705: Intermediate Statistics Fall 2017 Lecturer: Siva Balakrishnan Lecture 21: October 19 21.1 Likelihood Ratio Test (LRT) To test composite versus composite hypotheses the general method is to use

More information

Chapter 2. Review of basic Statistical methods 1 Distribution, conditional distribution and moments

Chapter 2. Review of basic Statistical methods 1 Distribution, conditional distribution and moments Chapter 2. Review of basic Statistical methods 1 Distribution, conditional distribution and moments We consider two kinds of random variables: discrete and continuous random variables. For discrete random

More information

Last lecture 1/35. General optimization problems Newton Raphson Fisher scoring Quasi Newton

Last lecture 1/35. General optimization problems Newton Raphson Fisher scoring Quasi Newton EM Algorithm Last lecture 1/35 General optimization problems Newton Raphson Fisher scoring Quasi Newton Nonlinear regression models Gauss-Newton Generalized linear models Iteratively reweighted least squares

More information

A consideration of the chi-square test of Hardy-Weinberg equilibrium in a non-multinomial situation

A consideration of the chi-square test of Hardy-Weinberg equilibrium in a non-multinomial situation Ann. Hum. Genet., Lond. (1975), 39, 141 Printed in Great Britain 141 A consideration of the chi-square test of Hardy-Weinberg equilibrium in a non-multinomial situation BY CHARLES F. SING AND EDWARD D.

More information

ML Testing (Likelihood Ratio Testing) for non-gaussian models

ML Testing (Likelihood Ratio Testing) for non-gaussian models ML Testing (Likelihood Ratio Testing) for non-gaussian models Surya Tokdar ML test in a slightly different form Model X f (x θ), θ Θ. Hypothesist H 0 : θ Θ 0 Good set: B c (x) = {θ : l x (θ) max θ Θ l

More information

This does not cover everything on the final. Look at the posted practice problems for other topics.

This does not cover everything on the final. Look at the posted practice problems for other topics. Class 7: Review Problems for Final Exam 8.5 Spring 7 This does not cover everything on the final. Look at the posted practice problems for other topics. To save time in class: set up, but do not carry

More information

Institute of Actuaries of India

Institute of Actuaries of India Institute of Actuaries of India Subject CT3 Probability & Mathematical Statistics May 2011 Examinations INDICATIVE SOLUTION Introduction The indicative solution has been written by the Examiners with the

More information

Lecture 1: Case-Control Association Testing. Summer Institute in Statistical Genetics 2015

Lecture 1: Case-Control Association Testing. Summer Institute in Statistical Genetics 2015 Timothy Thornton and Michael Wu Summer Institute in Statistical Genetics 2015 1 / 1 Introduction Association mapping is now routinely being used to identify loci that are involved with complex traits.

More information

Statistics 3858 : Contingency Tables

Statistics 3858 : Contingency Tables Statistics 3858 : Contingency Tables 1 Introduction Before proceeding with this topic the student should review generalized likelihood ratios ΛX) for multinomial distributions, its relation to Pearson

More information

Computational Systems Biology: Biology X

Computational Systems Biology: Biology X Bud Mishra Room 1002, 715 Broadway, Courant Institute, NYU, New York, USA L#7:(Mar-23-2010) Genome Wide Association Studies 1 The law of causality... is a relic of a bygone age, surviving, like the monarchy,

More information

Lecture 7: Hypothesis Testing and ANOVA

Lecture 7: Hypothesis Testing and ANOVA Lecture 7: Hypothesis Testing and ANOVA Goals Overview of key elements of hypothesis testing Review of common one and two sample tests Introduction to ANOVA Hypothesis Testing The intent of hypothesis

More information

STAT 536: Genetic Statistics

STAT 536: Genetic Statistics STAT 536: Genetic Statistics Frequency Estimation Karin S. Dorman Department of Statistics Iowa State University August 28, 2006 Fundamental rules of genetics Law of Segregation a diploid parent is equally

More information

Contingency Tables. Safety equipment in use Fatal Non-fatal Total. None 1, , ,128 Seat belt , ,878

Contingency Tables. Safety equipment in use Fatal Non-fatal Total. None 1, , ,128 Seat belt , ,878 Contingency Tables I. Definition & Examples. A) Contingency tables are tables where we are looking at two (or more - but we won t cover three or more way tables, it s way too complicated) factors, each

More information

Statistical inference (estimation, hypothesis tests, confidence intervals) Oct 2018

Statistical inference (estimation, hypothesis tests, confidence intervals) Oct 2018 Statistical inference (estimation, hypothesis tests, confidence intervals) Oct 2018 Sampling A trait is measured on each member of a population. f(y) = propn of individuals in the popn with measurement

More information

Maximum Likelihood Large Sample Theory

Maximum Likelihood Large Sample Theory Maximum Likelihood Large Sample Theory MIT 18.443 Dr. Kempthorne Spring 2015 1 Outline 1 Large Sample Theory of Maximum Likelihood Estimates 2 Asymptotic Results: Overview Asymptotic Framework Data Model

More information

Multiple Sample Categorical Data

Multiple Sample Categorical Data Multiple Sample Categorical Data paired and unpaired data, goodness-of-fit testing, testing for independence University of California, San Diego Instructor: Ery Arias-Castro http://math.ucsd.edu/~eariasca/teaching.html

More information

1. Understand the methods for analyzing population structure in genomes

1. Understand the methods for analyzing population structure in genomes MSCBIO 2070/02-710: Computational Genomics, Spring 2016 HW3: Population Genetics Due: 24:00 EST, April 4, 2016 by autolab Your goals in this assignment are to 1. Understand the methods for analyzing population

More information

Sections 3.4, 3.5. Timothy Hanson. Department of Statistics, University of South Carolina. Stat 770: Categorical Data Analysis

Sections 3.4, 3.5. Timothy Hanson. Department of Statistics, University of South Carolina. Stat 770: Categorical Data Analysis Sections 3.4, 3.5 Timothy Hanson Department of Statistics, University of South Carolina Stat 770: Categorical Data Analysis 1 / 22 3.4 I J tables with ordinal outcomes Tests that take advantage of ordinal

More information

This is a multiple choice and short answer practice exam. It does not count towards your grade. You may use the tables in your book.

This is a multiple choice and short answer practice exam. It does not count towards your grade. You may use the tables in your book. NAME (Please Print): HONOR PLEDGE (Please Sign): statistics 101 Practice Final Key This is a multiple choice and short answer practice exam. It does not count towards your grade. You may use the tables

More information

Introduction to Linkage Disequilibrium

Introduction to Linkage Disequilibrium Introduction to September 10, 2014 Suppose we have two genes on a single chromosome gene A and gene B such that each gene has only two alleles Aalleles : A 1 and A 2 Balleles : B 1 and B 2 Suppose we have

More information

Section VII. Chi-square test for comparing proportions and frequencies. F test for means

Section VII. Chi-square test for comparing proportions and frequencies. F test for means Section VII Chi-square test for comparing proportions and frequencies F test for means 0 proportions: chi-square test Z test for comparing proportions between two independent groups Z = P 1 P 2 SE d SE

More information

Optimal exact tests for complex alternative hypotheses on cross tabulated data

Optimal exact tests for complex alternative hypotheses on cross tabulated data Optimal exact tests for complex alternative hypotheses on cross tabulated data Daniel Yekutieli Statistics and OR Tel Aviv University CDA course 29 July 2017 Yekutieli (TAU) Optimal exact tests for complex

More information

Central Limit Theorem ( 5.3)

Central Limit Theorem ( 5.3) Central Limit Theorem ( 5.3) Let X 1, X 2,... be a sequence of independent random variables, each having n mean µ and variance σ 2. Then the distribution of the partial sum S n = X i i=1 becomes approximately

More information

Testing Hypothesis. Maura Mezzetti. Department of Economics and Finance Università Tor Vergata

Testing Hypothesis. Maura Mezzetti. Department of Economics and Finance Università Tor Vergata Maura Department of Economics and Finance Università Tor Vergata Hypothesis Testing Outline It is a mistake to confound strangeness with mystery Sherlock Holmes A Study in Scarlet Outline 1 The Power Function

More information

1 Comparing two binomials

1 Comparing two binomials BST 140.652 Review notes 1 Comparing two binomials 1. Let X Binomial(n 1,p 1 ) and ˆp 1 = X/n 1 2. Let Y Binomial(n 2,p 2 ) and ˆp 2 = Y/n 2 3. We also use the following notation: n 11 = X n 12 = n 1 X

More information

The purpose of this section is to derive the asymptotic distribution of the Pearson chi-square statistic. k (n j np j ) 2. np j.

The purpose of this section is to derive the asymptotic distribution of the Pearson chi-square statistic. k (n j np j ) 2. np j. Chapter 9 Pearson s chi-square test 9. Null hypothesis asymptotics Let X, X 2, be independent from a multinomial(, p) distribution, where p is a k-vector with nonnegative entries that sum to one. That

More information

Population genetics snippets for genepop

Population genetics snippets for genepop Population genetics snippets for genepop Peter Beerli August 0, 205 Contents 0.Basics 0.2Exact test 2 0.Fixation indices 4 0.4Isolation by Distance 5 0.5Further Reading 8 0.6References 8 0.7Disclaimer

More information

One-Way Tables and Goodness of Fit

One-Way Tables and Goodness of Fit Stat 504, Lecture 5 1 One-Way Tables and Goodness of Fit Key concepts: One-way Frequency Table Pearson goodness-of-fit statistic Deviance statistic Pearson residuals Objectives: Learn how to compute the

More information

Chapter 11. Hypothesis Testing (II)

Chapter 11. Hypothesis Testing (II) Chapter 11. Hypothesis Testing (II) 11.1 Likelihood Ratio Tests one of the most popular ways of constructing tests when both null and alternative hypotheses are composite (i.e. not a single point). Let

More information

EXERCISES FOR CHAPTER 3. Exercise 3.2. Why is the random mating theorem so important?

EXERCISES FOR CHAPTER 3. Exercise 3.2. Why is the random mating theorem so important? Statistical Genetics Agronomy 65 W. E. Nyquist March 004 EXERCISES FOR CHAPTER 3 Exercise 3.. a. Define random mating. b. Discuss what random mating as defined in (a) above means in a single infinite population

More information

Statistical Analysis for QBIC Genetics Adapted by Ellen G. Dow 2017

Statistical Analysis for QBIC Genetics Adapted by Ellen G. Dow 2017 Statistical Analysis for QBIC Genetics Adapted by Ellen G. Dow 2017 I. χ 2 or chi-square test Objectives: Compare how close an experimentally derived value agrees with an expected value. One method to

More information

Chi Square Analysis M&M Statistics. Name Period Date

Chi Square Analysis M&M Statistics. Name Period Date Chi Square Analysis M&M Statistics Name Period Date Have you ever wondered why the package of M&Ms you just bought never seems to have enough of your favorite color? Or, why is it that you always seem

More information

Categorical Variables and Contingency Tables: Description and Inference

Categorical Variables and Contingency Tables: Description and Inference Categorical Variables and Contingency Tables: Description and Inference STAT 526 Professor Olga Vitek March 3, 2011 Reading: Agresti Ch. 1, 2 and 3 Faraway Ch. 4 3 Univariate Binomial and Multinomial Measurements

More information

The goodness-of-fit test Having discussed how to make comparisons between two proportions, we now consider comparisons of multiple proportions.

The goodness-of-fit test Having discussed how to make comparisons between two proportions, we now consider comparisons of multiple proportions. The goodness-of-fit test Having discussed how to make comparisons between two proportions, we now consider comparisons of multiple proportions. A common problem of this type is concerned with determining

More information

MATH4427 Notebook 2 Fall Semester 2017/2018

MATH4427 Notebook 2 Fall Semester 2017/2018 MATH4427 Notebook 2 Fall Semester 2017/2018 prepared by Professor Jenny Baglivo c Copyright 2009-2018 by Jenny A. Baglivo. All Rights Reserved. 2 MATH4427 Notebook 2 3 2.1 Definitions and Examples...................................

More information

The Multinomial Model

The Multinomial Model The Multinomial Model STA 312: Fall 2012 Contents 1 Multinomial Coefficients 1 2 Multinomial Distribution 2 3 Estimation 4 4 Hypothesis tests 8 5 Power 17 1 Multinomial Coefficients Multinomial coefficient

More information

Summary of Chapters 7-9

Summary of Chapters 7-9 Summary of Chapters 7-9 Chapter 7. Interval Estimation 7.2. Confidence Intervals for Difference of Two Means Let X 1,, X n and Y 1, Y 2,, Y m be two independent random samples of sizes n and m from two

More information

Practice Problems Section Problems

Practice Problems Section Problems Practice Problems Section 4-4-3 4-4 4-5 4-6 4-7 4-8 4-10 Supplemental Problems 4-1 to 4-9 4-13, 14, 15, 17, 19, 0 4-3, 34, 36, 38 4-47, 49, 5, 54, 55 4-59, 60, 63 4-66, 68, 69, 70, 74 4-79, 81, 84 4-85,

More information

Population Genetics. with implications for Linkage Disequilibrium. Chiara Sabatti, Human Genetics 6357a Gonda

Population Genetics. with implications for Linkage Disequilibrium. Chiara Sabatti, Human Genetics 6357a Gonda 1 Population Genetics with implications for Linkage Disequilibrium Chiara Sabatti, Human Genetics 6357a Gonda csabatti@mednet.ucla.edu 2 Hardy-Weinberg Hypotheses: infinite populations; no inbreeding;

More information

Lecture 3: Basic Statistical Tools. Bruce Walsh lecture notes Tucson Winter Institute 7-9 Jan 2013

Lecture 3: Basic Statistical Tools. Bruce Walsh lecture notes Tucson Winter Institute 7-9 Jan 2013 Lecture 3: Basic Statistical Tools Bruce Walsh lecture notes Tucson Winter Institute 7-9 Jan 2013 1 Basic probability Events are possible outcomes from some random process e.g., a genotype is AA, a phenotype

More information

URN MODELS: the Ewens Sampling Lemma

URN MODELS: the Ewens Sampling Lemma Department of Computer Science Brown University, Providence sorin@cs.brown.edu October 3, 2014 1 2 3 4 Mutation Mutation: typical values for parameters Equilibrium Probability of fixation 5 6 Ewens Sampling

More information

Review of One-way Tables and SAS

Review of One-way Tables and SAS Stat 504, Lecture 7 1 Review of One-way Tables and SAS In-class exercises: Ex1, Ex2, and Ex3 from http://v8doc.sas.com/sashtml/proc/z0146708.htm To calculate p-value for a X 2 or G 2 in SAS: http://v8doc.sas.com/sashtml/lgref/z0245929.htmz0845409

More information

Part 1.) We know that the probability of any specific x only given p ij = p i p j is just multinomial(n, p) where p k1 k 2

Part 1.) We know that the probability of any specific x only given p ij = p i p j is just multinomial(n, p) where p k1 k 2 Problem.) I will break this into two parts: () Proving w (m) = p( x (m) X i = x i, X j = x j, p ij = p i p j ). In other words, the probability of a specific table in T x given the row and column counts

More information

f(x θ)dx with respect to θ. Assuming certain smoothness conditions concern differentiating under the integral the integral sign, we first obtain

f(x θ)dx with respect to θ. Assuming certain smoothness conditions concern differentiating under the integral the integral sign, we first obtain 0.1. INTRODUCTION 1 0.1 Introduction R. A. Fisher, a pioneer in the development of mathematical statistics, introduced a measure of the amount of information contained in an observaton from f(x θ). Fisher

More information

Epidemiology Principle of Biostatistics Chapter 14 - Dependent Samples and effect measures. John Koval

Epidemiology Principle of Biostatistics Chapter 14 - Dependent Samples and effect measures. John Koval Epidemiology 9509 Principle of Biostatistics Chapter 14 - Dependent Samples and effect measures John Koval Department of Epidemiology and Biostatistics University of Western Ontario What is being covered

More information

STAT 135 Lab 5 Bootstrapping and Hypothesis Testing

STAT 135 Lab 5 Bootstrapping and Hypothesis Testing STAT 135 Lab 5 Bootstrapping and Hypothesis Testing Rebecca Barter March 2, 2015 The Bootstrap Bootstrap Suppose that we are interested in estimating a parameter θ from some population with members x 1,...,

More information

ST495: Survival Analysis: Hypothesis testing and confidence intervals

ST495: Survival Analysis: Hypothesis testing and confidence intervals ST495: Survival Analysis: Hypothesis testing and confidence intervals Eric B. Laber Department of Statistics, North Carolina State University April 3, 2014 I remember that one fateful day when Coach took

More information

Modeling Overdispersion

Modeling Overdispersion James H. Steiger Department of Psychology and Human Development Vanderbilt University Regression Modeling, 2009 1 Introduction 2 Introduction In this lecture we discuss the problem of overdispersion in

More information

Lab #11. Variable B. Variable A Y a b a+b N c d c+d a+c b+d N = a+b+c+d

Lab #11. Variable B. Variable A Y a b a+b N c d c+d a+c b+d N = a+b+c+d BIOS 4120: Introduction to Biostatistics Breheny Lab #11 We will explore observational studies in today s lab and review how to make inferences on contingency tables. We will only use 2x2 tables for today

More information

Discrete Multivariate Statistics

Discrete Multivariate Statistics Discrete Multivariate Statistics Univariate Discrete Random variables Let X be a discrete random variable which, in this module, will be assumed to take a finite number of t different values which are

More information

Question: If mating occurs at random in the population, what will the frequencies of A 1 and A 2 be in the next generation?

Question: If mating occurs at random in the population, what will the frequencies of A 1 and A 2 be in the next generation? October 12, 2009 Bioe 109 Fall 2009 Lecture 8 Microevolution 1 - selection The Hardy-Weinberg-Castle Equilibrium - consider a single locus with two alleles A 1 and A 2. - three genotypes are thus possible:

More information

Lecture 22. December 19, Department of Biostatistics Johns Hopkins Bloomberg School of Public Health Johns Hopkins University.

Lecture 22. December 19, Department of Biostatistics Johns Hopkins Bloomberg School of Public Health Johns Hopkins University. Lecture 22 Department of Biostatistics Johns Hopkins Bloomberg School of Public Health Johns Hopkins University December 19, 2007 1 2 3 4 5 6 7 8 9 1 tests for equivalence of two binomial 2 tests for,

More information

Analysis of Variance

Analysis of Variance Analysis of Variance Blood coagulation time T avg A 62 60 63 59 61 B 63 67 71 64 65 66 66 C 68 66 71 67 68 68 68 D 56 62 60 61 63 64 63 59 61 64 Blood coagulation time A B C D Combined 56 57 58 59 60 61

More information

Categorical Data Analysis Chapter 3

Categorical Data Analysis Chapter 3 Categorical Data Analysis Chapter 3 The actual coverage probability is usually a bit higher than the nominal level. Confidence intervals for association parameteres Consider the odds ratio in the 2x2 table,

More information

A Very Brief Summary of Statistical Inference, and Examples

A Very Brief Summary of Statistical Inference, and Examples A Very Brief Summary of Statistical Inference, and Examples Trinity Term 2009 Prof. Gesine Reinert Our standard situation is that we have data x = x 1, x 2,..., x n, which we view as realisations of random

More information

Lecture 25. Ingo Ruczinski. November 24, Department of Biostatistics Johns Hopkins Bloomberg School of Public Health Johns Hopkins University

Lecture 25. Ingo Ruczinski. November 24, Department of Biostatistics Johns Hopkins Bloomberg School of Public Health Johns Hopkins University Lecture 25 Department of Biostatistics Johns Hopkins Bloomberg School of Public Health Johns Hopkins University November 24, 2015 1 2 3 4 5 6 7 8 9 10 11 1 Hypothesis s of homgeneity 2 Estimating risk

More information

STAT Chapter 13: Categorical Data. Recall we have studied binomial data, in which each trial falls into one of 2 categories (success/failure).

STAT Chapter 13: Categorical Data. Recall we have studied binomial data, in which each trial falls into one of 2 categories (success/failure). STAT 515 -- Chapter 13: Categorical Data Recall we have studied binomial data, in which each trial falls into one of 2 categories (success/failure). Many studies allow for more than 2 categories. Example

More information

Parameter Estimation and Fitting to Data

Parameter Estimation and Fitting to Data Parameter Estimation and Fitting to Data Parameter estimation Maximum likelihood Least squares Goodness-of-fit Examples Elton S. Smith, Jefferson Lab 1 Parameter estimation Properties of estimators 3 An

More information

Just Enough Likelihood

Just Enough Likelihood Just Enough Likelihood Alan R. Rogers September 2, 2013 1. Introduction Statisticians have developed several methods for comparing hypotheses and for estimating parameters from data. Of these, the method

More information

STAC51: Categorical data Analysis

STAC51: Categorical data Analysis STAC51: Categorical data Analysis Mahinda Samarakoon January 26, 2016 Mahinda Samarakoon STAC51: Categorical data Analysis 1 / 32 Table of contents Contingency Tables 1 Contingency Tables Mahinda Samarakoon

More information

Two-sample Categorical data: Testing

Two-sample Categorical data: Testing Two-sample Categorical data: Testing Patrick Breheny October 29 Patrick Breheny Biostatistical Methods I (BIOS 5710) 1/22 Lister s experiment Introduction In the 1860s, Joseph Lister conducted a landmark

More information

ij i j m ij n ij m ij n i j Suppose we denote the row variable by X and the column variable by Y ; We can then re-write the above expression as

ij i j m ij n ij m ij n i j Suppose we denote the row variable by X and the column variable by Y ; We can then re-write the above expression as page1 Loglinear Models Loglinear models are a way to describe association and interaction patterns among categorical variables. They are commonly used to model cell counts in contingency tables. These

More information

Chapter 10. Prof. Tesler. Math 186 Winter χ 2 tests for goodness of fit and independence

Chapter 10. Prof. Tesler. Math 186 Winter χ 2 tests for goodness of fit and independence Chapter 10 χ 2 tests for goodness of fit and independence Prof. Tesler Math 186 Winter 2018 Prof. Tesler Ch. 10: χ 2 goodness of fit tests Math 186 / Winter 2018 1 / 26 Multinomial test Consider a k-sided

More information

" M A #M B. Standard deviation of the population (Greek lowercase letter sigma) σ 2

 M A #M B. Standard deviation of the population (Greek lowercase letter sigma) σ 2 Notation and Equations for Final Exam Symbol Definition X The variable we measure in a scientific study n The size of the sample N The size of the population M The mean of the sample µ The mean of the

More information

simple if it completely specifies the density of x

simple if it completely specifies the density of x 3. Hypothesis Testing Pure significance tests Data x = (x 1,..., x n ) from f(x, θ) Hypothesis H 0 : restricts f(x, θ) Are the data consistent with H 0? H 0 is called the null hypothesis simple if it completely

More information

We know from STAT.1030 that the relevant test statistic for equality of proportions is:

We know from STAT.1030 that the relevant test statistic for equality of proportions is: 2. Chi 2 -tests for equality of proportions Introduction: Two Samples Consider comparing the sample proportions p 1 and p 2 in independent random samples of size n 1 and n 2 out of two populations which

More information

Solutions to Even-Numbered Exercises to accompany An Introduction to Population Genetics: Theory and Applications Rasmus Nielsen Montgomery Slatkin

Solutions to Even-Numbered Exercises to accompany An Introduction to Population Genetics: Theory and Applications Rasmus Nielsen Montgomery Slatkin Solutions to Even-Numbered Exercises to accompany An Introduction to Population Genetics: Theory and Applications Rasmus Nielsen Montgomery Slatkin CHAPTER 1 1.2 The expected homozygosity, given allele

More information

STAT 135 Lab 11 Tests for Categorical Data (Fisher s Exact test, χ 2 tests for Homogeneity and Independence) and Linear Regression

STAT 135 Lab 11 Tests for Categorical Data (Fisher s Exact test, χ 2 tests for Homogeneity and Independence) and Linear Regression STAT 135 Lab 11 Tests for Categorical Data (Fisher s Exact test, χ 2 tests for Homogeneity and Independence) and Linear Regression Rebecca Barter April 20, 2015 Fisher s Exact Test Fisher s Exact Test

More information

2.3 Analysis of Categorical Data

2.3 Analysis of Categorical Data 90 CHAPTER 2. ESTIMATION AND HYPOTHESIS TESTING 2.3 Analysis of Categorical Data 2.3.1 The Multinomial Probability Distribution A mulinomial random variable is a generalization of the binomial rv. It results

More information

Recall the Basics of Hypothesis Testing

Recall the Basics of Hypothesis Testing Recall the Basics of Hypothesis Testing The level of significance α, (size of test) is defined as the probability of X falling in w (rejecting H 0 ) when H 0 is true: P(X w H 0 ) = α. H 0 TRUE H 1 TRUE

More information

Topic 19 Extensions on the Likelihood Ratio

Topic 19 Extensions on the Likelihood Ratio Topic 19 Extensions on the Likelihood Ratio Two-Sided Tests 1 / 12 Outline Overview Normal Observations Power Analysis 2 / 12 Overview The likelihood ratio test is a popular choice for composite hypothesis

More information

The Genetics of Natural Selection

The Genetics of Natural Selection The Genetics of Natural Selection Introduction So far in this course, we ve focused on describing the pattern of variation within and among populations. We ve talked about inbreeding, which causes genotype

More information

Outline. P o purple % x white & white % x purple& F 1 all purple all purple. F purple, 224 white 781 purple, 263 white

Outline. P o purple % x white & white % x purple& F 1 all purple all purple. F purple, 224 white 781 purple, 263 white Outline - segregation of alleles in single trait crosses - independent assortment of alleles - using probability to predict outcomes - statistical analysis of hypotheses - conditional probability in multi-generation

More information

LECTURE # How does one test whether a population is in the HW equilibrium? (i) try the following example: Genotype Observed AA 50 Aa 0 aa 50

LECTURE # How does one test whether a population is in the HW equilibrium? (i) try the following example: Genotype Observed AA 50 Aa 0 aa 50 LECTURE #10 A. The Hardy-Weinberg Equilibrium 1. From the definitions of p and q, and of p 2, 2pq, and q 2, an equilibrium is indicated (p + q) 2 = p 2 + 2pq + q 2 : if p and q remain constant, and if

More information

Contingency Tables. Contingency tables are used when we want to looking at two (or more) factors. Each factor might have two more or levels.

Contingency Tables. Contingency tables are used when we want to looking at two (or more) factors. Each factor might have two more or levels. Contingency Tables Definition & Examples. Contingency tables are used when we want to looking at two (or more) factors. Each factor might have two more or levels. (Using more than two factors gets complicated,

More information

The E-M Algorithm in Genetics. Biostatistics 666 Lecture 8

The E-M Algorithm in Genetics. Biostatistics 666 Lecture 8 The E-M Algorithm in Genetics Biostatistics 666 Lecture 8 Maximum Likelihood Estimation of Allele Frequencies Find parameter estimates which make observed data most likely General approach, as long as

More information

Chapter 10. Chapter 10. Multinomial Experiments and. Multinomial Experiments and Contingency Tables. Contingency Tables.

Chapter 10. Chapter 10. Multinomial Experiments and. Multinomial Experiments and Contingency Tables. Contingency Tables. Chapter 10 Multinomial Experiments and Contingency Tables 1 Chapter 10 Multinomial Experiments and Contingency Tables 10-1 1 Overview 10-2 2 Multinomial Experiments: of-fitfit 10-3 3 Contingency Tables:

More information

Brandon C. Kelly (Harvard Smithsonian Center for Astrophysics)

Brandon C. Kelly (Harvard Smithsonian Center for Astrophysics) Brandon C. Kelly (Harvard Smithsonian Center for Astrophysics) Probability quantifies randomness and uncertainty How do I estimate the normalization and logarithmic slope of a X ray continuum, assuming

More information

UNIVERSITY OF TORONTO Faculty of Arts and Science

UNIVERSITY OF TORONTO Faculty of Arts and Science UNIVERSITY OF TORONTO Faculty of Arts and Science December 2013 Final Examination STA442H1F/2101HF Methods of Applied Statistics Jerry Brunner Duration - 3 hours Aids: Calculator Model(s): Any calculator

More information

Introduction to QTL mapping in model organisms

Introduction to QTL mapping in model organisms Introduction to QTL mapping in model organisms Karl W Broman Department of Biostatistics Johns Hopkins University kbroman@jhsph.edu www.biostat.jhsph.edu/ kbroman Outline Experiments and data Models ANOVA

More information

BIO5312 Biostatistics Lecture 13: Maximum Likelihood Estimation

BIO5312 Biostatistics Lecture 13: Maximum Likelihood Estimation BIO5312 Biostatistics Lecture 13: Maximum Likelihood Estimation Yujin Chung November 29th, 2016 Fall 2016 Yujin Chung Lec13: MLE Fall 2016 1/24 Previous Parametric tests Mean comparisons (normality assumption)

More information

HYPOTHESIS TESTING: THE CHI-SQUARE STATISTIC

HYPOTHESIS TESTING: THE CHI-SQUARE STATISTIC 1 HYPOTHESIS TESTING: THE CHI-SQUARE STATISTIC 7 steps of Hypothesis Testing 1. State the hypotheses 2. Identify level of significant 3. Identify the critical values 4. Calculate test statistics 5. Compare

More information

Linkage and Linkage Disequilibrium

Linkage and Linkage Disequilibrium Linkage and Linkage Disequilibrium Summer Institute in Statistical Genetics 2014 Module 10 Topic 3 Linkage in a simple genetic cross Linkage In the early 1900 s Bateson and Punnet conducted genetic studies

More information

Goodness of Fit Tests

Goodness of Fit Tests Goodness of Fit Tests Marc H. Mehlman marcmehlman@yahoo.com University of New Haven (University of New Haven) Goodness of Fit Tests 1 / 38 Table of Contents 1 Goodness of Fit Chi Squared Test 2 Tests of

More information

Unit 9: Inferences for Proportions and Count Data

Unit 9: Inferences for Proportions and Count Data Unit 9: Inferences for Proportions and Count Data Statistics 571: Statistical Methods Ramón V. León 1/15/008 Unit 9 - Stat 571 - Ramón V. León 1 Large Sample Confidence Interval for Proportion ( pˆ p)

More information

Poisson regression: Further topics

Poisson regression: Further topics Poisson regression: Further topics April 21 Overdispersion One of the defining characteristics of Poisson regression is its lack of a scale parameter: E(Y ) = Var(Y ), and no parameter is available to

More information

Pump failure data. Pump Failures Time

Pump failure data. Pump Failures Time Outline 1. Poisson distribution 2. Tests of hypothesis for a single Poisson mean 3. Comparing multiple Poisson means 4. Likelihood equivalence with exponential model Pump failure data Pump 1 2 3 4 5 Failures

More information

Statistical Inference: Estimation and Confidence Intervals Hypothesis Testing

Statistical Inference: Estimation and Confidence Intervals Hypothesis Testing Statistical Inference: Estimation and Confidence Intervals Hypothesis Testing 1 In most statistics problems, we assume that the data have been generated from some unknown probability distribution. We desire

More information

Math 152. Rumbos Fall Solutions to Exam #2

Math 152. Rumbos Fall Solutions to Exam #2 Math 152. Rumbos Fall 2009 1 Solutions to Exam #2 1. Define the following terms: (a) Significance level of a hypothesis test. Answer: The significance level, α, of a hypothesis test is the largest probability

More information

Topic 21 Goodness of Fit

Topic 21 Goodness of Fit Topic 21 Goodness of Fit Contingency Tables 1 / 11 Introduction Two-way Table Smoking Habits The Hypothesis The Test Statistic Degrees of Freedom Outline 2 / 11 Introduction Contingency tables, also known

More information