Statistical Hypothesis Testing. STAT 536: Genetic Statistics. Statistical Hypothesis Testing - Terminology. Hardy-Weinberg Disequilibrium

Similar documents
STAT 536: Genetic Statistics

Properties and Hypothesis Testing

STA Learning Objectives. Population Proportions. Module 10 Comparing Two Proportions. Upon completing this module, you should be able to:

Chapter 22. Comparing Two Proportions. Copyright 2010, 2007, 2004 Pearson Education, Inc.

Understanding Samples

Chapter 22. Comparing Two Proportions. Copyright 2010 Pearson Education, Inc.

Goodness-of-Fit Tests and Categorical Data Analysis (Devore Chapter Fourteen)

1 Inferential Methods for Correlation and Regression Analysis

General IxJ Contingency Tables

Comparing Two Populations. Topic 15 - Two Sample Inference I. Comparing Two Means. Comparing Two Pop Means. Background Reading

Tests of Hypotheses Based on a Single Sample (Devore Chapter Eight)

A statistical method to determine sample size to estimate characteristic value of soil parameters

Table 12.1: Contingency table. Feature b. 1 N 11 N 12 N 1b 2 N 21 N 22 N 2b. ... a N a1 N a2 N ab

Math 152. Rumbos Fall Solutions to Review Problems for Exam #2. Number of Heads Frequency

Lecture 3. Properties of Summary Statistics: Sampling Distribution

Chapter 13: Tests of Hypothesis Section 13.1 Introduction

Stat 200 -Testing Summary Page 1

Chapter 6 Sampling Distributions

Because it tests for differences between multiple pairs of means in one test, it is called an omnibus test.

Last Lecture. Wald Test

Common Large/Small Sample Tests 1/55

If, for instance, we were required to test whether the population mean μ could be equal to a certain value μ

Chapter 23: Inferences About Means

MOST PEOPLE WOULD RATHER LIVE WITH A PROBLEM THEY CAN'T SOLVE, THAN ACCEPT A SOLUTION THEY CAN'T UNDERSTAND.

Agreement of CI and HT. Lecture 13 - Tests of Proportions. Example - Waiting Times

1 Models for Matched Pairs

6 Sample Size Calculations

Mathematical Notation Math Introduction to Applied Statistics

Hypothesis Testing. Evaluation of Performance of Learned h. Issues. Trade-off Between Bias and Variance

Chapter 13, Part A Analysis of Variance and Experimental Design

GG313 GEOLOGICAL DATA ANALYSIS

Statistics 511 Additional Materials

Recall the study where we estimated the difference between mean systolic blood pressure levels of users of oral contraceptives and non-users, x - y.

CEE 522 Autumn Uncertainty Concepts for Geotechnical Engineering

Econ 325/327 Notes on Sample Mean, Sample Proportion, Central Limit Theorem, Chi-square Distribution, Student s t distribution 1.

Power and Type II Error

Chapter 11: Asking and Answering Questions About the Difference of Two Proportions

UCLA STAT 110B Applied Statistics for Engineering and the Sciences

Topic 18: Composite Hypotheses

Lecture 5: Parametric Hypothesis Testing: Comparing Means. GENOME 560, Spring 2016 Doug Fowler, GS

Lecture 6 Chi Square Distribution (χ 2 ) and Least Squares Fitting

Discrete Mathematics for CS Spring 2008 David Wagner Note 22

Sampling Distributions, Z-Tests, Power

Frequentist Inference

PSYCHOLOGICAL RESEARCH (PYC 304-C) Lecture 9

Economics Spring 2015

STATISTICAL PROPERTIES OF LEAST SQUARES ESTIMATORS. Comments:

FACULTY OF MATHEMATICAL STUDIES MATHEMATICS FOR PART I ENGINEERING. Lectures

Econ 325 Notes on Point Estimator and Confidence Interval 1 By Hiro Kasahara

Class 23. Daniel B. Rowe, Ph.D. Department of Mathematics, Statistics, and Computer Science. Marquette University MATH 1700

A quick activity - Central Limit Theorem and Proportions. Lecture 21: Testing Proportions. Results from the GSS. Statistics and the General Population

Lecture 22: Review for Exam 2. 1 Basic Model Assumptions (without Gaussian Noise)

Correlation. Two variables: Which test? Relationship Between Two Numerical Variables. Two variables: Which test? Contingency table Grouped bar graph

October 25, 2018 BIM 105 Probability and Statistics for Biomedical Engineers 1

Introductory statistics

[ ] ( ) ( ) [ ] ( ) 1 [ ] [ ] Sums of Random Variables Y = a 1 X 1 + a 2 X 2 + +a n X n The expected value of Y is:

Direction: This test is worth 150 points. You are required to complete this test within 55 minutes.

Chapter 12 Correlation

Resampling Methods. X (1/2), i.e., Pr (X i m) = 1/2. We order the data: X (1) X (2) X (n). Define the sample median: ( n.

Data Analysis and Statistical Methods Statistics 651

10. Comparative Tests among Spatial Regression Models. Here we revisit the example in Section 8.1 of estimating the mean of a normal random

11 Correlation and Regression

Sample Size Determination (Two or More Samples)

Big Picture. 5. Data, Estimates, and Models: quantifying the accuracy of estimates.

This is an introductory course in Analysis of Variance and Design of Experiments.

Class 27. Daniel B. Rowe, Ph.D. Department of Mathematics, Statistics, and Computer Science. Marquette University MATH 1700

Lecture 6 Chi Square Distribution (χ 2 ) and Least Squares Fitting

Topic 9: Sampling Distributions of Estimators

Important Formulas. Expectation: E (X) = Σ [X P(X)] = n p q σ = n p q. P(X) = n! X1! X 2! X 3! X k! p X. Chapter 6 The Normal Distribution.

CS434a/541a: Pattern Recognition Prof. Olga Veksler. Lecture 5

Chapter 5: Hypothesis testing

Statistical Inference (Chapter 10) Statistical inference = learn about a population based on the information provided by a sample.

Problem Set 4 Due Oct, 12

Estimation for Complete Data

Confidence intervals summary Conservative and approximate confidence intervals for a binomial p Examples. MATH1005 Statistics. Lecture 24. M.

6.3 Testing Series With Positive Terms

Final Examination Solutions 17/6/2010

DS 100: Principles and Techniques of Data Science Date: April 13, Discussion #10

Continuous Data that can take on any real number (time/length) based on sample data. Categorical data can only be named or categorised

EECS564 Estimation, Filtering, and Detection Hwk 2 Solns. Winter p θ (z) = (2θz + 1 θ), 0 z 1

Read through these prior to coming to the test and follow them when you take your test.

Correlation Regression

BIOS 4110: Introduction to Biostatistics. Breheny. Lab #9

Stat 421-SP2012 Interval Estimation Section

NANYANG TECHNOLOGICAL UNIVERSITY SYLLABUS FOR ENTRANCE EXAMINATION FOR INTERNATIONAL STUDENTS AO-LEVEL MATHEMATICS

Chapter 22: What is a Test of Significance?

1 Constructing and Interpreting a Confidence Interval

POWER COMPARISON OF EMPIRICAL LIKELIHOOD RATIO TESTS: SMALL SAMPLE PROPERTIES THROUGH MONTE CARLO STUDIES*

TABLES AND FORMULAS FOR MOORE Basic Practice of Statistics

Random Variables, Sampling and Estimation

Statisticians use the word population to refer the total number of (potential) observations under consideration

Efficient GMM LECTURE 12 GMM II

3/3/2014. CDS M Phil Econometrics. Types of Relationships. Types of Relationships. Types of Relationships. Vijayamohanan Pillai N.

Lecture 7: Non-parametric Comparison of Location. GENOME 560, Spring 2016 Doug Fowler, GS

STAC51: Categorical data Analysis

1036: Probability & Statistics

Chapter 8: Estimating with Confidence

University of California, Los Angeles Department of Statistics. Hypothesis testing

Lecture 33: Bootstrap

Topic 9: Sampling Distributions of Estimators

Transcription:

Statistical Hypothesis Testig STAT 536: Geetic Statistics Kari S. Dorma Departmet of Statistics Iowa State Uiversity September 7, 006 Idetify a hypothesis, a idea you wat to test for its applicability to your data set. Hardy-Weiberg equilibrium applies to this data set. The two loci I am studyig are idepedet of each other. Idetify ad calculate a test statistic. Ideally, the statistic should: summarize ad accetuate ay deviatios of the data from what is expected uder the hypothesis, have a kow samplig distributio uder the ull hypothesis. Compute or estimate the probability of the observed test statistic uder the assumptio of the hypothesis. Reject the hypothesis if this probability is small. Statistical Hypothesis Testig - Termiology Null hypothesis (H 0 ): the hypothesis you wish to test. Alterative hypothesis (H a ): whe you reject the ull hypothesis, you coclude the alterative hypothesis or hypotheses. Type I error: the test statistic causes you to (erroeously) reject the hypothesis whe it is true. size, or sigificace level (α): the probability of a type I error Type II error: you accept the hypothesis whe it is false β: the probability of a type II error power (1 β): probability that you reject the hypothesis whe it is false. Procedure: classically, oe decides o the size of the test before collectig the data, the selects the most powerful test for the desired size. Hypothesis Accepted Hypothesis Rejected Hypothesis True 1 α Type I (size, α) Hypothesis False Type II (β) power (1 β) Hardy-Weiberg Disequilibrium Let u ad v be alleles at a sigle locus. The, HWE implies P uu = pu P uv = p u p v wheever u v where P uv is the populatio geotype frequecy ad p u are the populatio allele frequecies. Recall that if two radom variables X ad Y are idepedet, the P(X ad Y ) = P(X)P(Y ). I Eglish, kowig X tells you othig about Y ad vice versa. HWE equatios imply idepedece (or o associatio) amog the alleles at a locus.

Hardy-Weiberg Disequilibrium (cot) These equatios may ot be satisfied i a populatio where ay oe (or more) of the Hardy-Weiberg assumptios is (are) violated. Whe the equatios are ot satisfied, a Hardy-Weiberg disequilibrium applies. P uu p u P uv p u p v wheever u v Oe could mathematically quatitate this disequilibrium i multiple ways. We will cosider two for ow: Suppose there is covariatio amog alleles. Write the disequilibrium i terms of this covariatio. Cosider subtractive disequilibrium: P uu p u ad P uv p u p v. Cosider multiplicative disequilibrium oce agai ad covert to log-liear model. Covariatio Betwee > Alleles What do you do with f whe there are more tha alleles? Say, for example, there are 3 alleles u, v, ad w. There ca be correlatio betwee u ad v or u ad w, etc. We ca subscript f. Let f uv be the correlatio betwee alleles u ad v, where u ca equal v. The, P uu = pu + p u (1 p u ) f uu P uv = p u p v (1 f uv ). But, we are ot doe. There are relatioships amog these parameters. If there are differet alleles, there are 1 free allele frequecies p u ad (+1) correlatio coefficiets f uv, for a total of +3 parameters. However, there are oly (+1) 1 free geotype couts i the data. How may relatioships are there amog the parameters? Covariatio Betwee Alleles Let x j be a idicator variable idicatig whether the jth allele of a radom idividual is allele 1. Recall the model What is the meaig of f? Var `x j Cov `x i, x j Corr `x i, x j P 11 = p 1 + p 1(1 p 1 )f P 1 = p 1 (1 p 1 )(1 f ) = p 1 (1 p 1 ) = E `x i x j E (xi ) E `x j = P11 p1 Cov `x i, x j = qvar (x i ) Var `x j = P 11 p 1 p 1 (1 p 1 ) = p 1 + p 1 (1 p 1 ) f p 1 p 1 (1 p 1 ) = f. Covariatio Betwee > Alleles (cot) There are d.f. i parameters d.f. i data = + 3 ( + 1) + 1 = relatioships amog the model parameters. To fid the relatioships amog the parameters, recall that p u = P uu + 1 P uv. There are such relatioships, ad they will cosume all the extra degrees of freedom. Substitute i the expressios for the geotype frequecies to observe f uu = p v f uv. 1 p u v u v u

Covariatio Betwee > Alleles (d approach) Suppose that associatios betwee alleles are ot a cosequece of specific iteractios amog alleles. Suppose istead that there is a geeral associatio betwee alleles regardless of allele idetity. (We will discuss how this ca arise later.) The, there is oe correlatio that applies to all pairs of alleles u ad v ad the applicable equatios are agai P uu = p u + p u (1 p u ) f P uv = p u p v (1 f ) where u v. There are 1 free allele frequecies p u ad 1 free correlatio f, leadig to free parameters. There are plety of degrees of freedom (+1) 1 i the data to cover these parameters. Problems with Correlatios f uv But all is ot perfect. Notice, f appears as a multiplier of allele frequecies. We model disequilibrium as a multiplicative factor that expads/shriks geotype couts away from HWE expectatio. We derived some estimates for correlatio f i previous lecture(s). I geeral, estimatio of multiplicative factors ivolves ratios of statistics. Ratios are otoriously difficult statistically. A easier approach, statistically, is to look for additive disequilibrium. Defie D uu = P uu pu D uv = P uv p u p v. Let D uv = D uv, to write more coveietly P uu = pu + D uu P uv = p u p v D uv wheever u v. d.f. for Additive Disequilibrium Agai, there must be relatioships amog the parameters to accout for the differece i degrees of freedom. p u = P uu + 1 P uv v u = pu + D uu + v<u = p u v u p v + D uu v<u = p u + D uu v<u D uv (p u p v D uv ) D uv Rage of Additive Disequilibrium Because 0 P uu, P uv 1, we also recogize that the additive disequilibrium ca t be just aythig. I fact, the total list of costraits is p u D uu 1 p u p u p v 1 D uv p u p v D uu = X v<u D uv I the case of just two alleles at a locus 1 ad, the D 11 = D 1 = D, so P 11 = p 1 + D 1 P 1 = p 1 p D 1 leadig to D uu = v<u D uv. ad P = p + D 1 max pu D 1 p 1 p. u {1,}

Testig D 1 = 0 (HWE) Estimatig D 1 Testig for HWE is equivalet to testig the ull hypothesis H 0 : D 1 = 0. Here, we have restricted to the two allele case. We eed two thigs: A estimate ˆD A. Is it close to 0? A samplig distributio for the estimate ˆD 1 to determie whether it is farther from 0 tha we would expect by chace. Two free parameters p 1, D 1 ad two free pieces of data 11, 1 suggests Bailey s method: 11 = ( p 1 + D 1 ) ca be solved to produce 1 = (p 1 p D 1 ) ˆp 1 = 11 + 1 = p 1 ˆD 1 = 11 p 1 = P 11 p 1. ˆD 1 Bias ˆD 1 Samplig Variace Is our estimate ˆD 1 ubiased? E ˆD1 ) = E ( P11 E ( p 1 ) = P 11 p1 1 ( p1 + P 11 p 1) = D 1 1 [p 1 (1 p 1 ) + D 1 ]. Sice E ˆD1 D 1 we coclude that the estimator is biased. However, we are ecouraged to ote that as, the bias goes to 0. Usig Fisher s approximatio (it applies because ˆD 1 is a fuctio of proportios), we obtai a approximate samplig variace for ˆD 1 Var ˆD1 1 [ ] p1 (1 p 1 ) + (1 p 1 ) D 1 D1. To estimate the samplig variace, we substitute i our estimates for p 1 ad D 1 Var ˆD1 ˆ= 1 [ ˆp 1 (1 ˆp 1 ) + (1 ˆp 1 ) ˆD1 ˆD ] 1. Sice ˆD 1 is the MLE, we have for large samples that [ ] ˆD 1 N E ˆD1, Var ˆD1.

Testig H 0 : D 1 = 0 Usig z-values Computig z Compute the stadard ormal variate z ˆD 1 E ˆD1 z = Var ˆD1 Uder the ull hypothesis H 0, z approximately follows the stadard ormal distributio. Compare z agaist stadard ormal distributio. The key is that if ˆD 1 is very positive or very egative, the z will ted to be far from 0 ad your statistic will fall i the tails of the samplig distributio, where it is ot expected to fall if the ull hypothesis of HWE is true. Uder the ull hypothesis, we kow E ˆD1 0 Var ˆD1 1 so z = [ ˆp 1 (1 ˆp 1 ) ]. ˆD1 ˆp 1 (1 ˆp 1 ) Note, we have assumed is sufficietly large that the bias term is egligible. Relevat Alterative Hypotheses Relevat Alterative Hypotheses (cot) P 11 = p 1 + D 1 P 1 = p 1 p D 1 P = p + D 1 Depedig o your purpose, there may be differet alterative hypotheses you cosider. Suppose z = 1.5, the H A : D 1 > 0 or D 1 < 0. You have o a priori feelig for whether heterozygotes will be over- or uder-represeted. Use a two-tailed test. > *porm(q=-1.5) [1] 0.11995 > *(1-porm(q=1.5)) [1] 0.11995 Oe-side hypotheses are appropriate whe you suspected heterozygotes would either be uder- or over-represeted before you collected the data. H A : D 1 > 0. You suspect that heterozygotes will be uder-represeted. Use a oe-tailed (right tail) test. > (1-porm(q=1.5)) [1] 0.1056498 H A : D 1 < 0. You suspect that heterozygotes will be over-represeted. Use a oe-tailed (left tail) test. > porm(q=1.5) [1] 0.894350

Chi-Square Chi-Square Goodess-of-Fit A equivalet test is the Chi-Square test for HWE. It depeds o comparig z agaist its samplig distributio, which uder the ull, is a chi-square distributio with 1 degree of freedom. z = X 1 = ˆD 1 ˆp 1 (1 ˆp 1). However, ote that both positive ad egative values of z give the same z statistic. It is ot so easy to cosider oe-sided alterative hypotheses. Sice the tests are equivalet, use the z statistic for oe-sided tests. Test Assumptio: The sample size is large so both ormality (or chi-square) applies ad bias ca be igored. Geotype 11 1 Observed (O) 11 1 Expected (E) Observed - Expected ˆp 1 ˆD 1 ˆp 1 (1 ˆp 1 ) ˆD 1 (1 ˆp 1 ) ˆD 1 Here, we have made the assumptio that is sufficietly large that the bias terms are 0. The goodess-of-fit chi-square statistic is defied as X 1 = = geotypes ( ˆD ) 1 ˆp 1 (O E) E ( ˆD 1 ) ( ˆD 1 ) + ˆp 1 (1 ˆp 1 ) + (1 ˆp 1 ). Stadard Cautios About Chi-Square Tests Likelihood Ratio Suppose your ull hypothesis is Apply oly whe expected couts E 5. Because the expected couts E appear i the deomiator, small variatio whe they are small results i huge chages i X 1. Apply Yates correctio to accout for discrete ature of data. Because the observed data are discrete, but the samplig distributio (ormal or chi-square) is cotiuous, the Yates correctio is recommeded: X1 ( O E 0.5) = E geotypes H 0 : φ = φ 0 for some parameter φ. Let the maximum likelihood value uder H 0 be L 0 ad the maximum likelihood value without the restrictio o φ be L 1. The, L 0 will always be smaller tha L 1 sice φ 0 may ot be the maximum likelihood value of φ. However, if the ull is true, ˆφ should be very close to φ 0 ad L 0 will be very close to L 1. Defie the likelihood ratio as λ = L 0 L 1. Whe the ull hypothesis is true ad the size of φ is s, the l λ χ (s).

Likelihood Ratio Test for HWE Uder the ucostraied model (the alterative hypothesis), the parameters are p 11, p 1, p ad the data are 11, 1,. There are two degrees of freedom i the model ad the data, so Bailey s method applies to yield ˆp 11 = 11 ˆp 1 = 1. The maximum likelihood uder the ucostraied model is L 1 =! 11! 1!! ( 11 ) 11 ( 1 Uder the costraied model, ˆP 11 = ˆp 1 ˆP 1 = ˆp 1 (1 ˆp 1 ) with ˆp 1 = 1. ) 1 ( A Multiplicative Model That Works Let us cosider aother multiplicative model P 11 = MM 1 M 11 P 1 = MM 1 M M 1 P = MM M Here M is the mea effect, M 11, M 1, M represet associatios betwee allele frequecies, ad M 1, M represet the allele frequecy cotributios. Takig logarithms puts this model back i the additive space l P 11 = l M + l M 1 + l M 11 ) l P 1 = l M + l + l M 1 + l M + l M 1 l P = l M + l M + l M Likelihood Ratio Test for HWE (cot) The maximum likelihood uder the costraied model is L 0 =! ( 1 11! 1!! ) 11 ( 1 () ) 1 ( ). The test statistic is therefore [ ] 11 1 ( 1 ) 1 l λ = l () 11 11 1 1 Hadlig Overparameterizatio There are, as usual, more parameters tha observatios. Ad this time there are multiple ways to deal with the overparameterizatio. Oe way is to set the M M 1 = 1 M M = 1, P 11 = MM 1 M 11 P 1 = MM 1 P = M. There is still a extra degree of freedom, but summig all three equatios yields 1 = M M1 M 11 + M 1 + 1 or 1 M = 1 + M 1 + M1 M. 11

Estimatig Parameters M 1 ad M 11 Agai, Bailey s method applies ad the maximum likelihood estimates are ˆM 1 = 1 Mˆ 11 = 4 11 1 with the tag-alog ˆM =. Substitutig these MLEs back ito the origial multiplicative equatios produces the same likelihood uder H A. ˆP 11 = 11 ˆP 1 = 1 ˆP = L 1 =! 11! 1!! 11 11 1 1 Log Likelihood Test for Multiplicate Model HWE implies o iteractio term, i.e. M 11 = 1. Uder this costrait, we agai apply Bailey s method to fid ) ( 1 ˆP 11 = ˆP 1 = 1 ( ) ˆP =. It turs out that λ has the same form uder this log-liear model as the additive disequilibrium model. So the log-liear model for testig HWE is equivalet to the additive model. Exact Tests Exact Tests for HWE If the probability of the observed sample uder the ull hypothesis ad all less likely samples is small, the the evidece suggests the data is ulikely to have arise uder the ull hypothesis. If oe ca compute the probability of all possible samples, the obtaiig a exact probability (o approximatio) is possible. are useful whe all the possible observed data ca be eumerated practically. This occurs geerally whe the expected couts are small i some categories (i.e. whe the previous tests fail). The probability of the observed data 11, 1, is give by the multiomial distributio P ( 11, 1, ) =! 11! 1!! p 11 11 p 1 1 p. Whe HWE applies, the P ( 11, 1, ) =! 11! 1!! p 11 1 (p 1 p ) 1 p. I additio, the allele couts 1 ad are biomially distributed P ( 1, ) = ()! 1!! p 1 1 p. The coditioal probability, where we coditio o the observed allele couts is P ( 11, 1, 1, ) = P ( 11, 1,, 1, ) P ( 1, )! 1!! 1 = 11! 1!!()!. = P ( 11, 1, ) P ( 1, )

Exact Test for HWE (Example) Summary of Tests for HWE Suppose we observe 11 = 10, 1 = 1, =. Use a exact test to calculate a p-value for rejectig the ull hypothesis of HWE. 11 1 Probability Cumul. Prob. 10 1 9.1 10 5 9.1 10 5 9 3 1 0.35 0.35 8 5 0 0.63 0.97 The p-value is p = 9.1 10 5 sice there is o dataset more extreme tha the observed. Normal approximatio for MLEs uses the z statistic. Chi-square test uses the X 1 = z statistic ad is equivalet to the above test. The chi-square goodess-of-fit test is idetical to the above chi-square test, but highlights the eed for substatial data i each category. The likelihood ratio test is widely applicable ad flexible whe a likelihood fuctio is available. The log-liear model uses a multiplicative model ad leads to a test equivalet to the likelihood ratio test. The exact test is useful whe the data set is small ad particularly whe couts i some categories are small. Tests for Multiple Alleles Testig Complete HWE Whe there are more tha two alleles at a locus, the geeral equatios P uu = p u + D uu P uv = p u p v D uv wheever u v Therefore l λ for H 0 : D uv = 0 for all u v approximately follows a chi-square distributio with k(k 1) degrees of freedom. If you are more comfortable with a goodess-of-fit test, that statistic apply, with relatioship D uu = X v<u D uv X T = u v ( uv E ( uv ) 0.5) E ( uv ) ad MLEs obtaied by Bailey s method (verify this; it is ot hard) with ˆp u = p u ˆD uv = p u p v 1 P uv. Uder complete HWE, D uv = 0 for all u v, that is there are k(k 1) costraits applied (oe for each heterozygote) whe there are k alleles. E ( uu ) = p u E ( uv ) = p u p v. follows the same samplig distributio.

Testig Partial HWE z-test for Partial HWE If you wat to test oly certai combiatios of alleles, the tests are more complicated. Test a sigle D uv. Example: If H 0 : D 1 = 0, the likelihood ratio statistic l λ follows a chi-square with 1 degree of freedom. But Bailey s method does ot apply uless k = so L 0 is difficult to compute. Iterative methods are required. Or, oe could apply the z-test or the chi-square test. Details for these tests follow o the ext slide. Test multiple, but ot all D uv. Example: If k = 4 ad H 0 : D 1, D 34 = 0, l λ follows a chi-square with degrees of freedom. Iterative methods are still required. This time, there are o easy alteratives. Complex hypotheses cause difficulties for the z-test ad related chi-square. The likelihood ratio test hadles complex hypotheses quite aturally. Here, you oly eed the MLEs uder the full alterative model (where Bailey s method applies). For large samples, the MLE ˆD uv is approximately ormally distributed uder H 0 with mea 0 ad variace Var ˆDuv. z uv = ˆD uv r Var ˆDuv We ca agai use Fisher s approximatio to compute the variace. 8 1 < Var ˆDuv = : p up v [(1 p u )(1 p v ) + p u p v ] h (1 p u p v ) (p u p v ) i D uv 9 + X = (pud vw + pv D uw ) Duv ; w u,v Uder the H 0, the variace is obtaied by assumig D uv = 0. Exact Tests for Multiple Alleles Approximate Exact Tests The exact test geeralizes to multiple alleles. The formula is GeePOP iput: Test loc1 Pop a1, 0101 a, 0101 a3, 0101 a4, 00 a5, 003 P ({ uv } { u }) =!H Q u u! ()! Q u,v uv!. GUO, S. ad THOMPSON, E. 199. Performig the exact test of Hardy Weiberg proportio for multiple alleles. Biometrics 48 pp. 361-37. LAZZERONI, L. C. ad LANGE, K. 1997. Markov chais for Mote Carlo tests of geetic equilibrium i multidimesioal cotigecy tables. A. Statist. 5 pp. 138-168. Whe it is impossible to eumerate all the possible datasets with the same allele frequecies, approximate methods are eeded. Oe of the simplest uses Mote Carlo. Calculate F = P ({ uv } { u }) for the observed data. Set S = 0 ad put all your geotypes (13, 66, 31, 16,...) i a big vector of legth. 13663116 Permute all alleles ad clump successive alleles ito geotypes. (36)(16)(36)(13) Compute F = P ({ uv } { u }) for the permuted dataset. If F F, icremet S by 1. Repeat M times. Estimate the p-value as S M.

Example Power Calculatios Cosider the followig table of geotype couts for a locus with four alleles. A 1 0 A 3 1 A 3 5 18 1 A 4 3 7 5 A 1 A A 3 A 4 Method Estimate Exact 0.01744 χ 0.0337 MC itegratio (M = 1700) 0.01706 Recall that the power of a statistical test is the probability that you reject the ull hypothesis whe it is false. Before you begi collectig data, you may wish to estimate whether you will be able to detect a disequilibrium of a give size, say D 1 = 0.10. To detect it meas you reject the ull that D 1 = 0. The test statistic X1 = ˆD 1 ˆp 1 (1 ˆp 1) follows a differet distributio depedig o whether H 0 is true or ot. H 0 true X1 χ (1) H 0 false X1 χ (1,ν) where χ (1,ν) is the ocetral chi-square distributio with ocetrality parameter ν. The Nocetrality Parameter Size of D 1 or The ocetral chi-square distributio is a approximate distributio uder H A. The ocetrality parameter is give by ν = D 1 p 1 (1 p 1) ν is bigger whe D 1 is farther from 0. But ote, the approximatio is oly valid whe ν is small, say of order 1. If we take the stadard sigificace level α = 0.05, the we will reject H 0 if X 1 > 3.84. With this kowledge oe ca ask a couple of questios How big does D 1 eed to be i order to have 90% chace of gettig X 1 > 3.84 ad therefore rejectig H 0 ad detectig the disequilibrium? How big does my sample eed to be i order to detect a disequilibrium D 1 = 0.1 with 90% probability? r ν D 1 = p 1 (1 p 1 ) Whe ν = 10.5, the X 1 will exceed 3.84 with 90% probability. (These kids of results are tabulated or available i statistics packages.) So, for alterative sample sizes you ca see how large a disequilibrium D 1 you will be likely (90% probabilitity) to detect. Also, further rearragemet yields = νp A(1 p A ) D A which ca be used to compute the size of the sample we ll eed to detect a specified disequilibrium D A. Note, for both of these applicatios, you eed to kow p A to use the formulas.

Power Calculatios for Exact Tests Power Calculatios for Exact Tests (cot) Recall, you eed to calculate P ({ uv } { u }) = P ({ uv}) P ({ u }) Uder the alterative hypothesis, the geotype frequecies are give by the formulas P uu = p u + D uu so you ca compute the umerator P uv = p u p v D uv wheever u v,! Y P ({ uv }) = Q u v uv! u v P uv uv But, of course, u = uu + 1 uv. so if you kow P ({ uv }), you ca compute P ({ u }) by summig over the former. u<v but you ca o loger assume the biomial distributio for allele frequecies.