Family-wise Error Rate Control in QTL Mapping and Gene Ontology Graphs

Family-wise Error Rate Control in QTL Mapping and Gene Ontology Graphs with Remarks on Family Selection Dissertation Defense April 5, 204

Contents Dissertation Defense Introduction 2 FWER Control within Gene Ontology Graphs 3 A Power Improving Multiplicity Correction for Large-Scale SNP Selection in LD Based QTL Mapping 4 QTL Mapping: Hypotheses and Approaches 5 Discussion

. Introduction Dissertation Defense If enough statistics are computed, some of them will be sure to show structure. (Diaconis 985).0 P(any type I error) 0.8 0.6 0.4 0.2 0 20 40 60 80 00 Number of Simultaneous Tests (m)

. Introduction Dissertation Defense Cournot Tippet Great spurt of MCP activity Miller Marcus et al Hochberg & Tamhane Hsu Westfall & Young Denotes a world-wide conference on MCPs. 30 20 0 843 93 40's & 50's 966 976 '87 '93 Number of MCP articles from four leading journals by year from 965 to 2008. '96 '00 '3

. Introduction Dissertation Defense The Vitality of [the] field in the future as a research area depends upon [the researcher s] ability to continue and address the real needs of statistical analysis in current problems (Benjamini, 200).

. Introduction Dissertation Defense Declared Declared non-significant significant Total True null hypothesis U V m 0 False null hypothesis T S m m 0 m R R m The Per Comparison Error Rate (PCER): E(V/m) The Familywise Error Rate (FWER): P(V ) The False Discovery Rate (FDR): E(V/R)

. Introduction Dissertation Defense Selecting a Family of Hypotheses A subjective, but important decision. Any collection of inferences for which it is meaningful to take into account some combined measure of errors. (Hochberg & Tamhane 987) Gatekeeping (Bretz et al. 2009)

. Introduction Dissertation Defense The Bonferroni Adjustment: test each H i at level α/m Boole s Inequality: P(A B) P(A)+P(B)... or generally P ( A i ) P(A i ) Let R i denote the event that hypothesis H i is rejected. Then, if P Hi (R i ) = α i, FWER = P Hi ( R i ) P Hi (R i ) = If α i = α/m for all i, then FWER α. α i

. Introduction Dissertation Defense Weighted Bonferroni Adjustment: test H i at level α i, s.t. α i α Since ( ) FWER = P Hi R i P Hi (R i ) = α i So long as α i α, then FWER α.

. Introduction Dissertation Defense Holm s Sequential Bonferroni: test ordered H (j) at level α/(m j + ) Let P i denoted the p-value for H i. Let I {,..., m} index the true H i, I = k m. Then, P (P i > α ) k for all i I = P (P i α ) k for some i I Since m j + k, FWER α. i I ( P P i α ) k k α k = α.

. Introduction Dissertation Defense Closed Testing: reject w i iff all w j w i are rejected at level α Let W be a set of hypotheses. W is closed under intersection if: for any two hypotheses H i, H j W, w = H i H j is also in W. An example Consider the elementary hypotheses H, H 2, and H 3. Let w = H, w 2 = H 2, w 3 = H 3, and w 4 = H H 2, w 5 = H H 3, w 6 = H 2 H 3, and w 7 = H H 2 H 3 W = {w,..., w 7 } is a set of hypotheses closed under intersections.

. Introduction Dissertation Defense Closed Testing: reject w i iff all w j w i are rejected at level α W = {w,..., w 7 } is a set of hypotheses closed under intersections. H H 2 H 3 w 7 H H 2 H 2 H 3 w 4 w 5 H H3 w 6 w w 2 w 3 H H 2 H 3

. Introduction Dissertation Defense Generalized Weighted Bonferroni Testing /2 α/3 /2 α/3 /2 α/3 H H 2 H 3 /2 /2 /2 mi= α i α 0 g ij, g ii = 0, and m k= g ik for all i, j =,..., m.

. Introduction Dissertation Defense Generalized Weighted Bonferroni Testing (a) α/2 α/2 (b) α H 2 H 3 H 3 mi= α i α 0 g ij, g ii = 0, and m k= g ik for all i, j =,..., m.

2. FWER Control within GO Graphs 2 3 4 5 6 7 8 9 0 2 3 4 5 6 7 8 9 20 2 22 23 24 25 26 27 28 29 30 3 32 33 34 35 36 37 38 39 40 4 42 43 44 45 46 47 48 49 50 5 52 53 54 55 56 57 58 59 60 6 62 63 64 65 66 67 68 69 70 7 72 73 74 75 76 77 78 79 80 8 82 83 84 85 86 87 88 89 90 9 92 93 94 95 96 97 98 99 00 0 02 03 04 05 06 07 08 09 0 2 3 GO:000850 2 GO:0000003 3 GO:000852 4 GO:0009987 5 GO:006265 6 GO:006032 7 GO:0002376 8 GO:00400 9 GO:0050896 0 GO:007840 GO:005704 2 GO:009953 3 GO:002244 4 GO:004428 5 GO:0044238 6 GO:0006807 7 GO:0044237 8 GO:0009058 9 GO:004370 20 GO:0009056 2 GO:0006928 22 GO:00530

2. FWER Control within GO Graphs (Toy) Example GO Graph A B F C D E

2. FWER Control within GO Graphs Focus Level Method (Goeman and Mansmann 2008) Applies a top-down and a bottom-up approach. (a) A (b) A (c) B F B F C E B CDE D CD DE CE CDE A BF CF CDF DF C D E CD DE CE F C D E

2. FWER Control within GO Graphs F C Focus Level Method 2 (c) 3 B 2 CDE α/2 B A A BF 3 2 D 2 3 2 E CF F CDF 2 DF α/2 Short Focus Level C 3 3 B D 2 α A E 2 3 F CD DE CE F C D E

2. FWER Control within GO Graphs Table: Summary of power calculations for Simulation. Mean Node Computation n Method A B F C D E Time (sec) 5 FL 0.447 0.428 0.32 0.42 0.35 0.30 0.42634 SFL 0.447 0.366 0.20 0.092 0.083 0.22 0.00778 20 FL 0.574 0.567 0.80 0.86 0.92 0.79 0.02097 SFL 0.574 0.552 0.78 0.84 0.88 0.79 0.00789 00 FL 0.642 0.635 0.202 0.220 0.207 0.20 0.355848 SFL 0.642 0.623 0.20 0.27 0.204 0.20 0.00793 FL: Focus Level SFL: Short Focus Level

2. FWER Control within GO Graphs Simulation 2 0 02 03 04 05 06 07 08 09 0 2 3 4 (The closure of this graph contains 574 nodes.)

2. FWER Control within GO Graphs Table: Results of the power analysis under Simulation 2. GO:0 GO:02 GO:03 GO:04 GO:06 GO:07 GO:0 GO: GO:3 FL 0.995 0.968 0.890 0.462 0.52 0.872 0.380 0.399 0.344 SFL 0.995 0.988 0.952 0.543 0.837 0.949 0.489 0.476 0.445 FL: Focus Level SFL: Short Focus Level Computation Time FL 3:42:938 SFL 0:00:05

2. FWER Control within GO Graphs (Computation took 3 minutes and 23 seconds. Original graph contained 5,687 nodes.) 2 3 4 5 6 7 8 9 0 2 3 4 5 6 7 8 9 20 2 22 23 24 25 26 27 28 29 30 3 32 33 34 35 36 37 38 39 40 4 42 43 44 45 46 47 48 49 50 5 52 53 54 55 56 57 58 59 60 6 62 63 64 65 66 67 68 69 70 7 72 73 74 75 76 77 78 79 80 8 82 83 84 85 86 87 88 89 90 9 92 93 94 95 96 97 98 99 00 0 02 03 04 05 06 07 08 09 0 2 3 GO:000850 2 GO:0000003 3 GO:000852 4 GO:0009987 5 GO:006265 6 GO:006032 7 GO:0002376 8 GO:00400 9 GO:0050896 0 GO:007840 GO:005704 2 GO:009953 3 GO:002244 4 GO:004428 5 GO:0044238 6 GO:0006807 7 GO:0044237 8 GO:0009058 9 GO:004370 20 GO:0009056 2 GO:0006928 22 GO:00530

3. FWER Control in LD QTL Mapping 2 3 4 5 Want to know if: a QTL exists, the QTL is linked to any markers.

3. FWER Control in LD QTL Mapping 2 3 4 5 H L 0 H L 0 H L 0 H L 0 H L 0 H D 0 H D 0 H D 0 H D 0 H D 0 H0 L : No QTL exists. : QTL is unlinked with marker. H D 0

3. FWER Control in LD QTL Mapping 2 3 4 5 H L 0 H L 0 H L 0 H L 0 H L 0 H D 0 H D 0 H D 0 H D 0 H D 0 L(p, q, D, µ,..., µ G, σ Y, M) = n G ω g Mi (p, q, D)f (Y i µ g, σ) i= g= (D is not identifiable under H L 0 )

3. FWER Control in LD QTL Mapping A α B H L 0 α 0 H D 0 H D 0 Figure: A) Demonstration of the GBA testing scheme for a single marker. B) The updated graph after finding H L 0 significant.

3. FWER Control in LD QTL Mapping α/3 α/3 α/3 H0 L H0 L2 H0 L3 /2 /2 /2 /2 0 0 0 H0 D H0 D2 H0 D3 /2 /2 Figure: Demonstration of the hierarchy of the GBA testing scheme for three markers.

3. FWER Control in LD QTL Mapping α/3 α/3 H D 0 H L2 0 /2 /2 0 /2 /2 H D2 0 /2 α/3 H D3 0 Figure: Demonstration of the GBA testing scheme for three markers assuming that hypotheses H0 L and H L3 0 from the initial graph in Figure 2 are rejected.

3. FWER Control in LD QTL Mapping A α/2 B α H L2 0 0 H D2 0 α/2 H D3 0 H L2 0 0 H D2 0 Figure: A) The updated graph from Figure 3 assuming the hypothesis H0 D of Figure 3 is rejected at the α/3-level. B) Graph resulting from the rejection of the hypothesis H D3 0 at the α/2-level.

3. FWER Control in LD QTL Mapping Conditions under which the GBA simplifies to an IUT (+Holm). Let H0 U denote the union hypothesis HL 0 HD 0. Let P U denote the p-value for the IUT of H0 U. Let k denote the marker with arg min P U i < α/m. Then, mp L k = P L k P D k = m max{pl k, pd k } = mpm k = P M k α where m is number of markers, p L k and pd k are raw p-values for marker k, P k are GBA adjusted p-values for marker k, and P M k denotes the raw IUT p-value for marker k.

3. FWER Control in LD QTL Mapping A.0 H 2 = 0. B.0 H 2 = 0.4 Power 0.9 0.8 0.7 0.6 0.5 0.4 GBA Bonferroni n = 500 n = 300 n = 00 Power 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.3 0.2 0.2 0. 0. 0.0 0.0 0 50 00 500 000 Number of SNPs 0 50 00 500 000 Number of SNPs Figure: Power comparison between the graphical Bonferroni adjustment (GBA) and standard Bonferroni adjustment under different sample size, number of SNPs, and heritability (A: H 2 = 0., B: H 2 = 0.4).

3. FWER Control in LD QTL Mapping PC4 SNP PC4 SNP y 0.05 0.00 0.05 0.0 AA Aa aa AA Aa aa Figure: The control 0.04 of leaf 0.02 shape for 0.00 different 0.02 genotypes 0.04 (AA, Aa, aa) of x the QTL identified by marker on PC 4.

3. FWER Control in LD QTL Mapping log p adjusted 3 0 20 30 2 3 4 5 6 7 8 9 0 2 3 4 5 6 7 8 9 X Chromosome mb Figure: The negative log of the GBA-adjusted p-values for H0 D SNP in the mouse HDL cholesterol QTL mapping project. for each Link

4. QTL Mapping: Hypotheses and Approaches H A : a linked QTL. L(p, q, D, µ,..., µ G, σ Y, M) = H0 2 : an unlinked QTL. L(q, µ,..., µ G, σ Y) = n G ω g Mi (p, q, D)f (Y i µ g, σ). i= g= i= g= () n G ω g (q)f (Y i µ g, σ) (2) H 0 : no QTL. L(µ, σ Y) = n f (Y i µ, σ). (3) i=

4. QTL Mapping: Hypotheses and Approaches Test for association between QTL and phenotype Y. H L 0 : µ = µ 2 = µ 3 µ vs H L : one of the equalities above does not hold.

4. QTL Mapping: Hypotheses and Approaches Test for linkage between SNP and QTL. H0 D : D = 0 vs HD : D 0. (4) χ 2 D = n ˆD 2 ˆp( ˆp)ˆq( ˆq) χ2 (5)

4. QTL Mapping: Hypotheses and Approaches F n (x) 0.0 0.2 0.4 0.6 0.8.0 ν = ν = 0 n=00 n=300 n=500 0 0 20 30 40 LRTS Figure: Likelihood Ratio Test of H 0 against H A for synthetic data simulated under the null hypothesis of no QTL, H 0.

4. QTL Mapping: Hypotheses and Approaches 0.0 0.2 0.4 0.6 0.8.0 ν = ν = 0 no QTL known QTL unlinked QTL 0 0 20 30 40 Figure: The empirical cumulative density functions corresponding to the test of D = 0 for three scenarios.

4. QTL Mapping: Hypotheses and Approaches The (Bivariate) Null Kernel Method Simulate s data sets, each of size n, based on the model assumptions of the joint hypothesis H 0. 2 Calculate T i and U i for i =,..., s. 3 Estimate the joint density ˆf of T and U using a kernel density estimation technique on the T i and U i. 4 Compute the cdf ˆF of ˆf by ˆF(c) = A(c) ˆf, where A(c) = {(t, u) ˆf (t, u) c}. 5 The joint p-value for the calculated statistics ˆt and û can then be obtained by the formula p = ˆF(ˆf (ˆt, û)).

4. QTL Mapping: Hypotheses and Approaches The Null Kernel Method vs. Hotelling s T 2 (bivariate test of location) T 2 = n( X µ 0 ) S ( X µ 0 ) 2(n )/(n 2)F 2,n 2 (6) U statistic 4 2 0 2 4 6 0.00 0.0 0.000 0.05 4 2 0 2 4 T statistic Figure: Visualization of the Null Kernel method as applied to a sample of,000 T and U statistics simulated under the bivariate normal null distribution with zero mean, unit variances, and covariances of 0.3.

4. QTL Mapping: Hypotheses and Approaches The Null Kernel Method vs. Hotelling s T 2 (bivariate test of location) T 2 log0(p) 6 5 4 3 2 0 II III 0 2 4 6 8 I IV Null Kernel log0(p) Figure: Comparison of P-values ( log 0 (p)) obtained from either the Null Kernel method or Hotelling s T 2 test.

4. QTL Mapping: Hypotheses and Approaches The Null Kernel Method vs. Hotelling s T 2 (bivariate test of location) T 2 log0(p) 2.5 2.0.5.0 0.5 II III I IV 0.0 0.0 0.5.0.5 Null Kernel log0(p) Figure: The log 0 of the P-values from the Null Kernel and Hotelling s T 2 methods for data simulated consistent with the null hypothesis.

4. QTL Mapping: Hypotheses and Approaches QTL mapping Simulation Study 40 e 5 χ L 2 30 20 0.00 0.05 0.000 0 0. 0. 0 0 20 40 60 80 χ D 2 Figure: Visualization of the Null Kernel estimated (null) density for the bivariate data corresponding to the test of H D 0, χ2 D, and HL 0, χ2 L.

4. QTL Mapping: Hypotheses and Approaches QTL mapping Simulation Study 0 8 Null Kernel log0(p) 6 4 2 0 0 20 40 60 80 20 40 60 80 20 40 60 80 20 40 60 80 00 Chromosome Chromosome 2 Chromosome 3 Chromosome 4 Figure: The resulting adjusted P-values from each of the permutation, simulation, and theoretical approaches against the results of the Null Kernel method.

4. QTL Mapping: Hypotheses and Approaches Mice HDL QTL mapping study 3 log p adjusted 0 30 2 3 4 5 6 7 8 9 0 2 3 4 5 6 7 8 9 X Chromosome mb Figure: The negative log of the Holm adjusted P-values for the Null Kernel approach.

4. QTL Mapping: Hypotheses and Approaches χ L 2 0 20 40 60 80 00 0 50 00 50 200 250 2 χ D Figure: The joint plot of the observed test statistics for the mouse HDL QTL mapping data.

5. Discussion Dissertation Defense Professional statisticians... bear an obligation to offer alternatives (or entirely new approaches) that meet real needs and are practical as well. (J. W. Tukey, as quoted in Benjamini and Braun 200)

5. Discussion Dissertation Defense Introduction 2 FWER Control within Gene Ontology Graphs Extended GBA methods to Restricted Hypotheses (Theorem ). Introduced the Short Focus Level method (code in: mvgst). Quantified the computational advantage. 3 A Power Improving Multiplicity Correction for Large-Scale SNP Selection in LD Based QTL Mapping Introduced a GBA approach for LD based QTL mapping. Protects model identifiability and strong FWER control. Quantified the power increase numerically and practically. 4 QTL Mapping: Hypotheses and Approaches Detailed problems of χ 2 assumptions in LD based QTL mapping. Introduced the Null Kernel method. Showed power and computational advantages of the NK method. 5 Discussion

Acknowledgements Dissertation Defense This work was supported by Utah Agricultural Experiment Station (UAES) project number UTA0062, associated with the W22 multi-state project Reproductive Performance in Domestic Ruminants Utah State University VPR Research Catalyst Grant.