Supplementary Materials: Meta-analysis of Quantitative Pleiotropic Traits at Gene Level with Multivariate Functional Linear Models

Size: px

Start display at page:

Download "Supplementary Materials: Meta-analysis of Quantitative Pleiotropic Traits at Gene Level with Multivariate Functional Linear Models"

Chrystal Annabel Todd
5 years ago
Views:

1 Supplementary Materials: Meta-analysis of Quantitative Pleiotropic Traits at Gene Level with Multivariate Functional Linear Models Appendix A. Type I Error and Power Simulations and Results A.. Type I Error Simulations and Results The scenarios of simulations are given in Table S.. The results of type I error rates are given in Tables S. and S.4. A.. Power Simulation Parameter Settings The simulation parameters of power calculations are given in Table S.. Appendix B. Information and Extra Results of the Eight European Cohorts For each of the eight European cohorts, we performed analysis for four lipid traits and genes. The information of the genes is given in Table S.5. The sample sizes of each trait are presented in Table S.6. The results of one-trait meta-analysis by likelihood ratio tests (LRT) from Tables and of Fan et al. (05) are presented in Tables S.7 and S.. The results of study-based pleiotropy analysis from Table of Wang et al. (05) are presented in Table S.. Tables S.8, S.9, and S.0 present the results of two-trait meta-analysis of lipid traits in European studies using F -approximations based on Pillai-Bartlett trace. Table S.: Simulation Study Settings. Sample sizes are total sample sizes in each study. Covariates represent covariates in each study. EUR refers to the scenario where all three studies had EUR samples. EUR + AA refers to the scenario where studies and had EUR samples and study had AA samples. z is a binary covariate taking values 0 and each with probability 0.5, and z and z are continuous covariates and distributed as standard normal. Scenario Population Sample Sizes Covariates Study Study Study Study Study Study EUR,600,00,00 (z, z ) (z, z ) (z, z ) EUR,600,00,00 z (z, z ) (z, z, z ) EUR+AA,600,00,00 z (z, z ) (z, z, z )

2 Table S.: Simulation Parameter Settings. The constants c l = (c l, c l, c l ) in β ljk = c lj log 0 (MAF k ) of power simulations, l, j =,,, are given in this table for two cases: () homogeneous genetic effect and () heterogeneous genetic effect. Genetic Effect Study (c l ) Percentage of Causal Variants (c ) Homogeneous (0.475, , ) (0.75,.5, ).50 ) (0.5,.5, ) (c ) (c ) (0.475, , ) (0.475,.5, ) (0.5,.5, ) (0.475, (c Heterogeneous ).5, ) (0.475,.5, ) (0.5,.5, ) +(0.5, 0.5, 0.5) +(0.5, 0.5, 0.5) +(0.5, 0.5, 0.5) (c ) (0.475, , ) (0.475,.5, ) (0.5,.5, ) (0.5, 0.5, 0.5) (0.5, 0.5, 0.5) (0.5, 0.5, 0.5)

3 Table S.: Empirical Type I Error Rates of Approximate F -distributed Test Statistics at Different α Levels Based on. 0 6 Simulated Datasets, When All Variants are Rare. Type of Tests is explained in the main text, i.e., Het-F is approximate F -distributed test statistic if genetic effects are heterogeneous and Hom-F is approximate F -distributed test statistic if genetic effects are homogenous, and Scenario is given in Table S.. The results of Basis of Both GVF and β l (t) were based on smoothing both GVF and genetic effect functions β l (t) of model (4), and the results of Basis of beta-smooth Only were based on smoothing β l (t) only approach of model (7). Abbreviation: GVF = Genetic Variant Function. Traits (y, y ) (y, y, y ) Type Approximate F -distributed Test Statistics Additive of Scenario Level α Basis of both GVF & β l (t) Basis of beta-smooth only Model Tests B-spline Fourier B-spline Fourier (9) Het-F Hom-F Het-F Hom-F X X X X X X X X X X X X X X X X X X X X X X X X

4 Table S.4: Empirical Type I Error Rates of Approximate F -distributed Test Statistics at Different α Levels Based on. 0 6 Simulated Datasets, When Some Variants are Rare and Some are Common. Type of Tests is explained in the main text, i.e., Het-F is approximate F - distributed test statistic if genetic effects are heterogeneous and Hom-F is approximate F -distributed test statistic if genetic effects are homogenous, and Scenario is given in Table S.. The results of Basis of Both GVF and β l (t) were based on smoothing both GVF and genetic effect functions β l (t) of model (4), and the results of Basis of beta-smooth Only were based on smoothing β l (t) only approach of model (7). Abbreviation: GVF = Genetic Variant Function. Traits (y, y ) (y, y, y ) Type Approximate F -distributed Test Statistics Additive of Scenario Level α Basis of both GVF & β l (t) Basis of beta-smooth only Model Tests B-spline Fourier B-spline Fourier (9) Het-F Hom-F Het-F Hom-F X X X X X X X x X X X X X X X X X X X X X X X x 4

5 Table S.5: Summary of Genes and the Number of Genetic Variants in Each Gene Region by Mar. 006 (NCBI6/hg8). The number of variants is the number of genetic variants in a region of Start (-5Kb) End (+5Kb) Positions. The gene region of PCSK9 is (557777, 5504), and (55757, ) is the region in the database. # The length is the length of the region in bp. Gene Chromosome Gene Start (-5Kb) - End (+5Kb) Number of Region Positions (bp) Positions (Length # ) Variants PCSK (457) 74 APOB (5644) IGFBP (900) CDKAL (707946) 560 JAZF (6044) 84 LPL (888) CDKNB (640) 64 CDC (646) 65 IDE (4) 7 KIF (77) 6 HHEX (577) 0 TCF7L (747) 58 KCNQ (449) 660 MTNRB (59) 06 HMGA (58) 4 TSPAN (490) 54 HNFA (765) 7 OASL (8950) 08 FTO (40506) 9 LDLR (54467) 4 APOE (6) 5 GIPR (45) 7 Table S.6: Sample Sizes of the Four Lipid Traits for Each of the Seven Studies. Study HDL LDL TG CHOL Dd DIAGEN Dps DRs EXTRA FUSION Stage METSIM Norway Total

6 Table S.7: One-trait Meta-analysis of Lipid Traits in Eight European Cohorts by Homogeneous Likelihood Ratio Tests (Hom- LRT), Hom-MetaSKAT-O, and Hom-MetaSKAT. The results are from Table of Fan et al. (05) and associations that attain a threshold significance of P <. 0 6 are highlighted by red (Liu et al. 04). 5 The results of Basis of Both GVF and βl(t) were based on smoothing both GVF and genetic effect functions βl(t) of model (4), and the results of Basis of beta-smooth Only were based on smoothing βl(t) only approach of model (7), the results of Additive Model (0) were based on the additive effect model (0), and the p-values of Hom-MetaSKAT and Hom-MetaSKAT-O were based of R package MetaSKAT. Abbreviation: GVF = Genetic Variant Function. Traits Gene P -values of the F -approximation Based on Pillai-Bartlett Trace P -values of Basis of Both GVF and βl(t) Basis of beta-smooth Only Additive Hom-Meta- B-spline Basis Fourier Basis B-spline Basis Fourier Basis Model (0) SKAT SKAT-O HDL LPL LDL APOB APOE LDLR PCSK TG APOE LPL CHOL APOB APOE HNFA LDLR

7 Table S.8: Two-trait Meta-analysis of Lipid Traits in European Studies Using Homogeneous F -approximation (Hom-F) Based on Pillai-Bartlett Trace. The associations that attain a threshold significance of P <. 0 6 are highlighted in red [Liu et al. 04]. 5 The results of Basis of Both GVF and βl(t) were based on smoothing both GVF and genetic effect functions βl(t) of model (4), and the results of Basis of beta-smooth Only were based on smoothing βl(t) only approach of model (7). Abbreviations: GVF = Genetic Variant Function. Traits Gene (HDL, TG) P -values of the Homogeneous F -approximation (Hom-F) Basis of Both GVF and βl(t) Basis of beta-smooth Only B-spline Basis Fourier Basis B-spline Basis Fourier Basis APOE LPL (HDL, LDL) APOB APOE LDLR PCSK (HDL, CHOL) APOB APOE LDLR LPL (TG, CHOL) APOB APOE LDLR LPL (LDL, TG) APOB APOE LDLR LPL PCSK (LDL, CHOL) APOB APOE LDLR PCSK

8 Table S.9: Two-trait Meta-analysis of Lipid Traits in European Studies Using Heterogeneous F -approximation (Het-F) Based on Pillai-Bartlett Trace. The associations that attain a threshold significance of P <. 0 6 are highlighted in red [Liu et al. 04]. 5 The results of Basis of Both GVF and βl(t) were based on smoothing both GVF and genetic effect functions βl(t) of model (4), the results of Basis of beta-smooth Only were based on smoothing βl(t) only approach of model (7), and the results of Additive Model (9) were based on the additive effect model (9). Abbreviations: GVF = Genetic Variant Function. Traits Gene (HDL, TG) P -values of the Heterogeneous F -approximation (Het-F) Basis of Both GVF and βl(t) Basis of beta-smooth Only Additive B-spline Basis Fourier Basis B-spline Basis Fourier Basis Model (9) APOB APOE CDKAL JAZF LPL TSPAN (HDL, LDL) APOB APOE CDKAL CDKNB FTO HNFA JAZF LDLR LPL OASL PCSK TSPAN (HDL, CHOL) APOB APOE CDKAL CDKNB FTO HNFA IDE JAZF LDLR LPL MTNRB OASL PCSK TSPAN

9 Table S.0: Two-trait Meta-analysis of Lipid Traits in European Studies Using Heterogeneous F -approximation (Het-F) Based on Pillai-Bartlett Trace. The associations that attain a threshold significance of P <. 0 6 are highlighted in red [Liu et al. 04]. 5 The results of Basis of Both GVF and βl(t) were based on smoothing both GVF and genetic effect functions βl(t) of model (4), the results of Basis of beta-smooth Only were based on smoothing βl(t) only approach of model (7), and the results of Additive Model (9) were based on the additive effect model (9). Abbreviations: GVF = Genetic Variant Function. Traits Gene (TG, CHOL) P -values of the Heterogeneous F -approximation (Het-F) Basis of Both GVF and βl(t) Basis of beta-smooth Only Additive B-spline Basis Fourier Basis B-spline Basis Fourier Basis Model (9) APOB APOE HNFA LPL PCSK TSPAN (LDL, TG) APOB APOE HNFA LPL OASL PCSK TSPAN (LDL, CHOL) APOB APOE CDKAL FTO HNFA KCNQ LDLR OASL PCSK TSPAN

10 Table S.: One-trait Meta-analysis of Lipid Traits in Eight European Cohorts by Heterogeneous Likelihood Ratio Tests (Het- LRT), Het-MetaSKAT-O, and Het-MetaSKAT. The results are from Table of Fan et al. (05) and associations that attain a threshold significance of P <. 0 6 are highlighted by red (Liu et al. 04). 5 The results of Basis of Both GVF and βl(t) were based on smoothing both GVF and genetic effect functions βl(t) of model (4), and the results of Basis of beta-smooth Only were based on smoothing βl(t) only approach of model (7), the results of Additive Model (9) were based on the additive effect model (9), and the p-values of Het-MetaSKAT and Het-MetaSKAT-O were based of R package MetaSKAT. Traits Gene LDL P -values of the Het-LRT P -values of Basis of Both GVF and βl(t) Basis of beta-smooth Only Additive Het-Meta- B-spline Basis Fourier Basis B-spline Basis Fourier Basis Model (9) SKAT SKAT-O APOB APOE CDC CDKAL CDKNB FTO HNFA LDLR OASL PCSK TSPAN TG LPL CHOL APOB APOE CDC CDKAL CDKNB FTO HNFA IDE JAZF KIF LDLR MTNRB OASL PCSK TSPAN

11 Table S.: Study-based Pleiotropy Analysis of Lipid Traits in 5 European Studies in the Regions of APOE and LDLR Genes Using the F -approximation Based on Pillai-Bartlett Trace. The results are from Table of Wang et al. (05) and associations that attain a threshold significance of P <. 0 6 are highlighted in red [Liu et al. 04]. 5 Abbreviations: GVF = Genetic Variant Function, and FPCA = functional principal component analysis. Study Gene Traits Dd-007 APOE P -values of the F -approximation Based on Pillai-Bartlett Trace P -values Basis of both GVF and βl(t) Basis of beta-smooth Only Additive of FPCA B-sp Basis Fourier Basis B-sp Basis Fourier Basis Model SKAT-O LDL CHOL LDL, CHOL X FUSION LDL Stage APOE CHOL LDL,CHOL X Norway APOE DIAGEN APOE METSIM APOE LDLR LDL TG CHOL LDL,TG X LDL,CHOL X TG,CHOL X LDL,TG,CHOL X LDL TG CHOL LDL,TG X LDL,CHOL X TG,CHOL X LDL,TG,CHOL X LDL TG CHOL LDL,TG X LDL,CHOL X LDL,TG,CHOL X LDL CHOL LDL,CHOL X

Genotype Imputation. Biostatistics 666

Genotype Imputation. Biostatistics 666 Genotype Imputation Biostatistics 666 Previously Hidden Markov Models for Relative Pairs Linkage analysis using affected sibling pairs Estimation of pairwise relationships Identity-by-Descent Relatives