Genetic Studies of Multivariate Traits

Size: px
Start display at page:

Download "Genetic Studies of Multivariate Traits"

Transcription

1 Genetic Studies of Multivariate Traits Heping Zhang Collaborative Center for Statistics in Science Yale University School of Public Health Presented at DIMACS, University of Rutgers May 17, 2013 Heping Zhang (C 2 S 2, Yale University) DIMACS 1 / 50

2 Outline 1 Background Comorbidity: Definition and Mechanisms Data and Study Design Challenge 2 Association Test Generalized Kendall s Tau Maximum Weighted Test over Grids 3 Data Analyses WTCCC Bipolar Disorder Data COGA Family Data 4 Conclusions and Acknowledgment Method Data Analysis Acknowledgment References Heping Zhang (C 2 S 2, Yale University) DIMACS 2 / 50

3 Outline 1 Background Comorbidity: Definition and Mechanisms Data and Study Design Challenge 2 Association Test Generalized Kendall s Tau Maximum Weighted Test over Grids 3 Data Analyses WTCCC Bipolar Disorder Data COGA Family Data 4 Conclusions and Acknowledgment Method Data Analysis Acknowledgment References Heping Zhang (C 2 S 2, Yale University) DIMACS 3 / 50

4 Comorbidity Multiple disorders or illnesses occur in the same person, simultaneously or sequentially Source: Heping Zhang (C 2 S 2, Yale University) DIMACS 4 / 50

5 Comorbidity Source: aasets.lifehack.org Heping Zhang (C 2 S 2, Yale University) DIMACS 5 / 50

6 Comorbidity Is there a relationship between childhood ADHD and later drug abuse? See page 2. from the director: Comorbidity is a topic that our stakeholders patients, family members, health care professionals, and others frequently ask about. It is also a topic about which we have insufficient information, so it remains a research priority for NIDA. This Research Report provides information on the state of the science in this area. Although a variety of diseases commonly co-occur with drug abuse and addiction (e.g., HIV, hepatitis C, cancer, cardiovascular disease), this report focuses only on the comorbidity of drug use disorders and other mental illnesses.* To help explain this comorbidity, we need to first recognize that drug addiction is a mental illness. It is a complex brain disease characterized by compulsive, at times uncontrollable drug craving, seeking, and use despite devastating consequences behaviors that stem from drug-induced changes in brain structure and function. These changes occur in some of the same brain areas that are disrupted in other mental disorders, such as depression, anxiety, or schizophrenia. It is therefore not surprising that population surveys show a high rate of co-occurrence, or comorbidity, between drug addiction and other mental illnesses. While we cannot always prove a connection or causality, we do know that certain mental disorders are established risk factors for subsequent drug abuse and vice versa. It is often difficult to disentangle the overlapping symptoms of drug addiction and other mental illnesses, making diagnosis and treatment complex. Correct diagnosis is critical to ensuring appropriate and effective treatment. Ignorance of or failure to treat a comorbid disorder can jeopardize a patient s chance of recovery. We hope that our enhanced understanding of the common genetic, environmental, and neural bases of these disorders and the dissemination of this information will lead to improved treatments for comorbidity and will diminish the social stigma that makes patients reluctant to seek the treatment they need. Nora D. Volkow, M.D. Director National Institute on Drug Abuse Comorbidity: Addiction and Other Mental Illnesses What Is Comorbidity? W hen two disorders or illnesses occur in the same person, simultaneously or sequentially, they are described as comorbid. Comorbidity also implies interactions between the illnesses that affect the course and prognosis of both. continued inside *Since the focus of this report is on comorbid drug use disorders and other mental illnesses, the terms mental illness and mental disorders will refer here to disorders other than substance use disorders, such as depression, schizophrenia, anxiety, and mania. The terms dual diagnosis, mentally ill chemical abuser, and co-occurrence are also used to refer to drug use disorders that are comorbid with other mental illnesses. Dr. Volkow, Director, NIDA: Comorbidity is a topic that our stakeholders patients, family members, health care professionals, and others frequently ask about. It is also a topic about which we have insufficient information, so it remains a research priority for NIDA. Source: Published 12/2008. Revised 9/2010. Heping Zhang (C 2 S 2, Yale University) DIMACS 6 / 50

7 Common Etiology for Comorbidity { mental disorder Common etiology drug use disorder "Overlapping genetic vulnerabilities: Mounting evidence suggests that common genetic factors may predispose individuals to both mental disorders and addiction or to having a greater risk of the second disorder once the first appears." "Overlapping environmental triggers: Stress, trauma (e.g., physical or sexual abuse), and early exposure to drugs are common factors that can lead to addiction and to mental illness, particularly in those with underlying genetic vulnerabilities." Source: Heping Zhang (C 2 S 2, Yale University) DIMACS 7 / 50

8 Disorders, Genes and Covariates Covariates: interact or confound genetic effects Failure to account for covariates: bias or reduced power Null hypothesis: no association between marker alleles and any linked locus that influences traits conditional on covariates Heping Zhang (C 2 S 2, Yale University) DIMACS 8 / 50

9 Outline 1 Background Comorbidity: Definition and Mechanisms Data and Study Design Challenge 2 Association Test Generalized Kendall s Tau Maximum Weighted Test over Grids 3 Data Analyses WTCCC Bipolar Disorder Data COGA Family Data 4 Conclusions and Acknowledgment Method Data Analysis Acknowledgment References Heping Zhang (C 2 S 2, Yale University) DIMACS 9 / 50

10 An Example Trait Questionnaire Fagerstrom Test for Nicotine Dependence Heping Zhang (C 2 S 2, Yale University) DIMACS 10 / 50

11 Genotypes and Covariates Source: en.wikipedia.org; 2010.census.gov Heping Zhang (C 2 S 2, Yale University) DIMACS 11 / 50

12 Study Design Population-based studies Family-based studies Heping Zhang (C 2 S 2, Yale University) DIMACS 12 / 50

13 Outline 1 Background Comorbidity: Definition and Mechanisms Data and Study Design Challenge 2 Association Test Generalized Kendall s Tau Maximum Weighted Test over Grids 3 Data Analyses WTCCC Bipolar Disorder Data COGA Family Data 4 Conclusions and Acknowledgment Method Data Analysis Acknowledgment References Heping Zhang (C 2 S 2, Yale University) DIMACS 13 / 50

14 Multivariate Distributions t n t! a n t,a! a ˆp nt,a t,a exp { 1 2 (x µ) Σ 1 (x µ)} (2π) n Σ Heping Zhang (C 2 S 2, Yale University) DIMACS 14 / 50

15 Challenge How do we model a hybrid of continuous, ordinal, and/or binary responses??? Heping Zhang (C 2 S 2, Yale University) DIMACS 15 / 50

16 Outline 1 Background Comorbidity: Definition and Mechanisms Data and Study Design Challenge 2 Association Test Generalized Kendall s Tau Maximum Weighted Test over Grids 3 Data Analyses WTCCC Bipolar Disorder Data COGA Family Data 4 Conclusions and Acknowledgment Method Data Analysis Acknowledgment References Heping Zhang (C 2 S 2, Yale University) DIMACS 16 / 50

17 Notation and Hypothesis n study subjects, from a population-based study or family-based study For each subject: A vector of traits T = (T (1),..., T (p) ) Marker genotype M Parental marker genotypes M pa (only available in a family-based study) A vector of covariates Z = (Z (1),..., Z (l) ) Null hypothesis: no association between marker alleles and any linked locus that influences traits T, conditional on Z Heping Zhang (C 2 S 2, Yale University) DIMACS 17 / 50

18 Kendall s Tau A nonparametric statistic measuring the rank correlation between two variables Pairs of observations: {(X i, Y i ) : i = 1,..., n} (X i, Y i ) and (X j, Y j ): Concordant, if X i X j and Y i Y j have the same sign Disconcordant, if X i X j and Y i Y j have the different sign Kendall s tau: τ = 2(A B)/{n(n 1)} A and B: numbers of concordant and disconcordant pairs Or τ = ( ) n 1 sign{(x i X j )(Y i Y j )} 2 i<j Heping Zhang (C 2 S 2, Yale University) DIMACS 18 / 50

19 Generalized Kendall s Tau F ij = {f 1 (T (1) i T (1) j ),..., f p (T (p) i T (p) j )} f k ( ): identity function for a quantitative or binary trait f k ( ): sign function for an ordinal trait D ij = C i C j. C: number of any chosen allele in marker genotype M Genaralized Kendall s tau (Zhang, Liu and Wang, 2010): U = ( ) n 1 D ij F ij 2 i<j Special cases: FBAT-GEE (Lange et al. 2003) Test for a single ordinal trait (Wang, Ye and Zhang, 2006) Heping Zhang (C 2 S 2, Yale University) DIMACS 19 / 50

20 Weighted Test A weight function w(z i, Z j ) imposes a relatively large weight when Z i is close to Z j, and a relatively small weight when Z i and Z j are far away Weighted U-statistic: S = Weighted test statistic: ( ) n 1 D ij F ij w(z i, Z j ) 2 i<j χ 2 τ = {S E 0 (S)} Var 1 0 (S){S E 0(S)} Heping Zhang (C 2 S 2, Yale University) DIMACS 20 / 50

21 Weight Function I: Distance Write Z = (Z co, Z ca ), with Z co for the continuous covariates and Z ca for the categorical covariates For example, w(z i, Z j ; h, q) = W h ( Z co i Z co j )W q {I(Z ca i Z ca j )} W h (u) = exp( u 2 /2h 2 ), h > 0, W q (v) = (1 q)i(v = 0) + qi(v = 1), 0 q 0.5 Weighted U-statistic (called fixed-(h, q) U-statistic): S(h, q) = ( ) n 1 D ij F ij w(z i, Z j ; h, q) 2 i<j Heping Zhang (C 2 S 2, Yale University) DIMACS 21 / 50

22 Weight Function II: Propensity Score Propensity score: probability of a unit being assigned to a particular treatment given a set of covariates Causal effect analysis: match subjects according to their propensity scores (Rosenbaum and Rubin, 1984) Genomic propensity score: p(z) = {p 1 (z), p 2 (z)}, p c (z) = P(C = c Z = z) Genetic association analysis: match subjects according to their genomic propensity scores Weight function: w(z i, Z j ) = W h { p(z i ) p(z j ) }, with W h (u) = exp( u 2 /2h 2 ), h > 0 Heping Zhang (C 2 S 2, Yale University) DIMACS 22 / 50

23 Asymptotic Distribution: Null Hypothesis When n, Var 1/2 D 0 {S(h, q)}[s(h, q) E 0 {S(h, q)}] N(0, I p ) Fixed-(h, q) test statistic: Mean and variance: χ 2 τ (h, q) D χ 2 p E 0 {S(h, q)} = Var 0 {S(h, q)} = 2 n 1 n i=1 4 (n 1) 2 ū i E 0 (C i M pa i, Z i ), n i=1 n i=1 ū i ū jcov 0 (C i, C j M pa i, M pa j, Z i, Z j ), with ū i = n 1 n j=1 F ijw(z i, Z j ; h, q) Heping Zhang (C 2 S 2, Yale University) DIMACS 23 / 50

24 Asymptotic Distribution: Power Under the alternative hypothesis, χ 2 τ (h, q) p e i χ 2 1(φ i ) i=1 µ = µ 1 µ 0 E 1 {S(h, q)} E 0 {S(h, q)} Σ 0 = Var 0 {S(h, q)} Σ 1 = Var 1 {S(h, q)} e 1 e p 0: eigenvalues of Σ 1/2 1 Σ 1 0 Σ1/2 1 φ i = µ 2 i µ i : ith component of µ = QΣ 1/2 1 µ Q: an orthonormal matrix, QΣ 1/2 1 Σ 1 0 Σ1/2 1 Q = diag(e 1,..., e p ) Heping Zhang (C 2 S 2, Yale University) DIMACS 24 / 50

25 Factors Determining the Power The conditional power P: P = P { p i=1 e iχ 2 1 (φ i) q χ 2 p (1 α) } Taking a family-based study as an example, µ 1 = Σ 1 = 2 n 1 n i=1 4 (n 1) 2 ū i E(C i T i, Z i, M pa i ) n i=1 n j=1 ū i ū jcov(c i, C j T i, T j, Z i, Z j, M pa i, M pa j ) By Bayes theorem, P(C = c T, Z, M pa ) = P(T C=c,Z)P(C=c Mpa ) c P(T C=c,Z)P(C=c M pa ) Penetrance: P(T C = c, Z) Allele frequency: P(C = c M pa ) Heping Zhang (C 2 S 2, Yale University) DIMACS 25 / 50

26 Power Approximation Using the result from Liu et al. (2009), we have Theorem P P{χ 2 l (ν) q }, where l, ν, and q depend on µ 1 and Σ 1. Heping Zhang (C 2 S 2, Yale University) DIMACS 26 / 50

27 Outline 1 Background Comorbidity: Definition and Mechanisms Data and Study Design Challenge 2 Association Test Generalized Kendall s Tau Maximum Weighted Test over Grids 3 Data Analyses WTCCC Bipolar Disorder Data COGA Family Data 4 Conclusions and Acknowledgment Method Data Analysis Acknowledgment References Heping Zhang (C 2 S 2, Yale University) DIMACS 27 / 50

28 Maximum Weighted Test Fixed-(h, q) test: how to choose optimal parameters h and q? Choose a grid of h and q values and maximize the weighted test statistic over those choices {h 1,..., h L1 }: pre-specified grid points of h {q 1,..., q L2 }: pre-specified grid points of q χ 2 τ,max = max 1 l 1 L 1,1 l 2 L 2 χ 2 τ (h l1, q l2 ) Approximate the optimal weighting scheme, yielding the strongest association measure Heping Zhang (C 2 S 2, Yale University) DIMACS 28 / 50

29 Resampling Approach An intuitive approach through computation... Population-based studies: restricted permutation in Yu et al. (2010) Family-based studies: children s genotypes solely determined by their parents marker alleles, resample the children s genotype by Mendelian laws Calculate M resampling test statistics χ 2 τ,max,1,..., χ2 τ,max,m using M resampled data Resampling p-value: the proportion of the resampling test statistics that exceed our observed test statistic, i.e., M 1 M m=1 I( χ2 τ,max,m χ 2 τ,max) Computation too intensive! Heping Zhang (C 2 S 2, Yale University) DIMACS 29 / 50

30 Asymptotic Distribution: Joint Distribution An theoretical approach through approximation... Equivalently, χ 2 τ,max = max R l1,l l 1 L 1,1 l 2 L 2 R = Var 1/2 0D (S){S E 0(S)} S = {S (h 1, q 1 ),..., S (h L1, q L2 )} Var 0D (S) = diag[var 0 {S(h 1, q 1 )},..., Var 0 {S(h L1, q L2 )}]: the diagonal blocks of Var 0 (S) Var 1/2 0 (S){S E 0 (S)} D N(0, I pl1 L 2 ) R = Var 1/2 0D (S)Var1/2 0 (S)G, G N(0, I pl1 L 2 ) Heping Zhang (C 2 S 2, Yale University) DIMACS 30 / 50

31 Asymptotic Distribution: Uniform Approximation Theorem Assume that the eigenvalues of Var 0D (S) and Var 0 (S) are uniformly bounded from both above and below, i.e., there exist two positive numbers c and C such that c λ min {Var 0D (S)} λ max {Var 0D (S)} C and c λ min {Var 0 (S)} λ max {Var 0 (S)} C uniformly for all n, where λ min and λ max denote the smallest and largest eigenvalues respectively. Then for any x R, as n, ( ) ( ) sup P χ 2 τ,max x P max R l1,l 2 2 x 0. 1 l 1 L 1,1 l 2 L 2 x R Heping Zhang (C 2 S 2, Yale University) DIMACS 31 / 50

32 Outline 1 Background Comorbidity: Definition and Mechanisms Data and Study Design Challenge 2 Association Test Generalized Kendall s Tau Maximum Weighted Test over Grids 3 Data Analyses WTCCC Bipolar Disorder Data COGA Family Data 4 Conclusions and Acknowledgment Method Data Analysis Acknowledgment References Heping Zhang (C 2 S 2, Yale University) DIMACS 32 / 50

33 WTCCC Bipolar Disorder Data Collected by Wellcome Trust Case-Control Consortium (WTCCC, 2007, Nature) Phenotype: 1998 cases/3004 controls of bipolar disorder Genotype: genotyped by Affymetrix GeneChip 500K arrays Covariates: gender, age at recruitment Our method: weighted test using propensity score approach (h = 1) Methods for comparison: non-weighted test and logistic regression Strong association: p-value < ; moderate association: < p-value < 10 5 Heping Zhang (C 2 S 2, Yale University) DIMACS 33 / 50

34 Manhattan Plot: Comparison of Three Methods Heping Zhang (C 2 S 2, Yale University) DIMACS 34 / 50

35 GWAS Results Chr. SNP Position Non-weighted Weighted Logistic Regression 6 rs e e e-9 16 rs e e e-9 16 rs e e e-6 16 rs e e e-6 16 rs e e e-6 17 rs e e e-7 20 rs e e e-7 20 rs e e e-7 Heping Zhang (C 2 S 2, Yale University) DIMACS 35 / 50

36 Outline 1 Background Comorbidity: Definition and Mechanisms Data and Study Design Challenge 2 Association Test Generalized Kendall s Tau Maximum Weighted Test over Grids 3 Data Analyses WTCCC Bipolar Disorder Data COGA Family Data 4 Conclusions and Acknowledgment Method Data Analysis Acknowledgment References Heping Zhang (C 2 S 2, Yale University) DIMACS 36 / 50

37 Collaborative Studies on Genetics of Alcoholism A large scale study to map alcohol dependence susceptible genes Heping Zhang (C 2 S 2, Yale University) DIMACS 37 / 50

38 COGA Data The data include 143 families with a total of 1,614 individuals Multiple Traits: ALDX1 (the severity of the alcohol dependence): pure unaffected, never drunk, unaffected with some symptoms, and affected MaxDrink (maximum number of drinks in a 24 hour period): 0-9, 10-19, 20-29, and more than 30 drinks TimeDrink (spent so much time drinking, had little time for anything else): no, yes and lasted less than a month, and yes and lasted for one month or longer Genotypes: markers on chromosome 7 Covariates: age at interview and gender Heping Zhang (C 2 S 2, Yale University) DIMACS 38 / 50

39 Results P-values between one of the three traits and markers on Chromosome 7 with covariates unadjusted. Heping Zhang (C 2 S 2, Yale University) DIMACS 39 / 50

40 Results D7S1790 D7S513 D7S1802 D7S629 D7S673 D7S1838 NPY2 D7S817 D7S2846 D7S521 D7S691 D7S478 D7S679 D7S665 D7S1830 D7S3046 D7S1870 D7S1797 D7S820 D7S821 D7S1796 D7S1799 D7S1817 D7S2847 D7S490 D7S1809 log(p value) D7S1804 D7S509 D7S1824 D7S794 D7S1805 adjusted unadjusted Distance (cm) P-values between the three traits and markers on Chromosome 7 with covariates adjusted or not. Heping Zhang (C 2 S 2, Yale University) DIMACS 40 / 50

41 Outline 1 Background Comorbidity: Definition and Mechanisms Data and Study Design Challenge 2 Association Test Generalized Kendall s Tau Maximum Weighted Test over Grids 3 Data Analyses WTCCC Bipolar Disorder Data COGA Family Data 4 Conclusions and Acknowledgment Method Data Analysis Acknowledgment References Heping Zhang (C 2 S 2, Yale University) DIMACS 41 / 50

42 Method Developed a nonparametric weighted test to adjust for covariates that accommodates multiple traits Provided its asymptotic distribution and analytical power calculation Refined the weighted test by proposing the idea of maximum weighting over the grid points of parameters Proposed an asymptotic approach to assessing its significance Heping Zhang (C 2 S 2, Yale University) DIMACS 42 / 50

43 Outline 1 Background Comorbidity: Definition and Mechanisms Data and Study Design Challenge 2 Association Test Generalized Kendall s Tau Maximum Weighted Test over Grids 3 Data Analyses WTCCC Bipolar Disorder Data COGA Family Data 4 Conclusions and Acknowledgment Method Data Analysis Acknowledgment References Heping Zhang (C 2 S 2, Yale University) DIMACS 43 / 50

44 Applications WTCCC bipolar disorder data: not only confirmed the results reported by the WTCCC (2007), but also identified another region at the genome-wide significance level The identified haplotype block is near the RPGRIP1L gene that was reported to be associated with bipolar disorder (O Donovan et al., 2008; Riley et al., 2009) COGA data: confirmed and strengthened the top signal; provided evidences for the advantage of maximum weighted test over non-weighted test Heping Zhang (C 2 S 2, Yale University) DIMACS 44 / 50

45 Outline 1 Background Comorbidity: Definition and Mechanisms Data and Study Design Challenge 2 Association Test Generalized Kendall s Tau Maximum Weighted Test over Grids 3 Data Analyses WTCCC Bipolar Disorder Data COGA Family Data 4 Conclusions and Acknowledgment Method Data Analysis Acknowledgment References Heping Zhang (C 2 S 2, Yale University) DIMACS 45 / 50

46 Acknowledgment Dr. Yuan Jiang, Oregon State University Dr. Ching-Ti Liu, Boston University Dr. Xueqin Wang, Sun Yat-Sen University, China Dr. Wensheng Zhu, Northeastern Normal University, China Heping Zhang (C 2 S 2, Yale University) DIMACS 46 / 50

47 Acknowledgment Supported by grant R01DA from National Institute on Drug Abuse. The SAGE data were obtained from dbgap ( The COGA data were provided by COGA. WTCCC data were provided by Wellcome Trust Case-Control Consortium. The views expressed here are those of the authors. Heping Zhang (C 2 S 2, Yale University) DIMACS 47 / 50

48 Outline 1 Background Comorbidity: Definition and Mechanisms Data and Study Design Challenge 2 Association Test Generalized Kendall s Tau Maximum Weighted Test over Grids 3 Data Analyses WTCCC Bipolar Disorder Data COGA Family Data 4 Conclusions and Acknowledgment Method Data Analysis Acknowledgment References Heping Zhang (C 2 S 2, Yale University) DIMACS 48 / 50

49 Related Readings W. Zhu and H. Zhang (2009) Why do we test multiple traits in genetic association studies? (with discussion) J. Korean Statist. Soc., 38:1-10. H. Zhang, C.-T. Liu and X. Wang (2010) An association test for multiple traits based on the generalized Kendall s tau. J. Amer. Statist. Assoc., 105: X. Chen, K. Cho, B. Singer, and H. Zhang (2011) The nuclear transcription factor PKNOX2 is a candidate gene for substance dependence in European-origin women. PLoS One, 27; 6:e Y. Jiang and H. Zhang (2011) Propensity Score-Based Nonparametric Test Revealing Genetic Variants Underlying Bipolar Disorder. Genet. Epidemiol., 35: H. Zhang (2011) Statistical Analysis in Genetic Studies of Mental Illnesses. Statistical Science, 26: W. Zhu, Y. Jiang, and H. Zhang (2012) Covariate-Adjusted Association Tests and Power Calculations Based on the Generalized Kendall s Tau. J. Amer. Statist. Assoc., 107:1-11. Heping Zhang (C 2 S 2, Yale University) DIMACS 49 / 50

50 For everything we did, there may be a better way!" David Banks (?) Heping Zhang (C 2 S 2, Yale University) DIMACS 50 / 50

Genetics Studies of Multivariate Traits

Genetics Studies of Multivariate Traits Genetics Studies of Multivariate Traits Heping Zhang Department of Epidemiology and Public Health Yale University School of Medicine Presented at Southern Regional Council on Statistics Summer Research

More information

Genetics Studies of Comorbidity

Genetics Studies of Comorbidity Genetics Studies of Comorbidity Heping Zhang Department of Epidemiology and Public Health Yale University School of Medicine Presented at Science at the Edge Michigan State University January 27, 2012

More information

MODEL-FREE LINKAGE AND ASSOCIATION MAPPING OF COMPLEX TRAITS USING QUANTITATIVE ENDOPHENOTYPES

MODEL-FREE LINKAGE AND ASSOCIATION MAPPING OF COMPLEX TRAITS USING QUANTITATIVE ENDOPHENOTYPES MODEL-FREE LINKAGE AND ASSOCIATION MAPPING OF COMPLEX TRAITS USING QUANTITATIVE ENDOPHENOTYPES Saurabh Ghosh Human Genetics Unit Indian Statistical Institute, Kolkata Most common diseases are caused by

More information

Expression QTLs and Mapping of Complex Trait Loci. Paul Schliekelman Statistics Department University of Georgia

Expression QTLs and Mapping of Complex Trait Loci. Paul Schliekelman Statistics Department University of Georgia Expression QTLs and Mapping of Complex Trait Loci Paul Schliekelman Statistics Department University of Georgia Definitions: Genes, Loci and Alleles A gene codes for a protein. Proteins due everything.

More information

1 Springer. Nan M. Laird Christoph Lange. The Fundamentals of Modern Statistical Genetics

1 Springer. Nan M. Laird Christoph Lange. The Fundamentals of Modern Statistical Genetics 1 Springer Nan M. Laird Christoph Lange The Fundamentals of Modern Statistical Genetics 1 Introduction to Statistical Genetics and Background in Molecular Genetics 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

More information

Lecture 7: Interaction Analysis. Summer Institute in Statistical Genetics 2017

Lecture 7: Interaction Analysis. Summer Institute in Statistical Genetics 2017 Lecture 7: Interaction Analysis Timothy Thornton and Michael Wu Summer Institute in Statistical Genetics 2017 1 / 39 Lecture Outline Beyond main SNP effects Introduction to Concept of Statistical Interaction

More information

Association Testing with Quantitative Traits: Common and Rare Variants. Summer Institute in Statistical Genetics 2014 Module 10 Lecture 5

Association Testing with Quantitative Traits: Common and Rare Variants. Summer Institute in Statistical Genetics 2014 Module 10 Lecture 5 Association Testing with Quantitative Traits: Common and Rare Variants Timothy Thornton and Katie Kerr Summer Institute in Statistical Genetics 2014 Module 10 Lecture 5 1 / 41 Introduction to Quantitative

More information

BTRY 4830/6830: Quantitative Genomics and Genetics Fall 2014

BTRY 4830/6830: Quantitative Genomics and Genetics Fall 2014 BTRY 4830/6830: Quantitative Genomics and Genetics Fall 2014 Homework 4 (version 3) - posted October 3 Assigned October 2; Due 11:59PM October 9 Problem 1 (Easy) a. For the genetic regression model: Y

More information

Calculation of IBD probabilities

Calculation of IBD probabilities Calculation of IBD probabilities David Evans and Stacey Cherny University of Oxford Wellcome Trust Centre for Human Genetics This Session IBD vs IBS Why is IBD important? Calculating IBD probabilities

More information

Genotype Imputation. Biostatistics 666

Genotype Imputation. Biostatistics 666 Genotype Imputation Biostatistics 666 Previously Hidden Markov Models for Relative Pairs Linkage analysis using affected sibling pairs Estimation of pairwise relationships Identity-by-Descent Relatives

More information

SNP Association Studies with Case-Parent Trios

SNP Association Studies with Case-Parent Trios SNP Association Studies with Case-Parent Trios Department of Biostatistics Johns Hopkins Bloomberg School of Public Health September 3, 2009 Population-based Association Studies Balding (2006). Nature

More information

Computational Systems Biology: Biology X

Computational Systems Biology: Biology X Bud Mishra Room 1002, 715 Broadway, Courant Institute, NYU, New York, USA L#7:(Mar-23-2010) Genome Wide Association Studies 1 The law of causality... is a relic of a bygone age, surviving, like the monarchy,

More information

Friday Harbor From Genetics to GWAS (Genome-wide Association Study) Sept David Fardo

Friday Harbor From Genetics to GWAS (Genome-wide Association Study) Sept David Fardo Friday Harbor 2017 From Genetics to GWAS (Genome-wide Association Study) Sept 7 2017 David Fardo Purpose: prepare for tomorrow s tutorial Genetic Variants Quality Control Imputation Association Visualization

More information

Previous lecture. Single variant association. Use genome-wide SNPs to account for confounding (population substructure)

Previous lecture. Single variant association. Use genome-wide SNPs to account for confounding (population substructure) Previous lecture Single variant association Use genome-wide SNPs to account for confounding (population substructure) Estimation of effect size and winner s curse Meta-Analysis Today s outline P-value

More information

(Genome-wide) association analysis

(Genome-wide) association analysis (Genome-wide) association analysis 1 Key concepts Mapping QTL by association relies on linkage disequilibrium in the population; LD can be caused by close linkage between a QTL and marker (= good) or by

More information

Theoretical and computational aspects of association tests: application in case-control genome-wide association studies.

Theoretical and computational aspects of association tests: application in case-control genome-wide association studies. Theoretical and computational aspects of association tests: application in case-control genome-wide association studies Mathieu Emily November 18, 2014 Caen mathieu.emily@agrocampus-ouest.fr - Agrocampus

More information

Association studies and regression

Association studies and regression Association studies and regression CM226: Machine Learning for Bioinformatics. Fall 2016 Sriram Sankararaman Acknowledgments: Fei Sha, Ameet Talwalkar Association studies and regression 1 / 104 Administration

More information

SNP-SNP Interactions in Case-Parent Trios

SNP-SNP Interactions in Case-Parent Trios Detection of SNP-SNP Interactions in Case-Parent Trios Department of Biostatistics Johns Hopkins Bloomberg School of Public Health June 2, 2009 Karyotypes http://ghr.nlm.nih.gov/ Single Nucleotide Polymphisms

More information

Case-Control Association Testing. Case-Control Association Testing

Case-Control Association Testing. Case-Control Association Testing Introduction Association mapping is now routinely being used to identify loci that are involved with complex traits. Technological advances have made it feasible to perform case-control association studies

More information

Bayesian Inference of Interactions and Associations

Bayesian Inference of Interactions and Associations Bayesian Inference of Interactions and Associations Jun Liu Department of Statistics Harvard University http://www.fas.harvard.edu/~junliu Based on collaborations with Yu Zhang, Jing Zhang, Yuan Yuan,

More information

Proportional Variance Explained by QLT and Statistical Power. Proportional Variance Explained by QTL and Statistical Power

Proportional Variance Explained by QLT and Statistical Power. Proportional Variance Explained by QTL and Statistical Power Proportional Variance Explained by QTL and Statistical Power Partitioning the Genetic Variance We previously focused on obtaining variance components of a quantitative trait to determine the proportion

More information

STAT331. Cox s Proportional Hazards Model

STAT331. Cox s Proportional Hazards Model STAT331 Cox s Proportional Hazards Model In this unit we introduce Cox s proportional hazards (Cox s PH) model, give a heuristic development of the partial likelihood function, and discuss adaptations

More information

Estimation of Conditional Kendall s Tau for Bivariate Interval Censored Data

Estimation of Conditional Kendall s Tau for Bivariate Interval Censored Data Communications for Statistical Applications and Methods 2015, Vol. 22, No. 6, 599 604 DOI: http://dx.doi.org/10.5351/csam.2015.22.6.599 Print ISSN 2287-7843 / Online ISSN 2383-4757 Estimation of Conditional

More information

25 : Graphical induced structured input/output models

25 : Graphical induced structured input/output models 10-708: Probabilistic Graphical Models 10-708, Spring 2016 25 : Graphical induced structured input/output models Lecturer: Eric P. Xing Scribes: Raied Aljadaany, Shi Zong, Chenchen Zhu Disclaimer: A large

More information

Department of Forensic Psychiatry, School of Medicine & Forensics, Xi'an Jiaotong University, Xi'an, China;

Department of Forensic Psychiatry, School of Medicine & Forensics, Xi'an Jiaotong University, Xi'an, China; Title: Evaluation of genetic susceptibility of common variants in CACNA1D with schizophrenia in Han Chinese Author names and affiliations: Fanglin Guan a,e, Lu Li b, Chuchu Qiao b, Gang Chen b, Tinglin

More information

. Also, in this case, p i = N1 ) T, (2) where. I γ C N(N 2 2 F + N1 2 Q)

. Also, in this case, p i = N1 ) T, (2) where. I γ C N(N 2 2 F + N1 2 Q) Supplementary information S7 Testing for association at imputed SPs puted SPs Score tests A Score Test needs calculations of the observed data score and information matrix only under the null hypothesis,

More information

Unit 14: Nonparametric Statistical Methods

Unit 14: Nonparametric Statistical Methods Unit 14: Nonparametric Statistical Methods Statistics 571: Statistical Methods Ramón V. León 8/8/2003 Unit 14 - Stat 571 - Ramón V. León 1 Introductory Remarks Most methods studied so far have been based

More information

Asymptotic distribution of the largest eigenvalue with application to genetic data

Asymptotic distribution of the largest eigenvalue with application to genetic data Asymptotic distribution of the largest eigenvalue with application to genetic data Chong Wu University of Minnesota September 30, 2016 T32 Journal Club Chong Wu 1 / 25 Table of Contents 1 Background Gene-gene

More information

Lecture 1 Introduction to Multi-level Models

Lecture 1 Introduction to Multi-level Models Lecture 1 Introduction to Multi-level Models Course Website: http://www.biostat.jhsph.edu/~ejohnson/multilevel.htm All lecture materials extracted and further developed from the Multilevel Model course

More information

Lecture 2: Genetic Association Testing with Quantitative Traits. Summer Institute in Statistical Genetics 2017

Lecture 2: Genetic Association Testing with Quantitative Traits. Summer Institute in Statistical Genetics 2017 Lecture 2: Genetic Association Testing with Quantitative Traits Instructors: Timothy Thornton and Michael Wu Summer Institute in Statistical Genetics 2017 1 / 29 Introduction to Quantitative Trait Mapping

More information

On the limiting distribution of the likelihood ratio test in nucleotide mapping of complex disease

On the limiting distribution of the likelihood ratio test in nucleotide mapping of complex disease On the limiting distribution of the likelihood ratio test in nucleotide mapping of complex disease Yuehua Cui 1 and Dong-Yun Kim 2 1 Department of Statistics and Probability, Michigan State University,

More information

Calculation of IBD probabilities

Calculation of IBD probabilities Calculation of IBD probabilities David Evans University of Bristol This Session Identity by Descent (IBD) vs Identity by state (IBS) Why is IBD important? Calculating IBD probabilities Lander-Green Algorithm

More information

Régression en grande dimension et épistasie par blocs pour les études d association

Régression en grande dimension et épistasie par blocs pour les études d association Régression en grande dimension et épistasie par blocs pour les études d association V. Stanislas, C. Dalmasso, C. Ambroise Laboratoire de Mathématiques et Modélisation d Évry "Statistique et Génome" 1

More information

Binomial Mixture Model-based Association Tests under Genetic Heterogeneity

Binomial Mixture Model-based Association Tests under Genetic Heterogeneity Binomial Mixture Model-based Association Tests under Genetic Heterogeneity Hui Zhou, Wei Pan Division of Biostatistics, School of Public Health, University of Minnesota, Minneapolis, MN 55455 April 30,

More information

Lecture 9. QTL Mapping 2: Outbred Populations

Lecture 9. QTL Mapping 2: Outbred Populations Lecture 9 QTL Mapping 2: Outbred Populations Bruce Walsh. Aug 2004. Royal Veterinary and Agricultural University, Denmark The major difference between QTL analysis using inbred-line crosses vs. outbred

More information

Case-Control Association Testing with Related Individuals: A More Powerful Quasi-Likelihood Score Test

Case-Control Association Testing with Related Individuals: A More Powerful Quasi-Likelihood Score Test Case-Control Association esting with Related Individuals: A More Powerful Quasi-Likelihood Score est imothy hornton and Mary Sara McPeek ARICLE We consider the problem of genomewide association testing

More information

Multiple Change-Point Detection and Analysis of Chromosome Copy Number Variations

Multiple Change-Point Detection and Analysis of Chromosome Copy Number Variations Multiple Change-Point Detection and Analysis of Chromosome Copy Number Variations Yale School of Public Health Joint work with Ning Hao, Yue S. Niu presented @Tsinghua University Outline 1 The Problem

More information

Power and sample size calculations for designing rare variant sequencing association studies.

Power and sample size calculations for designing rare variant sequencing association studies. Power and sample size calculations for designing rare variant sequencing association studies. Seunggeun Lee 1, Michael C. Wu 2, Tianxi Cai 1, Yun Li 2,3, Michael Boehnke 4 and Xihong Lin 1 1 Department

More information

Primal-dual Covariate Balance and Minimal Double Robustness via Entropy Balancing

Primal-dual Covariate Balance and Minimal Double Robustness via Entropy Balancing Primal-dual Covariate Balance and Minimal Double Robustness via (Joint work with Daniel Percival) Department of Statistics, Stanford University JSM, August 9, 2015 Outline 1 2 3 1/18 Setting Rubin s causal

More information

HERITABILITY ESTIMATION USING A REGULARIZED REGRESSION APPROACH (HERRA)

HERITABILITY ESTIMATION USING A REGULARIZED REGRESSION APPROACH (HERRA) BIRS 016 1 HERITABILITY ESTIMATION USING A REGULARIZED REGRESSION APPROACH (HERRA) Malka Gorfine, Tel Aviv University, Israel Joint work with Li Hsu, FHCRC, Seattle, USA BIRS 016 The concept of heritability

More information

arxiv: v2 [stat.me] 22 Jun 2015

arxiv: v2 [stat.me] 22 Jun 2015 A WEIGHTED U STATISTIC FOR ASSOCIATION ANALYSES CONSIDERING GENETIC HETEROGENEITY arxiv:1504.08319v2 [stat.me] 22 Jun 2015 By Changshuai Wei,, Robert C. Elston, and Qing Lu, University of North Texas Health

More information

Targeted Maximum Likelihood Estimation in Safety Analysis

Targeted Maximum Likelihood Estimation in Safety Analysis Targeted Maximum Likelihood Estimation in Safety Analysis Sam Lendle 1 Bruce Fireman 2 Mark van der Laan 1 1 UC Berkeley 2 Kaiser Permanente ISPE Advanced Topics Session, Barcelona, August 2012 1 / 35

More information

SUPPLEMENTARY INFORMATION

SUPPLEMENTARY INFORMATION doi:10.1038/nature25973 Power Simulations We performed extensive power simulations to demonstrate that the analyses carried out in our study are well powered. Our simulations indicate very high power for

More information

Modeling IBD for Pairs of Relatives. Biostatistics 666 Lecture 17

Modeling IBD for Pairs of Relatives. Biostatistics 666 Lecture 17 Modeling IBD for Pairs of Relatives Biostatistics 666 Lecture 7 Previously Linkage Analysis of Relative Pairs IBS Methods Compare observed and expected sharing IBD Methods Account for frequency of shared

More information

The Generalized Higher Criticism for Testing SNP-sets in Genetic Association Studies

The Generalized Higher Criticism for Testing SNP-sets in Genetic Association Studies The Generalized Higher Criticism for Testing SNP-sets in Genetic Association Studies Ian Barnett, Rajarshi Mukherjee & Xihong Lin Harvard University ibarnett@hsph.harvard.edu August 5, 2014 Ian Barnett

More information

Statistics in medicine

Statistics in medicine Statistics in medicine Lecture 4: and multivariable regression Fatma Shebl, MD, MS, MPH, PhD Assistant Professor Chronic Disease Epidemiology Department Yale School of Public Health Fatma.shebl@yale.edu

More information

Tutorial Session 2. MCMC for the analysis of genetic data on pedigrees:

Tutorial Session 2. MCMC for the analysis of genetic data on pedigrees: MCMC for the analysis of genetic data on pedigrees: Tutorial Session 2 Elizabeth Thompson University of Washington Genetic mapping and linkage lod scores Monte Carlo likelihood and likelihood ratio estimation

More information

Lecture 6: Introduction to Quantitative genetics. Bruce Walsh lecture notes Liege May 2011 course version 25 May 2011

Lecture 6: Introduction to Quantitative genetics. Bruce Walsh lecture notes Liege May 2011 course version 25 May 2011 Lecture 6: Introduction to Quantitative genetics Bruce Walsh lecture notes Liege May 2011 course version 25 May 2011 Quantitative Genetics The analysis of traits whose variation is determined by both a

More information

Research Statement on Statistics Jun Zhang

Research Statement on Statistics Jun Zhang Research Statement on Statistics Jun Zhang (junzhang@galton.uchicago.edu) My interest on statistics generally includes machine learning and statistical genetics. My recent work focus on detection and interpretation

More information

1 Preliminary Variance component test in GLM Mediation Analysis... 3

1 Preliminary Variance component test in GLM Mediation Analysis... 3 Honglang Wang Depart. of Stat. & Prob. wangho16@msu.edu Omics Data Integration Statistical Genetics/Genomics Journal Club Summary and discussion of Joint Analysis of SNP and Gene Expression Data in Genetic

More information

Variable selection and machine learning methods in causal inference

Variable selection and machine learning methods in causal inference Variable selection and machine learning methods in causal inference Debashis Ghosh Department of Biostatistics and Informatics Colorado School of Public Health Joint work with Yeying Zhu, University of

More information

Powerful multi-locus tests for genetic association in the presence of gene-gene and gene-environment interactions

Powerful multi-locus tests for genetic association in the presence of gene-gene and gene-environment interactions Powerful multi-locus tests for genetic association in the presence of gene-gene and gene-environment interactions Nilanjan Chatterjee, Zeynep Kalaylioglu 2, Roxana Moslehi, Ulrike Peters 3, Sholom Wacholder

More information

The Generalized Higher Criticism for Testing SNP-sets in Genetic Association Studies

The Generalized Higher Criticism for Testing SNP-sets in Genetic Association Studies The Generalized Higher Criticism for Testing SNP-sets in Genetic Association Studies Ian Barnett, Rajarshi Mukherjee & Xihong Lin Harvard University ibarnett@hsph.harvard.edu June 24, 2014 Ian Barnett

More information

Statistics: revision

Statistics: revision NST 1B Experimental Psychology Statistics practical 5 Statistics: revision Rudolf Cardinal & Mike Aitken 29 / 30 April 2004 Department of Experimental Psychology University of Cambridge Handouts: Answers

More information

Asymptotic equivalence of paired Hotelling test and conditional logistic regression

Asymptotic equivalence of paired Hotelling test and conditional logistic regression Asymptotic equivalence of paired Hotelling test and conditional logistic regression Félix Balazard 1,2 arxiv:1610.06774v1 [math.st] 21 Oct 2016 Abstract 1 Sorbonne Universités, UPMC Univ Paris 06, CNRS

More information

Supplementary Information

Supplementary Information Supplementary Information 1 Supplementary Figures (a) Statistical power (p = 2.6 10 8 ) (b) Statistical power (p = 4.0 10 6 ) Supplementary Figure 1: Statistical power comparison between GEMMA (red) and

More information

Constrained Maximum Likelihood Estimation for Model Calibration Using Summary-level Information from External Big Data Sources

Constrained Maximum Likelihood Estimation for Model Calibration Using Summary-level Information from External Big Data Sources Constrained Maximum Likelihood Estimation for Model Calibration Using Summary-level Information from External Big Data Sources Yi-Hau Chen Institute of Statistical Science, Academia Sinica Joint with Nilanjan

More information

Methods for Cryptic Structure. Methods for Cryptic Structure

Methods for Cryptic Structure. Methods for Cryptic Structure Case-Control Association Testing Review Consider testing for association between a disease and a genetic marker Idea is to look for an association by comparing allele/genotype frequencies between the cases

More information

Hybrid CPU/GPU Acceleration of Detection of 2-SNP Epistatic Interactions in GWAS

Hybrid CPU/GPU Acceleration of Detection of 2-SNP Epistatic Interactions in GWAS Hybrid CPU/GPU Acceleration of Detection of 2-SNP Epistatic Interactions in GWAS Jorge González-Domínguez*, Bertil Schmidt*, Jan C. Kässens**, Lars Wienbrandt** *Parallel and Distributed Architectures

More information

Lecture 1: Case-Control Association Testing. Summer Institute in Statistical Genetics 2015

Lecture 1: Case-Control Association Testing. Summer Institute in Statistical Genetics 2015 Timothy Thornton and Michael Wu Summer Institute in Statistical Genetics 2015 1 / 1 Introduction Association mapping is now routinely being used to identify loci that are involved with complex traits.

More information

Quantitative Genomics and Genetics BTRY 4830/6830; PBSB

Quantitative Genomics and Genetics BTRY 4830/6830; PBSB Quantitative Genomics and Genetics BTRY 4830/6830; PBSB.5201.01 Lecture 20: Epistasis and Alternative Tests in GWAS Jason Mezey jgm45@cornell.edu April 16, 2016 (Th) 8:40-9:55 None Announcements Summary

More information

Package LBLGXE. R topics documented: July 20, Type Package

Package LBLGXE. R topics documented: July 20, Type Package Type Package Package LBLGXE July 20, 2015 Title Bayesian Lasso for detecting Rare (or Common) Haplotype Association and their interactions with Environmental Covariates Version 1.2 Date 2015-07-09 Author

More information

Population Genetics. with implications for Linkage Disequilibrium. Chiara Sabatti, Human Genetics 6357a Gonda

Population Genetics. with implications for Linkage Disequilibrium. Chiara Sabatti, Human Genetics 6357a Gonda 1 Population Genetics with implications for Linkage Disequilibrium Chiara Sabatti, Human Genetics 6357a Gonda csabatti@mednet.ucla.edu 2 Hardy-Weinberg Hypotheses: infinite populations; no inbreeding;

More information

On Measurement Error Problems with Predictors Derived from Stationary Stochastic Processes and Application to Cocaine Dependence Treatment Data

On Measurement Error Problems with Predictors Derived from Stationary Stochastic Processes and Application to Cocaine Dependence Treatment Data On Measurement Error Problems with Predictors Derived from Stationary Stochastic Processes and Application to Cocaine Dependence Treatment Data Yehua Li Department of Statistics University of Georgia Yongtao

More information

NIH Public Access Author Manuscript Stat Sin. Author manuscript; available in PMC 2013 August 15.

NIH Public Access Author Manuscript Stat Sin. Author manuscript; available in PMC 2013 August 15. NIH Public Access Author Manuscript Published in final edited form as: Stat Sin. 2012 ; 22: 1041 1074. ON MODEL SELECTION STRATEGIES TO IDENTIFY GENES UNDERLYING BINARY TRAITS USING GENOME-WIDE ASSOCIATION

More information

Rerandomization to Balance Covariates

Rerandomization to Balance Covariates Rerandomization to Balance Covariates Kari Lock Morgan Department of Statistics Penn State University Joint work with Don Rubin University of Minnesota Biostatistics 4/27/16 The Gold Standard Randomized

More information

Section 9c. Propensity scores. Controlling for bias & confounding in observational studies

Section 9c. Propensity scores. Controlling for bias & confounding in observational studies Section 9c Propensity scores Controlling for bias & confounding in observational studies 1 Logistic regression and propensity scores Consider comparing an outcome in two treatment groups: A vs B. In a

More information

Variance Component Models for Quantitative Traits. Biostatistics 666

Variance Component Models for Quantitative Traits. Biostatistics 666 Variance Component Models for Quantitative Traits Biostatistics 666 Today Analysis of quantitative traits Modeling covariance for pairs of individuals estimating heritability Extending the model beyond

More information

Probability of Detecting Disease-Associated SNPs in Case-Control Genome-Wide Association Studies

Probability of Detecting Disease-Associated SNPs in Case-Control Genome-Wide Association Studies Probability of Detecting Disease-Associated SNPs in Case-Control Genome-Wide Association Studies Ruth Pfeiffer, Ph.D. Mitchell Gail Biostatistics Branch Division of Cancer Epidemiology&Genetics National

More information

Confounding, mediation and colliding

Confounding, mediation and colliding Confounding, mediation and colliding What types of shared covariates does the sibling comparison design control for? Arvid Sjölander and Johan Zetterqvist Causal effects and confounding A common aim of

More information

More Statistics tutorial at Logistic Regression and the new:

More Statistics tutorial at  Logistic Regression and the new: Logistic Regression and the new: Residual Logistic Regression 1 Outline 1. Logistic Regression 2. Confounding Variables 3. Controlling for Confounding Variables 4. Residual Linear Regression 5. Residual

More information

NETWORK BIOLOGY AND COMPLEX DISEASES. Ahto Salumets

NETWORK BIOLOGY AND COMPLEX DISEASES. Ahto Salumets NETWORK BIOLOGY AND COMPLEX DISEASES Ahto Salumets CENTRAL DOGMA OF BIOLOGY https://en.wikipedia.org/wiki/central_dogma_of_molecular_biology http://www.qaraqalpaq.com/genetics.html CHROMOSOMES SINGLE-NUCLEOTIDE

More information

Dynamics in Social Networks and Causality

Dynamics in Social Networks and Causality Web Science & Technologies University of Koblenz Landau, Germany Dynamics in Social Networks and Causality JProf. Dr. University Koblenz Landau GESIS Leibniz Institute for the Social Sciences Last Time:

More information

Introduction to QTL mapping in model organisms

Introduction to QTL mapping in model organisms Human vs mouse Introduction to QTL mapping in model organisms Karl W Broman Department of Biostatistics Johns Hopkins University www.biostat.jhsph.edu/~kbroman [ Teaching Miscellaneous lectures] www.daviddeen.com

More information

Gene mapping in model organisms

Gene mapping in model organisms Gene mapping in model organisms Karl W Broman Department of Biostatistics Johns Hopkins University http://www.biostat.jhsph.edu/~kbroman Goal Identify genes that contribute to common human diseases. 2

More information

Some New Methods for Family-Based Association Studies

Some New Methods for Family-Based Association Studies Some New Methods for Family-Based Association Studies Ingo Ruczinski Department of Biostatistics Johns Hopkins Bloomberg School of Public Health April 8, 20 http: //biostat.jhsph.edu/ iruczins/ Topics

More information

Lecture 9: Kernel (Variance Component) Tests and Omnibus Tests for Rare Variants. Summer Institute in Statistical Genetics 2017

Lecture 9: Kernel (Variance Component) Tests and Omnibus Tests for Rare Variants. Summer Institute in Statistical Genetics 2017 Lecture 9: Kernel (Variance Component) Tests and Omnibus Tests for Rare Variants Timothy Thornton and Michael Wu Summer Institute in Statistical Genetics 2017 1 / 46 Lecture Overview 1. Variance Component

More information

DATA-ADAPTIVE VARIABLE SELECTION FOR

DATA-ADAPTIVE VARIABLE SELECTION FOR DATA-ADAPTIVE VARIABLE SELECTION FOR CAUSAL INFERENCE Group Health Research Institute Department of Biostatistics, University of Washington shortreed.s@ghc.org joint work with Ashkan Ertefaie Department

More information

Cover Page. The handle holds various files of this Leiden University dissertation

Cover Page. The handle   holds various files of this Leiden University dissertation Cover Page The handle http://hdl.handle.net/1887/35195 holds various files of this Leiden University dissertation Author: Balliu, Brunilda Title: Statistical methods for genetic association studies with

More information

Bayesian construction of perceptrons to predict phenotypes from 584K SNP data.

Bayesian construction of perceptrons to predict phenotypes from 584K SNP data. Bayesian construction of perceptrons to predict phenotypes from 584K SNP data. Luc Janss, Bert Kappen Radboud University Nijmegen Medical Centre Donders Institute for Neuroscience Introduction Genetic

More information

Nature Genetics: doi: /ng Supplementary Figure 1. Number of cases and proxy cases required to detect association at designs.

Nature Genetics: doi: /ng Supplementary Figure 1. Number of cases and proxy cases required to detect association at designs. Supplementary Figure 1 Number of cases and proxy cases required to detect association at designs. = 5 10 8 for case control and proxy case control The ratio of controls to cases (or proxy cases) is 1.

More information

Statistical Methods for Alzheimer s Disease Studies

Statistical Methods for Alzheimer s Disease Studies Statistical Methods for Alzheimer s Disease Studies Rebecca A. Betensky, Ph.D. Department of Biostatistics, Harvard T.H. Chan School of Public Health July 19, 2016 1/37 OUTLINE 1 Statistical collaborations

More information

Two-sample testing with high-dimensional genetic and neuroimaging data

Two-sample testing with high-dimensional genetic and neuroimaging data Two-sample testing with high-dimensional genetic and neuroimaging data Wei Pan Division of Biostatistics, School of Public Health, University of Minnesota, Minneapolis, MN 55455 Nov 4, 2016 University

More information

Review We have covered so far: Single variant association analysis and effect size estimation GxE interaction and higher order >2 interaction Measurement error in dietary variables (nutritional epidemiology)

More information

EM algorithm. Rather than jumping into the details of the particular EM algorithm, we ll look at a simpler example to get the idea of how it works

EM algorithm. Rather than jumping into the details of the particular EM algorithm, we ll look at a simpler example to get the idea of how it works EM algorithm The example in the book for doing the EM algorithm is rather difficult, and was not available in software at the time that the authors wrote the book, but they implemented a SAS macro to implement

More information

Testing for group differences in brain functional connectivity

Testing for group differences in brain functional connectivity Testing for group differences in brain functional connectivity Junghi Kim, Wei Pan, for ADNI Division of Biostatistics, School of Public Health, University of Minnesota, Minneapolis, MN 55455 Banff Feb

More information

Some models of genomic selection

Some models of genomic selection Munich, December 2013 What is the talk about? Barley! Steptoe x Morex barley mapping population Steptoe x Morex barley mapping population genotyping from Close at al., 2009 and phenotyping from cite http://wheat.pw.usda.gov/ggpages/sxm/

More information

Test for interactions between a genetic marker set and environment in generalized linear models Supplementary Materials

Test for interactions between a genetic marker set and environment in generalized linear models Supplementary Materials Biostatistics (2013), pp. 1 31 doi:10.1093/biostatistics/kxt006 Test for interactions between a genetic marker set and environment in generalized linear models Supplementary Materials XINYI LIN, SEUNGGUEN

More information

Quantitative Genomics and Genetics BTRY 4830/6830; PBSB

Quantitative Genomics and Genetics BTRY 4830/6830; PBSB Quantitative Genomics and Genetics BTRY 4830/6830; PBSB.5201.01 Lecture16: Population structure and logistic regression I Jason Mezey jgm45@cornell.edu April 11, 2017 (T) 8:40-9:55 Announcements I April

More information

Model-Free Knockoffs: High-Dimensional Variable Selection that Controls the False Discovery Rate

Model-Free Knockoffs: High-Dimensional Variable Selection that Controls the False Discovery Rate Model-Free Knockoffs: High-Dimensional Variable Selection that Controls the False Discovery Rate Lucas Janson, Stanford Department of Statistics WADAPT Workshop, NIPS, December 2016 Collaborators: Emmanuel

More information

Statistical Applications in Genetics and Molecular Biology

Statistical Applications in Genetics and Molecular Biology Statistical Applications in Genetics and Molecular Biology Volume 6, Issue 1 2007 Article 28 A Comparison of Methods to Control Type I Errors in Microarray Studies Jinsong Chen Mark J. van der Laan Martyn

More information

TA: Sheng Zhgang (Th 1:20) / 342 (W 1:20) / 343 (W 2:25) / 344 (W 12:05) Haoyang Fan (W 1:20) / 346 (Th 12:05) FINAL EXAM

TA: Sheng Zhgang (Th 1:20) / 342 (W 1:20) / 343 (W 2:25) / 344 (W 12:05) Haoyang Fan (W 1:20) / 346 (Th 12:05) FINAL EXAM STAT 301, Fall 2011 Name Lec 4: Ismor Fischer Discussion Section: Please circle one! TA: Sheng Zhgang... 341 (Th 1:20) / 342 (W 1:20) / 343 (W 2:25) / 344 (W 12:05) Haoyang Fan... 345 (W 1:20) / 346 (Th

More information

Nature Genetics: doi: /ng Supplementary Figure 1. The hypotheses being tested by our two approaches.

Nature Genetics: doi: /ng Supplementary Figure 1. The hypotheses being tested by our two approaches. Supplementary Figure 1 The hypotheses being tested by our two approaches. (a) The hypotheses being tested by the Bayesian approach are represented as collections of configurations. Each configuration is

More information

STA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis. 1. Indicate whether each of the following is true (T) or false (F).

STA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis. 1. Indicate whether each of the following is true (T) or false (F). STA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis 1. Indicate whether each of the following is true (T) or false (F). (a) (b) (c) (d) (e) In 2 2 tables, statistical independence is equivalent

More information

Affected Sibling Pairs. Biostatistics 666

Affected Sibling Pairs. Biostatistics 666 Affected Sibling airs Biostatistics 666 Today Discussion of linkage analysis using affected sibling pairs Our exploration will include several components we have seen before: A simple disease model IBD

More information

NONPARAMETRIC ADJUSTMENT FOR MEASUREMENT ERROR IN TIME TO EVENT DATA: APPLICATION TO RISK PREDICTION MODELS

NONPARAMETRIC ADJUSTMENT FOR MEASUREMENT ERROR IN TIME TO EVENT DATA: APPLICATION TO RISK PREDICTION MODELS BIRS 2016 1 NONPARAMETRIC ADJUSTMENT FOR MEASUREMENT ERROR IN TIME TO EVENT DATA: APPLICATION TO RISK PREDICTION MODELS Malka Gorfine Tel Aviv University, Israel Joint work with Danielle Braun and Giovanni

More information

Causal Model Selection Hypothesis Tests in Systems Genetics

Causal Model Selection Hypothesis Tests in Systems Genetics 1 Causal Model Selection Hypothesis Tests in Systems Genetics Elias Chaibub Neto and Brian S Yandell SISG 2012 July 13, 2012 2 Correlation and Causation The old view of cause and effect... could only fail;

More information

Classification 1: Linear regression of indicators, linear discriminant analysis

Classification 1: Linear regression of indicators, linear discriminant analysis Classification 1: Linear regression of indicators, linear discriminant analysis Ryan Tibshirani Data Mining: 36-462/36-662 April 2 2013 Optional reading: ISL 4.1, 4.2, 4.4, ESL 4.1 4.3 1 Classification

More information

NATIONAL UNIVERSITY OF SINGAPORE EXAMINATION (SOLUTIONS) ST3241 Categorical Data Analysis. (Semester II: )

NATIONAL UNIVERSITY OF SINGAPORE EXAMINATION (SOLUTIONS) ST3241 Categorical Data Analysis. (Semester II: ) NATIONAL UNIVERSITY OF SINGAPORE EXAMINATION (SOLUTIONS) Categorical Data Analysis (Semester II: 2010 2011) April/May, 2011 Time Allowed : 2 Hours Matriculation No: Seat No: Grade Table Question 1 2 3

More information

Relationship between Genomic Distance-Based Regression and Kernel Machine Regression for Multi-marker Association Testing

Relationship between Genomic Distance-Based Regression and Kernel Machine Regression for Multi-marker Association Testing Relationship between Genomic Distance-Based Regression and Kernel Machine Regression for Multi-marker Association Testing Wei Pan Division of Biostatistics, School of Public Health, University of Minnesota,

More information