Theoretical and computational aspects of association tests: application in case-control genome-wide association studies.

Size: px
Start display at page:

Download "Theoretical and computational aspects of association tests: application in case-control genome-wide association studies."

Transcription

1 Theoretical and computational aspects of association tests: application in case-control genome-wide association studies Mathieu Emily November 18, 2014 Caen - Agrocampus Ouest - IRMAR, Rennes, France

2 Laboratoire de Mathématiques Appliquées de l Agrocampus (LMA 2 ) People: 6 Faculty, 1 research assistant, 5 PhD Research: Multivariate exploratory data analysis, Biostatistics, High-dimensional data Main topics: Sensometrics, Genomic data analysis mathieu.emily@agrocampus-ouest.fr - Agrocampus Ouest - IRMAR - Rennes 2

3 Outline 1 Genome-wide association studies 2 Power in single-locus association 3 Two-locus association 4 Conclusion mathieu.emily@agrocampus-ouest.fr - Agrocampus Ouest - IRMAR - Rennes 3

4 Outline 1 Genome-wide association studies Context and problematic 2 Power in single-locus association 3 Two-locus association 4 Conclusion mathieu.emily@agrocampus-ouest.fr - Agrocampus Ouest - IRMAR - Rennes 4

5 Genome-wide association studies (GWAS) Case/control studies Detection of differences in allelic frequencies between cases and controls individuals Genotyping of individuals from both populations Challenges: technological Large increase in the number of markers on chips: 100k, 300k, 500k and 1000k! computational statistical - Agrocampus Ouest - IRMAR - Rennes 5

6 Genome-wide association studies (GWAS) Statistical and computational challenges Individual Phenotype Marker 1 Marker 2... Marker 500,000 Y X 1 X 2... X 500,000 Id 1 healthy AA AC TG Id 2 diseased AC AC GG..... Id 1,000 diseased AC CC TG... Let Y be a random variable with a Bernoulli distribution (The case where Y is continuous is not treated here) Let X i {i = 1... p} be p random variables with 3 states (X i = 0 homozygote, X i = 1 heterozygote and X i = 2 homozygote for the minor allele) corresponding to Marker i genotype How Y is explained by {X i } i=1...p?.. mathieu.emily@agrocampus-ouest.fr - Agrocampus Ouest - IRMAR - Rennes 6

7 A success story?...yes Since 2005, a lot of variants has been found in susceptibility to various complex diseases: prostate cancer, Crohn s disease, etc... Manhattan plot for T1 Diabetes in the WTCCC dataset mathieu.emily@agrocampus-ouest.fr - Agrocampus Ouest - IRMAR - Rennes 7

8 A success story?...yes and no GWAS typically identify common variants with small effect sizes, lower right part of the graph (Bush WS, Moore JH, PLoS Comput Biol, 2012) - Agrocampus Ouest - IRMAR - Rennes 8

9 A success story?...no GWAS has generated new challenges such as: the quest of missing heritability! - Agrocampus Ouest - IRMAR - Rennes 9

10 Discrepancy between biology and statistics In biology GWAS are limited by complex phenomenon such as: Genome structure Complexity of diseases Potentiality for a large number of false positive results The future is to put prior knowledge in the analysis...and potentially make the problem more complex mathieu.emily@agrocampus-ouest.fr - Agrocampus Ouest - IRMAR - Rennes 10

11 Discrepancy between biology and statistics In biology GWAS are limited by complex phenomenon such as: Genome structure Complexity of diseases Potentiality for a large number of false positive results The future is to put prior knowledge in the analysis...and potentially make the problem more complex From a statistical point of view, GWAS are challenging because of : Correlation between SNPs Interaction between variables High dimensional problem with categorical variables The future is to investigate the behavior of basic statistical procedures in this specific context mathieu.emily@agrocampus-ouest.fr - Agrocampus Ouest - IRMAR - Rennes 11

12 Outline 1 Genome-wide association studies 2 Power in single-locus association Direct single-locus association Application with the WTCCC dataset 3 Two-locus association 4 Conclusion mathieu.emily@agrocampus-ouest.fr - Agrocampus Ouest - IRMAR - Rennes 12

13 Single-locus association GWAS are usually performed via a single-locus approach: Each SNP is tested independently Question: what is the most powerful statistical test to detect signal? Manhattan plot for T1 Diabetes in the WTCCC dataset (Nature, 2007) mathieu.emily@agrocampus-ouest.fr - Agrocampus Ouest - IRMAR - Rennes 13

14 Theoretical context and notations Let X and Y two binary variables with values in {1, 2}. X can be a biallelic biological marker. Y can be the presence/absence of a disease. Data are usually summarized in a 2x2 contingency table: X = 1 X = 2 Total Y = 1 n 11 n 12 n 1. = N(1 φ) Y = 2 n 21 n 22 n 2. = Nφ Total n.1 n.2 N where n ij is the total number of observations with Y = i and X = j. The marginal counts for Y are assumed to be fixed. One-margin fixed design. Let introduce φ as the balance of the design. Detecting association between X and Y is equivalent to compare two binomial proportions, π 1 and π 2 where: π i = P[X = 2 Y = i] for i = 1, 2 mathieu.emily@agrocampus-ouest.fr - Agrocampus Ouest - IRMAR - Rennes 14

15 Statistical hypothesis and tests (1) Our objective is to test: H 0 : π 1 = π 2 vs H 1 : π 1 π 2 (1) Exact tests: Fisher exact test Power function for exact test is hardly tractable. Asymptotic tests Pearson s χ 2 test Likelihood Ratio test (LRT) Statistical hypothesis in Equation 1 can be reformulated as: ) H 0 : log ( π1 1 π 1 π 2 1 π 2 = log (OR(π 1, π 2)) = 0 vs H 1 : log ( π1 1 π 1 π 2 1 π 2 where OR(π 1, π 2) is the so-called odds-ratio between π 1 and π 2. Statistical inference on odds-ratio can be used. ) 0 mathieu.emily@agrocampus-ouest.fr - Agrocampus Ouest - IRMAR - Rennes 15

16 Statistical hypothesis and tests (2) Let introduce the expected counts obtained under independence between X and Y : m ij = n i.n.j N Pearson s χ 2 statistic: Likelihood ratio: Odds-ratio inference: P = LR = 2 2 i=1 2 i=1 ( ) with : t = log n11 n 22 n 12 n 21 and SE = 2 (n ij m ij ) 2 j=1 2 j=1 ( z 2 t = SE m ij ( ) nij n ij log m ij ) 2 1 n n n n 22 mathieu.emily@agrocampus-ouest.fr - Agrocampus Ouest - IRMAR - Rennes 16

17 Statistical hypothesis and tests (3) Under H 0, all three tests follow a central χ 2 distribution with 1df: P H0 χ 2 (1) and LR H0 χ 2 (1) and z 2 H0 χ 2 (1) Under H 1, each of the three tests follows a non-central χ 2 distribution with 1df: P H1 χ 2 (λ P, 1) and LR H1 χ 2 (λ LR, 1) and z 2 H1 χ 2 (λ z 2, 1) qs Power comparison between P, LR and z 2 is equivalent to compare the non-central parameters λ P, λ LR and λ z 2. mathieu.emily@agrocampus-ouest.fr - Agrocampus Ouest - IRMAR - Rennes 17

18 Power study framework In the context of 2x2 tables analysis, power studies have been used to estimate the sample size needed to gain a certain level of power. Power study performed before experimentation. Here we propose a post-hoc power study, that can be made posterior to the experiments. To compare non-central parameters, we assume that N is fixed and propose the following scheme: 1 Definition of a general situation for H 1 2 Estimation of the three non-central parameters (λ P, λ LR and λ z 2 ) 3 Theoretical comparison of the non-central parameter estimates mathieu.emily@agrocampus-ouest.fr - Agrocampus Ouest - IRMAR - Rennes 18

19 Local alternatives for H 1 We consider the situation of local alternatives given by: π 2 = π 1 + h N. Let us introduce the mean contingency table, NE, and the mean expected contingency table, ME, as follows: NE= X = 1 X = 2 Total Y = 1 ne 11 = N(1 π 1 )(1 φ) ne 12 = Nπ 1 (1 φ) N(1 φ) Y = 2 ne 21 = N(1 π 2 )φ ne 22 = Nπ 2 φ Nφ Total n.1 = N(1 π) n.2 = N π N ME= X = 1 X = 2 Total Y = 1 me 11 = N(1 π)(1 φ) me 12 = N π(1 φ) N(1 φ) Y = 2 me 21 = N(1 π)φ me 22 = N πφ Nφ Total n.1 = N(1 π) n.2 = N π N where π = π 1(1 φ) + π 2φ. mathieu.emily@agrocampus-ouest.fr - Agrocampus Ouest - IRMAR - Rennes 19

20 Estimation of the non-central parameters Under local alternatives, non-central parameter, λ, is asymptotically equal to the statistic of the test calculated on NE and ME. Thus, estimates for non-central parameters are given by: λ P = λ LR = 2 2 i=1 2 i=1 2 (ne ij me ij ) 2 j=1 2 j=1 ( te λ z 2 = SE e ( ) with : t e = log ne11 ne 22 ne 12 ne 21 and SE e = me ij ( ) neij ne ij log me ij ) 2 1 ne ne ne ne 22 mathieu.emily@agrocampus-ouest.fr - Agrocampus Ouest - IRMAR - Rennes 20

21 When h is small we have: Taylor approximations where λ P = φ(1 φ)h 2 k=2 λ LR = φ(1 φ)h 2 k=2 ( h N ) k 2 g k (π 1)φ k 2 ( h ) k 2 g k (π 2 k 2 1) N k(k 1) i=0 φi g k (π 1) = ( ( 1 π 1 ) k 1 ( ) ) k 1 1 = (1 π1)k 1 ( π 1) k 1 1 π 1 (π 1(1 π 1)) k 1 mathieu.emily@agrocampus-ouest.fr - Agrocampus Ouest - IRMAR - Rennes 21

22 Taylor approximations (2) 4 th order When h is small we have: ( ) λ P λ LR h3 φ(1 φ) 2φ 1 [g 2(π 1) + h ( )] 5φ 2 φ 1 g 3(π 1) N 3 n 6 and: ( ) λ P λ 1/12 φ(1 φ)π1(1 z 2 h 4 π 1) g 3(π 1) 3π1 2 3π1 + 1 > 0 Parameters of importance: φ and π 1 h? mathieu.emily@agrocampus-ouest.fr - Agrocampus Ouest - IRMAR - Rennes 22

23 χ 2 - LRT Plot of the difference in power between χ 2 and LRT. mathieu.emily@agrocampus-ouest.fr - Agrocampus Ouest - IRMAR - Rennes 23

24 χ 2 - z 2 Plot of the difference in power between χ 2 and z 2. mathieu.emily@agrocampus-ouest.fr - Agrocampus Ouest - IRMAR - Rennes 24

25 Power comparison for φ=0.1 π 1 = 0.05 π 1 = 0.1 π 1 = 0.4 If π 1 is small, power is different between χ 2 and LR mathieu.emily@agrocampus-ouest.fr - Agrocampus Ouest - IRMAR - Rennes 25

26 Power comparison for φ=0.5 π 1 = 0.05 π 1 = 0.1 π 1 = 0.4 Similar powers for each test mathieu.emily@agrocampus-ouest.fr - Agrocampus Ouest - IRMAR - Rennes 26

27 Power comparison for φ=0.9 π 1 = 0.05 π 1 = 0.1 π 1 = 0.4 If π 1 is small, power is different between χ 2 and LR mathieu.emily@agrocampus-ouest.fr - Agrocampus Ouest - IRMAR - Rennes 27

28 Recommandations χ 2 always outperforms z 2. If h > 0 (Causal effect): π 1 small and φ small: χ 2 > LRT π 1 small and φ high: χ 2 < LRT If h < 0 (Protective effect): π 1 small and φ small: χ 2 < LRT π 1 small and φ high: χ 2 > LRT mathieu.emily@agrocampus-ouest.fr - Agrocampus Ouest - IRMAR - Rennes 28

29 Benchmark dataset: WTCCC (Nature, 2007) 500,000 Single Nucleotide Polymorphisms (SNPs) (X i ) 3,000 Controls 7 diseases with 2,000 cases for each disease. Two possible strategies for studying Crohn s disease: 1 2, 000 cases vs 3, 000 controls: φ = , 000 cases vs 15, 000 controls: φ = 0.11 The following filters are used: Control of the number of missing data (< 50) Control of Hardy-Weinberg Equilibrium (p.val > 0.05) Restriction to rare alleles: f 0.05 mathieu.emily@agrocampus-ouest.fr - Agrocampus Ouest - IRMAR - Rennes 29

30 Chromosome 20 Ranking can changed between tests. SNP ranking χ LR z mathieu.emily@agrocampus-ouest.fr - Agrocampus Ouest - IRMAR - Rennes 30

31 Outline 1 Genome-wide association studies 2 Power in single-locus association 3 Two-locus association Odds-ratio and δ method for counts Statistical interaction Biological interaction 4 Conclusion mathieu.emily@agrocampus-ouest.fr - Agrocampus Ouest - IRMAR - Rennes 31

32 Gene-gene interaction Single-locus scan fails at explaining biological complexity: Protein interaction networks Pathways A natural extension to single-locus approach is two-locus approach: SNP-SNP interaction or Gene-Gene interaction Main challenges: The number of tests: 125 billions of tests ( ) The large class of interaction models. One useful tool: Approximation of odds-ratio inference using δ method mathieu.emily@agrocampus-ouest.fr - Agrocampus Ouest - IRMAR - Rennes 32

33 Inference on odds-ratio The aim is to test the association between Y and m categories for X k with: Φ = [OR(x 1),..., OR(x m)] Null hypothesis can be written as: or equivalently: H 0 : Φ = [1,..., 1] H 0 : Ψ = [ψ(x 1),..., ψ(x m)] = [log(or(x 1 )),..., log(or(x m))] = [0,..., 0] mathieu.emily@agrocampus-ouest.fr - Agrocampus Ouest - IRMAR - Rennes 33

34 Classical test in genetic epidemiology Test Let W = ΨV 1 Ψ t Ψ = [ψ(x 1),..., ψ(x m)] Let V be the variance-covariance matrix for Ψ As W is a Wald statistic, we have: W χ 2 (m) In practice Ψ is estimated using Maximum Likelihood Estimation Estimating V 1 is more complex mathieu.emily@agrocampus-ouest.fr - Agrocampus Ouest - IRMAR - Rennes 34

35 Estimation of Ψ using MLE Contingency tables are given by: Y = y 1. y n, X l = x 1l. x nl = n1 0 n where nk s is the number of individuals i with y i = s and x il = k Then: OR(x l ) = P(Y = 1 X = x ( ) 1 l) P(Y = 1 X = x0) P(Y = 0 X = x l ) P(Y = 0 X = x 0) can be estimated by:. n 0 m. n 1 m OR(x l ) = n1 l n0 x 0 nl 0 nx 1 1 ψ(x l ) = log(n 1 l) log(n 0 l) log(n 0 x 0 ) + log(n 1 x 0 ) mathieu.emily@agrocampus-ouest.fr - Agrocampus Ouest - IRMAR - Rennes 35

36 Estimation of V (2) δ approximation Counts are assumed to follow a multinomial distribution: [N 1 x0 ;... ; N 1 xm ] Mult(p 1 x 0 ;... ; p 1 x m ) We can write: ) Nx 1 l n 1 px 1 (1 px l (1 1 + l ) δ 1 n 1 px 1 x l l log(n x 1 ) log(n 1 p 1 (1 p 1 x x l l ) + l ) δ n 1 p 1 x 1 l x l with: δx 1 l N (0, 1) Cov(δx 1 l ; δx 1 px 1 p 1 n ) = l xn (1 px 1 )(1 p 1 l xn ) if l n mathieu.emily@agrocampus-ouest.fr - Agrocampus Ouest - IRMAR - Rennes 36

37 Estimation of V (2) Example (( ) Cov(ψ(x k ), ψ(x l )) = Cov log(nk 1 ) log(nk 0 ) log(nx 0 0 ) + log(nx 1 0 ), ) (log(nl 1 ) log(nl 0 ) log(nx 0 0 ) + log(nx 1 0 ) Approximated thanks to: log(n x 1 l ) log(n 1 p 1 x l ) + (1 px 1 ) l n 1 px 1 δx 1 l l Variance-covariance structure of δ s mathieu.emily@agrocampus-ouest.fr - Agrocampus Ouest - IRMAR - Rennes 37

38 Application to statistical interaction deviation from linearity (1) Let X = (X k, X l ) be a pair of SNPs with 9 categories: x 0 = AABB, x 1 = AABb, x 2 = AAbb, x 3 = AaBB, x 4 = AaBb, x 5 = Aabb, x 6 = aabb, x 7 = aabb, x 8 = aabb Saturated logistic model is given by: logit (P(Y = 1 X )) =β i {Aa;aa} Test for interaction consists in testing: β i I Xk =i + i {Aa;aa} j {Bb;bb} i {Bb;bb} β ij I Xk =i;x l =j [β AaBb, β Aabb, β aabb, β Aabb ] = [0, 0, 0, 0] β i I Xl =i mathieu.emily@agrocampus-ouest.fr - Agrocampus Ouest - IRMAR - Rennes 38

39 Application to statistical interaction deviation from linearity (2) H 0 can be formulated as i {Aa, aa} and j = i {Bb, bb}: OR(X k = i X l = j) = OR(X k = AA X l = j)or(x k = i X l = BB) n 1 ijn 1 AABB n 1 ibb n1 AAj = n0 ijn 0 AABB n 0 ibb n0 AAj Ψ = [ψ AaBb ; ψ Aabb ; ψ aabb ; ψ aabb ] = [0; 0; 0; 0] with ψ ij = log ( n a ij n a 00 n i0 a na 0j ( n u ij n00 u n u i0 nu 0j ) ) mathieu.emily@agrocampus-ouest.fr - Agrocampus Ouest - IRMAR - Rennes 39

40 Computational cost Comparative analysis between a Wald test and a Likelihood Ratio Test (LRT) nsim Time (sec) Time (sec) LRT Wald Execution time is divided by almost 2. mathieu.emily@agrocampus-ouest.fr - Agrocampus Ouest - IRMAR - Rennes 40

41 WTCCC analysis After filtering using prior knowledge 3.5 millions tests have been performed Overall analysis of the 7 diseases from the WTCCC mathieu.emily@agrocampus-ouest.fr - Agrocampus Ouest - IRMAR - Rennes 41

42 Crohn s disease Significant hit between two genes: APC and IQGAP1 p-value: and after multiple testing correction Biological insights for the interaction M. Emily et al., European Journal of Human Genetics, QQ-plot for Crohn s disease with (black) and without (blue) APC-IQGAP1 interaction mathieu.emily@agrocampus-ouest.fr - Agrocampus Ouest - IRMAR - Rennes 42

43 Application to biological interaction non-linearity effect : IndOR IndOR: Independent Odds Ratio IndOR is based on a définition of epistasis (Cordell, 2002) The absence of epistasis means that two genes share the same amount of dependency between cases and controls. For a pair of SNPs (X k, X l ), H 0 can be formulated as: i {AA, Aa, aa} and j {BB, Bb, bb} P ((X k, X l ) = (i, j) Y = 1) P(X k = i Y = 1)P(X l = j Y = 1) = P((X k, X l ) = (i, j) Y = 0) P(X k = i Y = 0)P(X l = j Y = 0) mathieu.emily@agrocampus-ouest.fr - Agrocampus Ouest - IRMAR - Rennes 43

44 IndOR: Independent Odds Ratio Thanks to Bayes formula we have for H 0: P ((X k, X l ) = (i, j) Y = 1) P(X k = i Y = 1)P(X l = j Y = 1) = P((X k, X l ) = (i, j) Y = 0) P(X k = i Y = 0)P(X l = j Y = 0) IndOR = ΨV 1 Ψ t, with Ψ = [ψ AaBb, ψ Aabb, ψ aabb, ψ aabb ] IndOR χ 2 (4), sous H 0 ( ) OR(xi, x j ) ψ ij = log = 0 OR(x i )OR(x j ) M. Emily, Statistics In Medicine, mathieu.emily@agrocampus-ouest.fr - Agrocampus Ouest - IRMAR - Rennes 44

45 Historical epistatic disease models X 2 X γ γ γ 1 γ γ(1 + θ) γ(1 + θ) 2 γ γ(1 + θ) γ(1 + θ) X 2 X γ γ γ 1 γ γ γ 2 γ γ γ(1 + θ) RR: Jointly Recessive-Recessive X 2 X γ γ γ 1 γ γ γ 2 γ γ(1 + θ) γ(1 + θ) DD: Jointly Dominant-Dominant RD: Jointly Recessive-Dominant mathieu.emily@agrocampus-ouest.fr - Agrocampus Ouest - IRMAR - Rennes 45

46 RR DD RD Historical epistatic disease models Power Ratio r 2 PLINK T IH BOOST IndOR Case Only mathieu.emily@agrocampus-ouest.fr - Agrocampus Ouest - IRMAR - Rennes 46

47 Biological epistatic disease models X 2 X 1 BB Bb bb AA γ γ γ Aa γ γ(1 + θ) γ aa γ γ γ I: Interface X 2 X 1 BB Bb bb AA γ γ γ Aa γ γ γ(1 + θ) aa γ(1 + θ) γ(1 + θ) γ(1 + θ) Mod: Modifying-effect mathieu.emily@agrocampus-ouest.fr - Agrocampus Ouest - IRMAR - Rennes 47

48 Biological epistatic disease models I Mod Power Ratio r 2 PLINK T IH BOOST IndOR Case Only mathieu.emily@agrocampus-ouest.fr - Agrocampus Ouest - IRMAR - Rennes 48

49 Crohn s disease hits Control set Statistic SNP1 Chr1 (Position) SNP2 Chr2 (Position) p-value corr. p-value Shared PLINK rs ( ) rs ( ) Combined PLINK rs ( ) rs ( ) Shared T IH rs ( ) rs ( ) Combined T IH rs ( ) rs ( ) Shared BOOST rs ( ) rs ( ) Combined BOOST rs ( ) rs ( ) Shared IndOR rs ( ) rs ( ) Combined IndOR rs ( ) rs ( ) Shared CaseOnly rs ( ) rs ( ) Combined CaseOnly rs ( ) rs ( ) Shared PLINK rs ( ) rs ( ) Combined PLINK rs ( ) rs ( ) Shared T IH rs ( ) rs ( ) Combined T IH rs ( ) rs ( ) Shared BOOST rs ( ) rs ( ) Combined BOOST rs ( ) rs ( ) Shared IndOR rs ( ) rs ( ) Combined IndOR rs ( ) rs ( ) Shared CaseOnly rs ( ) rs ( ) Combined CaseOnly rs ( ) rs ( ) mathieu.emily@agrocampus-ouest.fr - Agrocampus Ouest - IRMAR - Rennes 49

50 Outline 1 Genome-wide association studies 2 Power in single-locus association 3 Two-locus association 4 Conclusion mathieu.emily@agrocampus-ouest.fr - Agrocampus Ouest - IRMAR - Rennes 50

51 Conclusion/Discussion Single-locus statistical tests are not equivalent: χ 2 test always outperforms z 2. The comparison between χ 2 and LRT depends jointly on the observed proportion of cases (φ) and the frequency of the variant (π 1 ): Causal effect Protective effect φ is small φ is large φ is small φ is large Rare variant χ 2 LRT LRT χ 2 Common variant LRT χ 2 χ 2 LRT Future work: Effect of tagging: indirect association Test for linear trend (Cochran-Armitage test) Two-locus interaction: δ approximation for counts Improvement of linear and non-linear tests Future work: Theoretical power study Investigation of the effect of tagging Thank you for your attention! mathieu.emily@agrocampus-ouest.fr - Agrocampus Ouest - IRMAR - Rennes 51

Quantitative Genomics and Genetics BTRY 4830/6830; PBSB

Quantitative Genomics and Genetics BTRY 4830/6830; PBSB Quantitative Genomics and Genetics BTRY 4830/6830; PBSB.5201.01 Lecture 20: Epistasis and Alternative Tests in GWAS Jason Mezey jgm45@cornell.edu April 16, 2016 (Th) 8:40-9:55 None Announcements Summary

More information

BTRY 4830/6830: Quantitative Genomics and Genetics

BTRY 4830/6830: Quantitative Genomics and Genetics BTRY 4830/6830: Quantitative Genomics and Genetics Lecture 23: Alternative tests in GWAS / (Brief) Introduction to Bayesian Inference Jason Mezey jgm45@cornell.edu Nov. 13, 2014 (Th) 8:40-9:55 Announcements

More information

Computational Systems Biology: Biology X

Computational Systems Biology: Biology X Bud Mishra Room 1002, 715 Broadway, Courant Institute, NYU, New York, USA L#7:(Mar-23-2010) Genome Wide Association Studies 1 The law of causality... is a relic of a bygone age, surviving, like the monarchy,

More information

Case-Control Association Testing. Case-Control Association Testing

Case-Control Association Testing. Case-Control Association Testing Introduction Association mapping is now routinely being used to identify loci that are involved with complex traits. Technological advances have made it feasible to perform case-control association studies

More information

Association Testing with Quantitative Traits: Common and Rare Variants. Summer Institute in Statistical Genetics 2014 Module 10 Lecture 5

Association Testing with Quantitative Traits: Common and Rare Variants. Summer Institute in Statistical Genetics 2014 Module 10 Lecture 5 Association Testing with Quantitative Traits: Common and Rare Variants Timothy Thornton and Katie Kerr Summer Institute in Statistical Genetics 2014 Module 10 Lecture 5 1 / 41 Introduction to Quantitative

More information

The E-M Algorithm in Genetics. Biostatistics 666 Lecture 8

The E-M Algorithm in Genetics. Biostatistics 666 Lecture 8 The E-M Algorithm in Genetics Biostatistics 666 Lecture 8 Maximum Likelihood Estimation of Allele Frequencies Find parameter estimates which make observed data most likely General approach, as long as

More information

BTRY 7210: Topics in Quantitative Genomics and Genetics

BTRY 7210: Topics in Quantitative Genomics and Genetics BTRY 7210: Topics in Quantitative Genomics and Genetics Jason Mezey Biological Statistics and Computational Biology (BSCB) Department of Genetic Medicine jgm45@cornell.edu February 12, 2015 Lecture 3:

More information

Probability of Detecting Disease-Associated SNPs in Case-Control Genome-Wide Association Studies

Probability of Detecting Disease-Associated SNPs in Case-Control Genome-Wide Association Studies Probability of Detecting Disease-Associated SNPs in Case-Control Genome-Wide Association Studies Ruth Pfeiffer, Ph.D. Mitchell Gail Biostatistics Branch Division of Cancer Epidemiology&Genetics National

More information

Quantitative Genomics and Genetics BTRY 4830/6830; PBSB

Quantitative Genomics and Genetics BTRY 4830/6830; PBSB Quantitative Genomics and Genetics BTRY 4830/6830; PBSB.5201.01 Lecture16: Population structure and logistic regression I Jason Mezey jgm45@cornell.edu April 11, 2017 (T) 8:40-9:55 Announcements I April

More information

Genotype Imputation. Biostatistics 666

Genotype Imputation. Biostatistics 666 Genotype Imputation Biostatistics 666 Previously Hidden Markov Models for Relative Pairs Linkage analysis using affected sibling pairs Estimation of pairwise relationships Identity-by-Descent Relatives

More information

Lecture 2: Genetic Association Testing with Quantitative Traits. Summer Institute in Statistical Genetics 2017

Lecture 2: Genetic Association Testing with Quantitative Traits. Summer Institute in Statistical Genetics 2017 Lecture 2: Genetic Association Testing with Quantitative Traits Instructors: Timothy Thornton and Michael Wu Summer Institute in Statistical Genetics 2017 1 / 29 Introduction to Quantitative Trait Mapping

More information

Quantitative Genomics and Genetics BTRY 4830/6830; PBSB

Quantitative Genomics and Genetics BTRY 4830/6830; PBSB Quantitative Genomics and Genetics BTRY 4830/6830; PBSB.5201.01 Lecture 18: Introduction to covariates, the QQ plot, and population structure II + minimal GWAS steps Jason Mezey jgm45@cornell.edu April

More information

Lecture 1: Case-Control Association Testing. Summer Institute in Statistical Genetics 2015

Lecture 1: Case-Control Association Testing. Summer Institute in Statistical Genetics 2015 Timothy Thornton and Michael Wu Summer Institute in Statistical Genetics 2015 1 / 1 Introduction Association mapping is now routinely being used to identify loci that are involved with complex traits.

More information

Association studies and regression

Association studies and regression Association studies and regression CM226: Machine Learning for Bioinformatics. Fall 2016 Sriram Sankararaman Acknowledgments: Fei Sha, Ameet Talwalkar Association studies and regression 1 / 104 Administration

More information

Linear Regression (1/1/17)

Linear Regression (1/1/17) STA613/CBB540: Statistical methods in computational biology Linear Regression (1/1/17) Lecturer: Barbara Engelhardt Scribe: Ethan Hada 1. Linear regression 1.1. Linear regression basics. Linear regression

More information

NIH Public Access Author Manuscript Stat Sin. Author manuscript; available in PMC 2013 August 15.

NIH Public Access Author Manuscript Stat Sin. Author manuscript; available in PMC 2013 August 15. NIH Public Access Author Manuscript Published in final edited form as: Stat Sin. 2012 ; 22: 1041 1074. ON MODEL SELECTION STRATEGIES TO IDENTIFY GENES UNDERLYING BINARY TRAITS USING GENOME-WIDE ASSOCIATION

More information

Bayesian Inference of Interactions and Associations

Bayesian Inference of Interactions and Associations Bayesian Inference of Interactions and Associations Jun Liu Department of Statistics Harvard University http://www.fas.harvard.edu/~junliu Based on collaborations with Yu Zhang, Jing Zhang, Yuan Yuan,

More information

1. Understand the methods for analyzing population structure in genomes

1. Understand the methods for analyzing population structure in genomes MSCBIO 2070/02-710: Computational Genomics, Spring 2016 HW3: Population Genetics Due: 24:00 EST, April 4, 2016 by autolab Your goals in this assignment are to 1. Understand the methods for analyzing population

More information

Friday Harbor From Genetics to GWAS (Genome-wide Association Study) Sept David Fardo

Friday Harbor From Genetics to GWAS (Genome-wide Association Study) Sept David Fardo Friday Harbor 2017 From Genetics to GWAS (Genome-wide Association Study) Sept 7 2017 David Fardo Purpose: prepare for tomorrow s tutorial Genetic Variants Quality Control Imputation Association Visualization

More information

Goodness of Fit Goodness of fit - 2 classes

Goodness of Fit Goodness of fit - 2 classes Goodness of Fit Goodness of fit - 2 classes A B 78 22 Do these data correspond reasonably to the proportions 3:1? We previously discussed options for testing p A = 0.75! Exact p-value Exact confidence

More information

Nature Genetics: doi: /ng Supplementary Figure 1. Number of cases and proxy cases required to detect association at designs.

Nature Genetics: doi: /ng Supplementary Figure 1. Number of cases and proxy cases required to detect association at designs. Supplementary Figure 1 Number of cases and proxy cases required to detect association at designs. = 5 10 8 for case control and proxy case control The ratio of controls to cases (or proxy cases) is 1.

More information

Efficient designs of gene environment interaction studies: implications of Hardy Weinberg equilibrium and gene environment independence

Efficient designs of gene environment interaction studies: implications of Hardy Weinberg equilibrium and gene environment independence Special Issue Paper Received 7 January 20, Accepted 28 September 20 Published online 24 February 202 in Wiley Online Library (wileyonlinelibrary.com) DOI: 0.002/sim.4460 Efficient designs of gene environment

More information

On the limiting distribution of the likelihood ratio test in nucleotide mapping of complex disease

On the limiting distribution of the likelihood ratio test in nucleotide mapping of complex disease On the limiting distribution of the likelihood ratio test in nucleotide mapping of complex disease Yuehua Cui 1 and Dong-Yun Kim 2 1 Department of Statistics and Probability, Michigan State University,

More information

Proportional Variance Explained by QLT and Statistical Power. Proportional Variance Explained by QTL and Statistical Power

Proportional Variance Explained by QLT and Statistical Power. Proportional Variance Explained by QTL and Statistical Power Proportional Variance Explained by QTL and Statistical Power Partitioning the Genetic Variance We previously focused on obtaining variance components of a quantitative trait to determine the proportion

More information

BTRY 4830/6830: Quantitative Genomics and Genetics Fall 2014

BTRY 4830/6830: Quantitative Genomics and Genetics Fall 2014 BTRY 4830/6830: Quantitative Genomics and Genetics Fall 2014 Homework 4 (version 3) - posted October 3 Assigned October 2; Due 11:59PM October 9 Problem 1 (Easy) a. For the genetic regression model: Y

More information

2. Map genetic distance between markers

2. Map genetic distance between markers Chapter 5. Linkage Analysis Linkage is an important tool for the mapping of genetic loci and a method for mapping disease loci. With the availability of numerous DNA markers throughout the human genome,

More information

Binomial Mixture Model-based Association Tests under Genetic Heterogeneity

Binomial Mixture Model-based Association Tests under Genetic Heterogeneity Binomial Mixture Model-based Association Tests under Genetic Heterogeneity Hui Zhou, Wei Pan Division of Biostatistics, School of Public Health, University of Minnesota, Minneapolis, MN 55455 April 30,

More information

. Also, in this case, p i = N1 ) T, (2) where. I γ C N(N 2 2 F + N1 2 Q)

. Also, in this case, p i = N1 ) T, (2) where. I γ C N(N 2 2 F + N1 2 Q) Supplementary information S7 Testing for association at imputed SPs puted SPs Score tests A Score Test needs calculations of the observed data score and information matrix only under the null hypothesis,

More information

The purpose of this section is to derive the asymptotic distribution of the Pearson chi-square statistic. k (n j np j ) 2. np j.

The purpose of this section is to derive the asymptotic distribution of the Pearson chi-square statistic. k (n j np j ) 2. np j. Chapter 9 Pearson s chi-square test 9. Null hypothesis asymptotics Let X, X 2, be independent from a multinomial(, p) distribution, where p is a k-vector with nonnegative entries that sum to one. That

More information

Normal distribution We have a random sample from N(m, υ). The sample mean is Ȳ and the corrected sum of squares is S yy. After some simplification,

Normal distribution We have a random sample from N(m, υ). The sample mean is Ȳ and the corrected sum of squares is S yy. After some simplification, Likelihood Let P (D H) be the probability an experiment produces data D, given hypothesis H. Usually H is regarded as fixed and D variable. Before the experiment, the data D are unknown, and the probability

More information

Population Genetics I. Bio

Population Genetics I. Bio Population Genetics I. Bio5488-2018 Don Conrad dconrad@genetics.wustl.edu Why study population genetics? Functional Inference Demographic inference: History of mankind is written in our DNA. We can learn

More information

COMBI - Combining high-dimensional classification and multiple hypotheses testing for the analysis of big data in genetics

COMBI - Combining high-dimensional classification and multiple hypotheses testing for the analysis of big data in genetics COMBI - Combining high-dimensional classification and multiple hypotheses testing for the analysis of big data in genetics Thorsten Dickhaus University of Bremen Institute for Statistics AG DANK Herbsttagung

More information

Computational Approaches to Statistical Genetics

Computational Approaches to Statistical Genetics Computational Approaches to Statistical Genetics GWAS I: Concepts and Probability Theory Christoph Lippert Dr. Oliver Stegle Prof. Dr. Karsten Borgwardt Max-Planck-Institutes Tübingen, Germany Tübingen

More information

STAT 536: Genetic Statistics

STAT 536: Genetic Statistics STAT 536: Genetic Statistics Tests for Hardy Weinberg Equilibrium Karin S. Dorman Department of Statistics Iowa State University September 7, 2006 Statistical Hypothesis Testing Identify a hypothesis,

More information

MODEL-FREE LINKAGE AND ASSOCIATION MAPPING OF COMPLEX TRAITS USING QUANTITATIVE ENDOPHENOTYPES

MODEL-FREE LINKAGE AND ASSOCIATION MAPPING OF COMPLEX TRAITS USING QUANTITATIVE ENDOPHENOTYPES MODEL-FREE LINKAGE AND ASSOCIATION MAPPING OF COMPLEX TRAITS USING QUANTITATIVE ENDOPHENOTYPES Saurabh Ghosh Human Genetics Unit Indian Statistical Institute, Kolkata Most common diseases are caused by

More information

A novel fuzzy set based multifactor dimensionality reduction method for detecting gene-gene interaction

A novel fuzzy set based multifactor dimensionality reduction method for detecting gene-gene interaction A novel fuzzy set based multifactor dimensionality reduction method for detecting gene-gene interaction Sangseob Leem, Hye-Young Jung, Sungyoung Lee and Taesung Park Bioinformatics and Biostatistics lab

More information

Expression QTLs and Mapping of Complex Trait Loci. Paul Schliekelman Statistics Department University of Georgia

Expression QTLs and Mapping of Complex Trait Loci. Paul Schliekelman Statistics Department University of Georgia Expression QTLs and Mapping of Complex Trait Loci Paul Schliekelman Statistics Department University of Georgia Definitions: Genes, Loci and Alleles A gene codes for a protein. Proteins due everything.

More information

Department of Forensic Psychiatry, School of Medicine & Forensics, Xi'an Jiaotong University, Xi'an, China;

Department of Forensic Psychiatry, School of Medicine & Forensics, Xi'an Jiaotong University, Xi'an, China; Title: Evaluation of genetic susceptibility of common variants in CACNA1D with schizophrenia in Han Chinese Author names and affiliations: Fanglin Guan a,e, Lu Li b, Chuchu Qiao b, Gang Chen b, Tinglin

More information

Equivalence of random-effects and conditional likelihoods for matched case-control studies

Equivalence of random-effects and conditional likelihoods for matched case-control studies Equivalence of random-effects and conditional likelihoods for matched case-control studies Ken Rice MRC Biostatistics Unit, Cambridge, UK January 8 th 4 Motivation Study of genetic c-erbb- exposure and

More information

QTL model selection: key players

QTL model selection: key players Bayesian Interval Mapping. Bayesian strategy -9. Markov chain sampling 0-7. sampling genetic architectures 8-5 4. criteria for model selection 6-44 QTL : Bayes Seattle SISG: Yandell 008 QTL model selection:

More information

DNA polymorphisms such as SNP and familial effects (additive genetic, common environment) to

DNA polymorphisms such as SNP and familial effects (additive genetic, common environment) to 1 1 1 1 1 1 1 1 0 SUPPLEMENTARY MATERIALS, B. BIVARIATE PEDIGREE-BASED ASSOCIATION ANALYSIS Introduction We propose here a statistical method of bivariate genetic analysis, designed to evaluate contribution

More information

(Genome-wide) association analysis

(Genome-wide) association analysis (Genome-wide) association analysis 1 Key concepts Mapping QTL by association relies on linkage disequilibrium in the population; LD can be caused by close linkage between a QTL and marker (= good) or by

More information

Session 3 The proportional odds model and the Mann-Whitney test

Session 3 The proportional odds model and the Mann-Whitney test Session 3 The proportional odds model and the Mann-Whitney test 3.1 A unified approach to inference 3.2 Analysis via dichotomisation 3.3 Proportional odds 3.4 Relationship with the Mann-Whitney test Session

More information

Package LBLGXE. R topics documented: July 20, Type Package

Package LBLGXE. R topics documented: July 20, Type Package Type Package Package LBLGXE July 20, 2015 Title Bayesian Lasso for detecting Rare (or Common) Haplotype Association and their interactions with Environmental Covariates Version 1.2 Date 2015-07-09 Author

More information

Hybrid CPU/GPU Acceleration of Detection of 2-SNP Epistatic Interactions in GWAS

Hybrid CPU/GPU Acceleration of Detection of 2-SNP Epistatic Interactions in GWAS Hybrid CPU/GPU Acceleration of Detection of 2-SNP Epistatic Interactions in GWAS Jorge González-Domínguez*, Bertil Schmidt*, Jan C. Kässens**, Lars Wienbrandt** *Parallel and Distributed Architectures

More information

ML Testing (Likelihood Ratio Testing) for non-gaussian models

ML Testing (Likelihood Ratio Testing) for non-gaussian models ML Testing (Likelihood Ratio Testing) for non-gaussian models Surya Tokdar ML test in a slightly different form Model X f (x θ), θ Θ. Hypothesist H 0 : θ Θ 0 Good set: B c (x) = {θ : l x (θ) max θ Θ l

More information

Multiple QTL mapping

Multiple QTL mapping Multiple QTL mapping Karl W Broman Department of Biostatistics Johns Hopkins University www.biostat.jhsph.edu/~kbroman [ Teaching Miscellaneous lectures] 1 Why? Reduce residual variation = increased power

More information

Bioinformatics. Genotype -> Phenotype DNA. Jason H. Moore, Ph.D. GECCO 2007 Tutorial / Bioinformatics.

Bioinformatics. Genotype -> Phenotype DNA. Jason H. Moore, Ph.D. GECCO 2007 Tutorial / Bioinformatics. Bioinformatics Jason H. Moore, Ph.D. Frank Lane Research Scholar in Computational Genetics Associate Professor of Genetics Adjunct Associate Professor of Biological Sciences Adjunct Associate Professor

More information

Computational Systems Biology: Biology X

Computational Systems Biology: Biology X Bud Mishra Room 1002, 715 Broadway, Courant Institute, NYU, New York, USA L#5:(Mar-21-2010) Genome Wide Association Studies 1 Experiments on Garden Peas Statistical Significance 2 The law of causality...

More information

Powerful multi-locus tests for genetic association in the presence of gene-gene and gene-environment interactions

Powerful multi-locus tests for genetic association in the presence of gene-gene and gene-environment interactions Powerful multi-locus tests for genetic association in the presence of gene-gene and gene-environment interactions Nilanjan Chatterjee, Zeynep Kalaylioglu 2, Roxana Moslehi, Ulrike Peters 3, Sholom Wacholder

More information

GWAS IV: Bayesian linear (variance component) models

GWAS IV: Bayesian linear (variance component) models GWAS IV: Bayesian linear (variance component) models Dr. Oliver Stegle Christoh Lippert Prof. Dr. Karsten Borgwardt Max-Planck-Institutes Tübingen, Germany Tübingen Summer 2011 Oliver Stegle GWAS IV: Bayesian

More information

Chapter 2. Review of basic Statistical methods 1 Distribution, conditional distribution and moments

Chapter 2. Review of basic Statistical methods 1 Distribution, conditional distribution and moments Chapter 2. Review of basic Statistical methods 1 Distribution, conditional distribution and moments We consider two kinds of random variables: discrete and continuous random variables. For discrete random

More information

Lecture 9. QTL Mapping 2: Outbred Populations

Lecture 9. QTL Mapping 2: Outbred Populations Lecture 9 QTL Mapping 2: Outbred Populations Bruce Walsh. Aug 2004. Royal Veterinary and Agricultural University, Denmark The major difference between QTL analysis using inbred-line crosses vs. outbred

More information

Lecture 6: Introduction to Quantitative genetics. Bruce Walsh lecture notes Liege May 2011 course version 25 May 2011

Lecture 6: Introduction to Quantitative genetics. Bruce Walsh lecture notes Liege May 2011 course version 25 May 2011 Lecture 6: Introduction to Quantitative genetics Bruce Walsh lecture notes Liege May 2011 course version 25 May 2011 Quantitative Genetics The analysis of traits whose variation is determined by both a

More information

Adaptive testing of conditional association through Bayesian recursive mixture modeling

Adaptive testing of conditional association through Bayesian recursive mixture modeling Adaptive testing of conditional association through Bayesian recursive mixture modeling Li Ma February 12, 2013 Abstract In many case-control studies, a central goal is to test for association or dependence

More information

The Generalized Higher Criticism for Testing SNP-sets in Genetic Association Studies

The Generalized Higher Criticism for Testing SNP-sets in Genetic Association Studies The Generalized Higher Criticism for Testing SNP-sets in Genetic Association Studies Ian Barnett, Rajarshi Mukherjee & Xihong Lin Harvard University ibarnett@hsph.harvard.edu June 24, 2014 Ian Barnett

More information

Introduction to Linkage Disequilibrium

Introduction to Linkage Disequilibrium Introduction to September 10, 2014 Suppose we have two genes on a single chromosome gene A and gene B such that each gene has only two alleles Aalleles : A 1 and A 2 Balleles : B 1 and B 2 Suppose we have

More information

Measures of Association and Variance Estimation

Measures of Association and Variance Estimation Measures of Association and Variance Estimation Dipankar Bandyopadhyay, Ph.D. Department of Biostatistics, Virginia Commonwealth University D. Bandyopadhyay (VCU) BIOS 625: Categorical Data & GLM 1 / 35

More information

Asymptotic distribution of the largest eigenvalue with application to genetic data

Asymptotic distribution of the largest eigenvalue with application to genetic data Asymptotic distribution of the largest eigenvalue with application to genetic data Chong Wu University of Minnesota September 30, 2016 T32 Journal Club Chong Wu 1 / 25 Table of Contents 1 Background Gene-gene

More information

An introduction to biostatistics: part 1

An introduction to biostatistics: part 1 An introduction to biostatistics: part 1 Cavan Reilly September 6, 2017 Table of contents Introduction to data analysis Uncertainty Probability Conditional probability Random variables Discrete random

More information

Pearson s Test, Trend Test, and MAX Are All Trend Tests with Different Types of Scores

Pearson s Test, Trend Test, and MAX Are All Trend Tests with Different Types of Scores Commentary doi: 101111/1469-180900800500x Pearson s Test, Trend Test, and MAX Are All Trend Tests with Different Types of Scores Gang Zheng 1, Jungnam Joo 1 and Yaning Yang 1 Office of Biostatistics Research,

More information

Analyzing metabolomics data for association with genotypes using two-component Gaussian mixture distributions

Analyzing metabolomics data for association with genotypes using two-component Gaussian mixture distributions Analyzing metabolomics data for association with genotypes using two-component Gaussian mixture distributions Jason Westra Department of Statistics, Iowa State University Ames, IA 50011, United States

More information

p(d g A,g B )p(g B ), g B

p(d g A,g B )p(g B ), g B Supplementary Note Marginal effects for two-locus models Here we derive the marginal effect size of the three models given in Figure 1 of the main text. For each model we assume the two loci (A and B)

More information

Discrete Multivariate Statistics

Discrete Multivariate Statistics Discrete Multivariate Statistics Univariate Discrete Random variables Let X be a discrete random variable which, in this module, will be assumed to take a finite number of t different values which are

More information

Categorical Data Analysis Chapter 3

Categorical Data Analysis Chapter 3 Categorical Data Analysis Chapter 3 The actual coverage probability is usually a bit higher than the nominal level. Confidence intervals for association parameteres Consider the odds ratio in the 2x2 table,

More information

Lecture 21: October 19

Lecture 21: October 19 36-705: Intermediate Statistics Fall 2017 Lecturer: Siva Balakrishnan Lecture 21: October 19 21.1 Likelihood Ratio Test (LRT) To test composite versus composite hypotheses the general method is to use

More information

Lecture 25. Ingo Ruczinski. November 24, Department of Biostatistics Johns Hopkins Bloomberg School of Public Health Johns Hopkins University

Lecture 25. Ingo Ruczinski. November 24, Department of Biostatistics Johns Hopkins Bloomberg School of Public Health Johns Hopkins University Lecture 25 Department of Biostatistics Johns Hopkins Bloomberg School of Public Health Johns Hopkins University November 24, 2015 1 2 3 4 5 6 7 8 9 10 11 1 Hypothesis s of homgeneity 2 Estimating risk

More information

Stochastic processes and

Stochastic processes and Stochastic processes and Markov chains (part II) Wessel van Wieringen w.n.van.wieringen@vu.nl wieringen@vu nl Department of Epidemiology and Biostatistics, VUmc & Department of Mathematics, VU University

More information

Backward Genotype-Trait Association. in Case-Control Designs

Backward Genotype-Trait Association. in Case-Control Designs Backward Genotype-Trait Association (BGTA)-Based Dissection of Complex Traits in Case-Control Designs Tian Zheng, Hui Wang and Shaw-Hwa Lo Department of Statistics, Columbia University, New York, New York,

More information

Using a CUDA-Accelerated PGAS Model on a GPU Cluster for Bioinformatics

Using a CUDA-Accelerated PGAS Model on a GPU Cluster for Bioinformatics Using a CUDA-Accelerated PGAS Model on a GPU Cluster for Bioinformatics Jorge González-Domínguez Parallel and Distributed Architectures Group Johannes Gutenberg University of Mainz, Germany j.gonzalez@uni-mainz.de

More information

The Generalized Higher Criticism for Testing SNP-sets in Genetic Association Studies

The Generalized Higher Criticism for Testing SNP-sets in Genetic Association Studies The Generalized Higher Criticism for Testing SNP-sets in Genetic Association Studies Ian Barnett, Rajarshi Mukherjee & Xihong Lin Harvard University ibarnett@hsph.harvard.edu August 5, 2014 Ian Barnett

More information

Lecture WS Evolutionary Genetics Part I 1

Lecture WS Evolutionary Genetics Part I 1 Quantitative genetics Quantitative genetics is the study of the inheritance of quantitative/continuous phenotypic traits, like human height and body size, grain colour in winter wheat or beak depth in

More information

Inferring Genetic Architecture of Complex Biological Processes

Inferring Genetic Architecture of Complex Biological Processes Inferring Genetic Architecture of Complex Biological Processes BioPharmaceutical Technology Center Institute (BTCI) Brian S. Yandell University of Wisconsin-Madison http://www.stat.wisc.edu/~yandell/statgen

More information

How to analyze many contingency tables simultaneously?

How to analyze many contingency tables simultaneously? How to analyze many contingency tables simultaneously? Thorsten Dickhaus Humboldt-Universität zu Berlin Beuth Hochschule für Technik Berlin, 31.10.2012 Outline Motivation: Genetic association studies Statistical

More information

I Have the Power in QTL linkage: single and multilocus analysis

I Have the Power in QTL linkage: single and multilocus analysis I Have the Power in QTL linkage: single and multilocus analysis Benjamin Neale 1, Sir Shaun Purcell 2 & Pak Sham 13 1 SGDP, IoP, London, UK 2 Harvard School of Public Health, Cambridge, MA, USA 3 Department

More information

Module 4: Bayesian Methods Lecture 9 A: Default prior selection. Outline

Module 4: Bayesian Methods Lecture 9 A: Default prior selection. Outline Module 4: Bayesian Methods Lecture 9 A: Default prior selection Peter Ho Departments of Statistics and Biostatistics University of Washington Outline Je reys prior Unit information priors Empirical Bayes

More information

Lecture 7: Interaction Analysis. Summer Institute in Statistical Genetics 2017

Lecture 7: Interaction Analysis. Summer Institute in Statistical Genetics 2017 Lecture 7: Interaction Analysis Timothy Thornton and Michael Wu Summer Institute in Statistical Genetics 2017 1 / 39 Lecture Outline Beyond main SNP effects Introduction to Concept of Statistical Interaction

More information

Model-Free Knockoffs: High-Dimensional Variable Selection that Controls the False Discovery Rate

Model-Free Knockoffs: High-Dimensional Variable Selection that Controls the False Discovery Rate Model-Free Knockoffs: High-Dimensional Variable Selection that Controls the False Discovery Rate Lucas Janson, Stanford Department of Statistics WADAPT Workshop, NIPS, December 2016 Collaborators: Emmanuel

More information

Lecture 01: Introduction

Lecture 01: Introduction Lecture 01: Introduction Dipankar Bandyopadhyay, Ph.D. BMTRY 711: Analysis of Categorical Data Spring 2011 Division of Biostatistics and Epidemiology Medical University of South Carolina Lecture 01: Introduction

More information

Régression en grande dimension et épistasie par blocs pour les études d association

Régression en grande dimension et épistasie par blocs pour les études d association Régression en grande dimension et épistasie par blocs pour les études d association V. Stanislas, C. Dalmasso, C. Ambroise Laboratoire de Mathématiques et Modélisation d Évry "Statistique et Génome" 1

More information

Methods for Cryptic Structure. Methods for Cryptic Structure

Methods for Cryptic Structure. Methods for Cryptic Structure Case-Control Association Testing Review Consider testing for association between a disease and a genetic marker Idea is to look for an association by comparing allele/genotype frequencies between the cases

More information

Partitioning Genetic Variance

Partitioning Genetic Variance PSYC 510: Partitioning Genetic Variance (09/17/03) 1 Partitioning Genetic Variance Here, mathematical models are developed for the computation of different types of genetic variance. Several substantive

More information

Robust Detection and Identification of Sparse Segments in Ultra-High Dimensional Data Analysis

Robust Detection and Identification of Sparse Segments in Ultra-High Dimensional Data Analysis Robust Detection and Identification of Sparse Segments in Ultra-High Dimensional Data Analysis Hongzhe Li hongzhe@upenn.edu, http://statgene.med.upenn.edu University of Pennsylvania Perelman School of

More information

Population Genetics. with implications for Linkage Disequilibrium. Chiara Sabatti, Human Genetics 6357a Gonda

Population Genetics. with implications for Linkage Disequilibrium. Chiara Sabatti, Human Genetics 6357a Gonda 1 Population Genetics with implications for Linkage Disequilibrium Chiara Sabatti, Human Genetics 6357a Gonda csabatti@mednet.ucla.edu 2 Hardy-Weinberg Hypotheses: infinite populations; no inbreeding;

More information

STAT 526 Spring Midterm 1. Wednesday February 2, 2011

STAT 526 Spring Midterm 1. Wednesday February 2, 2011 STAT 526 Spring 2011 Midterm 1 Wednesday February 2, 2011 Time: 2 hours Name (please print): Show all your work and calculations. Partial credit will be given for work that is partially correct. Points

More information

Statistical Power of Model Selection Strategies for Genome-Wide Association Studies

Statistical Power of Model Selection Strategies for Genome-Wide Association Studies Statistical Power of Model Selection Strategies for Genome-Wide Association Studies Zheyang Wu 1, Hongyu Zhao 1,2 * 1 Department of Epidemiology and Public Health, Yale University School of Medicine, New

More information

Genetic Association Studies in the Presence of Population Structure and Admixture

Genetic Association Studies in the Presence of Population Structure and Admixture Genetic Association Studies in the Presence of Population Structure and Admixture Purushottam W. Laud and Nicholas M. Pajewski Division of Biostatistics Department of Population Health Medical College

More information

Solutions for Examination Categorical Data Analysis, March 21, 2013

Solutions for Examination Categorical Data Analysis, March 21, 2013 STOCKHOLMS UNIVERSITET MATEMATISKA INSTITUTIONEN Avd. Matematisk statistik, Frank Miller MT 5006 LÖSNINGAR 21 mars 2013 Solutions for Examination Categorical Data Analysis, March 21, 2013 Problem 1 a.

More information

Review. Timothy Hanson. Department of Statistics, University of South Carolina. Stat 770: Categorical Data Analysis

Review. Timothy Hanson. Department of Statistics, University of South Carolina. Stat 770: Categorical Data Analysis Review Timothy Hanson Department of Statistics, University of South Carolina Stat 770: Categorical Data Analysis 1 / 22 Chapter 1: background Nominal, ordinal, interval data. Distributions: Poisson, binomial,

More information

1 Preliminary Variance component test in GLM Mediation Analysis... 3

1 Preliminary Variance component test in GLM Mediation Analysis... 3 Honglang Wang Depart. of Stat. & Prob. wangho16@msu.edu Omics Data Integration Statistical Genetics/Genomics Journal Club Summary and discussion of Joint Analysis of SNP and Gene Expression Data in Genetic

More information

Statistics 3858 : Contingency Tables

Statistics 3858 : Contingency Tables Statistics 3858 : Contingency Tables 1 Introduction Before proceeding with this topic the student should review generalized likelihood ratios ΛX) for multinomial distributions, its relation to Pearson

More information

1 Springer. Nan M. Laird Christoph Lange. The Fundamentals of Modern Statistical Genetics

1 Springer. Nan M. Laird Christoph Lange. The Fundamentals of Modern Statistical Genetics 1 Springer Nan M. Laird Christoph Lange The Fundamentals of Modern Statistical Genetics 1 Introduction to Statistical Genetics and Background in Molecular Genetics 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

More information

Multivariate analysis of genetic data an introduction

Multivariate analysis of genetic data an introduction Multivariate analysis of genetic data an introduction Thibaut Jombart MRC Centre for Outbreak Analysis and Modelling Imperial College London Population genomics in Lausanne 23 Aug 2016 1/25 Outline Multivariate

More information

Topic 21 Goodness of Fit

Topic 21 Goodness of Fit Topic 21 Goodness of Fit Contingency Tables 1 / 11 Introduction Two-way Table Smoking Habits The Hypothesis The Test Statistic Degrees of Freedom Outline 2 / 11 Introduction Contingency tables, also known

More information

Model Selection for Multiple QTL

Model Selection for Multiple QTL Model Selection for Multiple TL 1. reality of multiple TL 3-8. selecting a class of TL models 9-15 3. comparing TL models 16-4 TL model selection criteria issues of detecting epistasis 4. simulations and

More information

Affected Sibling Pairs. Biostatistics 666

Affected Sibling Pairs. Biostatistics 666 Affected Sibling airs Biostatistics 666 Today Discussion of linkage analysis using affected sibling pairs Our exploration will include several components we have seen before: A simple disease model IBD

More information

Multiple regression. CM226: Machine Learning for Bioinformatics. Fall Sriram Sankararaman Acknowledgments: Fei Sha, Ameet Talwalkar

Multiple regression. CM226: Machine Learning for Bioinformatics. Fall Sriram Sankararaman Acknowledgments: Fei Sha, Ameet Talwalkar Multiple regression CM226: Machine Learning for Bioinformatics. Fall 2016 Sriram Sankararaman Acknowledgments: Fei Sha, Ameet Talwalkar Multiple regression 1 / 36 Previous two lectures Linear and logistic

More information

Lecture 3: Basic Statistical Tools. Bruce Walsh lecture notes Tucson Winter Institute 7-9 Jan 2013

Lecture 3: Basic Statistical Tools. Bruce Walsh lecture notes Tucson Winter Institute 7-9 Jan 2013 Lecture 3: Basic Statistical Tools Bruce Walsh lecture notes Tucson Winter Institute 7-9 Jan 2013 1 Basic probability Events are possible outcomes from some random process e.g., a genotype is AA, a phenotype

More information

Reports of the Institute of Biostatistics

Reports of the Institute of Biostatistics Reports of the Institute of Biostatistics No 02 / 2008 Leibniz University of Hannover Natural Sciences Faculty Title: Properties of confidence intervals for the comparison of small binomial proportions

More information

CS 4491/CS 7990 SPECIAL TOPICS IN BIOINFORMATICS

CS 4491/CS 7990 SPECIAL TOPICS IN BIOINFORMATICS CS 4491/CS 7990 SPECIAL TOPICS IN BIOINFORMATICS * Some contents are adapted from Dr. Hung Huang and Dr. Chengkai Li at UT Arlington Mingon Kang, Ph.D. Computer Science, Kennesaw State University Problems

More information