Theoretical and computational aspects of association tests: application in case-control genome-wide association studies.
|
|
- Kory Parker
- 5 years ago
- Views:
Transcription
1 Theoretical and computational aspects of association tests: application in case-control genome-wide association studies Mathieu Emily November 18, 2014 Caen - Agrocampus Ouest - IRMAR, Rennes, France
2 Laboratoire de Mathématiques Appliquées de l Agrocampus (LMA 2 ) People: 6 Faculty, 1 research assistant, 5 PhD Research: Multivariate exploratory data analysis, Biostatistics, High-dimensional data Main topics: Sensometrics, Genomic data analysis mathieu.emily@agrocampus-ouest.fr - Agrocampus Ouest - IRMAR - Rennes 2
3 Outline 1 Genome-wide association studies 2 Power in single-locus association 3 Two-locus association 4 Conclusion mathieu.emily@agrocampus-ouest.fr - Agrocampus Ouest - IRMAR - Rennes 3
4 Outline 1 Genome-wide association studies Context and problematic 2 Power in single-locus association 3 Two-locus association 4 Conclusion mathieu.emily@agrocampus-ouest.fr - Agrocampus Ouest - IRMAR - Rennes 4
5 Genome-wide association studies (GWAS) Case/control studies Detection of differences in allelic frequencies between cases and controls individuals Genotyping of individuals from both populations Challenges: technological Large increase in the number of markers on chips: 100k, 300k, 500k and 1000k! computational statistical - Agrocampus Ouest - IRMAR - Rennes 5
6 Genome-wide association studies (GWAS) Statistical and computational challenges Individual Phenotype Marker 1 Marker 2... Marker 500,000 Y X 1 X 2... X 500,000 Id 1 healthy AA AC TG Id 2 diseased AC AC GG..... Id 1,000 diseased AC CC TG... Let Y be a random variable with a Bernoulli distribution (The case where Y is continuous is not treated here) Let X i {i = 1... p} be p random variables with 3 states (X i = 0 homozygote, X i = 1 heterozygote and X i = 2 homozygote for the minor allele) corresponding to Marker i genotype How Y is explained by {X i } i=1...p?.. mathieu.emily@agrocampus-ouest.fr - Agrocampus Ouest - IRMAR - Rennes 6
7 A success story?...yes Since 2005, a lot of variants has been found in susceptibility to various complex diseases: prostate cancer, Crohn s disease, etc... Manhattan plot for T1 Diabetes in the WTCCC dataset mathieu.emily@agrocampus-ouest.fr - Agrocampus Ouest - IRMAR - Rennes 7
8 A success story?...yes and no GWAS typically identify common variants with small effect sizes, lower right part of the graph (Bush WS, Moore JH, PLoS Comput Biol, 2012) - Agrocampus Ouest - IRMAR - Rennes 8
9 A success story?...no GWAS has generated new challenges such as: the quest of missing heritability! - Agrocampus Ouest - IRMAR - Rennes 9
10 Discrepancy between biology and statistics In biology GWAS are limited by complex phenomenon such as: Genome structure Complexity of diseases Potentiality for a large number of false positive results The future is to put prior knowledge in the analysis...and potentially make the problem more complex mathieu.emily@agrocampus-ouest.fr - Agrocampus Ouest - IRMAR - Rennes 10
11 Discrepancy between biology and statistics In biology GWAS are limited by complex phenomenon such as: Genome structure Complexity of diseases Potentiality for a large number of false positive results The future is to put prior knowledge in the analysis...and potentially make the problem more complex From a statistical point of view, GWAS are challenging because of : Correlation between SNPs Interaction between variables High dimensional problem with categorical variables The future is to investigate the behavior of basic statistical procedures in this specific context mathieu.emily@agrocampus-ouest.fr - Agrocampus Ouest - IRMAR - Rennes 11
12 Outline 1 Genome-wide association studies 2 Power in single-locus association Direct single-locus association Application with the WTCCC dataset 3 Two-locus association 4 Conclusion mathieu.emily@agrocampus-ouest.fr - Agrocampus Ouest - IRMAR - Rennes 12
13 Single-locus association GWAS are usually performed via a single-locus approach: Each SNP is tested independently Question: what is the most powerful statistical test to detect signal? Manhattan plot for T1 Diabetes in the WTCCC dataset (Nature, 2007) mathieu.emily@agrocampus-ouest.fr - Agrocampus Ouest - IRMAR - Rennes 13
14 Theoretical context and notations Let X and Y two binary variables with values in {1, 2}. X can be a biallelic biological marker. Y can be the presence/absence of a disease. Data are usually summarized in a 2x2 contingency table: X = 1 X = 2 Total Y = 1 n 11 n 12 n 1. = N(1 φ) Y = 2 n 21 n 22 n 2. = Nφ Total n.1 n.2 N where n ij is the total number of observations with Y = i and X = j. The marginal counts for Y are assumed to be fixed. One-margin fixed design. Let introduce φ as the balance of the design. Detecting association between X and Y is equivalent to compare two binomial proportions, π 1 and π 2 where: π i = P[X = 2 Y = i] for i = 1, 2 mathieu.emily@agrocampus-ouest.fr - Agrocampus Ouest - IRMAR - Rennes 14
15 Statistical hypothesis and tests (1) Our objective is to test: H 0 : π 1 = π 2 vs H 1 : π 1 π 2 (1) Exact tests: Fisher exact test Power function for exact test is hardly tractable. Asymptotic tests Pearson s χ 2 test Likelihood Ratio test (LRT) Statistical hypothesis in Equation 1 can be reformulated as: ) H 0 : log ( π1 1 π 1 π 2 1 π 2 = log (OR(π 1, π 2)) = 0 vs H 1 : log ( π1 1 π 1 π 2 1 π 2 where OR(π 1, π 2) is the so-called odds-ratio between π 1 and π 2. Statistical inference on odds-ratio can be used. ) 0 mathieu.emily@agrocampus-ouest.fr - Agrocampus Ouest - IRMAR - Rennes 15
16 Statistical hypothesis and tests (2) Let introduce the expected counts obtained under independence between X and Y : m ij = n i.n.j N Pearson s χ 2 statistic: Likelihood ratio: Odds-ratio inference: P = LR = 2 2 i=1 2 i=1 ( ) with : t = log n11 n 22 n 12 n 21 and SE = 2 (n ij m ij ) 2 j=1 2 j=1 ( z 2 t = SE m ij ( ) nij n ij log m ij ) 2 1 n n n n 22 mathieu.emily@agrocampus-ouest.fr - Agrocampus Ouest - IRMAR - Rennes 16
17 Statistical hypothesis and tests (3) Under H 0, all three tests follow a central χ 2 distribution with 1df: P H0 χ 2 (1) and LR H0 χ 2 (1) and z 2 H0 χ 2 (1) Under H 1, each of the three tests follows a non-central χ 2 distribution with 1df: P H1 χ 2 (λ P, 1) and LR H1 χ 2 (λ LR, 1) and z 2 H1 χ 2 (λ z 2, 1) qs Power comparison between P, LR and z 2 is equivalent to compare the non-central parameters λ P, λ LR and λ z 2. mathieu.emily@agrocampus-ouest.fr - Agrocampus Ouest - IRMAR - Rennes 17
18 Power study framework In the context of 2x2 tables analysis, power studies have been used to estimate the sample size needed to gain a certain level of power. Power study performed before experimentation. Here we propose a post-hoc power study, that can be made posterior to the experiments. To compare non-central parameters, we assume that N is fixed and propose the following scheme: 1 Definition of a general situation for H 1 2 Estimation of the three non-central parameters (λ P, λ LR and λ z 2 ) 3 Theoretical comparison of the non-central parameter estimates mathieu.emily@agrocampus-ouest.fr - Agrocampus Ouest - IRMAR - Rennes 18
19 Local alternatives for H 1 We consider the situation of local alternatives given by: π 2 = π 1 + h N. Let us introduce the mean contingency table, NE, and the mean expected contingency table, ME, as follows: NE= X = 1 X = 2 Total Y = 1 ne 11 = N(1 π 1 )(1 φ) ne 12 = Nπ 1 (1 φ) N(1 φ) Y = 2 ne 21 = N(1 π 2 )φ ne 22 = Nπ 2 φ Nφ Total n.1 = N(1 π) n.2 = N π N ME= X = 1 X = 2 Total Y = 1 me 11 = N(1 π)(1 φ) me 12 = N π(1 φ) N(1 φ) Y = 2 me 21 = N(1 π)φ me 22 = N πφ Nφ Total n.1 = N(1 π) n.2 = N π N where π = π 1(1 φ) + π 2φ. mathieu.emily@agrocampus-ouest.fr - Agrocampus Ouest - IRMAR - Rennes 19
20 Estimation of the non-central parameters Under local alternatives, non-central parameter, λ, is asymptotically equal to the statistic of the test calculated on NE and ME. Thus, estimates for non-central parameters are given by: λ P = λ LR = 2 2 i=1 2 i=1 2 (ne ij me ij ) 2 j=1 2 j=1 ( te λ z 2 = SE e ( ) with : t e = log ne11 ne 22 ne 12 ne 21 and SE e = me ij ( ) neij ne ij log me ij ) 2 1 ne ne ne ne 22 mathieu.emily@agrocampus-ouest.fr - Agrocampus Ouest - IRMAR - Rennes 20
21 When h is small we have: Taylor approximations where λ P = φ(1 φ)h 2 k=2 λ LR = φ(1 φ)h 2 k=2 ( h N ) k 2 g k (π 1)φ k 2 ( h ) k 2 g k (π 2 k 2 1) N k(k 1) i=0 φi g k (π 1) = ( ( 1 π 1 ) k 1 ( ) ) k 1 1 = (1 π1)k 1 ( π 1) k 1 1 π 1 (π 1(1 π 1)) k 1 mathieu.emily@agrocampus-ouest.fr - Agrocampus Ouest - IRMAR - Rennes 21
22 Taylor approximations (2) 4 th order When h is small we have: ( ) λ P λ LR h3 φ(1 φ) 2φ 1 [g 2(π 1) + h ( )] 5φ 2 φ 1 g 3(π 1) N 3 n 6 and: ( ) λ P λ 1/12 φ(1 φ)π1(1 z 2 h 4 π 1) g 3(π 1) 3π1 2 3π1 + 1 > 0 Parameters of importance: φ and π 1 h? mathieu.emily@agrocampus-ouest.fr - Agrocampus Ouest - IRMAR - Rennes 22
23 χ 2 - LRT Plot of the difference in power between χ 2 and LRT. mathieu.emily@agrocampus-ouest.fr - Agrocampus Ouest - IRMAR - Rennes 23
24 χ 2 - z 2 Plot of the difference in power between χ 2 and z 2. mathieu.emily@agrocampus-ouest.fr - Agrocampus Ouest - IRMAR - Rennes 24
25 Power comparison for φ=0.1 π 1 = 0.05 π 1 = 0.1 π 1 = 0.4 If π 1 is small, power is different between χ 2 and LR mathieu.emily@agrocampus-ouest.fr - Agrocampus Ouest - IRMAR - Rennes 25
26 Power comparison for φ=0.5 π 1 = 0.05 π 1 = 0.1 π 1 = 0.4 Similar powers for each test mathieu.emily@agrocampus-ouest.fr - Agrocampus Ouest - IRMAR - Rennes 26
27 Power comparison for φ=0.9 π 1 = 0.05 π 1 = 0.1 π 1 = 0.4 If π 1 is small, power is different between χ 2 and LR mathieu.emily@agrocampus-ouest.fr - Agrocampus Ouest - IRMAR - Rennes 27
28 Recommandations χ 2 always outperforms z 2. If h > 0 (Causal effect): π 1 small and φ small: χ 2 > LRT π 1 small and φ high: χ 2 < LRT If h < 0 (Protective effect): π 1 small and φ small: χ 2 < LRT π 1 small and φ high: χ 2 > LRT mathieu.emily@agrocampus-ouest.fr - Agrocampus Ouest - IRMAR - Rennes 28
29 Benchmark dataset: WTCCC (Nature, 2007) 500,000 Single Nucleotide Polymorphisms (SNPs) (X i ) 3,000 Controls 7 diseases with 2,000 cases for each disease. Two possible strategies for studying Crohn s disease: 1 2, 000 cases vs 3, 000 controls: φ = , 000 cases vs 15, 000 controls: φ = 0.11 The following filters are used: Control of the number of missing data (< 50) Control of Hardy-Weinberg Equilibrium (p.val > 0.05) Restriction to rare alleles: f 0.05 mathieu.emily@agrocampus-ouest.fr - Agrocampus Ouest - IRMAR - Rennes 29
30 Chromosome 20 Ranking can changed between tests. SNP ranking χ LR z mathieu.emily@agrocampus-ouest.fr - Agrocampus Ouest - IRMAR - Rennes 30
31 Outline 1 Genome-wide association studies 2 Power in single-locus association 3 Two-locus association Odds-ratio and δ method for counts Statistical interaction Biological interaction 4 Conclusion mathieu.emily@agrocampus-ouest.fr - Agrocampus Ouest - IRMAR - Rennes 31
32 Gene-gene interaction Single-locus scan fails at explaining biological complexity: Protein interaction networks Pathways A natural extension to single-locus approach is two-locus approach: SNP-SNP interaction or Gene-Gene interaction Main challenges: The number of tests: 125 billions of tests ( ) The large class of interaction models. One useful tool: Approximation of odds-ratio inference using δ method mathieu.emily@agrocampus-ouest.fr - Agrocampus Ouest - IRMAR - Rennes 32
33 Inference on odds-ratio The aim is to test the association between Y and m categories for X k with: Φ = [OR(x 1),..., OR(x m)] Null hypothesis can be written as: or equivalently: H 0 : Φ = [1,..., 1] H 0 : Ψ = [ψ(x 1),..., ψ(x m)] = [log(or(x 1 )),..., log(or(x m))] = [0,..., 0] mathieu.emily@agrocampus-ouest.fr - Agrocampus Ouest - IRMAR - Rennes 33
34 Classical test in genetic epidemiology Test Let W = ΨV 1 Ψ t Ψ = [ψ(x 1),..., ψ(x m)] Let V be the variance-covariance matrix for Ψ As W is a Wald statistic, we have: W χ 2 (m) In practice Ψ is estimated using Maximum Likelihood Estimation Estimating V 1 is more complex mathieu.emily@agrocampus-ouest.fr - Agrocampus Ouest - IRMAR - Rennes 34
35 Estimation of Ψ using MLE Contingency tables are given by: Y = y 1. y n, X l = x 1l. x nl = n1 0 n where nk s is the number of individuals i with y i = s and x il = k Then: OR(x l ) = P(Y = 1 X = x ( ) 1 l) P(Y = 1 X = x0) P(Y = 0 X = x l ) P(Y = 0 X = x 0) can be estimated by:. n 0 m. n 1 m OR(x l ) = n1 l n0 x 0 nl 0 nx 1 1 ψ(x l ) = log(n 1 l) log(n 0 l) log(n 0 x 0 ) + log(n 1 x 0 ) mathieu.emily@agrocampus-ouest.fr - Agrocampus Ouest - IRMAR - Rennes 35
36 Estimation of V (2) δ approximation Counts are assumed to follow a multinomial distribution: [N 1 x0 ;... ; N 1 xm ] Mult(p 1 x 0 ;... ; p 1 x m ) We can write: ) Nx 1 l n 1 px 1 (1 px l (1 1 + l ) δ 1 n 1 px 1 x l l log(n x 1 ) log(n 1 p 1 (1 p 1 x x l l ) + l ) δ n 1 p 1 x 1 l x l with: δx 1 l N (0, 1) Cov(δx 1 l ; δx 1 px 1 p 1 n ) = l xn (1 px 1 )(1 p 1 l xn ) if l n mathieu.emily@agrocampus-ouest.fr - Agrocampus Ouest - IRMAR - Rennes 36
37 Estimation of V (2) Example (( ) Cov(ψ(x k ), ψ(x l )) = Cov log(nk 1 ) log(nk 0 ) log(nx 0 0 ) + log(nx 1 0 ), ) (log(nl 1 ) log(nl 0 ) log(nx 0 0 ) + log(nx 1 0 ) Approximated thanks to: log(n x 1 l ) log(n 1 p 1 x l ) + (1 px 1 ) l n 1 px 1 δx 1 l l Variance-covariance structure of δ s mathieu.emily@agrocampus-ouest.fr - Agrocampus Ouest - IRMAR - Rennes 37
38 Application to statistical interaction deviation from linearity (1) Let X = (X k, X l ) be a pair of SNPs with 9 categories: x 0 = AABB, x 1 = AABb, x 2 = AAbb, x 3 = AaBB, x 4 = AaBb, x 5 = Aabb, x 6 = aabb, x 7 = aabb, x 8 = aabb Saturated logistic model is given by: logit (P(Y = 1 X )) =β i {Aa;aa} Test for interaction consists in testing: β i I Xk =i + i {Aa;aa} j {Bb;bb} i {Bb;bb} β ij I Xk =i;x l =j [β AaBb, β Aabb, β aabb, β Aabb ] = [0, 0, 0, 0] β i I Xl =i mathieu.emily@agrocampus-ouest.fr - Agrocampus Ouest - IRMAR - Rennes 38
39 Application to statistical interaction deviation from linearity (2) H 0 can be formulated as i {Aa, aa} and j = i {Bb, bb}: OR(X k = i X l = j) = OR(X k = AA X l = j)or(x k = i X l = BB) n 1 ijn 1 AABB n 1 ibb n1 AAj = n0 ijn 0 AABB n 0 ibb n0 AAj Ψ = [ψ AaBb ; ψ Aabb ; ψ aabb ; ψ aabb ] = [0; 0; 0; 0] with ψ ij = log ( n a ij n a 00 n i0 a na 0j ( n u ij n00 u n u i0 nu 0j ) ) mathieu.emily@agrocampus-ouest.fr - Agrocampus Ouest - IRMAR - Rennes 39
40 Computational cost Comparative analysis between a Wald test and a Likelihood Ratio Test (LRT) nsim Time (sec) Time (sec) LRT Wald Execution time is divided by almost 2. mathieu.emily@agrocampus-ouest.fr - Agrocampus Ouest - IRMAR - Rennes 40
41 WTCCC analysis After filtering using prior knowledge 3.5 millions tests have been performed Overall analysis of the 7 diseases from the WTCCC mathieu.emily@agrocampus-ouest.fr - Agrocampus Ouest - IRMAR - Rennes 41
42 Crohn s disease Significant hit between two genes: APC and IQGAP1 p-value: and after multiple testing correction Biological insights for the interaction M. Emily et al., European Journal of Human Genetics, QQ-plot for Crohn s disease with (black) and without (blue) APC-IQGAP1 interaction mathieu.emily@agrocampus-ouest.fr - Agrocampus Ouest - IRMAR - Rennes 42
43 Application to biological interaction non-linearity effect : IndOR IndOR: Independent Odds Ratio IndOR is based on a définition of epistasis (Cordell, 2002) The absence of epistasis means that two genes share the same amount of dependency between cases and controls. For a pair of SNPs (X k, X l ), H 0 can be formulated as: i {AA, Aa, aa} and j {BB, Bb, bb} P ((X k, X l ) = (i, j) Y = 1) P(X k = i Y = 1)P(X l = j Y = 1) = P((X k, X l ) = (i, j) Y = 0) P(X k = i Y = 0)P(X l = j Y = 0) mathieu.emily@agrocampus-ouest.fr - Agrocampus Ouest - IRMAR - Rennes 43
44 IndOR: Independent Odds Ratio Thanks to Bayes formula we have for H 0: P ((X k, X l ) = (i, j) Y = 1) P(X k = i Y = 1)P(X l = j Y = 1) = P((X k, X l ) = (i, j) Y = 0) P(X k = i Y = 0)P(X l = j Y = 0) IndOR = ΨV 1 Ψ t, with Ψ = [ψ AaBb, ψ Aabb, ψ aabb, ψ aabb ] IndOR χ 2 (4), sous H 0 ( ) OR(xi, x j ) ψ ij = log = 0 OR(x i )OR(x j ) M. Emily, Statistics In Medicine, mathieu.emily@agrocampus-ouest.fr - Agrocampus Ouest - IRMAR - Rennes 44
45 Historical epistatic disease models X 2 X γ γ γ 1 γ γ(1 + θ) γ(1 + θ) 2 γ γ(1 + θ) γ(1 + θ) X 2 X γ γ γ 1 γ γ γ 2 γ γ γ(1 + θ) RR: Jointly Recessive-Recessive X 2 X γ γ γ 1 γ γ γ 2 γ γ(1 + θ) γ(1 + θ) DD: Jointly Dominant-Dominant RD: Jointly Recessive-Dominant mathieu.emily@agrocampus-ouest.fr - Agrocampus Ouest - IRMAR - Rennes 45
46 RR DD RD Historical epistatic disease models Power Ratio r 2 PLINK T IH BOOST IndOR Case Only mathieu.emily@agrocampus-ouest.fr - Agrocampus Ouest - IRMAR - Rennes 46
47 Biological epistatic disease models X 2 X 1 BB Bb bb AA γ γ γ Aa γ γ(1 + θ) γ aa γ γ γ I: Interface X 2 X 1 BB Bb bb AA γ γ γ Aa γ γ γ(1 + θ) aa γ(1 + θ) γ(1 + θ) γ(1 + θ) Mod: Modifying-effect mathieu.emily@agrocampus-ouest.fr - Agrocampus Ouest - IRMAR - Rennes 47
48 Biological epistatic disease models I Mod Power Ratio r 2 PLINK T IH BOOST IndOR Case Only mathieu.emily@agrocampus-ouest.fr - Agrocampus Ouest - IRMAR - Rennes 48
49 Crohn s disease hits Control set Statistic SNP1 Chr1 (Position) SNP2 Chr2 (Position) p-value corr. p-value Shared PLINK rs ( ) rs ( ) Combined PLINK rs ( ) rs ( ) Shared T IH rs ( ) rs ( ) Combined T IH rs ( ) rs ( ) Shared BOOST rs ( ) rs ( ) Combined BOOST rs ( ) rs ( ) Shared IndOR rs ( ) rs ( ) Combined IndOR rs ( ) rs ( ) Shared CaseOnly rs ( ) rs ( ) Combined CaseOnly rs ( ) rs ( ) Shared PLINK rs ( ) rs ( ) Combined PLINK rs ( ) rs ( ) Shared T IH rs ( ) rs ( ) Combined T IH rs ( ) rs ( ) Shared BOOST rs ( ) rs ( ) Combined BOOST rs ( ) rs ( ) Shared IndOR rs ( ) rs ( ) Combined IndOR rs ( ) rs ( ) Shared CaseOnly rs ( ) rs ( ) Combined CaseOnly rs ( ) rs ( ) mathieu.emily@agrocampus-ouest.fr - Agrocampus Ouest - IRMAR - Rennes 49
50 Outline 1 Genome-wide association studies 2 Power in single-locus association 3 Two-locus association 4 Conclusion mathieu.emily@agrocampus-ouest.fr - Agrocampus Ouest - IRMAR - Rennes 50
51 Conclusion/Discussion Single-locus statistical tests are not equivalent: χ 2 test always outperforms z 2. The comparison between χ 2 and LRT depends jointly on the observed proportion of cases (φ) and the frequency of the variant (π 1 ): Causal effect Protective effect φ is small φ is large φ is small φ is large Rare variant χ 2 LRT LRT χ 2 Common variant LRT χ 2 χ 2 LRT Future work: Effect of tagging: indirect association Test for linear trend (Cochran-Armitage test) Two-locus interaction: δ approximation for counts Improvement of linear and non-linear tests Future work: Theoretical power study Investigation of the effect of tagging Thank you for your attention! mathieu.emily@agrocampus-ouest.fr - Agrocampus Ouest - IRMAR - Rennes 51
Quantitative Genomics and Genetics BTRY 4830/6830; PBSB
Quantitative Genomics and Genetics BTRY 4830/6830; PBSB.5201.01 Lecture 20: Epistasis and Alternative Tests in GWAS Jason Mezey jgm45@cornell.edu April 16, 2016 (Th) 8:40-9:55 None Announcements Summary
More informationBTRY 4830/6830: Quantitative Genomics and Genetics
BTRY 4830/6830: Quantitative Genomics and Genetics Lecture 23: Alternative tests in GWAS / (Brief) Introduction to Bayesian Inference Jason Mezey jgm45@cornell.edu Nov. 13, 2014 (Th) 8:40-9:55 Announcements
More informationComputational Systems Biology: Biology X
Bud Mishra Room 1002, 715 Broadway, Courant Institute, NYU, New York, USA L#7:(Mar-23-2010) Genome Wide Association Studies 1 The law of causality... is a relic of a bygone age, surviving, like the monarchy,
More informationCase-Control Association Testing. Case-Control Association Testing
Introduction Association mapping is now routinely being used to identify loci that are involved with complex traits. Technological advances have made it feasible to perform case-control association studies
More informationAssociation Testing with Quantitative Traits: Common and Rare Variants. Summer Institute in Statistical Genetics 2014 Module 10 Lecture 5
Association Testing with Quantitative Traits: Common and Rare Variants Timothy Thornton and Katie Kerr Summer Institute in Statistical Genetics 2014 Module 10 Lecture 5 1 / 41 Introduction to Quantitative
More informationThe E-M Algorithm in Genetics. Biostatistics 666 Lecture 8
The E-M Algorithm in Genetics Biostatistics 666 Lecture 8 Maximum Likelihood Estimation of Allele Frequencies Find parameter estimates which make observed data most likely General approach, as long as
More informationBTRY 7210: Topics in Quantitative Genomics and Genetics
BTRY 7210: Topics in Quantitative Genomics and Genetics Jason Mezey Biological Statistics and Computational Biology (BSCB) Department of Genetic Medicine jgm45@cornell.edu February 12, 2015 Lecture 3:
More informationProbability of Detecting Disease-Associated SNPs in Case-Control Genome-Wide Association Studies
Probability of Detecting Disease-Associated SNPs in Case-Control Genome-Wide Association Studies Ruth Pfeiffer, Ph.D. Mitchell Gail Biostatistics Branch Division of Cancer Epidemiology&Genetics National
More informationQuantitative Genomics and Genetics BTRY 4830/6830; PBSB
Quantitative Genomics and Genetics BTRY 4830/6830; PBSB.5201.01 Lecture16: Population structure and logistic regression I Jason Mezey jgm45@cornell.edu April 11, 2017 (T) 8:40-9:55 Announcements I April
More informationGenotype Imputation. Biostatistics 666
Genotype Imputation Biostatistics 666 Previously Hidden Markov Models for Relative Pairs Linkage analysis using affected sibling pairs Estimation of pairwise relationships Identity-by-Descent Relatives
More informationLecture 2: Genetic Association Testing with Quantitative Traits. Summer Institute in Statistical Genetics 2017
Lecture 2: Genetic Association Testing with Quantitative Traits Instructors: Timothy Thornton and Michael Wu Summer Institute in Statistical Genetics 2017 1 / 29 Introduction to Quantitative Trait Mapping
More informationQuantitative Genomics and Genetics BTRY 4830/6830; PBSB
Quantitative Genomics and Genetics BTRY 4830/6830; PBSB.5201.01 Lecture 18: Introduction to covariates, the QQ plot, and population structure II + minimal GWAS steps Jason Mezey jgm45@cornell.edu April
More informationLecture 1: Case-Control Association Testing. Summer Institute in Statistical Genetics 2015
Timothy Thornton and Michael Wu Summer Institute in Statistical Genetics 2015 1 / 1 Introduction Association mapping is now routinely being used to identify loci that are involved with complex traits.
More informationAssociation studies and regression
Association studies and regression CM226: Machine Learning for Bioinformatics. Fall 2016 Sriram Sankararaman Acknowledgments: Fei Sha, Ameet Talwalkar Association studies and regression 1 / 104 Administration
More informationLinear Regression (1/1/17)
STA613/CBB540: Statistical methods in computational biology Linear Regression (1/1/17) Lecturer: Barbara Engelhardt Scribe: Ethan Hada 1. Linear regression 1.1. Linear regression basics. Linear regression
More informationNIH Public Access Author Manuscript Stat Sin. Author manuscript; available in PMC 2013 August 15.
NIH Public Access Author Manuscript Published in final edited form as: Stat Sin. 2012 ; 22: 1041 1074. ON MODEL SELECTION STRATEGIES TO IDENTIFY GENES UNDERLYING BINARY TRAITS USING GENOME-WIDE ASSOCIATION
More informationBayesian Inference of Interactions and Associations
Bayesian Inference of Interactions and Associations Jun Liu Department of Statistics Harvard University http://www.fas.harvard.edu/~junliu Based on collaborations with Yu Zhang, Jing Zhang, Yuan Yuan,
More information1. Understand the methods for analyzing population structure in genomes
MSCBIO 2070/02-710: Computational Genomics, Spring 2016 HW3: Population Genetics Due: 24:00 EST, April 4, 2016 by autolab Your goals in this assignment are to 1. Understand the methods for analyzing population
More informationFriday Harbor From Genetics to GWAS (Genome-wide Association Study) Sept David Fardo
Friday Harbor 2017 From Genetics to GWAS (Genome-wide Association Study) Sept 7 2017 David Fardo Purpose: prepare for tomorrow s tutorial Genetic Variants Quality Control Imputation Association Visualization
More informationGoodness of Fit Goodness of fit - 2 classes
Goodness of Fit Goodness of fit - 2 classes A B 78 22 Do these data correspond reasonably to the proportions 3:1? We previously discussed options for testing p A = 0.75! Exact p-value Exact confidence
More informationNature Genetics: doi: /ng Supplementary Figure 1. Number of cases and proxy cases required to detect association at designs.
Supplementary Figure 1 Number of cases and proxy cases required to detect association at designs. = 5 10 8 for case control and proxy case control The ratio of controls to cases (or proxy cases) is 1.
More informationEfficient designs of gene environment interaction studies: implications of Hardy Weinberg equilibrium and gene environment independence
Special Issue Paper Received 7 January 20, Accepted 28 September 20 Published online 24 February 202 in Wiley Online Library (wileyonlinelibrary.com) DOI: 0.002/sim.4460 Efficient designs of gene environment
More informationOn the limiting distribution of the likelihood ratio test in nucleotide mapping of complex disease
On the limiting distribution of the likelihood ratio test in nucleotide mapping of complex disease Yuehua Cui 1 and Dong-Yun Kim 2 1 Department of Statistics and Probability, Michigan State University,
More informationProportional Variance Explained by QLT and Statistical Power. Proportional Variance Explained by QTL and Statistical Power
Proportional Variance Explained by QTL and Statistical Power Partitioning the Genetic Variance We previously focused on obtaining variance components of a quantitative trait to determine the proportion
More informationBTRY 4830/6830: Quantitative Genomics and Genetics Fall 2014
BTRY 4830/6830: Quantitative Genomics and Genetics Fall 2014 Homework 4 (version 3) - posted October 3 Assigned October 2; Due 11:59PM October 9 Problem 1 (Easy) a. For the genetic regression model: Y
More information2. Map genetic distance between markers
Chapter 5. Linkage Analysis Linkage is an important tool for the mapping of genetic loci and a method for mapping disease loci. With the availability of numerous DNA markers throughout the human genome,
More informationBinomial Mixture Model-based Association Tests under Genetic Heterogeneity
Binomial Mixture Model-based Association Tests under Genetic Heterogeneity Hui Zhou, Wei Pan Division of Biostatistics, School of Public Health, University of Minnesota, Minneapolis, MN 55455 April 30,
More information. Also, in this case, p i = N1 ) T, (2) where. I γ C N(N 2 2 F + N1 2 Q)
Supplementary information S7 Testing for association at imputed SPs puted SPs Score tests A Score Test needs calculations of the observed data score and information matrix only under the null hypothesis,
More informationThe purpose of this section is to derive the asymptotic distribution of the Pearson chi-square statistic. k (n j np j ) 2. np j.
Chapter 9 Pearson s chi-square test 9. Null hypothesis asymptotics Let X, X 2, be independent from a multinomial(, p) distribution, where p is a k-vector with nonnegative entries that sum to one. That
More informationNormal distribution We have a random sample from N(m, υ). The sample mean is Ȳ and the corrected sum of squares is S yy. After some simplification,
Likelihood Let P (D H) be the probability an experiment produces data D, given hypothesis H. Usually H is regarded as fixed and D variable. Before the experiment, the data D are unknown, and the probability
More informationPopulation Genetics I. Bio
Population Genetics I. Bio5488-2018 Don Conrad dconrad@genetics.wustl.edu Why study population genetics? Functional Inference Demographic inference: History of mankind is written in our DNA. We can learn
More informationCOMBI - Combining high-dimensional classification and multiple hypotheses testing for the analysis of big data in genetics
COMBI - Combining high-dimensional classification and multiple hypotheses testing for the analysis of big data in genetics Thorsten Dickhaus University of Bremen Institute for Statistics AG DANK Herbsttagung
More informationComputational Approaches to Statistical Genetics
Computational Approaches to Statistical Genetics GWAS I: Concepts and Probability Theory Christoph Lippert Dr. Oliver Stegle Prof. Dr. Karsten Borgwardt Max-Planck-Institutes Tübingen, Germany Tübingen
More informationSTAT 536: Genetic Statistics
STAT 536: Genetic Statistics Tests for Hardy Weinberg Equilibrium Karin S. Dorman Department of Statistics Iowa State University September 7, 2006 Statistical Hypothesis Testing Identify a hypothesis,
More informationMODEL-FREE LINKAGE AND ASSOCIATION MAPPING OF COMPLEX TRAITS USING QUANTITATIVE ENDOPHENOTYPES
MODEL-FREE LINKAGE AND ASSOCIATION MAPPING OF COMPLEX TRAITS USING QUANTITATIVE ENDOPHENOTYPES Saurabh Ghosh Human Genetics Unit Indian Statistical Institute, Kolkata Most common diseases are caused by
More informationA novel fuzzy set based multifactor dimensionality reduction method for detecting gene-gene interaction
A novel fuzzy set based multifactor dimensionality reduction method for detecting gene-gene interaction Sangseob Leem, Hye-Young Jung, Sungyoung Lee and Taesung Park Bioinformatics and Biostatistics lab
More informationExpression QTLs and Mapping of Complex Trait Loci. Paul Schliekelman Statistics Department University of Georgia
Expression QTLs and Mapping of Complex Trait Loci Paul Schliekelman Statistics Department University of Georgia Definitions: Genes, Loci and Alleles A gene codes for a protein. Proteins due everything.
More informationDepartment of Forensic Psychiatry, School of Medicine & Forensics, Xi'an Jiaotong University, Xi'an, China;
Title: Evaluation of genetic susceptibility of common variants in CACNA1D with schizophrenia in Han Chinese Author names and affiliations: Fanglin Guan a,e, Lu Li b, Chuchu Qiao b, Gang Chen b, Tinglin
More informationEquivalence of random-effects and conditional likelihoods for matched case-control studies
Equivalence of random-effects and conditional likelihoods for matched case-control studies Ken Rice MRC Biostatistics Unit, Cambridge, UK January 8 th 4 Motivation Study of genetic c-erbb- exposure and
More informationQTL model selection: key players
Bayesian Interval Mapping. Bayesian strategy -9. Markov chain sampling 0-7. sampling genetic architectures 8-5 4. criteria for model selection 6-44 QTL : Bayes Seattle SISG: Yandell 008 QTL model selection:
More informationDNA polymorphisms such as SNP and familial effects (additive genetic, common environment) to
1 1 1 1 1 1 1 1 0 SUPPLEMENTARY MATERIALS, B. BIVARIATE PEDIGREE-BASED ASSOCIATION ANALYSIS Introduction We propose here a statistical method of bivariate genetic analysis, designed to evaluate contribution
More information(Genome-wide) association analysis
(Genome-wide) association analysis 1 Key concepts Mapping QTL by association relies on linkage disequilibrium in the population; LD can be caused by close linkage between a QTL and marker (= good) or by
More informationSession 3 The proportional odds model and the Mann-Whitney test
Session 3 The proportional odds model and the Mann-Whitney test 3.1 A unified approach to inference 3.2 Analysis via dichotomisation 3.3 Proportional odds 3.4 Relationship with the Mann-Whitney test Session
More informationPackage LBLGXE. R topics documented: July 20, Type Package
Type Package Package LBLGXE July 20, 2015 Title Bayesian Lasso for detecting Rare (or Common) Haplotype Association and their interactions with Environmental Covariates Version 1.2 Date 2015-07-09 Author
More informationHybrid CPU/GPU Acceleration of Detection of 2-SNP Epistatic Interactions in GWAS
Hybrid CPU/GPU Acceleration of Detection of 2-SNP Epistatic Interactions in GWAS Jorge González-Domínguez*, Bertil Schmidt*, Jan C. Kässens**, Lars Wienbrandt** *Parallel and Distributed Architectures
More informationML Testing (Likelihood Ratio Testing) for non-gaussian models
ML Testing (Likelihood Ratio Testing) for non-gaussian models Surya Tokdar ML test in a slightly different form Model X f (x θ), θ Θ. Hypothesist H 0 : θ Θ 0 Good set: B c (x) = {θ : l x (θ) max θ Θ l
More informationMultiple QTL mapping
Multiple QTL mapping Karl W Broman Department of Biostatistics Johns Hopkins University www.biostat.jhsph.edu/~kbroman [ Teaching Miscellaneous lectures] 1 Why? Reduce residual variation = increased power
More informationBioinformatics. Genotype -> Phenotype DNA. Jason H. Moore, Ph.D. GECCO 2007 Tutorial / Bioinformatics.
Bioinformatics Jason H. Moore, Ph.D. Frank Lane Research Scholar in Computational Genetics Associate Professor of Genetics Adjunct Associate Professor of Biological Sciences Adjunct Associate Professor
More informationComputational Systems Biology: Biology X
Bud Mishra Room 1002, 715 Broadway, Courant Institute, NYU, New York, USA L#5:(Mar-21-2010) Genome Wide Association Studies 1 Experiments on Garden Peas Statistical Significance 2 The law of causality...
More informationPowerful multi-locus tests for genetic association in the presence of gene-gene and gene-environment interactions
Powerful multi-locus tests for genetic association in the presence of gene-gene and gene-environment interactions Nilanjan Chatterjee, Zeynep Kalaylioglu 2, Roxana Moslehi, Ulrike Peters 3, Sholom Wacholder
More informationGWAS IV: Bayesian linear (variance component) models
GWAS IV: Bayesian linear (variance component) models Dr. Oliver Stegle Christoh Lippert Prof. Dr. Karsten Borgwardt Max-Planck-Institutes Tübingen, Germany Tübingen Summer 2011 Oliver Stegle GWAS IV: Bayesian
More informationChapter 2. Review of basic Statistical methods 1 Distribution, conditional distribution and moments
Chapter 2. Review of basic Statistical methods 1 Distribution, conditional distribution and moments We consider two kinds of random variables: discrete and continuous random variables. For discrete random
More informationLecture 9. QTL Mapping 2: Outbred Populations
Lecture 9 QTL Mapping 2: Outbred Populations Bruce Walsh. Aug 2004. Royal Veterinary and Agricultural University, Denmark The major difference between QTL analysis using inbred-line crosses vs. outbred
More informationLecture 6: Introduction to Quantitative genetics. Bruce Walsh lecture notes Liege May 2011 course version 25 May 2011
Lecture 6: Introduction to Quantitative genetics Bruce Walsh lecture notes Liege May 2011 course version 25 May 2011 Quantitative Genetics The analysis of traits whose variation is determined by both a
More informationAdaptive testing of conditional association through Bayesian recursive mixture modeling
Adaptive testing of conditional association through Bayesian recursive mixture modeling Li Ma February 12, 2013 Abstract In many case-control studies, a central goal is to test for association or dependence
More informationThe Generalized Higher Criticism for Testing SNP-sets in Genetic Association Studies
The Generalized Higher Criticism for Testing SNP-sets in Genetic Association Studies Ian Barnett, Rajarshi Mukherjee & Xihong Lin Harvard University ibarnett@hsph.harvard.edu June 24, 2014 Ian Barnett
More informationIntroduction to Linkage Disequilibrium
Introduction to September 10, 2014 Suppose we have two genes on a single chromosome gene A and gene B such that each gene has only two alleles Aalleles : A 1 and A 2 Balleles : B 1 and B 2 Suppose we have
More informationMeasures of Association and Variance Estimation
Measures of Association and Variance Estimation Dipankar Bandyopadhyay, Ph.D. Department of Biostatistics, Virginia Commonwealth University D. Bandyopadhyay (VCU) BIOS 625: Categorical Data & GLM 1 / 35
More informationAsymptotic distribution of the largest eigenvalue with application to genetic data
Asymptotic distribution of the largest eigenvalue with application to genetic data Chong Wu University of Minnesota September 30, 2016 T32 Journal Club Chong Wu 1 / 25 Table of Contents 1 Background Gene-gene
More informationAn introduction to biostatistics: part 1
An introduction to biostatistics: part 1 Cavan Reilly September 6, 2017 Table of contents Introduction to data analysis Uncertainty Probability Conditional probability Random variables Discrete random
More informationPearson s Test, Trend Test, and MAX Are All Trend Tests with Different Types of Scores
Commentary doi: 101111/1469-180900800500x Pearson s Test, Trend Test, and MAX Are All Trend Tests with Different Types of Scores Gang Zheng 1, Jungnam Joo 1 and Yaning Yang 1 Office of Biostatistics Research,
More informationAnalyzing metabolomics data for association with genotypes using two-component Gaussian mixture distributions
Analyzing metabolomics data for association with genotypes using two-component Gaussian mixture distributions Jason Westra Department of Statistics, Iowa State University Ames, IA 50011, United States
More informationp(d g A,g B )p(g B ), g B
Supplementary Note Marginal effects for two-locus models Here we derive the marginal effect size of the three models given in Figure 1 of the main text. For each model we assume the two loci (A and B)
More informationDiscrete Multivariate Statistics
Discrete Multivariate Statistics Univariate Discrete Random variables Let X be a discrete random variable which, in this module, will be assumed to take a finite number of t different values which are
More informationCategorical Data Analysis Chapter 3
Categorical Data Analysis Chapter 3 The actual coverage probability is usually a bit higher than the nominal level. Confidence intervals for association parameteres Consider the odds ratio in the 2x2 table,
More informationLecture 21: October 19
36-705: Intermediate Statistics Fall 2017 Lecturer: Siva Balakrishnan Lecture 21: October 19 21.1 Likelihood Ratio Test (LRT) To test composite versus composite hypotheses the general method is to use
More informationLecture 25. Ingo Ruczinski. November 24, Department of Biostatistics Johns Hopkins Bloomberg School of Public Health Johns Hopkins University
Lecture 25 Department of Biostatistics Johns Hopkins Bloomberg School of Public Health Johns Hopkins University November 24, 2015 1 2 3 4 5 6 7 8 9 10 11 1 Hypothesis s of homgeneity 2 Estimating risk
More informationStochastic processes and
Stochastic processes and Markov chains (part II) Wessel van Wieringen w.n.van.wieringen@vu.nl wieringen@vu nl Department of Epidemiology and Biostatistics, VUmc & Department of Mathematics, VU University
More informationBackward Genotype-Trait Association. in Case-Control Designs
Backward Genotype-Trait Association (BGTA)-Based Dissection of Complex Traits in Case-Control Designs Tian Zheng, Hui Wang and Shaw-Hwa Lo Department of Statistics, Columbia University, New York, New York,
More informationUsing a CUDA-Accelerated PGAS Model on a GPU Cluster for Bioinformatics
Using a CUDA-Accelerated PGAS Model on a GPU Cluster for Bioinformatics Jorge González-Domínguez Parallel and Distributed Architectures Group Johannes Gutenberg University of Mainz, Germany j.gonzalez@uni-mainz.de
More informationThe Generalized Higher Criticism for Testing SNP-sets in Genetic Association Studies
The Generalized Higher Criticism for Testing SNP-sets in Genetic Association Studies Ian Barnett, Rajarshi Mukherjee & Xihong Lin Harvard University ibarnett@hsph.harvard.edu August 5, 2014 Ian Barnett
More informationLecture WS Evolutionary Genetics Part I 1
Quantitative genetics Quantitative genetics is the study of the inheritance of quantitative/continuous phenotypic traits, like human height and body size, grain colour in winter wheat or beak depth in
More informationInferring Genetic Architecture of Complex Biological Processes
Inferring Genetic Architecture of Complex Biological Processes BioPharmaceutical Technology Center Institute (BTCI) Brian S. Yandell University of Wisconsin-Madison http://www.stat.wisc.edu/~yandell/statgen
More informationHow to analyze many contingency tables simultaneously?
How to analyze many contingency tables simultaneously? Thorsten Dickhaus Humboldt-Universität zu Berlin Beuth Hochschule für Technik Berlin, 31.10.2012 Outline Motivation: Genetic association studies Statistical
More informationI Have the Power in QTL linkage: single and multilocus analysis
I Have the Power in QTL linkage: single and multilocus analysis Benjamin Neale 1, Sir Shaun Purcell 2 & Pak Sham 13 1 SGDP, IoP, London, UK 2 Harvard School of Public Health, Cambridge, MA, USA 3 Department
More informationModule 4: Bayesian Methods Lecture 9 A: Default prior selection. Outline
Module 4: Bayesian Methods Lecture 9 A: Default prior selection Peter Ho Departments of Statistics and Biostatistics University of Washington Outline Je reys prior Unit information priors Empirical Bayes
More informationLecture 7: Interaction Analysis. Summer Institute in Statistical Genetics 2017
Lecture 7: Interaction Analysis Timothy Thornton and Michael Wu Summer Institute in Statistical Genetics 2017 1 / 39 Lecture Outline Beyond main SNP effects Introduction to Concept of Statistical Interaction
More informationModel-Free Knockoffs: High-Dimensional Variable Selection that Controls the False Discovery Rate
Model-Free Knockoffs: High-Dimensional Variable Selection that Controls the False Discovery Rate Lucas Janson, Stanford Department of Statistics WADAPT Workshop, NIPS, December 2016 Collaborators: Emmanuel
More informationLecture 01: Introduction
Lecture 01: Introduction Dipankar Bandyopadhyay, Ph.D. BMTRY 711: Analysis of Categorical Data Spring 2011 Division of Biostatistics and Epidemiology Medical University of South Carolina Lecture 01: Introduction
More informationRégression en grande dimension et épistasie par blocs pour les études d association
Régression en grande dimension et épistasie par blocs pour les études d association V. Stanislas, C. Dalmasso, C. Ambroise Laboratoire de Mathématiques et Modélisation d Évry "Statistique et Génome" 1
More informationMethods for Cryptic Structure. Methods for Cryptic Structure
Case-Control Association Testing Review Consider testing for association between a disease and a genetic marker Idea is to look for an association by comparing allele/genotype frequencies between the cases
More informationPartitioning Genetic Variance
PSYC 510: Partitioning Genetic Variance (09/17/03) 1 Partitioning Genetic Variance Here, mathematical models are developed for the computation of different types of genetic variance. Several substantive
More informationRobust Detection and Identification of Sparse Segments in Ultra-High Dimensional Data Analysis
Robust Detection and Identification of Sparse Segments in Ultra-High Dimensional Data Analysis Hongzhe Li hongzhe@upenn.edu, http://statgene.med.upenn.edu University of Pennsylvania Perelman School of
More informationPopulation Genetics. with implications for Linkage Disequilibrium. Chiara Sabatti, Human Genetics 6357a Gonda
1 Population Genetics with implications for Linkage Disequilibrium Chiara Sabatti, Human Genetics 6357a Gonda csabatti@mednet.ucla.edu 2 Hardy-Weinberg Hypotheses: infinite populations; no inbreeding;
More informationSTAT 526 Spring Midterm 1. Wednesday February 2, 2011
STAT 526 Spring 2011 Midterm 1 Wednesday February 2, 2011 Time: 2 hours Name (please print): Show all your work and calculations. Partial credit will be given for work that is partially correct. Points
More informationStatistical Power of Model Selection Strategies for Genome-Wide Association Studies
Statistical Power of Model Selection Strategies for Genome-Wide Association Studies Zheyang Wu 1, Hongyu Zhao 1,2 * 1 Department of Epidemiology and Public Health, Yale University School of Medicine, New
More informationGenetic Association Studies in the Presence of Population Structure and Admixture
Genetic Association Studies in the Presence of Population Structure and Admixture Purushottam W. Laud and Nicholas M. Pajewski Division of Biostatistics Department of Population Health Medical College
More informationSolutions for Examination Categorical Data Analysis, March 21, 2013
STOCKHOLMS UNIVERSITET MATEMATISKA INSTITUTIONEN Avd. Matematisk statistik, Frank Miller MT 5006 LÖSNINGAR 21 mars 2013 Solutions for Examination Categorical Data Analysis, March 21, 2013 Problem 1 a.
More informationReview. Timothy Hanson. Department of Statistics, University of South Carolina. Stat 770: Categorical Data Analysis
Review Timothy Hanson Department of Statistics, University of South Carolina Stat 770: Categorical Data Analysis 1 / 22 Chapter 1: background Nominal, ordinal, interval data. Distributions: Poisson, binomial,
More information1 Preliminary Variance component test in GLM Mediation Analysis... 3
Honglang Wang Depart. of Stat. & Prob. wangho16@msu.edu Omics Data Integration Statistical Genetics/Genomics Journal Club Summary and discussion of Joint Analysis of SNP and Gene Expression Data in Genetic
More informationStatistics 3858 : Contingency Tables
Statistics 3858 : Contingency Tables 1 Introduction Before proceeding with this topic the student should review generalized likelihood ratios ΛX) for multinomial distributions, its relation to Pearson
More information1 Springer. Nan M. Laird Christoph Lange. The Fundamentals of Modern Statistical Genetics
1 Springer Nan M. Laird Christoph Lange The Fundamentals of Modern Statistical Genetics 1 Introduction to Statistical Genetics and Background in Molecular Genetics 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
More informationMultivariate analysis of genetic data an introduction
Multivariate analysis of genetic data an introduction Thibaut Jombart MRC Centre for Outbreak Analysis and Modelling Imperial College London Population genomics in Lausanne 23 Aug 2016 1/25 Outline Multivariate
More informationTopic 21 Goodness of Fit
Topic 21 Goodness of Fit Contingency Tables 1 / 11 Introduction Two-way Table Smoking Habits The Hypothesis The Test Statistic Degrees of Freedom Outline 2 / 11 Introduction Contingency tables, also known
More informationModel Selection for Multiple QTL
Model Selection for Multiple TL 1. reality of multiple TL 3-8. selecting a class of TL models 9-15 3. comparing TL models 16-4 TL model selection criteria issues of detecting epistasis 4. simulations and
More informationAffected Sibling Pairs. Biostatistics 666
Affected Sibling airs Biostatistics 666 Today Discussion of linkage analysis using affected sibling pairs Our exploration will include several components we have seen before: A simple disease model IBD
More informationMultiple regression. CM226: Machine Learning for Bioinformatics. Fall Sriram Sankararaman Acknowledgments: Fei Sha, Ameet Talwalkar
Multiple regression CM226: Machine Learning for Bioinformatics. Fall 2016 Sriram Sankararaman Acknowledgments: Fei Sha, Ameet Talwalkar Multiple regression 1 / 36 Previous two lectures Linear and logistic
More informationLecture 3: Basic Statistical Tools. Bruce Walsh lecture notes Tucson Winter Institute 7-9 Jan 2013
Lecture 3: Basic Statistical Tools Bruce Walsh lecture notes Tucson Winter Institute 7-9 Jan 2013 1 Basic probability Events are possible outcomes from some random process e.g., a genotype is AA, a phenotype
More informationReports of the Institute of Biostatistics
Reports of the Institute of Biostatistics No 02 / 2008 Leibniz University of Hannover Natural Sciences Faculty Title: Properties of confidence intervals for the comparison of small binomial proportions
More informationCS 4491/CS 7990 SPECIAL TOPICS IN BIOINFORMATICS
CS 4491/CS 7990 SPECIAL TOPICS IN BIOINFORMATICS * Some contents are adapted from Dr. Hung Huang and Dr. Chengkai Li at UT Arlington Mingon Kang, Ph.D. Computer Science, Kennesaw State University Problems
More information