Genetic Association Studies in the Presence of Population Structure and Admixture
|
|
- Lesley Matthews
- 5 years ago
- Views:
Transcription
1 Genetic Association Studies in the Presence of Population Structure and Admixture Purushottam W. Laud and Nicholas M. Pajewski Division of Biostatistics Department of Population Health Medical College of Wisconsin A presentation in the BNP Workshop, INI, Cambridge, August 7, 2007
2 Outline Some genetics background Impact of structure on association studies Remedies A DPM model Clustering success Estimation success A model for admixture Detection of admixed individuals
3 Some Genetics Terminology Chromosome 1 A d GAATTAGTCAAGC Chromosome 2 a D Chromosome Locus / Polymorphism Allele Genotype Two alleles A and a yield three potential genotypes at each locus: AA, aa/aa, or aa Phenotype
4 Propagation, Inheritance and Diversity In a pair of homologous chromosomes, one is inherited from the mother and the other from the father Recombination can result in alleles that differ from those of parents Mechanisms leading to variability in the frequency of genetic variants in human populations Positive Selection Ancestral Admixture Genetic Drift Allele frequency assumptions Hardy-Weinberg Equilibrium - Chromosome independence Linkage Disequilibrium - Locus to locus independence
5 Wright s Fixation Index Wright s F ST - Measures the divergence between populations relative to the overall level of genetic diversity within a species (Wright 1951) Simulation mechanism {Balding & Nichols (1995)} : Let p denote ancestral allele frequency at a locus. This could be drawn from, say, Uniform(0.10,0.90) or fixed Then, π - the allele frequency at this locus is simulated as π Beta (rp, r(1 p)), r = (1 F ST ) F ST E(π) = p, V ar(π) = p(1 p)f ST
6 Common Study Designs Main goal is to determine if a phenotype is associated with a genotype at some locus or a combination of loci Phenotype of interest can be quantitative or binary Sampling design can be cross-sectional or case-control Disease presence/absence, sample of cases and sample of non-cases leads to 2 2 table with column totals fixed Generally, traditional analyses use linear models and contingency tables
7 Population Structure and Impact on Association If sampling from a homogeneous population with stable allele frequencies at loci of interest, traditional methods can appropriately determine association Variation in allele frequencies in human subpopulations can mislead these analyses A disease with higher prevalence in one subpopulation than the other will look spuriously associated with alleles that have higher prevalence in the first Attenuation of a genetic effect is also possible - e.g., high prevalence in a small subpopulation
8 Impact of Structure on Association Genetics Literature Considerable debate in the genetics community about the potential for spurious associations caused by population allele frequency differences (Wacholder et.al. 2000; Cardon and Palmer 2003; Freedmen et.al. 2004) Bidirectional effect - Structure can also mask or reverse genetic effects (Deng 2001) Examples Knowler et.al. (1988) - Type 2 diabetes in American Indians Campbell et.al. (2005) - Lactase (LCT) gene and tall/short status in European Americans
9 Remedies Transmission Disequilibrium Test - Spielman et al. (1993) Genomic Control - Devlin & Roeder (1999) Modeled Classification - Pritchard et al. (2000) Structural Association - Satten et al. (2001)
10 Modeled Classification Two-Stage Approach of Pritchard et.al. Assume we have a sample of N individuals, consisting of n 1 cases and n 0 controls Each subject is genotyped at L null marker loci and the candidate locus Assume each subject originates from one of K distinct subpopulations, where K is fixed. Notation Population pointer i = 1,..., N z i Genotype X (c) il l = 1,..., (L + 1) Genotype frequencies p klj j = 1, 2, 3 Admixture proportions q (i) k Z X P Q
11 Modeled Classification STRUCTURE Intermediate Goal: Inference on P and Z/Q based on the genotypic data X Pritchard et al construct a Markov Chain Monte Carlo (MCMC) algorithm to approximate the posterior distribution P r (P, Z/Q X) P r (Z/Q) P r (P ) P r (X P, Z/Q) Given Z and P, Pritchard et al then test H0 : No association between candidate allele frequencies and phenotype φ H1 : Candidate locus allele frequencies depend on phenotype φ
12 Modeled Classification Likelihood Ratio Statistic: STRAT Define Λ = P r 1 (X; ˆP ) 1, Q ) P r 0 (X; ˆP 0, Q where [ ] P r (i) 0 X (c) i1 = j P 0, φ = k [ ] P r (i) 1 X (c) i1 = j P 1, φ = k q (i) k p k1j q (i) k p[φ(i) ] k1j The maximized likelihoods under each hypothesis are estimated using the EM algorithm Significance levels for Λ are approximated through simulation
13 Modeled Classification Extensions and Other Approaches Falush et.al. (2003) - Extended Pritchard et.al. s original work to allow for linkage disequilibrium and correlated allele frequencies Zhang et.al. (2006) - Parametric Bayesian approach that incorporates the inherent variability in estimating K through Model Averaging Other methods include Frequentist approaches such as Satten et.al. (2001), Epstein et.al. (2007), and Price et.al (2006) and a Bayesian approach for quantitative traits by Hoggart et.al. (2003)
14 Model : Quantitative Trait Part Given a sample of N individuals, genotyped at L null marker loci and C candidates, Y i Normal (µ i, τ P ) µ i = β 0i + G i + γz i C G i = (β il W il + β 2l V il ) l=1 V and W are indicators of AA and aa respectively
15 Model : Genetic Marker Part Let θ il = logit(π il ), L(W i, V i ) = L+C l=1 [ 2 W il ] exp(θil (2V il + W il )) (1 + exp(θ il )) 2 β 0i, θ i1,..., θ i,l+c G G G α, G 0 Dirichlet Process (αg 0 ) α Gamma (α 0, λ 0 ) G 0 = L+C f(β 0 ) f(θ il ) l=1
16 Model Graph l=1,..,c G θ il θ il β 0i (W il, V il ) (W il, V il ) β 1l β 2l l=c+1,..,l τ P Y i i=1,...,n
17 Simulation of Posterior Methods for DPM Conjugate Setting Escobar (1994), Escobar and West (1995), Bush and MacEachern (1996) Split-Merge Samplers: Jain and Neal (2004), Green and Richardson (2001) Non-conjugate Setting No-Gaps algorithm: MacEachern and Muller (1998) Neal (2000), Walker and Damien (1998) Jain and Neal (2007) extended their Split-Merge algorithm to a conditionally conjugate setting
18 Simulation of Posterior Algorithm 5 of Neal (2000) Given K current clusters of distinct values 1. Update configuration indicator for each individual (s i ) by proposing a new cluster membership (s i ) R times P (s i = j s i ) = n i,j /(N + α 1) for j=1,..,k P (s i = K + 1 s i ) = α/(n + α 1) 2. Use Metropolis-Hastings update for proposed move [ L(Yi, W a(s i, s i, V i φ s i i ) = min ) ] L(Y i, W i, V i φ si ), 1 (1) 3. Update other parameters given current configuration. Combination of conjugate updates and Adaptive Rejection Sampling (ARS) (Gilks and Wild 1992)
19 Simulation of Posterior MCMC Scheme Given j=1,2,..,k distinct clusters in the DP, let φ j = (β 0j, θ j1,..., θ jl ), Update configuration using Neal s Algorithm, sampling from [s i (s i, φ)] for i=1,...,n For j=1,2,..,k, Sample [β 0j X j, β 11,.., β 1C, β 21,.., β 2C, τ P ] where X j = {Y i, W i, V i s i = j} (conjugate normal update) For l=1,...,l, update cluster specific allele frequencies, sampling from [θ jl W j, V j ] where (W j, V j ) denote genotype data for subjects in j th cluster (ARS)
20 Simulation of Posterior MCMC Scheme (continued) Update genetic effects at candidate loci, for l=1,2,..,c Sample [ β 1l Y, W, β 01,.., β 0K, β (1 l), β 21,.., β 2C, τ P ] Sample [ β 2l Y, V, β 01,.., β 0K, β (2 l), β 11,.., β 1C ] where β (r l) = β (r1),.., β (r,l 1), β (r,l+1),.., β (rc) (both are conjugate normal updates) Update phenotypic precision Sample [τ P Y, W, V, β 01,.., β 0K, β 11,.., β 1C, β 21,.., β 2C ] (conjugate gamma update)
21 A Two-Population Scenario Sample Size: 400 subjects with 500 markers Pop. 1 (50%) with F ST = 0.01 Pop. 2 (50%) with F ST = Simulated data characteristics The biggest allele frequency difference,δ l = π 1l π 2l, occurs at locus 254, where δ 254 = =.355 The smallest difference occurs at locus 223, δ 223 = =.000
22 Prior for K n=400 and α=1
23 Posterior Distribution for K Two populations scenario: n=400, L=500 and α=1
24 Predictive Density for π 254 π P OP = and π P OP = 0.358
25 Predictive Density for π 223 π P OP = and π P OP = 0.820
26 Predictive Density for π 109 π P OP = and π P OP = 0.598
27 A Four-Population High Heterogeneity Scenario Sample Size: 800 subjects at 1000 markers Population % of sample F (j) ST
28 Prior for K n=800 and α=1
29 Posterior for K 4 Populations - High Heterogeneity
30 Predictive Density for π 43 (20%) π (1) = 0.07 (40%) π (2) = 0.08 (30%) π (3) = 0.37 (10%) π (4) = 0.24
31 Predictive Density for π 47 (20%) π (1) = 0.34 (40%) π (2) = 0.39 (30%) π (3) = 0.42 (10%) π (4) = 0.34
32 A Four-Population Low Heterogeneity Scenario Sample Size: 800 subjects at 1000 markers Population % of sample F (j) ST
33 Posterior for K 4 Populations - Low Heterogeneity
34 Quantitative Trait Model Simulation Three scenarios where individuals originate from two ancestral populations Baseline trait mean differs between populations, with β 01 = and β 02 = Subjects genotyped at 750 markers including 5 candidates for association with the phenotype (generically denoted as loci 1-5) Scenarios I. n=700; Pop. 1 (50%) and Pop. 2 (50%) II. n=1000; Pop. 1 (50%) and Pop. 2 (50%) III. Same as I plus 300 admixed individuals
35 Simulation Setup Locus β 1l β 2l A l D l Y i Normal (µ i, τ P ) where 5 µ i = β 0i + (β 1l W il + β 2l V il ) τ P = 0.01 l=1 A l and D l = Additive / Dominance effects Falconer & MacKay (1996)
36 Generation of Admixed Ancestry Individuals Assume j=1,..,k subpopulations with associated allele frequencies π jl at each locus Let q i denote an admixture vector where q ij denotes the proportion of the i th admixed individual s genome that originates from the j th subpopulation. q i Dirichlet(α q,..., α q ) where α q was set at 1.0 Then the genotype at the l th locus is independently simulated as Binomial(2, K q ij π jl ) j=1
37 Naive Analysis Effect True ˆβ 95% CI* χ 2 2,0.99 = 9.28 β to β to β to β to β to β to 2.24 β to β to β to β to *Simultaneous CI using Bonferroni Correction
38 Scenario I Results Two Distinct Subpopulations (n=700) Parameter True ˆβ 95% Cred. Int.* χ 2 l β to β to β to β to 7.00 β to β to 3.33 β to β to β to β to 1.93 * denotes simultaneous 95% Credible Intervals
39 Scenario II Results Two Distinct Subpopulations (n=1000) Parameter True ˆβ 95% Cred. Int.* χ 2 l β to β to β to β to 7.35 β to β to 4.13 β to β to β to β to 4.64 * denotes simultaneous 95% Credible Intervals
40 Scenario III Results n=1000, Including 300 Admixed Individuals Parameter True ˆβ 95% Cred. Int.* χ 2 l β to β to β to β to 7.06 β to β to 2.31 β to β to β to β to 7.48 * denotes simultaneous 95% Credible Intervals
41 Comparative Results Parameter True N S I S II S III β β β β β β β β β β
42 Model for Admixture Inclusion of highly admixed individuals affects estimation The DPM construction above clusters individuals based on their entire set of genotypes at all L loci. It does not allow for differing ancestries across the genome How to handle admixed individuals more appropriately? Assume, for a moment, that we know which individuals are highly admixed. Divide the sample between N L low admixture subjects and N H high admixture subjects We then consider a separate parametric likelihood for the N H individuals that uses locus-specific information, effectively making an assignment at each locus for each admixed person to the clusters detected by the Dirichlet Process
43 Admixture Model Likelihood For the i th admixed subject, let X (c) il denote the allele at the l th locus on the c th chromosome, c = 1, 2. Let z (c) il denote a subpopulation assignment indicator for that allele Let q ij represent an N L dimensional admixture vector that denotes how much of that person s genome originates from the subpopulations represented by the low admixture sample. L H = N H i=1 L τ 1/2 P 2π exp l=1 c=1j=1 τ P 2 2 N L (q ij e θ jlx (c) il 1 + e θ jl ( ) N 2 L Y i q ij β 0j G i ) I(z (c) il j=1 =j)
44 Admixture Model Continued The model is then the same as before except for the addition of two prior specifications q i Dirichlet (α 1,..., α NH ) for i = 1,..., N H Pr(z (c) il = j)= 1 N L for j = 1,..., N L Note: Although the model is setup to assign alleles to specific low-admixture individuals, because of the inherent Dirichlet Process clustering, the calculations simplify to assigning admixed alleles to the current K distinct clusters
45 Admixture Model Graph l = C + 1,.., L j = 1,.., N L θ jl l = 1,.., C i = 1,.., N H X (c) il G q ij β 0j Y i X (c) θ jl il β 2l β 1l τ P
46 Scenario III Results With Hybrid Model Parameter True ˆβ 95% Cred. Int.* χ 2 l β to β to β to β to 8.17 β to β to 2.73 β to β to β to β to 6.33 * denotes simultaneous 95% Credible Intervals
47 Comparative Results Parameter True N S I S II S III S IIIA β β β β β β β β β β
48 Detecting Admixed Individuals Let s (g) i denote the cluster membership indicator for the i th individual at the g th MCMC iteration, then define Then let ηˆ (g) il = n ɛ (g+1) i = j:s (g) j L l=1 N (2V o jl + W jl ) =s (g) i 2n s (g) i ( ˆη (g+1) il ) 2 ˆη (g) il
49 Results on Detection
50 Results on Detection
51 Modifications for Case-Control Studies Binary Outcomes Simply involves switching to a logistic or probit likelihood. Case-Control Studies The extension to a case-control setting is also relatively straightforward Need to specify a retrospective likelihood such as that proposed by Seaman and Richardson (2001) Müller and Roeder (1997)
52 Extensions 1. Clustering by genome sections 2. Introducing dependence structure amongst neighboring loci, i.e. incorporating linkage disequilibrium patterns 3. Extending to Genomewide Association Studies - Removing candidate loci designation so that all markers are potentially phenotypically associated
53 References In a separate file
Methods for Cryptic Structure. Methods for Cryptic Structure
Case-Control Association Testing Review Consider testing for association between a disease and a genetic marker Idea is to look for an association by comparing allele/genotype frequencies between the cases
More informationExpression QTLs and Mapping of Complex Trait Loci. Paul Schliekelman Statistics Department University of Georgia
Expression QTLs and Mapping of Complex Trait Loci Paul Schliekelman Statistics Department University of Georgia Definitions: Genes, Loci and Alleles A gene codes for a protein. Proteins due everything.
More informationBinomial Mixture Model-based Association Tests under Genetic Heterogeneity
Binomial Mixture Model-based Association Tests under Genetic Heterogeneity Hui Zhou, Wei Pan Division of Biostatistics, School of Public Health, University of Minnesota, Minneapolis, MN 55455 April 30,
More informationPhasing via the Expectation Maximization (EM) Algorithm
Computing Haplotype Frequencies and Haplotype Phasing via the Expectation Maximization (EM) Algorithm Department of Computer Science Brown University, Providence sorin@cs.brown.edu September 14, 2010 Outline
More informationCSci 8980: Advanced Topics in Graphical Models Analysis of Genetic Variation
CSci 8980: Advanced Topics in Graphical Models Analysis of Genetic Variation Instructor: Arindam Banerjee November 26, 2007 Genetic Polymorphism Single nucleotide polymorphism (SNP) Genetic Polymorphism
More informationGENETIC association studies aim to map phenotypeinfluencing
Copyright Ó 2006 by the Genetics Society of America DOI: 10.1534/genetics.105.055335 A General Population-Genetic Model for the Production by Population Structure of Spurious Genotype Phenotype Associations
More informationTutorial Session 2. MCMC for the analysis of genetic data on pedigrees:
MCMC for the analysis of genetic data on pedigrees: Tutorial Session 2 Elizabeth Thompson University of Washington Genetic mapping and linkage lod scores Monte Carlo likelihood and likelihood ratio estimation
More informationBayesian Inference of Interactions and Associations
Bayesian Inference of Interactions and Associations Jun Liu Department of Statistics Harvard University http://www.fas.harvard.edu/~junliu Based on collaborations with Yu Zhang, Jing Zhang, Yuan Yuan,
More information1. Understand the methods for analyzing population structure in genomes
MSCBIO 2070/02-710: Computational Genomics, Spring 2016 HW3: Population Genetics Due: 24:00 EST, April 4, 2016 by autolab Your goals in this assignment are to 1. Understand the methods for analyzing population
More informationCase-Control Association Testing. Case-Control Association Testing
Introduction Association mapping is now routinely being used to identify loci that are involved with complex traits. Technological advances have made it feasible to perform case-control association studies
More informationCalculation of IBD probabilities
Calculation of IBD probabilities David Evans University of Bristol This Session Identity by Descent (IBD) vs Identity by state (IBS) Why is IBD important? Calculating IBD probabilities Lander-Green Algorithm
More informationp(d g A,g B )p(g B ), g B
Supplementary Note Marginal effects for two-locus models Here we derive the marginal effect size of the three models given in Figure 1 of the main text. For each model we assume the two loci (A and B)
More informationDavid B. Dahl. Department of Statistics, and Department of Biostatistics & Medical Informatics University of Wisconsin Madison
AN IMPROVED MERGE-SPLIT SAMPLER FOR CONJUGATE DIRICHLET PROCESS MIXTURE MODELS David B. Dahl dbdahl@stat.wisc.edu Department of Statistics, and Department of Biostatistics & Medical Informatics University
More informationMultiple QTL mapping
Multiple QTL mapping Karl W Broman Department of Biostatistics Johns Hopkins University www.biostat.jhsph.edu/~kbroman [ Teaching Miscellaneous lectures] 1 Why? Reduce residual variation = increased power
More informationEstimating Evolutionary Trees. Phylogenetic Methods
Estimating Evolutionary Trees v if the data are consistent with infinite sites then all methods should yield the same tree v it gets more complicated when there is homoplasy, i.e., parallel or convergent
More informationCalculation of IBD probabilities
Calculation of IBD probabilities David Evans and Stacey Cherny University of Oxford Wellcome Trust Centre for Human Genetics This Session IBD vs IBS Why is IBD important? Calculating IBD probabilities
More informationPopulation Genetics. with implications for Linkage Disequilibrium. Chiara Sabatti, Human Genetics 6357a Gonda
1 Population Genetics with implications for Linkage Disequilibrium Chiara Sabatti, Human Genetics 6357a Gonda csabatti@mednet.ucla.edu 2 Hardy-Weinberg Hypotheses: infinite populations; no inbreeding;
More informationSupplementary Materials for Molecular QTL Discovery Incorporating Genomic Annotations using Bayesian False Discovery Rate Control
Supplementary Materials for Molecular QTL Discovery Incorporating Genomic Annotations using Bayesian False Discovery Rate Control Xiaoquan Wen Department of Biostatistics, University of Michigan A Model
More informationEM algorithm. Rather than jumping into the details of the particular EM algorithm, we ll look at a simpler example to get the idea of how it works
EM algorithm The example in the book for doing the EM algorithm is rather difficult, and was not available in software at the time that the authors wrote the book, but they implemented a SAS macro to implement
More informationQTL model selection: key players
Bayesian Interval Mapping. Bayesian strategy -9. Markov chain sampling 0-7. sampling genetic architectures 8-5 4. criteria for model selection 6-44 QTL : Bayes Seattle SISG: Yandell 008 QTL model selection:
More informationLinear Regression (1/1/17)
STA613/CBB540: Statistical methods in computational biology Linear Regression (1/1/17) Lecturer: Barbara Engelhardt Scribe: Ethan Hada 1. Linear regression 1.1. Linear regression basics. Linear regression
More informationBIOINFORMATICS. Gilles Guillot
BIOINFORMATICS Vol. 00 no. 00 2008 Pages 1 19 Supplementary material for: Inference of structure in subdivided populations at low levels of genetic differentiation. The correlated allele frequencies model
More informationA Bayesian Nonparametric Model for Predicting Disease Status Using Longitudinal Profiles
A Bayesian Nonparametric Model for Predicting Disease Status Using Longitudinal Profiles Jeremy Gaskins Department of Bioinformatics & Biostatistics University of Louisville Joint work with Claudio Fuentes
More informationGenotype Imputation. Biostatistics 666
Genotype Imputation Biostatistics 666 Previously Hidden Markov Models for Relative Pairs Linkage analysis using affected sibling pairs Estimation of pairwise relationships Identity-by-Descent Relatives
More informationLecture 13: Population Structure. October 8, 2012
Lecture 13: Population Structure October 8, 2012 Last Time Effective population size calculations Historical importance of drift: shifting balance or noise? Population structure Today Course feedback The
More informationLecture 1: Case-Control Association Testing. Summer Institute in Statistical Genetics 2015
Timothy Thornton and Michael Wu Summer Institute in Statistical Genetics 2015 1 / 1 Introduction Association mapping is now routinely being used to identify loci that are involved with complex traits.
More informationNonparametric Bayesian Methods - Lecture I
Nonparametric Bayesian Methods - Lecture I Harry van Zanten Korteweg-de Vries Institute for Mathematics CRiSM Masterclass, April 4-6, 2016 Overview of the lectures I Intro to nonparametric Bayesian statistics
More informationAssociation studies and regression
Association studies and regression CM226: Machine Learning for Bioinformatics. Fall 2016 Sriram Sankararaman Acknowledgments: Fei Sha, Ameet Talwalkar Association studies and regression 1 / 104 Administration
More informationPopulations in statistical genetics
Populations in statistical genetics What are they, and how can we infer them from whole genome data? Daniel Lawson Heilbronn Institute, University of Bristol www.paintmychromosomes.com Work with: January
More information1 Springer. Nan M. Laird Christoph Lange. The Fundamentals of Modern Statistical Genetics
1 Springer Nan M. Laird Christoph Lange The Fundamentals of Modern Statistical Genetics 1 Introduction to Statistical Genetics and Background in Molecular Genetics 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
More informationStationary Distribution of the Linkage Disequilibrium Coefficient r 2
Stationary Distribution of the Linkage Disequilibrium Coefficient r 2 Wei Zhang, Jing Liu, Rachel Fewster and Jesse Goodman Department of Statistics, The University of Auckland December 1, 2015 Overview
More informationBustamante et al., Supplementary Nature Manuscript # 1 out of 9 Information #
Bustamante et al., Supplementary Nature Manuscript # 1 out of 9 Details of PRF Methodology In the Poisson Random Field PRF) model, it is assumed that non-synonymous mutations at a given gene are either
More informationA Nonparametric Approach Using Dirichlet Process for Hierarchical Generalized Linear Mixed Models
Journal of Data Science 8(2010), 43-59 A Nonparametric Approach Using Dirichlet Process for Hierarchical Generalized Linear Mixed Models Jing Wang Louisiana State University Abstract: In this paper, we
More informationAssociation Testing with Quantitative Traits: Common and Rare Variants. Summer Institute in Statistical Genetics 2014 Module 10 Lecture 5
Association Testing with Quantitative Traits: Common and Rare Variants Timothy Thornton and Katie Kerr Summer Institute in Statistical Genetics 2014 Module 10 Lecture 5 1 / 41 Introduction to Quantitative
More informationLearning ancestral genetic processes using nonparametric Bayesian models
Learning ancestral genetic processes using nonparametric Bayesian models Kyung-Ah Sohn October 31, 2011 Committee Members: Eric P. Xing, Chair Zoubin Ghahramani Russell Schwartz Kathryn Roeder Matthew
More informationHomework Assignment, Evolutionary Systems Biology, Spring Homework Part I: Phylogenetics:
Homework Assignment, Evolutionary Systems Biology, Spring 2009. Homework Part I: Phylogenetics: Introduction. The objective of this assignment is to understand the basics of phylogenetic relationships
More information. Also, in this case, p i = N1 ) T, (2) where. I γ C N(N 2 2 F + N1 2 Q)
Supplementary information S7 Testing for association at imputed SPs puted SPs Score tests A Score Test needs calculations of the observed data score and information matrix only under the null hypothesis,
More informationBayesian Methods for Machine Learning
Bayesian Methods for Machine Learning CS 584: Big Data Analytics Material adapted from Radford Neal s tutorial (http://ftp.cs.utoronto.ca/pub/radford/bayes-tut.pdf), Zoubin Ghahramni (http://hunch.net/~coms-4771/zoubin_ghahramani_bayesian_learning.pdf),
More informationIntroduction to Linkage Disequilibrium
Introduction to September 10, 2014 Suppose we have two genes on a single chromosome gene A and gene B such that each gene has only two alleles Aalleles : A 1 and A 2 Balleles : B 1 and B 2 Suppose we have
More informationLecture 9. QTL Mapping 2: Outbred Populations
Lecture 9 QTL Mapping 2: Outbred Populations Bruce Walsh. Aug 2004. Royal Veterinary and Agricultural University, Denmark The major difference between QTL analysis using inbred-line crosses vs. outbred
More information(Genome-wide) association analysis
(Genome-wide) association analysis 1 Key concepts Mapping QTL by association relies on linkage disequilibrium in the population; LD can be caused by close linkage between a QTL and marker (= good) or by
More informationBayesian Inference in GLMs. Frequentists typically base inferences on MLEs, asymptotic confidence
Bayesian Inference in GLMs Frequentists typically base inferences on MLEs, asymptotic confidence limits, and log-likelihood ratio tests Bayesians base inferences on the posterior distribution of the unknowns
More informationLecture 2: Genetic Association Testing with Quantitative Traits. Summer Institute in Statistical Genetics 2017
Lecture 2: Genetic Association Testing with Quantitative Traits Instructors: Timothy Thornton and Michael Wu Summer Institute in Statistical Genetics 2017 1 / 29 Introduction to Quantitative Trait Mapping
More informationBayesian Nonparametric Regression for Diabetes Deaths
Bayesian Nonparametric Regression for Diabetes Deaths Brian M. Hartman PhD Student, 2010 Texas A&M University College Station, TX, USA David B. Dahl Assistant Professor Texas A&M University College Station,
More informationMarkov Chain Monte Carlo in Practice
Markov Chain Monte Carlo in Practice Edited by W.R. Gilks Medical Research Council Biostatistics Unit Cambridge UK S. Richardson French National Institute for Health and Medical Research Vilejuif France
More informationAffected Sibling Pairs. Biostatistics 666
Affected Sibling airs Biostatistics 666 Today Discussion of linkage analysis using affected sibling pairs Our exploration will include several components we have seen before: A simple disease model IBD
More informationCover Page. The handle holds various files of this Leiden University dissertation
Cover Page The handle http://hdl.handle.net/1887/35195 holds various files of this Leiden University dissertation Author: Balliu, Brunilda Title: Statistical methods for genetic association studies with
More informationBayesian analysis of the Hardy-Weinberg equilibrium model
Bayesian analysis of the Hardy-Weinberg equilibrium model Eduardo Gutiérrez Peña Department of Probability and Statistics IIMAS, UNAM 6 April, 2010 Outline Statistical Inference 1 Statistical Inference
More informationBayesian Regression (1/31/13)
STA613/CBB540: Statistical methods in computational biology Bayesian Regression (1/31/13) Lecturer: Barbara Engelhardt Scribe: Amanda Lea 1 Bayesian Paradigm Bayesian methods ask: given that I have observed
More informationProbability of Detecting Disease-Associated SNPs in Case-Control Genome-Wide Association Studies
Probability of Detecting Disease-Associated SNPs in Case-Control Genome-Wide Association Studies Ruth Pfeiffer, Ph.D. Mitchell Gail Biostatistics Branch Division of Cancer Epidemiology&Genetics National
More information1.5.1 ESTIMATION OF HAPLOTYPE FREQUENCIES:
.5. ESTIMATION OF HAPLOTYPE FREQUENCIES: Chapter - 8 For SNPs, alleles A j,b j at locus j there are 4 haplotypes: A A, A B, B A and B B frequencies q,q,q 3,q 4. Assume HWE at haplotype level. Only the
More informationSimultaneous Inference for Multiple Testing and Clustering via Dirichlet Process Mixture Models
Simultaneous Inference for Multiple Testing and Clustering via Dirichlet Process Mixture Models David B. Dahl Department of Statistics Texas A&M University Marina Vannucci, Michael Newton, & Qianxing Mo
More informationProportional Variance Explained by QLT and Statistical Power. Proportional Variance Explained by QTL and Statistical Power
Proportional Variance Explained by QTL and Statistical Power Partitioning the Genetic Variance We previously focused on obtaining variance components of a quantitative trait to determine the proportion
More informationSimilarity Measures and Clustering In Genetics
Similarity Measures and Clustering In Genetics Daniel Lawson Heilbronn Institute for Mathematical Research School of mathematics University of Bristol www.paintmychromosomes.com Talk outline Introduction
More information2. Map genetic distance between markers
Chapter 5. Linkage Analysis Linkage is an important tool for the mapping of genetic loci and a method for mapping disease loci. With the availability of numerous DNA markers throughout the human genome,
More informationNonparametric Bayes Modeling
Nonparametric Bayes Modeling Lecture 6: Advanced Applications of DPMs David Dunson Department of Statistical Science, Duke University Tuesday February 2, 2010 Motivation Functional data analysis Variable
More informationComputational statistics
Computational statistics Markov Chain Monte Carlo methods Thierry Denœux March 2017 Thierry Denœux Computational statistics March 2017 1 / 71 Contents of this chapter When a target density f can be evaluated
More informationNeutral Theory of Molecular Evolution
Neutral Theory of Molecular Evolution Kimura Nature (968) 7:64-66 King and Jukes Science (969) 64:788-798 (Non-Darwinian Evolution) Neutral Theory of Molecular Evolution Describes the source of variation
More informationBayes methods for categorical data. April 25, 2017
Bayes methods for categorical data April 25, 2017 Motivation for joint probability models Increasing interest in high-dimensional data in broad applications Focus may be on prediction, variable selection,
More informationSupporting Information Text S1
Supporting Information Text S1 List of Supplementary Figures S1 The fraction of SNPs s where there is an excess of Neandertal derived alleles n over Denisova derived alleles d as a function of the derived
More informationLinkage and Linkage Disequilibrium
Linkage and Linkage Disequilibrium Summer Institute in Statistical Genetics 2014 Module 10 Topic 3 Linkage in a simple genetic cross Linkage In the early 1900 s Bateson and Punnet conducted genetic studies
More informationProblems for 3505 (2011)
Problems for 505 (2011) 1. In the simplex of genotype distributions x + y + z = 1, for two alleles, the Hardy- Weinberg distributions x = p 2, y = 2pq, z = q 2 (p + q = 1) are characterized by y 2 = 4xz.
More informationMathematical models in population genetics II
Mathematical models in population genetics II Anand Bhaskar Evolutionary Biology and Theory of Computing Bootcamp January 1, 014 Quick recap Large discrete-time randomly mating Wright-Fisher population
More informationThe genomes of recombinant inbred lines
The genomes of recombinant inbred lines Karl W Broman Department of Biostatistics Johns Hopkins University http://www.biostat.jhsph.edu/~kbroman C57BL/6 2 1 Recombinant inbred lines (by sibling mating)
More informationLecture 16: Mixtures of Generalized Linear Models
Lecture 16: Mixtures of Generalized Linear Models October 26, 2006 Setting Outline Often, a single GLM may be insufficiently flexible to characterize the data Setting Often, a single GLM may be insufficiently
More informationContents. Part I: Fundamentals of Bayesian Inference 1
Contents Preface xiii Part I: Fundamentals of Bayesian Inference 1 1 Probability and inference 3 1.1 The three steps of Bayesian data analysis 3 1.2 General notation for statistical inference 4 1.3 Bayesian
More informationProbabilistic Graphical Networks: Definitions and Basic Results
This document gives a cursory overview of Probabilistic Graphical Networks. The material has been gleaned from different sources. I make no claim to original authorship of this material. Bayesian Graphical
More information2016 SISG Module 17: Bayesian Statistics for Genetics Lecture 3: Binomial Sampling
2016 SISG Module 17: Bayesian Statistics for Genetics Lecture 3: Binomial Sampling Jon Wakefield Departments of Statistics and Biostatistics University of Washington Outline Introduction and Motivating
More informationThe Bayesian Choice. Christian P. Robert. From Decision-Theoretic Foundations to Computational Implementation. Second Edition.
Christian P. Robert The Bayesian Choice From Decision-Theoretic Foundations to Computational Implementation Second Edition With 23 Illustrations ^Springer" Contents Preface to the Second Edition Preface
More informationLecture 1 Hardy-Weinberg equilibrium and key forces affecting gene frequency
Lecture 1 Hardy-Weinberg equilibrium and key forces affecting gene frequency Bruce Walsh lecture notes Introduction to Quantitative Genetics SISG, Seattle 16 18 July 2018 1 Outline Genetics of complex
More informationStatistical Genetics I: STAT/BIOST 550 Spring Quarter, 2014
Overview - 1 Statistical Genetics I: STAT/BIOST 550 Spring Quarter, 2014 Elizabeth Thompson University of Washington Seattle, WA, USA MWF 8:30-9:20; THO 211 Web page: www.stat.washington.edu/ thompson/stat550/
More informationSNP Association Studies with Case-Parent Trios
SNP Association Studies with Case-Parent Trios Department of Biostatistics Johns Hopkins Bloomberg School of Public Health September 3, 2009 Population-based Association Studies Balding (2006). Nature
More informationVCMC: Variational Consensus Monte Carlo
VCMC: Variational Consensus Monte Carlo Maxim Rabinovich, Elaine Angelino, Michael I. Jordan Berkeley Vision and Learning Center September 22, 2015 probabilistic models! sky fog bridge water grass object
More informationFriday Harbor From Genetics to GWAS (Genome-wide Association Study) Sept David Fardo
Friday Harbor 2017 From Genetics to GWAS (Genome-wide Association Study) Sept 7 2017 David Fardo Purpose: prepare for tomorrow s tutorial Genetic Variants Quality Control Imputation Association Visualization
More informationThe Quantitative TDT
The Quantitative TDT (Quantitative Transmission Disequilibrium Test) Warren J. Ewens NUS, Singapore 10 June, 2009 The initial aim of the (QUALITATIVE) TDT was to test for linkage between a marker locus
More informationFor 5% confidence χ 2 with 1 degree of freedom should exceed 3.841, so there is clear evidence for disequilibrium between S and M.
STAT 550 Howework 6 Anton Amirov 1. This question relates to the same study you saw in Homework-4, by Dr. Arno Motulsky and coworkers, and published in Thompson et al. (1988; Am.J.Hum.Genet, 42, 113-124).
More informationDown by the Bayes, where the Watermelons Grow
Down by the Bayes, where the Watermelons Grow A Bayesian example using SAS SUAVe: Victoria SAS User Group Meeting November 21, 2017 Peter K. Ott, M.Sc., P.Stat. Strategic Analysis 1 Outline 1. Motivating
More informationAn introduction to PRISM and its applications
An introduction to PRISM and its applications Yoshitaka Kameya Tokyo Institute of Technology 2007/9/17 FJ-2007 1 Contents What is PRISM? Two examples: from population genetics from statistical natural
More informationDNA polymorphisms such as SNP and familial effects (additive genetic, common environment) to
1 1 1 1 1 1 1 1 0 SUPPLEMENTARY MATERIALS, B. BIVARIATE PEDIGREE-BASED ASSOCIATION ANALYSIS Introduction We propose here a statistical method of bivariate genetic analysis, designed to evaluate contribution
More informationLecture 9. Short-Term Selection Response: Breeder s equation. Bruce Walsh lecture notes Synbreed course version 3 July 2013
Lecture 9 Short-Term Selection Response: Breeder s equation Bruce Walsh lecture notes Synbreed course version 3 July 2013 1 Response to Selection Selection can change the distribution of phenotypes, and
More informationNotes on Population Genetics
Notes on Population Genetics Graham Coop 1 1 Department of Evolution and Ecology & Center for Population Biology, University of California, Davis. To whom correspondence should be addressed: gmcoop@ucdavis.edu
More informationMonte Carlo-based statistical methods (MASM11/FMS091)
Monte Carlo-based statistical methods (MASM11/FMS091) Jimmy Olsson Centre for Mathematical Sciences Lund University, Sweden Lecture 12 MCMC for Bayesian computation II March 1, 2013 J. Olsson Monte Carlo-based
More informationThe Wright-Fisher Model and Genetic Drift
The Wright-Fisher Model and Genetic Drift January 22, 2015 1 1 Hardy-Weinberg Equilibrium Our goal is to understand the dynamics of allele and genotype frequencies in an infinite, randomlymating population
More informationOutline. Binomial, Multinomial, Normal, Beta, Dirichlet. Posterior mean, MAP, credible interval, posterior distribution
Outline A short review on Bayesian analysis. Binomial, Multinomial, Normal, Beta, Dirichlet Posterior mean, MAP, credible interval, posterior distribution Gibbs sampling Revisit the Gaussian mixture model
More informationSTAT 526 Spring Midterm 1. Wednesday February 2, 2011
STAT 526 Spring 2011 Midterm 1 Wednesday February 2, 2011 Time: 2 hours Name (please print): Show all your work and calculations. Partial credit will be given for work that is partially correct. Points
More informationMajor Genes, Polygenes, and
Major Genes, Polygenes, and QTLs Major genes --- genes that have a significant effect on the phenotype Polygenes --- a general term of the genes of small effect that influence a trait QTL, quantitative
More informationHmms with variable dimension structures and extensions
Hmm days/enst/january 21, 2002 1 Hmms with variable dimension structures and extensions Christian P. Robert Université Paris Dauphine www.ceremade.dauphine.fr/ xian Hmm days/enst/january 21, 2002 2 1 Estimating
More informationI Have the Power in QTL linkage: single and multilocus analysis
I Have the Power in QTL linkage: single and multilocus analysis Benjamin Neale 1, Sir Shaun Purcell 2 & Pak Sham 13 1 SGDP, IoP, London, UK 2 Harvard School of Public Health, Cambridge, MA, USA 3 Department
More informationIntroduction to Statistical Genetics (BST227) Lecture 6: Population Substructure in Association Studies
Introduction to Statistical Genetics (BST227) Lecture 6: Population Substructure in Association Studies Confounding in gene+c associa+on studies q What is it? q What is the effect? q How to detect it?
More informationEstimating the location and shape of hybrid zones.
Estimating the location and shape of hybrid zones. Benjamin Guedj and Gilles Guillot May 24, 2011 Abstract We propose a new model to make use of geo-referenced genetic data for inferring the location and
More informationBTRY 4830/6830: Quantitative Genomics and Genetics
BTRY 4830/6830: Quantitative Genomics and Genetics Lecture 23: Alternative tests in GWAS / (Brief) Introduction to Bayesian Inference Jason Mezey jgm45@cornell.edu Nov. 13, 2014 (Th) 8:40-9:55 Announcements
More informationTheoretical and computational aspects of association tests: application in case-control genome-wide association studies.
Theoretical and computational aspects of association tests: application in case-control genome-wide association studies Mathieu Emily November 18, 2014 Caen mathieu.emily@agrocampus-ouest.fr - Agrocampus
More informationSpatial localization of recent ancestors for admixed individuals
G3: Genes Genomes Genetics Early Online, published on November 3, 2014 as doi:10.1534/g3.114.014274 Spatial localization of recent ancestors for admixed individuals Wen-Yun Yang 1, Alexander Platt 2, Charleston
More informationCausal Graphical Models in Systems Genetics
1 Causal Graphical Models in Systems Genetics 2013 Network Analysis Short Course - UCLA Human Genetics Elias Chaibub Neto and Brian S Yandell July 17, 2013 Motivation and basic concepts 2 3 Motivation
More informationA CONDITION TO OBTAIN THE SAME DECISION IN THE HOMOGENEITY TEST- ING PROBLEM FROM THE FREQUENTIST AND BAYESIAN POINT OF VIEW
A CONDITION TO OBTAIN THE SAME DECISION IN THE HOMOGENEITY TEST- ING PROBLEM FROM THE FREQUENTIST AND BAYESIAN POINT OF VIEW Miguel A Gómez-Villegas and Beatriz González-Pérez Departamento de Estadística
More informationEvaluating the performance of a multilocus Bayesian method for the estimation of migration rates
University of Nebraska - Lincoln DigitalCommons@University of Nebraska - Lincoln Publications, Agencies and Staff of the U.S. Department of Commerce U.S. Department of Commerce 2007 Evaluating the performance
More informationModule 4: Bayesian Methods Lecture 9 A: Default prior selection. Outline
Module 4: Bayesian Methods Lecture 9 A: Default prior selection Peter Ho Departments of Statistics and Biostatistics University of Washington Outline Je reys prior Unit information priors Empirical Bayes
More informationDiscrete & continuous characters: The threshold model
Discrete & continuous characters: The threshold model Discrete & continuous characters: the threshold model So far we have discussed continuous & discrete character models separately for estimating ancestral
More informationNature Genetics: doi: /ng Supplementary Figure 1. Number of cases and proxy cases required to detect association at designs.
Supplementary Figure 1 Number of cases and proxy cases required to detect association at designs. = 5 10 8 for case control and proxy case control The ratio of controls to cases (or proxy cases) is 1.
More informationMCMC IN THE ANALYSIS OF GENETIC DATA ON PEDIGREES
MCMC IN THE ANALYSIS OF GENETIC DATA ON PEDIGREES Elizabeth A. Thompson Department of Statistics, University of Washington Box 354322, Seattle, WA 98195-4322, USA Email: thompson@stat.washington.edu This
More information