Heritability estimation in modern genetics and connections to some new results for quadratic forms in statistics
|
|
- Crystal Jackson
- 5 years ago
- Views:
Transcription
1 Heritability estimation in modern genetics and connections to some new results for quadratic forms in statistics Lee H. Dicker Rutgers University and Amazon, NYC Based on joint work with Ruijun Ma (Rutgers), Murat A. Erdogdu (Stanford/MSR), and Yakir Reshef (Harvard) Borchard Colloquium July 4, 2017 Work conducted prior to joining Amazon
2 Heritability in genetics Heritability is a fundamental concept in genetics. It represents the extent to which an exhibited trait in an individual is attributable to genetics. (Slightly) more technical formulation: The proportion of variation in a phenotype that can be explained by the genotype. Population-level estimates of heritability can be obtained from phenotype data (e.g. human height or milk fat percentage in cows) and genotype data (e.g. pedigree information or GWAS data). Heritability estimation has a long history in statistics going back to R.A. Fisher. Currently, LMM-based methods are probably the most prevalant approach for estimating heritability with GWAS data. LMM-based methods for heritability estimation are not new (Henderson, 1950, Ann. Math. Stat.). Modern LMM-based methods for GWAS data took off after (Yang et al., 2010, Nat. Genet.), which gave estimates of heritability for human height. Thousands of papers on LMM-based heritability for GWAS data since / 15
3 Heritability in genetics Hayes et al. (2010) PLoS Genet. I Much of the recent work on LMM-based methods for heritability estimation is built on older research in to cattle breeding. 2 / 15
4 This talk Some new statistical perspectives on heritability estimation. Main messages: 3 / 15
5 This talk Some new statistical perspectives on heritability estimation. Main messages: 1. Standardize your predictors. 3 / 15
6 This talk Some new statistical perspectives on heritability estimation. Main messages: 1. Standardize your predictors. 2. There s a big opportunity for statistics to make an impact in this area with thoughtful modeling and technical expertise. 3 / 15
7 Genetic relatedness and heritability Let y = (y 1,..., y n ) R n be a vector of centered real-valued outcomes, where y i is the phenotype value for individual i in some population. Assume that y = g + e (1) can be decomposed into an additive genetic effect g MV(0, σ 2 gk) and an uncorrelated noise vector e MV(0, σ 2 e I). K = (K ij ) is the genetic relationship matrix (GRM) and K ij measures genetic similarity between individuals i and j; the GRM is standardized so that K ii = 1. The noise vector e may contain environmental noise, measurement error, and other non-additive genetic effects. Frequently, the data is transformed before reaching the representation (1), e.g. project out covariates or other fixed effects. The heritability coefficient is σ2 g h 2 =. σ 2 g + σ 2 e Since E(yy ) = σ 2 g K + σ2 e I, the heritability coefficient is the R2 for regressing y i y j on K ij, i.e. regressing phenotype similarity on genetic similarity. 4 / 15
8 Genetic relatedness and heritability To estimate h 2, Henderson (1950) used least squares for regressing y i y j on K ij. Maximum likelihood is also widely used: Let l(s 2 g, s 2 e) = 1 2 log det(s2 g K + s2 e I) y (s 2 g K + s2 e I) 1 y be the log-likelihood for (σ 2 g, σ 2 e) under the assumption that g, e are Gaussian; minimize l to get the MLE for (σ 2 g, σ 2 e) and h 2. Henderson s estimator and the MLE are both reasonable estimators for a given GRM K how should we choose K? Pre-GWAS: Pedigree/familial information determines K and describes how individuals are related. Post-GWAS: Molecular genetic data gives fine-grained measure of genetic similarity. 5 / 15
9 Genetic relatedness and LMMs for GWAS data For GWAS data, each individual s genetic data can be encoded in a vector x i = (x i1,..., x im ) R m, where x ij represents the (frequently standardized) minor allele count for the j-th single nucelotide polymorphism (SNP). x ij is tri-nary. m can be in the 100K-1M s; n may be in the 1-10K s. The GRM is determined by a kernel function K with K ij = K(x i, x j ). The linear kernel is the most widely used in practice. K(x i, x j ) = 1 m x i x j With the linear kernel, the original model y = g + e can be rewritten as a linear random-effects model (LMM) y = Xb + e, where X = (x 1,..., x n ), b = (b 1,..., b m ) R m, and b 1,..., b m MV(0, σ 2 g/m) are iid. 6 / 15
10 LMMs for heritability estimation: Questions y = Xb + e, (2) Variance components methods for estimating h 2 under (2) have emerged as one of the most popular strategies for heritability estimation in GWAS. However, a number of challenges have emerged. Causal SNPs and linkage disequilibrium. In most generative models for linking GWAS data and outcomes, there is a fixed collection of causal SNPs C [m] and b is assumed to be a sparse vector supported on C. If the SNPs x i are highly correlated i.e. they are in linkage disequilibrium (LD) the LMM approach can give badly biased estimates for h 2 (Speed et al., 2012, Am. J. Hum. Genet.) Partitioning heritability. Sometimes it s desirable to estimate the heritability attributable to a subset of SNPs, S [m]. To date, there s no consensus about how this should be done and existing solutions have significant drawbacks. 7 / 15
11 Causal SNPs and linkage disequilibrium Simulations show standard LMM estimators for h 2 are biased when causal SNPs are located in regions with low (or high) LD. Settings: n = 500, m = σ 2 e = 0.5. b = 1 m (z 1,..., z m/2, 0,..., 0) R m with z 1,..., z m/2 N(0, 1) iid. So C = {1,..., m/2}. ( ) AR(0.2) 0 x i N(0, Σ) with Σ =. 0 AR(0.8) Then σ 2 g = 0.5 and h 2 = 0.5. h 2 ĥ Mean: % CI: (0.404,0.460) Table: Summary of results based on simulating 50 independent datasets. 8 / 15
12 Partitioning heritability One method for partitioning heritability is to assume a LMM with multiple variance components (Finucane et al., 2015, Nat. Genet.): where are all independent. y = X S b S + X S c b S c + e, (3) ( MV b i MV The S-partitioned heritability is ) 0, σ2 S, S ( ) if i S, 0, if i S σ 2 S c m S σ 2 S h 2 S = σ 2 +. S σ2 + σ 2 S c e h 2 S can be estimated, for instance, using maximum for the model (3) under a Gaussian random-effects assumption. However, estimating the total heritability h 2 = (σ 2 S + σ2 S c )/(σ 2 S + σ2 S c + σ 2 e) under this model has the same issues noted in the previous slide/simulation. 9 / 15
13 LMMs for heritability estimation Challenges arise when the location of causal SNPs is correlated with LD structure. Solution: 10 / 15
14 LMMs for heritability estimation Challenges arise when the location of causal SNPs is correlated with LD structure. Solution: Remove LD structure, i.e. whiten. 10 / 15
15 LMMs for heritability estimation: Mahalanobis kernel Our proposal: Use the Mahalanobis kernel to measure genetic similarity. where Cov(x i ) = Σ. K ij = x i Σ 1 x j, This resolves the causal SNPs/linkage disequilibrium and partitioning heritability problems, and clears a path for more modeling progress and results. Argument: K ij = x i Σ 1 x j y = Xb + e, with b MV(0, σ 2 gσ 1 /m) y = X Σb Σ + e, fixed-effects model, where X Σ = XΣ 1/2 and b Σ = Σ 1/2 b. The Mahalanobis kernel has been used extensively elsewhere in genetics (e.g. for association testing), but not to our knowledge for heritability estimation. 11 / 15
16 Revisiting Causal SNPs/LD and partitioning heritability To find the Mahalanobis-MLE ĥ 2 Σ, replace X by XΣ 1/2 and then find the MLE for h 2 using the linear kernel, i.e. whiten/decorrelate/standardize the predictors and compute the usual MLE. h 2 ĥ 2 ĥ 2 Σ 0.5 Mean: Mean: % CI: (0.404,0.460) 95% CI: (0.473, 0.525) Let ( ) ΣS,S Σ S,S c Σ = Σ. Σ S,S c S c,s c The partitioned heritability is defined via the decomposition y = g S + g S c + e, where g S = X S (b S + Σ 1 S,S Σ S,S c b S c ), g S c = (X S c X S Σ 1 S,S Σ S,S c )b S c. Under the LMM y = Xb + e with b MV(0, σ 2 gσ 1 /m), g S and g S c are uncorrelated. 12 / 15
17 Why does this work? Model misspecification results for variance components estimation problems (D & Erdogdu, 2016, 2017). Meta-result. Assume: (i) y = Xb + e R n (ii) Cov(e) = σ 2 e. (iii) b R m is any (fixed or random) vector with b 2 1. If x i N(0, Σ) are iid, then the variance component estimators for the Mahalanobis kernel (ˆσ 2 g, ˆσ 2 e) are nice: (i) (ˆσ 2 g, ˆσ 2 e) (σ 2 g, σ 2 e), where σ 2 g = lim b 2. (ii) (ˆσ 2 g, ˆσ 2 e) are approximately normal. Implication: If genotypes are random, then the Mahalanobis kernel for estimating heritability works even with fixed genetic effects and, in particular, when the location of causal SNPs is associated with LD structure. 13 / 15
18 Why does this work? Proof idea: In the fixed-effects model with random genotypes, b inherits randomness from X. Let ˆθ = (ˆσ 2 g, ˆσ 2 e) be the MLE and assume WLOG that x i N(0, I). Key point: Since X D = XU for any m m orthogonal matrix U, ˆθ(y, X) D = ˆθ(ỹ, X), where ỹ = X b + e, b uniform{s m 1 (σ 2 g)}. In other words, ˆθ = ˆθ(y, X) has the same distribution as ˆθ(ỹ, X), where the data are drawn from a random-effects model. Since b is approximately Gaussian for large m, we can use tools for variance components estimation in Gaussian random-effects models to approximate the distribution of ˆθ. Finite sample normal approximation results for quadratic forms. Other comments: We can get closed form expressions for the asymptotic variance of ˆθ using results on the Marčenko-Pastur distribution. 14 / 15
19 Open questions Results for non-gaussian x i. SNP genotype data is always non-gaussian, but simulations suggest results may still hold. Binary outcomes. Erdogdu, Bayati & D (2016). Applications to association testing problems. 15 / 15
Association Testing with Quantitative Traits: Common and Rare Variants. Summer Institute in Statistical Genetics 2014 Module 10 Lecture 5
Association Testing with Quantitative Traits: Common and Rare Variants Timothy Thornton and Katie Kerr Summer Institute in Statistical Genetics 2014 Module 10 Lecture 5 1 / 41 Introduction to Quantitative
More informationLecture 2: Genetic Association Testing with Quantitative Traits. Summer Institute in Statistical Genetics 2017
Lecture 2: Genetic Association Testing with Quantitative Traits Instructors: Timothy Thornton and Michael Wu Summer Institute in Statistical Genetics 2017 1 / 29 Introduction to Quantitative Trait Mapping
More information(Genome-wide) association analysis
(Genome-wide) association analysis 1 Key concepts Mapping QTL by association relies on linkage disequilibrium in the population; LD can be caused by close linkage between a QTL and marker (= good) or by
More informationAssociation studies and regression
Association studies and regression CM226: Machine Learning for Bioinformatics. Fall 2016 Sriram Sankararaman Acknowledgments: Fei Sha, Ameet Talwalkar Association studies and regression 1 / 104 Administration
More informationModel-Free Knockoffs: High-Dimensional Variable Selection that Controls the False Discovery Rate
Model-Free Knockoffs: High-Dimensional Variable Selection that Controls the False Discovery Rate Lucas Janson, Stanford Department of Statistics WADAPT Workshop, NIPS, December 2016 Collaborators: Emmanuel
More informationMaximum Likelihood for Variance Estimation in High-Dimensional Linear Models
Maximum Likelihood for Variance Estimation in High-Dimensional Linear Models Lee H. Dicker Rutgers University Murat A. Erdogdu Stanford University Abstract We study maximum likelihood estimators (s) for
More informationProportional Variance Explained by QLT and Statistical Power. Proportional Variance Explained by QTL and Statistical Power
Proportional Variance Explained by QTL and Statistical Power Partitioning the Genetic Variance We previously focused on obtaining variance components of a quantitative trait to determine the proportion
More informationLinear Regression (1/1/17)
STA613/CBB540: Statistical methods in computational biology Linear Regression (1/1/17) Lecturer: Barbara Engelhardt Scribe: Ethan Hada 1. Linear regression 1.1. Linear regression basics. Linear regression
More informationMultiple regression. CM226: Machine Learning for Bioinformatics. Fall Sriram Sankararaman Acknowledgments: Fei Sha, Ameet Talwalkar
Multiple regression CM226: Machine Learning for Bioinformatics. Fall 2016 Sriram Sankararaman Acknowledgments: Fei Sha, Ameet Talwalkar Multiple regression 1 / 36 Previous two lectures Linear and logistic
More informationSupplementary Materials for Molecular QTL Discovery Incorporating Genomic Annotations using Bayesian False Discovery Rate Control
Supplementary Materials for Molecular QTL Discovery Incorporating Genomic Annotations using Bayesian False Discovery Rate Control Xiaoquan Wen Department of Biostatistics, University of Michigan A Model
More informationEfficient Bayesian mixed model analysis increases association power in large cohorts
Linear regression Existing mixed model methods New method: BOLT-LMM Time O(MM) O(MN 2 ) O MN 1.5 Corrects for confounding? Power Efficient Bayesian mixed model analysis increases association power in large
More informationLecture WS Evolutionary Genetics Part I 1
Quantitative genetics Quantitative genetics is the study of the inheritance of quantitative/continuous phenotypic traits, like human height and body size, grain colour in winter wheat or beak depth in
More informationVariance Component Models for Quantitative Traits. Biostatistics 666
Variance Component Models for Quantitative Traits Biostatistics 666 Today Analysis of quantitative traits Modeling covariance for pairs of individuals estimating heritability Extending the model beyond
More informationGWAS IV: Bayesian linear (variance component) models
GWAS IV: Bayesian linear (variance component) models Dr. Oliver Stegle Christoh Lippert Prof. Dr. Karsten Borgwardt Max-Planck-Institutes Tübingen, Germany Tübingen Summer 2011 Oliver Stegle GWAS IV: Bayesian
More informationBTRY 4830/6830: Quantitative Genomics and Genetics
BTRY 4830/6830: Quantitative Genomics and Genetics Lecture 23: Alternative tests in GWAS / (Brief) Introduction to Bayesian Inference Jason Mezey jgm45@cornell.edu Nov. 13, 2014 (Th) 8:40-9:55 Announcements
More informationBTRY 4830/6830: Quantitative Genomics and Genetics Fall 2014
BTRY 4830/6830: Quantitative Genomics and Genetics Fall 2014 Homework 4 (version 3) - posted October 3 Assigned October 2; Due 11:59PM October 9 Problem 1 (Easy) a. For the genetic regression model: Y
More informationEcon 2120: Section 2
Econ 2120: Section 2 Part I - Linear Predictor Loose Ends Ashesh Rambachan Fall 2018 Outline Big Picture Matrix Version of the Linear Predictor and Least Squares Fit Linear Predictor Least Squares Omitted
More informationPower and sample size calculations for designing rare variant sequencing association studies.
Power and sample size calculations for designing rare variant sequencing association studies. Seunggeun Lee 1, Michael C. Wu 2, Tianxi Cai 1, Yun Li 2,3, Michael Boehnke 4 and Xihong Lin 1 1 Department
More informationFaST linear mixed models for genome-wide association studies
Nature Methods FaS linear mixed models for genome-wide association studies Christoph Lippert, Jennifer Listgarten, Ying Liu, Carl M Kadie, Robert I Davidson & David Heckerman Supplementary Figure Supplementary
More informationLecture 9 Multi-Trait Models, Binary and Count Traits
Lecture 9 Multi-Trait Models, Binary and Count Traits Guilherme J. M. Rosa University of Wisconsin-Madison Mixed Models in Quantitative Genetics SISG, Seattle 18 0 September 018 OUTLINE Multiple-trait
More informationQuantitative characters - exercises
Quantitative characters - exercises 1. a) Calculate the genetic covariance between half sibs, expressed in the ij notation (Cockerham's notation), when up to loci are considered. b) Calculate the genetic
More informationLecture 28: BLUP and Genomic Selection. Bruce Walsh lecture notes Synbreed course version 11 July 2013
Lecture 28: BLUP and Genomic Selection Bruce Walsh lecture notes Synbreed course version 11 July 2013 1 BLUP Selection The idea behind BLUP selection is very straightforward: An appropriate mixed-model
More informationMODEL-FREE LINKAGE AND ASSOCIATION MAPPING OF COMPLEX TRAITS USING QUANTITATIVE ENDOPHENOTYPES
MODEL-FREE LINKAGE AND ASSOCIATION MAPPING OF COMPLEX TRAITS USING QUANTITATIVE ENDOPHENOTYPES Saurabh Ghosh Human Genetics Unit Indian Statistical Institute, Kolkata Most common diseases are caused by
More informationMixed-Models. version 30 October 2011
Mixed-Models version 30 October 2011 Mixed models Mixed models estimate a vector! of fixed effects and one (or more) vectors u of random effects Both fixed and random effects models always include a vector
More informationLecture 24: Multivariate Response: Changes in G. Bruce Walsh lecture notes Synbreed course version 10 July 2013
Lecture 24: Multivariate Response: Changes in G Bruce Walsh lecture notes Synbreed course version 10 July 2013 1 Overview Changes in G from disequilibrium (generalized Bulmer Equation) Fragility of covariances
More information1 Springer. Nan M. Laird Christoph Lange. The Fundamentals of Modern Statistical Genetics
1 Springer Nan M. Laird Christoph Lange The Fundamentals of Modern Statistical Genetics 1 Introduction to Statistical Genetics and Background in Molecular Genetics 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
More informationDNA polymorphisms such as SNP and familial effects (additive genetic, common environment) to
1 1 1 1 1 1 1 1 0 SUPPLEMENTARY MATERIALS, B. BIVARIATE PEDIGREE-BASED ASSOCIATION ANALYSIS Introduction We propose here a statistical method of bivariate genetic analysis, designed to evaluate contribution
More informationQuasi-regression for heritability
Quasi-regression for heritability Art B. Owen Stanford University March 01 Abstract We show in an idealized model that the narrow sense (linear heritability from d autosomal SNPs can be estimated without
More information27: Case study with popular GM III. 1 Introduction: Gene association mapping for complex diseases 1
10-708: Probabilistic Graphical Models, Spring 2015 27: Case study with popular GM III Lecturer: Eric P. Xing Scribes: Hyun Ah Song & Elizabeth Silver 1 Introduction: Gene association mapping for complex
More informationLecture 9. QTL Mapping 2: Outbred Populations
Lecture 9 QTL Mapping 2: Outbred Populations Bruce Walsh. Aug 2004. Royal Veterinary and Agricultural University, Denmark The major difference between QTL analysis using inbred-line crosses vs. outbred
More informationThe Generalized Higher Criticism for Testing SNP-sets in Genetic Association Studies
The Generalized Higher Criticism for Testing SNP-sets in Genetic Association Studies Ian Barnett, Rajarshi Mukherjee & Xihong Lin Harvard University ibarnett@hsph.harvard.edu August 5, 2014 Ian Barnett
More informationSTAT 100C: Linear models
STAT 100C: Linear models Arash A. Amini June 9, 2018 1 / 56 Table of Contents Multiple linear regression Linear model setup Estimation of β Geometric interpretation Estimation of σ 2 Hat matrix Gram matrix
More informationarxiv: v1 [stat.me] 10 Jun 2018
Lost in translation: On the impact of data coding on penalized regression with interactions arxiv:1806.03729v1 [stat.me] 10 Jun 2018 Johannes W R Martini 1,2 Francisco Rosales 3 Ngoc-Thuy Ha 2 Thomas Kneib
More informationLinear Models for Regression CS534
Linear Models for Regression CS534 Example Regression Problems Predict housing price based on House size, lot size, Location, # of rooms Predict stock price based on Price history of the past month Predict
More informationStatistics 203: Introduction to Regression and Analysis of Variance Course review
Statistics 203: Introduction to Regression and Analysis of Variance Course review Jonathan Taylor - p. 1/?? Today Review / overview of what we learned. - p. 2/?? General themes in regression models Specifying
More information25 : Graphical induced structured input/output models
10-708: Probabilistic Graphical Models 10-708, Spring 2016 25 : Graphical induced structured input/output models Lecturer: Eric P. Xing Scribes: Raied Aljadaany, Shi Zong, Chenchen Zhu Disclaimer: A large
More informationRegression, Ridge Regression, Lasso
Regression, Ridge Regression, Lasso Fabio G. Cozman - fgcozman@usp.br October 2, 2018 A general definition Regression studies the relationship between a response variable Y and covariates X 1,..., X n.
More informationLinear Methods for Prediction
Chapter 5 Linear Methods for Prediction 5.1 Introduction We now revisit the classification problem and focus on linear methods. Since our prediction Ĝ(x) will always take values in the discrete set G we
More informationRelationship between Genomic Distance-Based Regression and Kernel Machine Regression for Multi-marker Association Testing
Relationship between Genomic Distance-Based Regression and Kernel Machine Regression for Multi-marker Association Testing Wei Pan Division of Biostatistics, School of Public Health, University of Minnesota,
More informationLinear Models for Regression CS534
Linear Models for Regression CS534 Example Regression Problems Predict housing price based on House size, lot size, Location, # of rooms Predict stock price based on Price history of the past month Predict
More informationThe Generalized Higher Criticism for Testing SNP-sets in Genetic Association Studies
The Generalized Higher Criticism for Testing SNP-sets in Genetic Association Studies Ian Barnett, Rajarshi Mukherjee & Xihong Lin Harvard University ibarnett@hsph.harvard.edu June 24, 2014 Ian Barnett
More informationGenotype Imputation. Biostatistics 666
Genotype Imputation Biostatistics 666 Previously Hidden Markov Models for Relative Pairs Linkage analysis using affected sibling pairs Estimation of pairwise relationships Identity-by-Descent Relatives
More informationFall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.
1. Let P be a probability measure on a collection of sets A. (a) For each n N, let H n be a set in A such that H n H n+1. Show that P (H n ) monotonically converges to P ( k=1 H k) as n. (b) For each n
More informationProblems. Suppose both models are fitted to the same data. Show that SS Res, A SS Res, B
Simple Linear Regression 35 Problems 1 Consider a set of data (x i, y i ), i =1, 2,,n, and the following two regression models: y i = β 0 + β 1 x i + ε, (i =1, 2,,n), Model A y i = γ 0 + γ 1 x i + γ 2
More informationAlternative implementations of Monte Carlo EM algorithms for likelihood inferences
Genet. Sel. Evol. 33 001) 443 45 443 INRA, EDP Sciences, 001 Alternative implementations of Monte Carlo EM algorithms for likelihood inferences Louis Alberto GARCÍA-CORTÉS a, Daniel SORENSEN b, Note a
More informationMIXED MODELS THE GENERAL MIXED MODEL
MIXED MODELS This chapter introduces best linear unbiased prediction (BLUP), a general method for predicting random effects, while Chapter 27 is concerned with the estimation of variances by restricted
More informationLecture 32: Infinite-dimensional/Functionvalued. Functions and Random Regressions. Bruce Walsh lecture notes Synbreed course version 11 July 2013
Lecture 32: Infinite-dimensional/Functionvalued Traits: Covariance Functions and Random Regressions Bruce Walsh lecture notes Synbreed course version 11 July 2013 1 Longitudinal traits Many classic quantitative
More informationLinear Regression. Volker Tresp 2018
Linear Regression Volker Tresp 2018 1 Learning Machine: The Linear Model / ADALINE As with the Perceptron we start with an activation functions that is a linearly weighted sum of the inputs h = M j=0 w
More informationPartitioning the Genetic Variance
Partitioning the Genetic Variance 1 / 18 Partitioning the Genetic Variance In lecture 2, we showed how to partition genotypic values G into their expected values based on additivity (G A ) and deviations
More informationMultiple Change-Point Detection and Analysis of Chromosome Copy Number Variations
Multiple Change-Point Detection and Analysis of Chromosome Copy Number Variations Yale School of Public Health Joint work with Ning Hao, Yue S. Niu presented @Tsinghua University Outline 1 The Problem
More informationMatrix Factorizations
1 Stat 540, Matrix Factorizations Matrix Factorizations LU Factorization Definition... Given a square k k matrix S, the LU factorization (or decomposition) represents S as the product of two triangular
More informationLatent Variable models for GWAs
Latent Variable models for GWAs Oliver Stegle Machine Learning and Computational Biology Research Group Max-Planck-Institutes Tübingen, Germany September 2011 O. Stegle Latent variable models for GWAs
More informationGenotyping strategy and reference population
GS cattle workshop Genotyping strategy and reference population Effect of size of reference group (Esa Mäntysaari, MTT) Effect of adding females to the reference population (Minna Koivula, MTT) Value of
More informationLecture 5: Linear models for classification. Logistic regression. Gradient Descent. Second-order methods.
Lecture 5: Linear models for classification. Logistic regression. Gradient Descent. Second-order methods. Linear models for classification Logistic regression Gradient descent and second-order methods
More informationAsymptotic distribution of the largest eigenvalue with application to genetic data
Asymptotic distribution of the largest eigenvalue with application to genetic data Chong Wu University of Minnesota September 30, 2016 T32 Journal Club Chong Wu 1 / 25 Table of Contents 1 Background Gene-gene
More informationLinear Methods for Prediction
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike License. Your use of this material constitutes acceptance of that license and the conditions of use of materials on this
More informationLecture 3: More on regularization. Bayesian vs maximum likelihood learning
Lecture 3: More on regularization. Bayesian vs maximum likelihood learning L2 and L1 regularization for linear estimators A Bayesian interpretation of regularization Bayesian vs maximum likelihood fitting
More informationMultivariate Regression
Multivariate Regression The so-called supervised learning problem is the following: we want to approximate the random variable Y with an appropriate function of the random variables X 1,..., X p with the
More informationMarginal Screening and Post-Selection Inference
Marginal Screening and Post-Selection Inference Ian McKeague August 13, 2017 Ian McKeague (Columbia University) Marginal Screening August 13, 2017 1 / 29 Outline 1 Background on Marginal Screening 2 2
More informationHERITABILITY ESTIMATION USING A REGULARIZED REGRESSION APPROACH (HERRA)
BIRS 016 1 HERITABILITY ESTIMATION USING A REGULARIZED REGRESSION APPROACH (HERRA) Malka Gorfine, Tel Aviv University, Israel Joint work with Li Hsu, FHCRC, Seattle, USA BIRS 016 The concept of heritability
More informationEvolutionary quantitative genetics and one-locus population genetics
Evolutionary quantitative genetics and one-locus population genetics READING: Hedrick pp. 57 63, 587 596 Most evolutionary problems involve questions about phenotypic means Goal: determine how selection
More informationMultidimensional heritability analysis of neuroanatomical shape. Jingwei Li
Multidimensional heritability analysis of neuroanatomical shape Jingwei Li Brain Imaging Genetics Genetic Variation Behavior Cognition Neuroanatomy Brain Imaging Genetics Genetic Variation Neuroanatomy
More informationCS 4491/CS 7990 SPECIAL TOPICS IN BIOINFORMATICS
CS 4491/CS 7990 SPECIAL TOPICS IN BIOINFORMATICS * Some contents are adapted from Dr. Hung Huang and Dr. Chengkai Li at UT Arlington Mingon Kang, Ph.D. Computer Science, Kennesaw State University Problems
More informationLecture 6: Selection on Multiple Traits
Lecture 6: Selection on Multiple Traits Bruce Walsh lecture notes Introduction to Quantitative Genetics SISG, Seattle 16 18 July 2018 1 Genetic vs. Phenotypic correlations Within an individual, trait values
More informationSome models of genomic selection
Munich, December 2013 What is the talk about? Barley! Steptoe x Morex barley mapping population Steptoe x Morex barley mapping population genotyping from Close at al., 2009 and phenotyping from cite http://wheat.pw.usda.gov/ggpages/sxm/
More informationOn the limiting distribution of the likelihood ratio test in nucleotide mapping of complex disease
On the limiting distribution of the likelihood ratio test in nucleotide mapping of complex disease Yuehua Cui 1 and Dong-Yun Kim 2 1 Department of Statistics and Probability, Michigan State University,
More informationLecture 5: BLUP (Best Linear Unbiased Predictors) of genetic values. Bruce Walsh lecture notes Tucson Winter Institute 9-11 Jan 2013
Lecture 5: BLUP (Best Linear Unbiased Predictors) of genetic values Bruce Walsh lecture notes Tucson Winter Institute 9-11 Jan 013 1 Estimation of Var(A) and Breeding Values in General Pedigrees The classic
More informationPrediction of genetic Values using Neural Networks
Prediction of genetic Values using Neural Networks Paulino Perez 1 Daniel Gianola 2 Jose Crossa 1 1 CIMMyT-Mexico 2 University of Wisconsin, Madison. September, 2014 SLU,Sweden Prediction of genetic Values
More informationModels with multiple random effects: Repeated Measures and Maternal effects
Models with multiple random effects: Repeated Measures and Maternal effects 1 Often there are several vectors of random effects Repeatability models Multiple measures Common family effects Cleaning up
More informationCovariance function estimation in Gaussian process regression
Covariance function estimation in Gaussian process regression François Bachoc Department of Statistics and Operations Research, University of Vienna WU Research Seminar - May 2015 François Bachoc Gaussian
More informationComputational Approaches to Statistical Genetics
Computational Approaches to Statistical Genetics GWAS I: Concepts and Probability Theory Christoph Lippert Dr. Oliver Stegle Prof. Dr. Karsten Borgwardt Max-Planck-Institutes Tübingen, Germany Tübingen
More informationMachine Learning Linear Classification. Prof. Matteo Matteucci
Machine Learning Linear Classification Prof. Matteo Matteucci Recall from the first lecture 2 X R p Regression Y R Continuous Output X R p Y {Ω 0, Ω 1,, Ω K } Classification Discrete Output X R p Y (X)
More informationGAUSSIAN PROCESS REGRESSION
GAUSSIAN PROCESS REGRESSION CSE 515T Spring 2015 1. BACKGROUND The kernel trick again... The Kernel Trick Consider again the linear regression model: y(x) = φ(x) w + ε, with prior p(w) = N (w; 0, Σ). The
More informationEstimation of Parameters in Random. Effect Models with Incidence Matrix. Uncertainty
Estimation of Parameters in Random Effect Models with Incidence Matrix Uncertainty Xia Shen 1,2 and Lars Rönnegård 2,3 1 The Linnaeus Centre for Bioinformatics, Uppsala University, Uppsala, Sweden; 2 School
More informationBayesian construction of perceptrons to predict phenotypes from 584K SNP data.
Bayesian construction of perceptrons to predict phenotypes from 584K SNP data. Luc Janss, Bert Kappen Radboud University Nijmegen Medical Centre Donders Institute for Neuroscience Introduction Genetic
More informationMixed-Model Estimation of genetic variances. Bruce Walsh lecture notes Uppsala EQG 2012 course version 28 Jan 2012
Mixed-Model Estimation of genetic variances Bruce Walsh lecture notes Uppsala EQG 01 course version 8 Jan 01 Estimation of Var(A) and Breeding Values in General Pedigrees The above designs (ANOVA, P-O
More informationModel Selection Tutorial 2: Problems With Using AIC to Select a Subset of Exposures in a Regression Model
Model Selection Tutorial 2: Problems With Using AIC to Select a Subset of Exposures in a Regression Model Centre for Molecular, Environmental, Genetic & Analytic (MEGA) Epidemiology School of Population
More informationAccounting for read depth in the analysis of genotyping-by-sequencing data
Accounting for read depth in the analysis of genotyping-by-sequencing data Ken Dodds, John McEwan, Timothy Bilton, Rudi Brauning, Rayna Anderson, Tracey Van Stijn, Theodor Kristjánsson, Shannon Clarke
More informationMultivariate Survival Analysis
Multivariate Survival Analysis Previously we have assumed that either (X i, δ i ) or (X i, δ i, Z i ), i = 1,..., n, are i.i.d.. This may not always be the case. Multivariate survival data can arise in
More informationMaster s Written Examination
Master s Written Examination Option: Statistics and Probability Spring 016 Full points may be obtained for correct answers to eight questions. Each numbered question which may have several parts is worth
More informationDistinctive aspects of non-parametric fitting
5. Introduction to nonparametric curve fitting: Loess, kernel regression, reproducing kernel methods, neural networks Distinctive aspects of non-parametric fitting Objectives: investigate patterns free
More informationECE 275A Homework 7 Solutions
ECE 275A Homework 7 Solutions Solutions 1. For the same specification as in Homework Problem 6.11 we want to determine an estimator for θ using the Method of Moments (MOM). In general, the MOM estimator
More informationF & B Approaches to a simple model
A6523 Signal Modeling, Statistical Inference and Data Mining in Astrophysics Spring 215 http://www.astro.cornell.edu/~cordes/a6523 Lecture 11 Applications: Model comparison Challenges in large-scale surveys
More informationA General Framework for Variable Selection in Linear Mixed Models with Applications to Genetic Studies with Structured Populations
A General Framework for Variable Selection in Linear Mixed Models with Applications to Genetic Studies with Structured Populations Joint work with Karim Oualkacha (UQÀM), Yi Yang (McGill), Celia Greenwood
More informationGWAS with mixed models
GWAS with mixed models (the trip from 10 0 to 10 8 ) Yurii Aulchenko yurii [dot] Aulchenko [at] gmail [dot] com Twitter: @YuriiAulchenko YuriiA consulting 1 Outline Methodology development Speeding things
More informationSTA414/2104 Statistical Methods for Machine Learning II
STA414/2104 Statistical Methods for Machine Learning II Murat A. Erdogdu & David Duvenaud Department of Computer Science Department of Statistical Sciences Lecture 3 Slide credits: Russ Salakhutdinov Announcements
More informationQuantitative Genomics and Genetics BTRY 4830/6830; PBSB
Quantitative Genomics and Genetics BTRY 4830/6830; PBSB.5201.01 Lecture16: Population structure and logistic regression I Jason Mezey jgm45@cornell.edu April 11, 2017 (T) 8:40-9:55 Announcements I April
More informationChapter 2: simple regression model
Chapter 2: simple regression model Goal: understand how to estimate and more importantly interpret the simple regression Reading: chapter 2 of the textbook Advice: this chapter is foundation of econometrics.
More informationMultiple random effects. Often there are several vectors of random effects. Covariance structure
Models with multiple random effects: Repeated Measures and Maternal effects Bruce Walsh lecture notes SISG -Mixed Model Course version 8 June 01 Multiple random effects y = X! + Za + Wu + e y is a n x
More informationBinomial Mixture Model-based Association Tests under Genetic Heterogeneity
Binomial Mixture Model-based Association Tests under Genetic Heterogeneity Hui Zhou, Wei Pan Division of Biostatistics, School of Public Health, University of Minnesota, Minneapolis, MN 55455 April 30,
More informationA Statistical Analysis of Fukunaga Koontz Transform
1 A Statistical Analysis of Fukunaga Koontz Transform Xiaoming Huo Dr. Xiaoming Huo is an assistant professor at the School of Industrial and System Engineering of the Georgia Institute of Technology,
More informationLinear Models for Regression CS534
Linear Models for Regression CS534 Prediction Problems Predict housing price based on House size, lot size, Location, # of rooms Predict stock price based on Price history of the past month Predict the
More informationRowan University Department of Electrical and Computer Engineering
Rowan University Department of Electrical and Computer Engineering Estimation and Detection Theory Fall 2013 to Practice Exam II This is a closed book exam. There are 8 problems in the exam. The problems
More informationResearch Statement on Statistics Jun Zhang
Research Statement on Statistics Jun Zhang (junzhang@galton.uchicago.edu) My interest on statistics generally includes machine learning and statistical genetics. My recent work focus on detection and interpretation
More informationGaussian Processes. Le Song. Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012
Gaussian Processes Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 01 Pictorial view of embedding distribution Transform the entire distribution to expected features Feature space Feature
More informationMaximum Likelihood, Logistic Regression, and Stochastic Gradient Training
Maximum Likelihood, Logistic Regression, and Stochastic Gradient Training Charles Elkan elkan@cs.ucsd.edu January 17, 2013 1 Principle of maximum likelihood Consider a family of probability distributions
More informationPrediction of the Confidence Interval of Quantitative Trait Loci Location
Behavior Genetics, Vol. 34, No. 4, July 2004 ( 2004) Prediction of the Confidence Interval of Quantitative Trait Loci Location Peter M. Visscher 1,3 and Mike E. Goddard 2 Received 4 Sept. 2003 Final 28
More informationDecision Tree Learning Lecture 2
Machine Learning Coms-4771 Decision Tree Learning Lecture 2 January 28, 2008 Two Types of Supervised Learning Problems (recap) Feature (input) space X, label (output) space Y. Unknown distribution D over
More informationStatistics 3858 : Maximum Likelihood Estimators
Statistics 3858 : Maximum Likelihood Estimators 1 Method of Maximum Likelihood In this method we construct the so called likelihood function, that is L(θ) = L(θ; X 1, X 2,..., X n ) = f n (X 1, X 2,...,
More informationBusiness Statistics. Tommaso Proietti. Linear Regression. DEF - Università di Roma 'Tor Vergata'
Business Statistics Tommaso Proietti DEF - Università di Roma 'Tor Vergata' Linear Regression Specication Let Y be a univariate quantitative response variable. We model Y as follows: Y = f(x) + ε where
More information