Overview. Background

Similar documents
Locating multiple interacting quantitative trait. loci using rank-based model selection

Selecting explanatory variables with the modified version of Bayesian Information Criterion

Multiple QTL mapping

Mapping multiple QTL in experimental crosses

Mapping multiple QTL in experimental crosses

QTL Mapping I: Overview and using Inbred Lines

Gene mapping in model organisms

R/qtl workshop. (part 2) Karl Broman. Biostatistics and Medical Informatics University of Wisconsin Madison. kbroman.org

Statistical issues in QTL mapping in mice

Lecture 11: Multiple trait models for QTL analysis

Introduction to QTL mapping in model organisms

QTL model selection: key players

QTL Model Search. Brian S. Yandell, UW-Madison January 2017

Introduction to QTL mapping in model organisms

Mapping QTL to a phylogenetic tree

Introduction to QTL mapping in model organisms

Introduction to QTL mapping in model organisms

Introduction to QTL mapping in model organisms

MODEL-FREE LINKAGE AND ASSOCIATION MAPPING OF COMPLEX TRAITS USING QUANTITATIVE ENDOPHENOTYPES

Lecture 8. QTL Mapping 1: Overview and Using Inbred Lines

Evolution of phenotypic traits

Model Selection for Multiple QTL

Lecture 9. QTL Mapping 2: Outbred Populations

(Genome-wide) association analysis

Use of hidden Markov models for QTL mapping

Lecture 2: Genetic Association Testing with Quantitative Traits. Summer Institute in Statistical Genetics 2017

Proportional Variance Explained by QLT and Statistical Power. Proportional Variance Explained by QTL and Statistical Power

Partitioning Genetic Variance

QTL model selection: key players

BAYESIAN MAPPING OF MULTIPLE QUANTITATIVE TRAIT LOCI

Causal Model Selection Hypothesis Tests in Systems Genetics

Quantitative characters II: heritability

Hierarchical Generalized Linear Models for Multiple QTL Mapping

I Have the Power in QTL linkage: single and multilocus analysis

Machine Learning Linear Regression. Prof. Matteo Matteucci

Machine Learning for OR & FE

For 5% confidence χ 2 with 1 degree of freedom should exceed 3.841, so there is clear evidence for disequilibrium between S and M.

Model Selection Tutorial 2: Problems With Using AIC to Select a Subset of Exposures in a Regression Model

Association Testing with Quantitative Traits: Common and Rare Variants. Summer Institute in Statistical Genetics 2014 Module 10 Lecture 5

Inferring Genetic Architecture of Complex Biological Processes

Statistics 203: Introduction to Regression and Analysis of Variance Course review

Chapter 1 Statistical Inference

Methods for QTL analysis

Association studies and regression

what is Bayes theorem? posterior = likelihood * prior / C

Stat/F&W Ecol/Hort 572 Review Points Ané, Spring 2010

Lecture WS Evolutionary Genetics Part I 1

Regression, Ridge Regression, Lasso

Biostatistics-Lecture 16 Model Selection. Ruibin Xi Peking University School of Mathematical Sciences

On the mapping of quantitative trait loci at marker and non-marker locations

Variable Selection in Restricted Linear Regression Models. Y. Tuaç 1 and O. Arslan 1

Evolutionary Genetics Midterm 2008

The Quantitative TDT

Statistics 262: Intermediate Biostatistics Model selection

A Frequentist Assessment of Bayesian Inclusion Probabilities

Quantitative Genomics and Genetics BTRY 4830/6830; PBSB

Lesson 4: Understanding Genetics

Case-Control Association Testing. Case-Control Association Testing

MIXED MODELS THE GENERAL MIXED MODEL

Quantitative characters - exercises

Lecture 8 Genomic Selection

Module 4: Bayesian Methods Lecture 9 A: Default prior selection. Outline

Binary trait mapping in experimental crosses with selective genotyping

Lecture 6: Introduction to Quantitative genetics. Bruce Walsh lecture notes Liege May 2011 course version 25 May 2011

Expression QTLs and Mapping of Complex Trait Loci. Paul Schliekelman Statistics Department University of Georgia

Introduction to Statistical modeling: handout for Math 489/583

ISyE 691 Data mining and analytics

The Behaviour of the Akaike Information Criterion when Applied to Non-nested Sequences of Models

CSS 350 Midterm #2, 4/2/01

Quantitative Genetics. February 16, 2010

Quantile-based permutation thresholds for QTL hotspot analysis: a tutorial

Prediction of the Confidence Interval of Quantitative Trait Loci Location

Linear Regression In God we trust, all others bring data. William Edwards Deming

Lecture 6. QTL Mapping

One-week Course on Genetic Analysis and Plant Breeding January 2013, CIMMYT, Mexico LOD Threshold and QTL Detection Power Simulation

Supplementary Materials for Molecular QTL Discovery Incorporating Genomic Annotations using Bayesian False Discovery Rate Control

Quantitative Genetics I: Traits controlled my many loci. Quantitative Genetics: Traits controlled my many loci

Bayesian construction of perceptrons to predict phenotypes from 584K SNP data.

Heredity and Genetics WKSH

Variance Components: Phenotypic, Environmental and Genetic

EXERCISES FOR CHAPTER 3. Exercise 3.2. Why is the random mating theorem so important?

Stat 5100 Handout #26: Variations on OLS Linear Regression (Ch. 11, 13)

MOST complex traits are influenced by interacting

Managing segregating populations

Variation and its response to selection

Lecture 3. Introduction on Quantitative Genetics: I. Fisher s Variance Decomposition

Model comparison and selection

Causal Graphical Models in Systems Genetics

A partial regression coefficient analysis framework to infer candidate genes potentially causal to traits in recombinant inbred lines

The E-M Algorithm in Genetics. Biostatistics 666 Lecture 8

1.5.1 ESTIMATION OF HAPLOTYPE FREQUENCIES:

Distinctive aspects of non-parametric fitting

Computational Systems Biology: Biology X

Lecture 6: Linear Regression (continued)

High-dimensional regression modeling

How the mean changes depends on the other variable. Plots can show what s happening...

Linear Regression (1/1/17)

Partitioning the Genetic Variance

UNIT 8 BIOLOGY: Meiosis and Heredity Page 148

Data Mining Stat 588

Transcription:

Overview Implementation of robust methods for locating quantitative trait loci in R Introduction to QTL mapping Andreas Baierl and Andreas Futschik Institute of Statistics and Decision Support Systems University of Vienna Analysis of QTL data modified BIC Robust methods Implementation and Simulations in R Robust Methods for QTL Mapping in R Andreas Baierl 1 Locating quantitative trait loci (QTL) Robust Methods for QTL Mapping in R Andreas Baierl 2 Background Quantitative trait: evolution occurred in small steps characters, that are influenced by many genes Many relevant traits are quantitative: height, yield,... A gene can obtains different forms (alleles) contribution of genetic effects to total (phenotypic) variation of a trait (heritability) determines rate at which characters respond to selection. (environmental variance reduces efficiency of response) Quantitative trait locus (QTL): gene (functional sequence of bases) that influences a certain quantitative trait Relevant questions: - How many genes influence a trait (How many QTL) - Find exact positions of QTL (- estimate size of genetic effects) Robust Methods for QTL Mapping in R Andreas Baierl 3 trait value = genetic influence + environmental influence partitioning genotypic variance into components with different impact on selection: additive, non-additive gene effects (epistasis) -> dependency on background population evolutionary reason: stabilization of phenotype phenotype: the form taken by some character in a specific individual. genotype: genetic makeup of individual Robust Methods for QTL Mapping in R Andreas Baierl 4

Data from experimental crosses Data matrix for backcross design A A a a BACKCROSS F0 Indiv. QT marker.1 marker.2... marker.m 1 34.3 AA Aa... AA ~ 50-500 markers 2 65.4 Aa AA... * F1 3 23.2 Aa *... Aa 4 45.4 AA AA... Aa INTERCROSS F2.................. ~ 200 1000 individuals Robust Methods for QTL Mapping in R Andreas Baierl 5 Genetic map Robust Methods for QTL Mapping in R Andreas Baierl 6 Analysis of QTL data 0 Genetic map Distance between markers is usually estimated from Find NUMBER, POSITIONS, EFFECT TYPES and SIZES of QTL 20 recombination frequency Challenges: large number of possible models If marker is close to QTL, then (main effects + interactions = m + m(m-1)/2 ~ 100 + 5.000) Location (cm) 40 60 marker genotype will be associated with QTL genotype (There would be a 1-1 -> efficient search strategy -> correct for test multiplicity deviation from normality of conditional distribution of trait given marker correspondence, if there were genotypes (especially when heavy tails or outliers) 80 no recombinations) recover unobserved / wrong / missing genotype information 100 No linkage between confounding of effect types 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 chromosomes selection bias for effect sizes, especially for small effects Chromosome Robust Methods for QTL Mapping in R Andreas Baierl 7 Robust Methods for QTL Mapping in R Andreas Baierl 8

Methods for QTL mapping Multiple regression approach marker based ANOVA on single markers multiple regression X ij : genotype of the i th individual (out of n) at the j th marker (out of m). univariate multiple X ij = ½ if individual has genotype AA (homozygous) X ij = -½ if individual has genotype Aa (heterozygous) - interval mapping - composite interval - conditional interval mapping - multiple interval mapping strict Bayesian approach I: subset of the set N = {1,...,m} marker U: subset of N x N mapping - Bayesian (Sen & Churchill) ε i : random error term with distribution f estimation of QTL location Robust Methods for QTL Mapping in R Andreas Baierl 9 Model selection Robust Methods for QTL Mapping in R Andreas Baierl 10 Behaviour of BIC depending on n & # of predictors aim: identify correct model, not minimise prediction error -> criterion for inclusion and exclusion of variables cross validation / bootstrap AIC: n log (RSS) + 2k/n minimises prediction error BIC: n log (RSS) + k log(n) more conservative than AIC, especially for small n n: sample size k =p + q= number of main effects (p) and interaction effects (q) under consideration RSS: residual sum of squares (assuming normal error distribution!) -> efficient search strategy forward selection + backward elimination step type 1 error und M0 0.00 0.05 0.10 0.15 0.20 0.25 0.30 number of predictors 0 2000 4000 6000 8000 10000 n 1 5 10 20 30 100 BIC Mi : BIC of 1-dimensional Model M i N: Number of 1-dim models n: sample size BIC chooses too many QTL every model has the same probability to be selected -> more likely to select large model. Robust Methods for QTL Mapping in R Andreas Baierl 11 Robust Methods for QTL Mapping in R Andreas Baierl 12

modified BIC Comparison of mbic and BIC Additional penalty term dependent on number of predictors under consideration (Bogdan et al 2004) modified BIC with E(p): expected number of main effects E(q): expected number of epistasis (=interaction) effects E(p) = E(q) = 2.2 controls the Type I error at a level of of 5% (for n = 200) type 1 error under M0 0.00 0.05 0.10 0.15 0.20 5 predictors (+10 two-way interaction terms) BIC mbic 0 1000 2000 3000 4000 5000 Robust Methods for QTL Mapping in R Andreas Baierl 13 Deviations from Normality Typically, non-parametric methods based on ranks are used Here we use robust regression techniques, in particular M-Estimators: minimise other measure of distance instead of residual sum of squares. popular alternatives are: rho.huber, k=0.05 rho.huber, k=1.3 n Robust Methods for QTL Mapping in R Andreas Baierl 14 Robust model selection criterion still consistent under quite general conditions on the error distribution (Martin, 1980) but performance of BIC * ρ depends on ρ and error distribution: Jurečkova and Sen (1996) derived limiting distribution for rho.bisquare rho.hampel Robust Methods for QTL Mapping in R Andreas Baierl 15 Robust Methods for QTL Mapping in R Andreas Baierl 16

Limiting Distribution Values for normalisation constant c e We showed that has the following property: with and error distribution f(x) for L 2 c e = 1 Robust Methods for QTL Mapping in R Andreas Baierl 17 Robust mbic Robust Methods for QTL Mapping in R Andreas Baierl 18 Simulation Setup In practice, c e and therefore the error distribution f(x) have to be estimated. This leads to a robust version of the mbic: 2 chromosomes with 11 marker each (m=22) 200 individuals (n=200) 1 additive effect 1 epistasis effect error distributions: with Normal, Laplace, Cauchy, Tukey, χ 2 estimators: L 2, Huber (k=0.05) ~ L 1, Huber (k=1.3), Bisquare, Hampel Robust Methods for QTL Mapping in R Andreas Baierl 19 Robust Methods for QTL Mapping in R Andreas Baierl 20

Simulation Results Implementation in R Percentage correctly identified effects and false discovery rate Robust regression using procedure rlm of package MASS Percentage 0.0 0.2 0.4 0.6 Huber-0.05 Huber-1.3 Bisquare Hampel L2-mBIC L2-BIC program structure: parameter specification generate realisation of genetic setup estimation of error distribution and c e in each forward step: estimate likelihood for m + m(m-1)/2 models generate output simulations: 1000 replications n=200-500, m=20-120 Normal Laplace Cauchy Tukey Chisq Chisq.med Robust Methods for QTL Mapping in R Andreas Baierl 21 Robust Methods for QTL Mapping in R Andreas Baierl 22 References Baierl, A., Bogdan, M., Frommlet, F., Futschik, A., 2006. On Locating multiple interacting quantitative trait loci in intercross designs. To appear in Genetics. Bogdan, M., J. K. Ghosh and R. W. Doerge, 2004. Modifying the Schwarz Bayesian Information Criterion to Locate Multiple Interacting Quantitative Trait Loci. Genetics, 167: 989-999. Broman, K. W. and T. P. Speed, 2002. A model selection approach for the identification of quantitative trait loci in experimental crosses. J Roy Stat Soc B, 64: 641-656. Jureckova, J., Sen, P.K., 1996. Robust statistical procedures: asymptoticsand interrelations. Wiley, New York. Sen and Churchill (2001), A Statistical framework for quantitative trait mapping, Genetics, 159:371-387. Robust Methods for QTL Mapping in R Andreas Baierl 23