Lecture 8 Genomic Selection

Size: px

Start display at page:

Download "Lecture 8 Genomic Selection"

Sophia Harmon
5 years ago
Views:

1 Lecture 8 Genomic Selection Guilherme J. M. Rosa University of Wisconsin-Madison Mixed Models in Quantitative Genetics SISG, Seattle 18 0 Setember 018 OUTLINE Marker Assisted Selection Genomic Selection Models & Techniques 1

2 Marker Assisted Selection MAS: Use of genetic markers to imrove the efficiency of genetic selection Basic idea behind of MAS: Most traits of economic imortance are controlled by a fairly large number of genes Some of these genes, however, with larger effect Following the attern of inheritance of such genes might assist in selection MAS Could Hel Imrove Low heritability traits Phenotyes that can be measured on one sex only Characteristics that are not measurable before sexual maturity Traits that are difficult to measured or require sacrifice Efficiency of MAS Size (effect) of QTL Frequency of favorable allele Recombination rate between marker(s) and QTL

3 Modeling Effects at The QTL Genotye y = Xβ + Wq + Za + ε henotye fixed effects (environmental) QTL effects Polygenic effects a ~ N(0, Aσ a ) residual ε ~ N(0, Iσ ε ) Modeling Effects at the QTL Genotye QTL-genotye as a fixed effect: Regression of henotyes using QTL genotye robabilities from segregation analysis (Kinghorn et al. 1993, Meuwissen and Goddard 1997) QTL-genotye as a random effect: QTL effect is modeled as the sum of the two gametic effects (Fernando and Grossman 1989) " " v % $ $ ' Var$ a ' = $ $ # ε ' $ & $ # Gametic relationshi matrix y = Xβ + Wv + Za + ε, G v σ v 0 Aσ a Iσ ε % ' ' ' ' & 3

$fraction of the total genetic variance GWMAS, on the other hand, makes use of a very dense set of markers covering the entire genome, which otentially exlain all genetic variance Genomic Selection 1.$

4 Genomic Selection (Genome-wide Marker Assisted Selection) As most quantitative traits are influenced by many genes, tracking a small number of them using molecular markers will exlain only a small fraction of the total genetic variance GWMAS, on the other hand, makes use of a very dense set of markers covering the entire genome, which otentially exlain all genetic variance Genomic Selection 1. Reference Poulation Animals with genotyic and henotyic information 4. Selected Animals. Data Analysis - QC and data rocessing - Prediction model: y i = µ + w ij b j + e i 3. Genomic Selection Prediction of genetic merit using marker information Suerior animals (higher gebv), selected earlier with higher accuracy Young animals (selection candidates) gebv k = w kjˆb j 4

5 Genomic Selection (Meuwissen et al., 001) y i = µ + x i1 g 1 + x i g x i g + e i Marker genotyes Genetic effects Genomic EBV: GEBV = x i1 ĝ 1 + x i ĝ x i ĝ = ð big small n aradigm ð Dimension reduction techniques (e.g. SVD and PLS), and stewise strategies ð Alternatively, ridge regression, random effects models, and hierarchical modeling x ij ĝ j Two-ste Procedure: Test each marker (chromosome segment) for resence of QTL and select those with significant effects Fit selected markers simultaneously using multile regression Predict breeding values using fitted regression (similar to LD- MAS aroach with multile markers) Problems: Over estimation of markers effects due to first-ste (selection) Do not cature all QTL Least Squares 5

6 BLUP y = 1µ + X j + e! # "# ˆµ ĝ $! & %& = # 1 ' 1 1 ' X "# X ' 1 X ' X + Iγ $ & %& 1! # # " 1 ' y X ' y $ & & % ~ N(0, σ 0 ) γ = σ e / σ 0 How to choose? Arbitrary; but σ 0 σ 0 σ0 = σu controls amount of shrinkage / σ u Alternative: set, where is an estimate (rior) of total additive genetic variance Bayes A y = 1µ + X j + e y µ,, σ e ~ N(1µ + X j, Iσ e ) Prior distributions: σ j ~ N(0, σ j ) σ j ~ χ (ν,s) (scaled inverted chi-square distribution with scale arameter S and ν degrees of freedom) σ e ~ χ (, 0) 6

7 Bayes B y = 1µ + X j + e y µ,, σ e ~ N(1µ + X j, Iσ e ) Prior distributions: = 0 with robability π σ j ~ N(0,σ j ) σ j ~ χ (ν,s) σ e ~ χ (, 0) with robability (1 - π) Simulation Study Genome: 1000 cm with markers every 1 cm Markers surrounding each 1 cm region combined into halotyes LD between marker and QTLs due to finite oulation size (N e = 100) Training samle: single generation with,000 animals Test samle: rediction of breeding values of their rogeny based on marker genotyes 7

8 Simulation Study Simulation Study 8

9 Simulation Study Simulation Study 9

10 Alication with Real Data Number of Animals Predictor Predictee Young Year of Birth (VanRaden et al., 008) 10

11 Model Selection ð Goodness-of-fit vs. Model Comlexity (Bias-variance tradeoff) Over-reduction Over-fit Model Selection ð Goodness-of-fit likelihood ratio aroach (LRT; nested models) L LRT = ln L ( ) ð Model comlexity number of free arameters, (effective number) 1 ~ χ 1 Linear (regularized) fitting: y ˆ = Sy = trace( S) 11

12 ð Balancing goodness-of-fit and comlexity Akaike information criterion (AIC): Bayesian information criterion (BIC): F If (or Schwarz Criterion) iid e ~ N(0, σ ) i e Model Selection then: AIC = ln ( L) BIC= ln(n) ln ( L) RSS AIC = + n ln n and 1 BIC = σ e RSS + ln( L) Ridge Regression βˆ ridge N = β β + λ arg min yi 0 xij j β β i= 1 j= 1 j= 1 j λ 0 (comlexity arameter) or, equivalently: βˆ ridge = arg min subject to : β j= 1 i 0 i= 1 j= 1 β N j y s β x ijβ j, 1

13 βˆ 0 = y = y / N after centering y Ridge Regression i i and x 's (i.e., y i i y and x i x) RSS(λ) = (y Xβ)'(y Xβ)+ λβ'β ˆβ ridge = (X'X + λi) 1 X'y LASSO βˆ lasso = arg min β N y β i 0 i= 1 j= 1 x ijβ j, subject to: j= 1 β t j Estimation icture for the LASSO (left) and Ridge Regression (right) The solid blue areas are the constraint regions β1 + β t (lasso) and β1 +β t (ridge regression), while the red ellises are the contours of the least squares error function. 13

14 Predictive Ability (Hastie et al 009) Behavior of test samle and training samle error as the model comlexity is varied Cross-validation ð K-FOLD Training set Testing set ð LEAVE-ONE-OUT ( n-fold ) 14

15 Bayesian Alternative y = 1µ + X j + e y µ,, σ e ~ N(1µ + X j, Iσ e ) BRR: σ 0 ~ N(0, σ 0 ) Bayes A: Bayes B,C: σ j ~ N(0, σ j ), σ j ~ χ (ν,s) k, σ j ~ π N(0, kσ j )+ (1 π) N(0, σ j ) BLasso: σ j ~ N(0, σ j ), σ j ~ Exonential(λ) BX: σ j ~ N(0, σ j ), σ j ~ X Normal/Indeendent Distributions ( ) = ( σ j )(σ j ) dσ j σ j BRR: Normal Bayes A: Student-t Bayes B,C: Mixtures BLasso: Double exonential 15

16 y = 1µ + GBLUP Regression with genetic effects with normal distribution with common variance X j + e, with: Equivalent Model y = 1µ + a + e, with: σ g ~ N(0, σ g ) a σ a ~ N(0,Gσ a ) G is the genomic relationshi matrix: # & G = % j (1 j ) ( $ ' 1 (X M)(X M)' ssgblup Single-ste GBLUP: Single mixed model with all animals (genotyed and non-genotyed) included, with matrix A relaced by H " H 1 = A 1 + $ G 1 1 # $ A % ' &' 16

17 Preventive and Personalized Medicine Prediction Model Training oulation Personalized treatment New atient _ 5,13 subjects from Framingham Heart Study _ Phenotyes measured from 1948 until death _ Genotyes: Affymetrix 500K SNPs Photo: htt:// 17

18 Models 1. No-SNP: standard covariables. Covariates + familial relationshis 3. Covariates + SNPs (PC or Bayesian LASSO) Probit B-LASSO or Results (ROC, Area Under the Curve) Comarison of Models Models with increasing number of SNPs 18

General Linear Model Introduction, Classes of Linear models and Estimation

Stat 740 General Linear Model Introduction, Classes of Linear models and Estimation An aim of scientific enquiry: To describe or to discover relationshis among events (variables) in the controlled (laboratory)