GENOMIC SELECTION ADDITIONAL TOPICS

Size: px

Start display at page:

Download "GENOMIC SELECTION ADDITIONAL TOPICS"

Eunice Lloyd
5 years ago
Views:

1 GENOMIC SELECTION ADDITIONAL TOPICS

2 OUTLINE Æ INTRODUCTION w Some Bascs of Regresson n Hgh-dmensonal Problems Æ BAYESIAN ALTERNATIVE w A Quck Tour on Bayesan Models Commonly Used n Genomc Selecton Æ COMPARISON OF GWMAS MODELS w Whole Genome Predcton Wthn and Across Envronments: An Alcaton to Wheat Yeld Æ MODELLING NON-ADDITIVE GENETIC EFFECTS w A Quck Tour on Sem-arametrc Kernel-based Methods w An Eamle Usng Nonarametrc Methods Æ SELECTIVE GENOTYPING w The Effect of Selectvely Genotyng Indvduals n Genomc Selecton

3 INTRODUCTION Some Bascs of Regresson n Hgh-dmensonal Problems

4 GENOME-WIDE MARKER-ASSISTED SELECTION

5 GENOME-WIDE MARKER-ASSISTED SELECTION (Meuwssen et al., 00 y = µ + g + g g + e Marker genotyes genetc effects Genomc EBV: GEBV = ĝ + ĝ ĝ = = ĝ ð bg small n aradgm ð Dmenson reducton technques (e.g. SVD and PLS, and stewse strateges. ð Alternatvely: enalzed regresson, shrnkage estmaton.

6 Lnear Regresson and Least Squares E[Y] = f (X = β + 0 X = β Tranng data: (, y ( N, yn ; = (,,, ' RSS( β N N = 0 = = = ( y f ( = y β β = ( y Xβ'( y Xβ RSS β = X'( y Xβ ; RSS β β' = X' X X'( y Xβ = 0 βˆ = ( X' X X' y and yˆ = Xβˆ = X( X' X X' y H hat matr, or roecton matr

7 Lnear Regresson and Least Squares Some addtonal assumtons: Cov(y, y Var(y ' = σ = 0 :fed (non random Var(ˆ β = ( X' X σˆ = N N = σ ( y ŷ [ Note : E[ σˆ ] = σ ] Moreover: Y = E[Y X] + ε = β0 + X β d ε ~ N(0, σ H = + ε ˆ β ~ N( β,( X' X σ (N σˆ ~ σ χn th - 0 : β = 0 z = ~ t N, where ν = dagonal element of ( X' X σˆ βˆ ν

8 Gauss-Markov Theorem Lnear combnaton of the arameters: θ = k'β LS: θ ˆ = k' βˆ = k'( X' X X' y E[ k' βˆ] = k'( X' X X' Xβ = k' β ~ θ = c ' y, wth E[ c' y] = k' β (.e., unbased Var( k' βˆ Var( c' y Mean squared error (MSE: ~ MSE( ~ θ = E( θ θ ~ = Var( θ + [E( ~ θ θ] varance squared bas

9 Least Squares Estmaton ð DRAWBACKS: Predcton accuracy: unbased but large varance Interretaton ð SOLUTION: Feature (varable selecton Best subset regresson: mnrss Stewse selecton (forward, backward, hybrd Shrnkage methods Rdge regresson and LASSO Other technques Prncal comonents regresson Partal least squares (PLS

10 Rdge Regresson λ 0 (comlety arameter βˆ rdge N = β β + λ arg mn y 0 β β = = = or, equvalently: βˆ rdge = arg mn subect to : β = 0 = = β N y s β β, βˆ 0 = y = y / N after centerng y and 's (.e., y y and RSS( λ = ( y Xβ' ( y Xβ + λβ' β βˆ = ( X' X + λi rdge X' y

11 LASSO βˆ lasso = arg mn β N y β 0 = = β, subect to: = β t Estmaton cture for the lasso (left and rdge regresson (rght. The sold blue areas are the constrant regons β + β t (lasso and β +β t (rdge regresson, whle the red ellses are the contours of the least squares error functon.

12 Shrnkage Estmators: Generalzaton 0 q, y arg mn ˆ q N 0 β + λ β β = = = = β β Contours of constant value of for gven values of q. β q

13 Model Selecton ð GOODNESS-OF-FIT VS. MODEL COMPLEXITY Over-reducton Over-ft F Bas-varance tradeoff

14 Model Selecton ð Goodness-of-ft lkelhood rato aroach (LRT; nested models LRT = ln L L ~ χ ( ð Model comlety number of free arameters, (effectve number Lnear (regularzed fttng: y ˆ = Sy = trace( S

15 Effectve Number of Parameters Eamle wth a smle lnear regresson: [ ] e β y X + = + +β = β e y 0 [ ] = n ( n ' ( X X = n n n n n k ' ( X X X = n n n n n n n n n n n n ( n ( n ( n n ( n ( n ( n k = n X'X [ ] [ ] [ ] [ ] k k ( n k n k ' ' ( trace = = = + = X X X X = ' ' ( X X X X ' det( / k X X =

16 Model Selecton ð Balancng goodness-of-ft and comlety Akake nformaton crteron (AIC: AIC = ln( L Bayesan nformaton crteron (BIC: F If (or Schwarz Crteron e d AIC = ~ N(0, σ e + n ln then: RSS n and BIC= ln(n BIC = σ e RSS ln( L + ln( L

17 Model Selecton Eamle: lnear vs. quadratc regresson y = β +β + e 0 y = β + β + β + e 0 ŷ = ŷ = R R σˆ ad e = 0.53 = 0.30 = 0.35 R R σˆ ad e = 0.70 = 0.0 = 0.45

18 Predctve Ablty Behavor of test samle and tranng samle error as the model comlety s vared.

19 CROSS-VALIDATION (Predctve Ablty ð K-FOLD Tranng set Testng set ð LEAVE-ONE-OUT ( n-fold

20 LOOCV Lnear Quadratc Obs Lnear Quadr PRESS

21 GWMAS ð MAIN (STATISTICAL/COMPUTATIONAL CHALLENGES Curse of dmensonalty How to deal wth non-addtve models ð TWO BASIC APPROACHES Elct regresson of Y on M: Imlct regresson usng RKHS: u = f (, β = u = f ' β Cov(f,f K(, ; e.g., K(,

22 BAYESIAN ALTERNATIVE A Quck Tour on Bayesan Models Commonly Used n Genomc Selecton

23 GWMAS (BLUP ð Model: y = µ + X g + e = Marker effects assumed normally dstrbuted wth a common varance,.e.: g ~ N(0, σ 0 Estmates: ð How to choose? Arbtrary; but controls amount of shrnkage Alternatve: set, where s an estmate (ror of total addtve genetc varance

24 BAYES A (Meuwssen et al. 00 y = µ + X g + e = y µ, g, σe ~ N( µ + X g =, Iσ e g σ ~ N(0, σ ð Pror dstrbutons: σ ~ χ ( ν,s (scaled nverted ch-square dstrbuton wth scale arameter S and ν degrees of freedom σ e ~ χ (,0

25 y = µ + X g + e = BAYES B (Meuwssen et al. 00 y µ, g, σe ~ N( µ + X g =, Iσ e g σ ~ N(0, σ ð Pror dstrbutons: σ = σ ~ 0 χ wth robablty π ( ν,s wth robablty ( - π σ e ~ χ (,0

26 BAYES B * y = µ + X g + e = y µ, g, σe ~ N( µ + X g =, Iσ e ð Pror dstrbutons: σ g = 0 wth robablty π g ~ σ χ ~ N(0, σ ( ν,s wth robablty ( - π σ e ~ χ (,0

27 BAYES B ** y = µ + X g + e = y µ, g, σe ~ N( µ + X g =, Iσ e ð Pror dstrbutons: σ g g ~ σ σ χ ~ N(0,c 0 ~ N(0, σ ( ν,s wth robablty π wth robablty ( - π σ e ~ χ (,0

28 BAYES C y = µ + X g + e = y µ, g, σe ~ N( µ + X g =, Iσ e ð Pror dstrbutons: g = 0 wth robablty π g σ g ~ N(0, σ g wth robablty ( - π π ~ Unform(0, σ g ~ χ (ν,s σ e ~ χ (,0

29 BAYESIAN LASSO y = µ + X g + e = y µ, g, σe ~ N( µ + X g =, Iσ e g σ ~ N(0, σ ð Pror dstrbutons: σ ~ Eonental( λ σ e ~ χ (,0

30 GBLUP Regresson wth genetc effects wth normal dstrbuton wth common varance y = µ + X g + e =, wth: g σ g ~ N(0, σ g Equvalent Model y = µ + b + e, wth: b σ b ~ N(0,Gσ b G s the genomc relatonsh matr: # & G = % ( ( $ = ' (X M(X M'

31 ssgblup Sngle med model wth all anmals (genotyed and non-genotyed ncluded, wth matr A relaced by H: " H = A + $ G # $ A % ' &'

32 COMPARISON OF GWMAS MODELS Whole Genome Predcton Wthn and Across Envronments: An Alcaton to Wheat Yeld Duan, H. 0 (Master Thess

33 Data Descrton Global Wheat Program of the Internatonal Maze and Wheat Imrovement Center (CIMMYT 599 wheat lnes wth,79 markers genotyed Global envronments were groued nto four macroenvronments (ME, ME, ME 3, and ME 4 Standardzed wth mean of 0 and standard devaton of Research Goal Assess erformance of dfferent models (Bayes A, BayesB, Bayes C, Bayesan LASSO, and Bayesn Rdge for redcton wthn and across envronment (G E nteracton

34 Bolot of standardzed wheat yelds n each macroenvronment.

35 Scatterlots of wheat yelds for each ar of macroenvronments.

36 Genotye macroenvronment nteracton: atterns of yeld across envronments for the 599 wheat lnes.

37 Correlatons between observed and redcted yeld wthn macroenvronments.

38 Correlatons between observed and redcted yeld across macroenvronments; Models: Bayes A and Bayes C.

39 Hgh Throughut Comutng The Condor at UW-Madson: htt://research.cs.wsc.edu/condor/

40 MODELLING NON-ADDITIVE GENETIC EFFECTS A Quck Tour on Semarametrc Kernel-based Methods

41 GWMAS (Includng Non-Addtve Genetc Effects ð Many studes that attemt to dentfy the genetc bass of comle trats gnore the ossblty that loc nteract, deste ts known substantal contrbuton to genetc varaton (Carborg and Haley 005. ð Dekkers and Hostal (00 also onted out that estng statstcal methods for marker-asssted selecton do not deal well wth comlety osed by quanttatve trats, among of the reasons they cte the nadequate handlng of non-addtvty. ð To address ths ssue, etensons of the GWMAS model to accommodate domnance and some level of estass have been roosed, as dscussed net.

42 ð Etensons of the GWMAS model to accommodate domnance and some level of estass have been roosed (Y et al. 003; Huang et al. 007; Xu 007, whch can be descrbed as: g ' y = µ + X g + X ' g ' + e = where the refer to nteracton terms relatve to estatc effects nvolvng loc and, and reresent arorate desgn matrces. X ' ð In the case of dallelc loc such as SNPs, each row of X g can be factorze ' nto addtve and domnance effects as g = α + ( δ, where = -, 0 or for the three ossble genotyes aa, Aa and AA, resectvely, and α and δ reresent the addtve and domnance effects relatve to loc. ð Smlarly, the four degrees of freedom relatve to each arwse nteracton between dallelc loc can be descrbed as: ' ' g ' = ' αα ' + ' ( αδ ' + ( ' δα ' + ( ' ( where αα ', αδ ', δα ', and δδ ' reresent addtve addtve, addtve domnance, domnance addtve, and domnance domnance estass between loc and. ' > δδ '

43 SEMIPARAMETRIC APPROACHES ð The non-addtve GWMAS model resented reles on very strong assumtons, such as lnearty, multvarate normalty, and roorton of segregatng loc (Ganola et al. 006 ð The genome seems to be much more hghly nteractve than what such models can accommodate. The number of hgher-order nteractons (.e., mult-loc estatc effects grows etremely quckly wth the ncrease on the number of markers ð Partton of genetc varance nto orthogonal addtve, domnance, addtve addtve, addtve domnance, etc. comonents s ossble only under hghly dealzed, unrealstc condtons (Cockerham 954; Kemthorne 954 ð Ganola et al. (006 and Ganola and van Kaam (008 roosed sem-arametrc aroaches usng kernel regresson and reroducng kernel Hlbert saces (RKHS regresson rocedures embedded nto standard med-effects lnear models

44 ð RKHS Regresson ð Non-arametrc reresentaton of f( : ð Bayesan RKHS regresson σ σ = f 0 0, N I K 0 0 ε f, ε f y + µ + = = σ = σ, K(, K(, K(, K(, K(, K( Cov( n n n n f f K f ' n ],f (, f ( [ f = f ε + = y GWMAS: Imlct Regresson usng RKHS kernel

45 Choosng the Reroducng Kernel ð Stes n creatng a kernel: Defne a noton of dstance n nut sace Ma from dstance to covarance structure ð Eamles: Inut sace: edgree nformaton K = A = or K = D, Inut sace: genotye nformaton etc. Indvdual Genotye AA Aa 3 Aa K * = 0.5 or K 0.5 * = , etc.

46 RKHS Regresson: Loss functon (e.g. logl, RSS Smoothng arameter Model comlety measurement; squared norm n Hlbert sace Kmeldorf and Wahba 970 nn ostve defnte matr whch elements are evaluatons of a RK n vector of unknown constants

47 Bayesan RKHS Anmal Model u = Kc

48 MODELLING NON-ADDITIVE GENETIC EFFECTS An Eamle Usng Nonarametrc Methods Gonzalez-Reco O, Ganola D, Long N, Wegel KA, Rosa GJM and Avendano S. Nonarametrc methods for ncororatng genomc nformaton nto genetc evaluatons: an alcaton to mortalty n brolers. Genetcs 78(4: , 008.

49 SEMIPARAMETRIC APPROACHES (Gonzalez-Reco et al. 008 ð Fve aroaches comared: F -metrc (lnear regresson model Kernel regresson RKHS regresson Bayesan regresson; wth,000 SNPs Standard E-BLUP genetc evaluaton wth 4 re-selected SNPs ð Resonse varable: late mortalty (4-4 days of age ð Phenotyc data:,67 rogeny of 00 sres

50 RESULTS ð Hertablty estmate (h = 0.0 from the E-BLUP aroach suggest that genetc evaluaton could be mroved f sutable molecular markers were avalable. ð Posteror means and standard devatons of the resdual varance suggest that RKHS accounted for more varance n the data.

51 ð The two nonarametrc methods ftted the data better, havng lower RSS.

52 RESULTS ð Predctve ablty ndcated advantages of the RKHS aroach relatve to other methods.

53 SELECTIVE GENOTYPING The Effect of Selectvely Genotyng Indvduals n Genomc Selecton Bolgon AA, Long N, Albuquerque LG, Wegel KA, Ganola D and Rosa GJM. Comarson of selectve genotyng strateges for redcton of breedng values n a oulaton undergong selecton. Journal of Anmal Scence, 0 (n ress.

54 Introducton Genomc selecton: Genome-enabled redcton of breedng values Genotyng cost: Genotye subset of anmals Effect of redcton accuracy? Obectve: To evaluate the qualty of GEBV for canddates to selecton based on dfferent strateges of selectve genotyng of a oulaton undergong selecton, wth dfferent selecton ntenstes

55 Materal and Methods Poulaton/Generatons 5,000 generatons (mutaton-drft equlbrum t : 00 anmals (50 males + 50 females t 5000 : 00 anmals G 0:,500 anmals G :,500 anmals Random matng Factoral matng Drectonal selecton Selecton ntenstes % # ,500

56 Materal and Methods Markers and Genetcs Effects Genome: 0 chromosomes wth 00 cm each Loc: 30 balellc loc (0 markers + 00 QTL n each chromosome M M Q M 3 M 4 M 99 M 00 Q 00 M 0 M 0 Mutaton rates: QTL.5 0-5, Markers QTL effects: Normally dstrbuted Hertablty: h = 0.0, 0.5 and 0.50

57 Materal and Methods Analyss Tranng oulaton: 500 genotyed anmals n G 0 Selectve genotyng strateges: Random, To, Bottom, Etreme, Less related Testng oulaton: Generaton G Model: Bayesan LASSO Performance: Correlatons between GEBV and TBV (accuracy, and Predctve mean square error

58 Materal and Methods G 0 Breedng selecton Genotyng selecton G Testng oulaton:,500 selecton canddates

59 Predcton accuraces (correlatons between GEBV and TBV Correlatons Correlatons Selecton of Anmals n G 0 Selecton of Anmals n G 0

60 Predctve mean squared error (PMSE PMSE PMSE Selecton of Anmals n G 0 Selecton of Anmals n G 0

61 Number and ercentage of concdence anmals

Lecture 8 Genomic Selection

Lecture 8 Genomic Selection Guilherme J. M. Rosa University of Wisconsin-Madison Mixed Models in Quantitative Genetics SISG, Seattle 18 0 Setember 018 OUTLINE Marker Assisted Selection Genomic Selection