GBLUP and G matrices 1

Similar documents
Lecture 5: BLUP (Best Linear Unbiased Predictors) of genetic values. Bruce Walsh lecture notes Tucson Winter Institute 9-11 Jan 2013

Quantitative characters - exercises

Breeding Values and Inbreeding. Breeding Values and Inbreeding

Crosses. Computation APY Sherman-Woodbury «hybrid» model. Unknown parent groups Need to modify H to include them (Misztal et al., 2013) Metafounders

3. Properties of the relationship matrix

Accounting for read depth in the analysis of genotyping-by-sequencing data

Genetic evaluation for three way crossbreeding

Best unbiased linear Prediction: Sire and Animal models

Pedigree and genomic evaluation of pigs using a terminal cross model

Lecture 28: BLUP and Genomic Selection. Bruce Walsh lecture notes Synbreed course version 11 July 2013

MIXED MODELS THE GENERAL MIXED MODEL

(Genome-wide) association analysis

Lecture 11: Multiple trait models for QTL analysis

Case-Control Association Testing. Case-Control Association Testing

Mixed-Model Estimation of genetic variances. Bruce Walsh lecture notes Uppsala EQG 2012 course version 28 Jan 2012

Calculation of IBD probabilities

INTRODUCTION TO ANIMAL BREEDING. Lecture Nr 3. The genetic evaluation (for a single trait) The Estimated Breeding Values (EBV) The accuracy of EBVs

Large scale genomic prediction using singular value decomposition of the genotype matrix

Lecture 13: Population Structure. October 8, 2012

Lecture 9. QTL Mapping 2: Outbred Populations

Calculation of IBD probabilities

A relationship matrix including full pedigree and genomic information

Resemblance between relatives

LECTURE # How does one test whether a population is in the HW equilibrium? (i) try the following example: Genotype Observed AA 50 Aa 0 aa 50

The concept of breeding value. Gene251/351 Lecture 5

Bases for Genomic Prediction

Lecture WS Evolutionary Genetics Part I 1

Reduced Animal Models

arxiv: v1 [stat.me] 10 Jun 2018

5. Best Linear Unbiased Prediction

Mixed-Models. version 30 October 2011

Extension of single-step ssgblup to many genotyped individuals. Ignacy Misztal University of Georgia

1.5.1 ESTIMATION OF HAPLOTYPE FREQUENCIES:

Orthogonal Estimates of Variances for Additive, Dominance. and Epistatic Effects in Populations

Genotyping strategy and reference population

Prediction of breeding values with additive animal models for crosses from 2 populations

Variance Component Models for Quantitative Traits. Biostatistics 666

Using the genomic relationship matrix to predict the accuracy of genomic selection

Repeated Records Animal Model

Methods for QTL analysis

1 Springer. Nan M. Laird Christoph Lange. The Fundamentals of Modern Statistical Genetics

Lecture 9. Short-Term Selection Response: Breeder s equation. Bruce Walsh lecture notes Synbreed course version 3 July 2013

MODELLING STRATEGIES TO IMPROVE GENETIC EVALUATION FOR THE NEW ZEALAND SHEEP INDUSTRY. John Holmes

Animal Model. 2. The association of alleles from the two parents is assumed to be at random.

A simple method to separate base population and segregation effects in genomic relationship matrices

Prediction of the Confidence Interval of Quantitative Trait Loci Location

REDUCED ANIMAL MODEL FOR MARKER ASSISTED SELECTION USING BEST LINEAR UNBIASED PREDICTION. R.J.C.Cantet 1 and C.Smith

The E-M Algorithm in Genetics. Biostatistics 666 Lecture 8

INTRODUCTION TO ANIMAL BREEDING. Lecture Nr 2. Genetics of quantitative (multifactorial) traits What is known about such traits How they are modeled

Population Genetics. with implications for Linkage Disequilibrium. Chiara Sabatti, Human Genetics 6357a Gonda

Maternal Genetic Models

Lecture 1: Case-Control Association Testing. Summer Institute in Statistical Genetics 2015

Linear Models for the Prediction of Animal Breeding Values

BS 50 Genetics and Genomics Week of Oct 3 Additional Practice Problems for Section. A/a ; B/B ; d/d X A/a ; b/b ; D/d

CAN WE FRAME AND UNDERSTAND CROSS-VALIDATION RESULTS IN ANIMAL BREEDING? A. Legarra 1, A. Reverter 2

Relatedness 1: IBD and coefficients of relatedness

Estimating Breeding Values

Resemblance among relatives

Evolution of quantitative traits

Short-Term Selection Response: Breeder s equation. Bruce Walsh lecture notes Uppsala EQG course version 31 Jan 2012

Genomic model with correlation between additive and dominance effects

Limited dimensionality of genomic information and effective population size

MODEL-FREE LINKAGE AND ASSOCIATION MAPPING OF COMPLEX TRAITS USING QUANTITATIVE ENDOPHENOTYPES

Raphael Mrode. Training in quantitative genetics and genomics 30 May 10 June 2016 ILRI, Nairobi. Partner Logo. Partner Logo

Likelihood Methods. 1 Likelihood Functions. The multivariate normal distribution likelihood function is

Computations with Markers

Lecture 2: Genetic Association Testing with Quantitative Traits. Summer Institute in Statistical Genetics 2017

BLUP without (inverse) relationship matrix

Recent advances in statistical methods for DNA-based prediction of complex traits

Association Testing with Quantitative Traits: Common and Rare Variants. Summer Institute in Statistical Genetics 2014 Module 10 Lecture 5

Linear Regression (1/1/17)

Detecting selection from differentiation between populations: the FLK and hapflk approach.

Should genetic groups be fitted in BLUP evaluation? Practical answer for the French AI beef sire evaluation

Models with multiple random effects: Repeated Measures and Maternal effects

Mixture model equations for marker-assisted genetic evaluation

Alternative implementations of Monte Carlo EM algorithms for likelihood inferences

Lecture 3. Introduction on Quantitative Genetics: I. Fisher s Variance Decomposition

Multi-population genomic prediction. Genomic prediction using individual-level data and summary statistics from multiple.

Estimation of Parameters in Random. Effect Models with Incidence Matrix. Uncertainty

Covariance between relatives

A reduced animal model with elimination of quantitative trait loci equations for marker-assisted selection

1. they are influenced by many genetic loci. 2. they exhibit variation due to both genetic and environmental effects.

Population Genetics I. Bio

Distinctive aspects of non-parametric fitting

Genomic best linear unbiased prediction method including imprinting effects for genomic evaluation

Prediction of IBD based on population history for fine gene mapping

2. Map genetic distance between markers

Multiple random effects. Often there are several vectors of random effects. Covariance structure

On coding genotypes for genetic markers with multiple alleles in genetic association study of quantitative traits

Causal Graphical Models in Quantitative Genetics and Genomics

FOR animals and plants, many genomic analyses with SNP

Population Structure

I Have the Power in QTL linkage: single and multilocus analysis

UNIT 8 BIOLOGY: Meiosis and Heredity Page 148

EXERCISES FOR CHAPTER 3. Exercise 3.2. Why is the random mating theorem so important?

LINEAR MODELS FOR THE PREDICTION OF ANIMAL BREEDING VALUES SECOND EDITION

The Lander-Green Algorithm. Biostatistics 666 Lecture 22

AEC 550 Conservation Genetics Lecture #2 Probability, Random mating, HW Expectations, & Genetic Diversity,

Methods for Cryptic Structure. Methods for Cryptic Structure

Testing for spatially-divergent selection: Comparing Q ST to F ST

Transcription:

GBLUP and G matrices 1

GBLUP from SNP-BLUP We have defined breeding values as sum of SNP effects:! = #$ To refer breeding values to an average value of 0, we adopt the centered coding for genotypes described before Genotype 101 Coding 012 Coding Centered coding aa & ' 0 2* ' & ' Aa 0 & ' (1 2* ' )& ' AA & ' 2& ' 2 2* ' & ' 2

GBLUP from SNP-BLUP We have defined breeding values as sum of SNP effects:! = #$ Because %&' $ = () * +, then %&'! = # () * + #, = ## ) * + But before, we found out that ) * + =. / 0 + 2 3 4 3. Substituting: Finally, we factorize ) 9 + %&'! = ##, 2 6 7 8 7 ) 9 + 3

VanRaden s first G Genotypes {0,1,2} Shifted to refer to the average of a population with allele frequencies p! = #$%& #$%& ' % ) * + * =,', % ) * + * Scaled to refer to the genetic variance of a population with allele frequencies p 4

GBLUP! = #$ + &' + ( # ) * +, # # ) * +, & & ) * +, # & ) * +, & + - +, +0. / 1$ 2' = #) * +,! & ) * +,! 5

GBLUP! " # $%!! " # $% & & " # $%! & " # $% & + ( $% $+ ) * We obtain animal, not SNP, solutions,-./ =!" # $% 1 & " # $% 1 Immediate application to maternal effects model, random regression, competition effect models, multiple trait, etc. All genotyped individuals can be included, either with phenotype or not.. Regular software (blupf90, asreml, wombat ) works Therefore, GREML and G-Gibbs are simple extensions. 6

Multiple trait GBLUP! " # $%!! " # $% & & " # $%! & " # $% & + ( $% $% ( * +, -. =!" # $% 0 & " # $% 0 ( * is the matrix of genetic covariance across traits usually # = 1 # *, where # * is residual covariances. 7

Reliabilities Nominal reliabilities (NOT cross-validation reliabilities) can be obtained from the Mixed Model equations, as:!"# $ = 1 ($$ ) $$ * +, where ( $$ is the -, - element of the inverse of the mixed model equations 8

GBLUP == SNPBLUP Both give the same solutions We can jump from GBLUP to SNP-BLUP!" = $!%!% = 1 2 ) * + * $, -./ 0" 9

GREML, G-Gibbs Use of G to estimate variance components It can be done with remlf90, gibbs*f90, AsReml, TM The result will refer to an ideal population with whatever allelic frequencies we introduced in the denominator of! = #$ # % ' ( ) (. 10

But what are genomic (additive) relationships? 11

Kinship It obviously comes from Latin parentes 12

So what is kinship? Socially it has a pedigree interpretation e.g. all royal families are related However pedigrees go back forever We need a more rigorous definition 13

True relationships Two individuals are genetically identical (for a trait) if they carry the same genotype at the causal QTLs or genes This is a biological fact The genetics of one locus for two diploid individuals can be described using Gillois identity coefficients 14

Relationships Relationships were conceived as standardized covariances (Fisher, Wright)!"# $ %, $ ' = ) %' * +, ) %' some relationship * +, genetic variance Genetic relationships are due to shared (Identical By State) alleles at causal genes if I share the blood group 00 with somebody I am like his twin These genes are unknown (and many will likely remain so) Use proxies Pedigree relationships Marker relationships 15

X Y Juan León Pedro Petrona Francisco Caciana Juana Paiperon Mariana Beltrán Carmen Constantino Leonor Andrés Teresa Catoya Julio-Mencha Progeny-progeny Julio-progeny Mencha-progeny 1 0.01025 0.06580 0.03467 0.03467 2 0.02393 0.04333 0.08252 0.00000 3 0.02490 0.04333 0.00000 0.08252 4 0.02393 0.04333 0.06665 0.06665 5 0.02490 0.04333 0.08228 0.08228 6 0.00708 0.00729 0.00000 0.00000 7 0.05103 0.02383 0.00000 0.00000 8 0.05103 0.02383 0.00000 0.00000 9 0.05127 0.24713 0.08228 0.06665 10 0.10937 0.15900 0.29248 0.00000 11 0.16992 0.15900 0.00000 0.29248 12 0.00708 0.00729 0.06665 0.08228 13 0.05103 0.02383 0.00000 0.29248 14 0.05103 0.02383 0.29248 0.00000 15 0.34326 0.08582 0.00000 0.00000 Julio Mencha Figure 2. The pedigree of the Jicaque Indians Julio and Mencha. Garcia-Cortes Gen Sel Evol 2015 16

Pedigree relationships: A Systematic tabular rules to compute any! "# (Emik & Terrill 1947) The whole array of! "# is disposed in a matrix $. $ %& is very sparse and easy to create and manipulate (Henderson 1976) Extraordinary development of whole-pedigree methods in livestock genetics E.g. computing inbreeding for 15 generations including 10 6 sheep takes minutes 17

Early use of markers used them to infer A In conservation genetics, molecular markers have often been used to estimate pedigree relationships Gather markers, then reconstruct pedigrees, then construct A Either estimates of A xy, or estimates of «the most likely relation» (sondaughter, cousins, whatever) Li and Horvitz 1953, Cockerham 1969, Ritland 1996, Caballero & Toro 2002, and many others With abundant marker data we can do better than this 18

Realized relationships Identical By Descent Relationships based on pedigree are average relationships which assume infinite loci. «Real» IBD relationships! are a bit different due to finite genome size (Hill and Weir, 2010) Therefore A isthe expectation of realized relationhips! SNPs more informative than A. Two full sibs might have a correlation of 0.4 or 0.6 You need many markers to get these «fine relationships» 19

Traditional Pedigree Animal Sire Dam Sire of Sire Dam of Sire Sire of Dam Dam of Dam Interbull annual meeting 2007 (20) VanRaden 2007

Genomic Pedigree Interbull annual meeting 2007 (21) VanRaden 2007

Haplotype Pedigree atagatcgatcg ctgtagcgatcg agatctagatcg ctgtagcttagg agggcgcgcagt ctgtctagatcg atgtcgcgcagt cgatctagatcg cggtagatcagt agagatcgcagt atgtcgctcacg agagatcgatct atggcgcgaacg ctatcgctcagg Interbull annual meeting 2007 (22) VanRaden 2007

Genotype Pedigree Count number of second allele 122221121111 111211120200 011111012011 121101011110 101121101111 101101111102 121120011010 0 = homozygous for first allele (alphabetically) 1 = heterozygous 2 = homozygous for second allele (alphabetically) Interbull annual meeting 2007 (23) VanRaden 2007

Comparison of expected and observed variances relationship/sharing 4401 full sib pairs 400-800 markers Expected Mean 0.5 SD 0.039 Observed Mean 0.0498 0.498 SD 0.036 Range 0.37-0.63 Source: Visscher et al. Slide from WG 24 Hill

Covariance of gene content Consider gene content coding {"", "$, $$} as & = {01,2} Cockerham, 1969: For two individuals, the covariance of their gene contents is +,- &., & / = 0./ 212 In other words, two related individuals will show similar genotypes at the markers Backsolve 30./ = +,- &., & / /212. If we have centered z = & 21 then 30./ = 7 87 9 :;< Extended to many loci 30./ = =>?@ A 8A 9 B = A 8A 9 =>?@ : ; D < D : ; D < D B 25

VanRaden s first G Genotypes {0,1,2} Shifted to refer to the average of a population with allele frequencies p! = #$%& #$%& ' % ) * + * =,', % ) * + * If base allelic frequencies are used, G is an unbiased an efficient estimator of IBD realized relationships Scaled to refer to the genetic variance of a population with allele frequencies p 26

Some properties of G If p are computed from the sample In HWE & Linkage Equilibrium Average of Diag(G) = 1 Average (G) =0 With average inbreeding F Average of Diag(G) = 1+F! = #$%& #$%& ' % ) * + * AA Aa aa freq q 2 + pqf 2pq(1-F) p 2 + pqf 27

Some intriguing properties of G If p are computed from the data This implies that E(Breeding Values)=0 Positive and negativeinbreeding Some individuals are more heterozygous than the average of the population (OK, no biological problem) Positive and negativegenomic relationships This implies that individuals i and j are more distinct than an average pair of individuals in the data Fixing negative estimates of relationships to 0 is wrong praxis 28

Not positive definite Strandén & Christensen (2011) showed that if p s are averages across the sample then G is not positive definite (has no inverse) We could use BLUP equations with non-inverted G (Henderson, 1984) => seeexercises Instead, we use! = 0.99 or something similar &'() &'() * (, -. - + 0.01 I 29

Real results (AMASGEN) 9 real French bulls among 1827 genotyped, ~50000 SNPs Very complex pedigree, simplified graph: 1 2 2 3 4 5 7 8 9 30

Pedigree-based relationship Little inbreeding [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [1,] 1.00 0.51 0.57 0.51 0.26 0.15 0.15 0.14 0.14 [2,] 0.51 1.01 0.30 0.33 0.17 0.17 0.12 0.11 0.11 [3,] 0.57 0.30 1.07 0.30 0.20 0.12 0.18 0.11 0.12 [4,] 0.51 0.33 0.30 1.01 0.17 0.18 0.11 0.11 0.11 [5,] 0.26 0.17 0.20 0.17 1.00 0.56 0.51 0.52 0.53 [6,] 0.15 0.17 0.12 0.18 0.56 1.06 0.31 0.32 0.32 [7,] 0.15 0.12 0.18 0.11 0.51 0.31 1.01 0.30 0.29 [8,] 0.14 0.11 0.11 0.11 0.52 0.32 0.30 1.02 0.30 [9,] 0.14 0.11 0.12 0.11 0.53 0.32 0.29 0.30 1.03 Cousin relationships ~0.125 31

first G genomic relationship Less than 1 in the diagonal Negative coefficients [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [1,] 0.82 0.40 0.43 0.38 0.12 0.04 0.04 0.01 0.10 [2,] 0.40 0.91 0.18 0.24 0.02 0.05-0.04-0.04 0.04 [3,] 0.43 0.18 0.88 0.19 0.07 0.00 0.07-0.02 0.05 [4,] 0.38 0.24 0.19 0.86 0.02-0.01-0.02 0.01 0.03 [5,] 0.12 0.02 0.07 0.02 0.73 0.34 0.30 0.31 0.35 [6,] 0.04 0.05 0.00-0.01 0.34 0.85 0.15 0.14 0.18 [7,] 0.04-0.04 0.07-0.02 0.30 0.15 0.80 0.14 0.17 [8,] 0.01-0.04-0.02 0.01 0.31 0.14 0.14 0.80 0.17 [9,] 0.10 0.04 0.05 0.03 0.35 0.18 0.17 0.17 0.85 G ( p ) = ZZ /2 å pi 1- all SNPs i Relationships among cousins are ~0 32

GBLUP == GBLUP For all matrices of the kind! = #$%& #$%& ' % ) * + * We don t need to put the same p s in the upper and and in the lower part Changing allele frequencies in & shifts EBV s by a constant, % ) * + * scales But we can compensate through a change in the genetic variance Changing allele frequencies in E. g. 1.1 0.55 0.55 1.1 10 = 1 0.5 0.5 1 11 So, if variances are estimated by REML or «corrected» according to 2 3 4 5 4 results should be identical : see exercises 33

IBS relationships at the markers! "#$ is a genomic relationship matrix based on Identity By State at the markers The terms in % "#$ are usually described in terms of identities or countings:! "#$&' = ) * *,-) 2 2 /01 2 301 4 " /3, where 5 67 measures the identity across all 4 combinations of alleles 34

GBLUP == GBLUP with IBS! "#$ = & '! (.* +,, where! (.* is built pretending that. = 0.5 Again, using the right variance components we get the same EBVs 35

OK, so what should I use? For GBLUP it does not matter much, all models are equivalent If using REML or Bayesian methods you get the variance components right If you use pre-estimated variance components you want to use comparable variances This is a bit tricky but in most cases default G works just fine Comparing variance components and h " also gets tricky See Legarra 2016, TPB 36

OK, so what should I use? For SSGBLUP it is essential to have compatible genomic and pedigree relationships Populations evolve with time, but genotypes came years after pedigree started Genomic Predictions are shifted from Pedigree Predictions Compatibility is achieved if both relationships refer to the same genetic base: Same average BV at the base Same genetic variance at the base Will be presented at SSGBLUP 37