Lecture 5: BLUP (Best Linear Unbiased Predictors) of genetic values. Bruce Walsh lecture notes Tucson Winter Institute 9-11 Jan 2013

Similar documents
Mixed-Model Estimation of genetic variances. Bruce Walsh lecture notes Uppsala EQG 2012 course version 28 Jan 2012

Mixed-Models. version 30 October 2011

Multiple random effects. Often there are several vectors of random effects. Covariance structure

MIXED MODELS THE GENERAL MIXED MODEL

Models with multiple random effects: Repeated Measures and Maternal effects

Lecture 28: BLUP and Genomic Selection. Bruce Walsh lecture notes Synbreed course version 11 July 2013

Lecture 3: Linear Models. Bruce Walsh lecture notes Uppsala EQG course version 28 Jan 2012

Lecture 2: Linear and Mixed Models

Lecture 5 Basic Designs for Estimation of Genetic Parameters

Best unbiased linear Prediction: Sire and Animal models

Quantitative characters - exercises

Lecture 2: Linear Models. Bruce Walsh lecture notes Seattle SISG -Mixed Model Course version 23 June 2011

Lecture 4: Allelic Effects and Genetic Variances. Bruce Walsh lecture notes Tucson Winter Institute 7-9 Jan 2013

3. Properties of the relationship matrix

Lecture 9. QTL Mapping 2: Outbred Populations

Lecture 4. Basic Designs for Estimation of Genetic Parameters

Lecture 32: Infinite-dimensional/Functionvalued. Functions and Random Regressions. Bruce Walsh lecture notes Synbreed course version 11 July 2013

INTRODUCTION TO ANIMAL BREEDING. Lecture Nr 3. The genetic evaluation (for a single trait) The Estimated Breeding Values (EBV) The accuracy of EBVs

5. Best Linear Unbiased Prediction

GBLUP and G matrices 1

Lecture 9. Short-Term Selection Response: Breeder s equation. Bruce Walsh lecture notes Synbreed course version 3 July 2013

Breeding Values and Inbreeding. Breeding Values and Inbreeding

Lecture 3. Introduction on Quantitative Genetics: I. Fisher s Variance Decomposition

Lecture 2: Introduction to Quantitative Genetics

Lecture 2. Basic Population and Quantitative Genetics

VARIANCE COMPONENT ESTIMATION & BEST LINEAR UNBIASED PREDICTION (BLUP)

Lecture 7 Correlated Characters

Likelihood Methods. 1 Likelihood Functions. The multivariate normal distribution likelihood function is

Lecture 13 Family Selection. Bruce Walsh lecture notes Synbreed course version 4 July 2013

Short-Term Selection Response: Breeder s equation. Bruce Walsh lecture notes Uppsala EQG course version 31 Jan 2012

Lecture 2. Fisher s Variance Decomposition

Animal Model. 2. The association of alleles from the two parents is assumed to be at random.

Lecture 9 Multi-Trait Models, Binary and Count Traits

Should genetic groups be fitted in BLUP evaluation? Practical answer for the French AI beef sire evaluation

Prediction of breeding values with additive animal models for crosses from 2 populations

MODELLING STRATEGIES TO IMPROVE GENETIC EVALUATION FOR THE NEW ZEALAND SHEEP INDUSTRY. John Holmes

Lecture WS Evolutionary Genetics Part I 1

Accounting for read depth in the analysis of genotyping-by-sequencing data

REML Variance-Component Estimation

Reduced Animal Models

MA 575 Linear Models: Cedric E. Ginestet, Boston University Mixed Effects Estimation, Residuals Diagnostics Week 11, Lecture 1

Repeated Records Animal Model

Chapter 11 MIVQUE of Variances and Covariances

Variance Component Models for Quantitative Traits. Biostatistics 666

Lecture 11: Multiple trait models for QTL analysis

The concept of breeding value. Gene251/351 Lecture 5

Linear Models for the Prediction of Animal Breeding Values

Genetic evaluation for three way crossbreeding

Chapter 5 Prediction of Random Variables

Introduction to General and Generalized Linear Models

Prediction. is a weighted least squares estimate since it minimizes. Rasmus Waagepetersen Department of Mathematics Aalborg University Denmark

Linear Mixed-Effects Models. Copyright c 2012 Dan Nettleton (Iowa State University) Statistics / 34

Lecture 1 Hardy-Weinberg equilibrium and key forces affecting gene frequency

WLS and BLUE (prelude to BLUP) Prediction

Common Mating Designs in Agricultural Research and Their Reliability in Estimation of Genetic Parameters

Raphael Mrode. Training in quantitative genetics and genomics 30 May 10 June 2016 ILRI, Nairobi. Partner Logo. Partner Logo

Lecture Notes. Introduction

Pedigree and genomic evaluation of pigs using a terminal cross model

PREDICTION OF BREEDING VALUES FOR UNMEASURED TRAITS FROM MEASURED TRAITS

Covariance between relatives

Chapter 12 REML and ML Estimation

Lecture 6. QTL Mapping

... x. Variance NORMAL DISTRIBUTIONS OF PHENOTYPES. Mice. Fruit Flies CHARACTERIZING A NORMAL DISTRIBUTION MEAN VARIANCE

Resemblance between relatives

Quantitative Genetics

Lecture 8. QTL Mapping 1: Overview and Using Inbred Lines

Estimation of Parameters in Random. Effect Models with Incidence Matrix. Uncertainty

Applications of Index Selection

Lecture 6: Selection on Multiple Traits

Resemblance among relatives

Association studies and regression

Lecture 1 Basic Statistical Machinery

11. Linear Mixed-Effects Models. Copyright c 2018 Dan Nettleton (Iowa State University) 11. Statistics / 49

BLUP without (inverse) relationship matrix

Calculation of IBD probabilities

Lecture 6: Introduction to Quantitative genetics. Bruce Walsh lecture notes Liege May 2011 course version 25 May 2011

Lecture 08: Standard methods. Bruce Walsh lecture notes Tucson Winter Institute 9-11 Jan 2013

A simple method of computing restricted best linear unbiased prediction of breeding values

Estimating Breeding Values

Lecture 3: Basic Statistical Tools. Bruce Walsh lecture notes Tucson Winter Institute 7-9 Jan 2013

A mixed model based QTL / AM analysis of interactions (G by G, G by E, G by treatment) for plant breeding

Selection on Multiple Traits

Animal Models. Sheep are scanned at maturity by ultrasound(us) to determine the amount of fat surrounding the muscle. A model (equation) might be

Lecture 9 GxE Mixed Models. Lucia Gutierrez Tucson Winter Institute

Recent advances in statistical methods for DNA-based prediction of complex traits

Maternal Genetic Models

H = σ 2 G / σ 2 P heredity determined by genotype. degree of genetic determination. Nature vs. Nurture.

Introduction to QTL mapping in model organisms

RANDOM REGRESSION IN ANIMAL BREEDING

Linear Regression (1/1/17)

Family-Based Selection

Distinctive aspects of non-parametric fitting

THE ANOVA APPROACH TO THE ANALYSIS OF LINEAR MIXED EFFECTS MODELS

(Genome-wide) association analysis

Quantitative characters II: heritability

Applications of Index and Multiple-Trait Selection

Normal distribution We have a random sample from N(m, υ). The sample mean is Ȳ and the corrected sum of squares is S yy. After some simplification,

Case-Control Association Testing. Case-Control Association Testing

Computations with Markers

SYLLABUS MIXED MODELS IN QUANTITATIVE GENETICS. William (Bill) Muir, Department of Animal Sciences, Purdue University

Transcription:

Lecture 5: BLUP (Best Linear Unbiased Predictors) of genetic values Bruce Walsh lecture notes Tucson Winter Institute 9-11 Jan 013 1

Estimation of Var(A) and Breeding Values in General Pedigrees The classic designs (ANOVA, P-O regression) for variance components are simple, involving only a single type of relative comparison. Further, they assume balanced designs, with the number of offspring the same in each family. In the real world, we often have a pedigree of relatives, with a very unbalanced design. Fortunately, the general mixed model (so called because it includes both fixed and random effects), offers an ideal platform for both estimating genetic variances as well a predicting the breeding values of individuals. Almost all animal breeding is based on such models, with REML (restricted max likelihood) used to estimated variances and BLUP (best linear unbiased predictors) used to predict BV

BLUP in plant breeding BLUP has migrated from animal breeding into plant breeding. Advantages: Handles unbalanced designs Uses information for all relatives measured to improve estimates BLUP can be used to estimate a variety of genetic values GCA, SCA, line values (i.e., genotypic values of pure lines) One can also use BLUP machinery to estimate environmental effects 3

The general mixed model Vector of observations (phenotypes) Vector of fixed effects (to be estimated), e.g., year, location and treatment effects Y = X! + Zu + e Incidence matrix for random effects Vector of residual errors (random effects) Incidence matrix for fixed effects Vector of random effects, such as individual genetic values (to be estimated) 4

The general mixed model Vector of observations (phenotypes) Vector of fixed effects Incidence matrix for random effects Y = X! + Zu + e Vector of residual errors Incidence matrix for fixed effects Vector of random effects Observe y, X, Z. Estimate fixed effects! Estimate random effects u, e 5

Example Suppose we wish to estimate the breeding values of three sires (fathers), each of which is mated to a random female (dam), producing two offspring, some reared in environment one, others in environment two. The data are Observation Value Sire environment Y 111 9 1 1 Y 11 1 1 Y 11 11 1 Y 1 6 1 Y 311 7 3 1 Y 31 14 3 6

Here the basic model is Y ijk =! j + u i + e ijk Effect of environment j The mixed model vectors and matrices become Breeding value of sire i y = y 1,1,1 y 1,,1 y,1,1 y,1, y 3,1,1 y 3,,1 = 9 1 11 6 7 14 X = 1 0 0 1 1 0, Z = 1 0 1 0 0 1 1 0 0 1 0 0 0 1 0 0 1 0 0 0 1 0 0 1, β = ( β1 β ), u = u 1 u u 3 7

Means & Variances for y = X! + Zu + e Means: E(u) = E(e) = 0, E(y) = X! Variances: Let R be the covariance matrix for the residuals. We typically assume R = " e*i Let G be the covariance matrix for the breeding values (the vector u) The covariance matrix for y becomes V = ZGZ T + R 8

Estimating fixed Effects & Predicting Random Effects For a mixed model, we observe y, X, and Z!, u, R, and G are generally unknown Two complementary estimation issues (i) Estimation of! and u ( X T V 1 X) 1 X T V 1 y β = Estimation of fixed effects BLUE = Best Linear Unbiased Estimator ) û = GZ T V (y 1 - X β Prediction of random effects BLUP = Best Linear Unbiased Predictor Recall V = ZGZ T + R 9

Let s return to our example Assume residuals uncorrelated & homoscedastic, R = " e *I. Hence, need " e to solve BLUE/BLUP equations. Suppose " e = 6, giving R = 6* I Now consider G, the covariance matrix for u (the vector of the three sire breeding values). Assume the sires are unrelated, so G is diagonal with element " G = sire variance, where " G = " A /4. Suppose " A = 8, giving G G = 8/4*I 10

1 0 0 1 0 0 0 0 0 1 0 0 V = 8 0 1 0 1 0 0 0 1 0 1 1 0 0 0 0 0 1 0 0 0 0 0 0 1 1 0 0 0 0 1 0 0 0 +6 4 0 1 0 0 0 0 1 0 0 0 0 1 0 0 0 0 1 1 0 0 1 0 0 0 0 1 0 0 0 1 0 0 0 0 0 1 8 0 0 0 0 8 0 0 0 0 0 0 8 0 0 = 0 0 8 0 0 0 0 0 0 8 0 0 0 0 8 Solving, recalling that V = ZGZ T + R giving V 1 = 30 1-4 1 0 0 0 0 1 4 0 0 0 0 0 0 4 1 0 0 0 0 1-4 0 0 0 0 0 0 4 1 0 0 0 0 1 4 β = ( β1 β ) = û = û1 û u 3 ( ) 1 X T V 1 - - X X T - V 1 y = 1 18 ( ) = GZ T V 1 y X β = 1 18 ( ) 148 35 1 1 11

BLUP estimates of line values Bernardo example (11.3.1): yield in four (related) inbred lines of Barley raised over two sets of environments Environment type (fixed) Line value (random) Model ) 1

b i = µ + t i Line values Relationship values (from pedigree data on how lines are related) Observations differ in their residual error due to sample size differences 13

Henderson s Mixed Model Equations y = X! + Zu + e, u ~ (0,G), e ~ (0, R), cov(u,e) = 0, If X is n x p and Z is n x q p x p XT R 1 X Z T R 1 X q x pq The whole matrix is (p+q) x (p+q) p x q X T R 1 Z β = XT R 1 y Z T R 1 Z + G 1 û Z T R 1 y q x q Easier to numerically work with than BLUP/BLUE equations ( ) β = X T V 1 X 1 X T V 1 y ) û = GZ T V (y 1 X β V = ZGZ T + R Inversion of an n x n matrix 14

15

Let s redo our example on slide 6 using Henderson s Equation X T R 1 X = 1 6 ( ) 4 0, X T R 1 Z = 0 ( ) T Z T R 1 1 X = 6 ( 1 ) 1 1 0 1 G 1 +Z T R 1 Z = 5 6 1 0 0 0 1 0 0 0 1, X T R 1 y = 1 6 ( 33 6 ), Z T R 1 y = 1 6 1 17 1 4 0 1 1 0 1 0 1 1 1 5 0 0 0 0 5 0 1 1 0 0 5 β 1 β û 1 û û 3 = 33 6 1 17 1 Taking the inverse gives β 1 β û 1 û û 3 = 1 18 148 35 1 1 As found above 16

The Animal Model, y i = µ + a i + e i Here, the individual is the unit of analysis, with y i the phenotypic value of the individual and a i its BV 1 a 1 X = a 1.., β = µ, u =. G = σa A, 1 a k Where the additive genetic relationship matrix A is given by A ij = # ij,,namely twice the coefficient of coancestry Assume R = " e *I, so that R-1 = 1/(" e )*I. Likewise, G = " A *A, so that G-1 = 1/(" A )*A-1. The animal model estimates the breeding value for each individual, even for a plant or tree! Same approach also works to estimate line (genotypic) values for inbreds. 17

Returning to the animal model Henderson s mixed model equations XT X Z T X X T Z β = XT y Z T Z + λ A 1 û Z T y This reduces to here $ = " e / " A = (1-h )/h n 1T 1 I + λ A 1 µ û n yi = y 18

Suppose our pedigree is Example 1 3 4 5 A = 1 0 0 1/ 0 0 1 0 1/ 1/ 0 0 1 0 1/ 1/ 1/ 0 1 1/4 0 1/ 1/ 1/4 1 Suppose $ =1 (corresponds to h = 0.5). In this case, I + λ A 1 = 5/ 1/ 0 1 0 1/ 3 1/ 1 1 0 1/ 5/ 0 1 1 1 0 3 0 0 1 1 0 3 19

Suppose the vector of observations is y = y 1 y y 3 = y 4 y 5 7 9 10 6 9 Here n = 5, % y = 41, and Henderson s equation becomes 5 1 1 1 1 1 µ 1 5/ 1/ 0 1 0 â 1 1 1/ 3 1/ 1 1 â 1 0 1/ 5/ 0-1 â 3 1-1 1 0 3 0 â 4 1 0 1 1 0 3 â 5 = 41 7 9 10 6 9 Solving gives µ = 440 53 8.30, â 1 â â 3 â 4 a 5 = 66/689 4/53 610/689 73/689 381/689 0.961 0.076 0.885 1.06 0.553 0

More on the animal model Under the animal model y = X! + Za + e a ~ (0," A A), e ~ (0, " e I) BLUP(a) = " A AZ T V -1 (y- X!) Where V = ZGZ T + R = " A ZAZ T + " e I Consider the simplest case of a single observation on one individual, where the only fixed effect is the mean µ, which is assumed known Here Z = A = I = (1), V = " A + " e " A AZ T V -1 = " A /(" A + " e ) = h BLUP(a) = h (y-µ) 1

More generally, with single observations on n unrelated individuals, A = Z = I n x n V = " A ZAZ T + " e I = (" A + " e ) I " A AZ T V -1 = h I BLUP(a) = " A AZ T V -1 (y- X!) = h (y- µ) Hence, the predicted breeding value of individual i is just BLUP(a i ) = h (y i -µ) When at least some individuals are related and/or inbred (so that A = I) and/or missing or multiple records (so that Z = I), then the estimates of the BV differ from this simple form, but BLUP fully accounts for this

BLUP is a shrinkage estimator For a single observation on one individual, BLUP(a) = h (y-µ) The difference between the observed value (y) and the mean (µ) is shrunk by the factor h --- shrinks the estimate back towards the mean (zero in the case of BVs) More generally, BLUP(a) = GZ T V -1 (y- X!) First adjusts observations (y) for fixed effects (X!) and then regresses this difference back towards zero (the mean BV), as Cov*Var -1 is a generalized regression coefficient 3

The Relationship Matrix A Typically given from a pedigree, but increasingly being estimated from marker data The diagonal elements indicate the amount of inbreeding A ii = 1 + F i, where F i is inbreeding coefficent for individual i. For a fully-inbred, A ii = 4

Marker-based relationship matrices There are two reasons for using a markerestimated relationship matrix Pedigree either unknown or poorly known With very dense markers, provides a better estimate than a known pedigree. Why? Consider two (non-inbred) full-sibs. The expectation under a pedigree is that they share exactly half their genes. However, there is a sampling variance about this expected value, so that some pair of sibs may share more than 50%, while another may share less. Using markers to detect such pairs improves the estimated values This is called G-BLUP (in animal breeding) and is a form of genomic selection 5

Marker-based relationship matrix Simplest case is to consider a very large number (L) of SNPs, and treat alike in state as IBD, and then compute the probability f xy that x and y share a randomly-drawn allele for each SNP marker. Twice the average over all markers is the entry for x and y in the relationship matrix (as A xy = f xy ) SNP genotype for x SNP genotype for y 00 01 11 00 1 0.5 0 01 0.5 0.5 0.5 11 0 0.5 1 Values for f xy given the SNP genotypes 6

Estimation of R and G A second estimation issue concerns the covariance matrix for residuals R and for breeding values G As we have seen, both matrices have the form " *B, where the variance " is unknown, but B is known For example, for residuals, R = " e*i For breeding values, G = " A*A, where A is given from the pedigree 7

REML Variance Component Estimation REML = Restricted Maximum Likelihood. Standard ML variance estimation assumes fixed factors are known without error. Results in downward bias in variance estimates REML maximizes that portion of the likelihood that does not depend on fixed effects Basic idea: Use a transformation to remove fixed effect, then perform ML on this transformed vector 8

Simple variance estimate under ML vs. REML ML = 1 n n i+1 (x x), REML = 1 n 1 n (x x) i+1 REML adjusts for the estimated fixed effect, in this case, the mean With balanced design, ANOVA variance estimates are equivalent to REML variance estimates 9

Multiple random effects y = X! + Za + Wu + e y is a n x 1 vector of observations! is a q x 1 vector of fixed effects a is a p x 1 vector of random effects u is a m x 1 vector of random effects X is n x q, Z is n x p, W is n x m y, X, Z, W observed.!, a, u, e to be estimated 30

Covariance structure y = X! + Za + Wu + e Defining the covariance structure key in any mixed-model Suppose e ~ (0," e I), u ~ (0," u I), a ~ (0," A A), as with breeding values These covariances matrices are still not sufficient, as we have yet to give describe the relationship between e, a, and u. If they are independent: 31

y = X! + Za + Wu + e Covariance matrix for the vector of observations y Note that if we ignored the second vector u of random effects, and assumed y = X! + Za + e*, then e* = Wu + e, with Var(e*) = " e I + " u WW T Consequence of ignoring random effects is that these are incorporated into the residuals, potentially compromising its covariance structure 3

Mixed-model Equations 33

The repeatability model Often, multiple measurements (aka records ) are collected on the same individual Such a record for individual k has three components Breeding value a k Common (permanent) environmental value p k Residual value for ith observation e ki Resulting observation is thus z ki = µ + a k + p k +e ki The repeatability of a trait is r = (" A +" p )/" z Resulting variance of the residuals is " e = (1-r) " z 34

Resulting mixed model y = X! + Za + Zp + e Notice that we could also write this model as y = X! + Z(a + p) + e = y = X! + Zv + e, v = a+p In class question: Why can we obtain separate estimates of a and p? 35

36

The incident matrix Z Suppose we have a total of 7 observations/records, with 3 measures from individual 1, from individual, and from individual 3. Then: Why? Matrix multiplication. Consider y 1. y 1 = µ + A + p + e 1 37

Consequences of ignoring p Suppose we ignored the permanent environment effects and assumed the model y = X! + Za + e* Then e* = Zp + e, Var(e*) = " e I + " p ZZ T Assuming that Var(e*) = " e I gives an incorrect model We could either use y = X! + Za + e* with the correct error structure (covariance) for e* = " e I + " p ZZ T Or use y = X! + Za +Zp + e, where e = " e I 38

Generalizing BLUP Thus far, we have framed BLUP in the standard animal breeding context which estimates a vector of breeding values from the genetic relationship matrix More generally, we can estimate any number of vectors g of genetic parameters (such as CGA, SCA, line values) given some matrix of genetic relatedness Historically the relatedness matrix is obtained from a pedigree, but now with dense markers it can be estimated directly 39

BLUP for GCA, SCA Again, Example from Bernardo (11.5.1) B73, B84, and H13 are in one maize heteroic group (Stiff Stalk), Mo17 and N197 in another (Lancaster) Note highly unbalanced design --- not all crosses in both environments 40

Covariance matrix based on pedigree information (see Bernardo for details) 41