Mixed-Model Estimation of genetic variances Bruce Walsh lecture notes Uppsala EQG 01 course version 8 Jan 01
Estimation of Var(A) and Breeding Values in General Pedigrees The above designs (ANOVA, P-O regression) are simple, involving only a single type of relative comparison. Further, we assumed balanced designs, with the number of offspring the same in each family. In the real world, we often have a pedigree of relatives, with a very unbalanced design. Fortunately, the general mixed model (so called because it includes both fixed and random effects, offers an ideal platform for both estimating genetic variances as well a predicting the breeding values of individuals. Almost all animal breeding is based on such models, with REML (restricted max likelihood) used to estimated variances and BLUP (best linear unbiased predictors) used to predict BV
The general mixed model Vector of observations (phenotypes) Vector of fixed effects (to be estimated), e.g., year, sex and age effects Y = X! + Zu + e Incidence matrix for random effects Vector of residual errors (random effects) Incidence matrix for fixed effects Vector of random effects, such as individual Breeding values (to be estimated)
The general mixed model Vector of observations (phenotypes) Vector of fixed effects Incidence matrix for random effects Y = X! + Zu + e Vector of residual errors Incidence matrix for fixed effects Vector of random effects Observe y, X, Z. Estimate fixed effects! Estimate random effects u, e
Example Suppose we wish to estimate the breeding values of three sires, each of which is mated to a random dam, producing two offspring, some reared in environment one, others in environment two. The data are Observation Value Sire environment Y 111 9 1 1 Y 11 1 1 Y 11 11 1 Y 1 6 1 Y 311 7 3 1 Y 31 14 3
Here the basic model is Y ijk =! j + u i + e ijk Effect of environment j The mixed model vectors and matrices become Breeding value of sire i y 1,1,1 y 1,,1 y y =,1,1 y,1, y 3,1,1 y 3,,1 = 9 1 11 6 7 14 X = 1 0 0 1 1 0, Z = 1 0 1 0 0 1 1 0 0 1 0 0 0 1 0 0 1 0 0 0 1 0 0 1, β = ( β1 β ), u = u 1 u u 3
Means & Variances for y = X! + Zu + e Means: E(u) = E(e) = 0, E(y) = X! Variances: Let R be the covariance matrix for the residuals. We typically assume R = " e*i Let G be the covariance matrix for the breeding values (the vector u) The covariance matrix for y becomes V = ZGZ T + R
Effects of model misspecification Suppose we simply used a General Linear model (only fixed effects) for this example? Here y = X! + e*, Y ~ MVN(X!,V) where e* ~ MVN(0,V), implying The effect of using a mixed model is that it partitions the residual e* as e* = Zu + e
Estimating fixed Effects & Predicting Random Effects For a mixed model, we observe y, X, and Z!, u, R, and G are generally unknown Two complementary estimation issues (i) Estimation of! and u ( X T V 1 X) 1 X T V 1 y β = Estimation of fixed effects BLUE = Best Linear Unbiased Estimator ) û = GZ T V (y 1 - X β Prediction of random effects BLUP = Best Linear Unbiased Predictor Recall V = ZGZ T + R
Let s return to our example Assume residuals uncorrelated & homoscedastic, R = " e *I. Hence, need " e to solve BLUE/BLUP equations. Suppose " e = 6, giving R = 6* I Now consider G, the covariance matrix for u (the vector of the three sire breeding values). Assume sires are unrelated, so G is diagonal with element " G =sire variance, where " G = " A /4. Suppose " A = 8, giving G G = 8/4*I
1 0 0 1 0 0 0 0 0 1 0 0 V = 8 0 1 0 1 0 0 0 1 0 1 1 0 0 0 0 0 1 0 0 0 0 0 0 1 1 0 0 0 0 1 0 0 0 +6 4 0 1 0 0 0 0 1 0 0 0 0 1 0 0 0 0 1 1 0 0 1 0 0 0 0 1 0 0 0 1 0 0 0 0 0 1 8 0 0 0 0 8 0 0 0 0 0 0 8 0 0 = 0 0 8 0 0 0 0 0 0 8 0 0 0 0 8 Solving, recalling that V = ZGZ T + R β = ( β1 β ) = û = û1 û u 3 giving V 1 = 30 1 - ( ) 1 X T V 1 - - X X T - V 1 y = 1 18 ( ) = GZ T V 1 y X β 4 1 0 0 0 0 1 4 0 0 0 0 0 0 4 1 0 0 0 0 1-4 0 0 0 0 0 0 4 1 0 0 0 0 1 4 = 1 18 ( ) 148 35 1 1
Henderson s Mixed Model Equations y = X! + Zu + e, u ~ (0,G), e ~ (0, R), cov(u,e) = 0, If X is n x p and Z is n x q p x p XT R 1 X Z T R 1 X p x q X T R 1 Z β = XT R 1 y Z T R 1 Z + G 1 û Z T R 1 y q x q The whole matrix is (p+q) x (p+q) β = X T V 1 - X 1 X T V 1 - ( ) y V = ZGZ T + R ) û = GZ T V (y 1 X β Inversion of an n x n matrix
Let s redo our previous example using Henderson s Equation X T R 1 X = 1 ( ) ( ) - 4 0, X T R 1 - T Z = Z T R 1-1 X = 6 0 6 ( 1 ) 1 1 0 1 G 1 - +Z T R 1 - Z = 5 6 1 0 0 0 1 0 0 0 1, X T R 1 y = 1 ( ) - 33, Z T R 1 - y = 1 6 6 6 1 17 1 4 0 1 1 0 1 0 1 1 1 5 0 0 0 0 5 0 1 1 0 0 5 β 1 β û 1 û û 3 = 33 6 1 17 1 Taking the inverse gives β 1 β û 1 û û 3 = 1 18 148 35 1-1 -
The Animal Model, y i = µ + a i + e i Here, the individual is the unit of analysis, with y i the phenotypic value of the individual and a i its BV 1 a 1 X = a 1.., β = µ, u =. G = σa A, 1 a k Where the additive genetic relationship matrix A is given by A ij = # ij, namely twice the coefficient of coancestry Assume R = " e *I, so that R-1 = 1/(" e )*I. Likewise, G = " A *A, so that G-1 = 1/(" A )* A-1.
Henderson s mixed model equation here becomes XT X Z T X X T Z β = XT y Z T Z + λ A 1 - û Z T y This reduces to here $ = " e / " A = (1-h )/h n 1T 1 I + λ A- 1 µ û n yi = y
Suppose our pedigree is Example 1 3 4 5 A = 1 0 0 1/ 0 0 1 0 1/ 1/ 0 0 1 0 1/ 1/ 1/ 0 1 1/4 0 1/ 1/ 1/4 1 Suppose $ =1 (corresponds to h = 0.5). In this case, I + λ A 1 - = 5/ 1/ 0 1-0 1/ 3 1/ 1-1 - 0 1/ 5/ 0 1-1 - 1-0 3 0 0 1-1 - 0 3
Suppose the vector of observations is y = y 1 y y 3 = y 4 y 5 7 9 10 6 9 Here n = 5, % y = 41, and Henderson s equation becomes 5 1 1 1 1 1 µ 1 5/ 1/ 0 1 0 â 1 1 1/ 3 1/ 1 1 â 1 0 1/ 5/ 0-1 â 3 1-1 1 0 3 0 â 4 1 0 1 1 0 3 â 5 = 41 7 9 10 6 9 Solving gives µ = 440 53 8.30, â 1 â â 3 â 4 a 5 = 66/689 4/53 610/689 73/689 381/689 0.961 0.076 0.885 1.06 0.553
More on the animal model Under the animal model y = X! + Za + e a ~ (0," A A), e ~ (0, " e I) BLUP(a) = " A AZ T V -1 (y- X!) Where V = ZGZ T + R = " A ZAZ T + " e I Consider the simplest case of a single observation on one individual, where the only fixed effect is the mean µ, which is assumed known Here Z = A = I = (1), V = " A + " e " A AZ T V -1 = " A /(" A + " e ) = h BLUP(a) = h (y-µ)
More generally, with single observations on n unrelated individuals, A = Z = I n x n V = " A ZAZ T + " e I = (" A + " e ) I " A AZ T V -1 = h I BLUP(a) = " A AZ T V -1 (y- X!) = h (y- µ) Hence, the predicted breeding value of individual i is just BLUP(a i ) = h (y i -µ) When at least some individuals are related and/or inbred (so that A = I) and/or missing or multiple records (so that Z = I), then the estimates of the BV differ from this simple form, but BLUP fully accounts for this
Estimation of R and G The second estimation issue the covariance matrix for residuals R and for breeding values G As we have seen, both matrices have the form " *B, where the variance " is unknown, but B is known For example, for residuals, R = " e*i For breeding values, G = " A*A, where A is given from the pedigree
REML Variance Component Estimation REML = Restricted Maximum Likelihood. Standard ML variance estimation assumes fixed factors are known without error. Results in downward bias in variance estimates REML maximizes that portion of the likelihood that does not depend on fixed effects Basic idea: Use a transformation to remove fixed effect, then perform ML on this transformed vector
Simple variance estimate under ML vs. REML ML = 1 n n i+1 (x x), REML = 1 n 1 n (x x) i+1 REML adjusts for the estimated fixed effect, in this case, the mean With balanced design, ANOVA variance estimates are equivalent to REML variance estimates