Analysis of litter size and average litter weight in pigs using a recursive model

Size: px

Start display at page:

Download "Analysis of litter size and average litter weight in pigs using a recursive model"

Allen Palmer
5 years ago
Views:

1 Genetics: Published Articles Ahead of Print, published on August 4, 007 as /genetics Analysis of litter size and average litter weight in pigs using a recursive model Luis Varona 1, Daniel Sorensen July 18, Area de Producció Animal - Centre UdL-IRTA, 5198 Lleida, Spain Department of Genetics and Biotechnology, Danish Institute of Agricultural Sciences, PB 50, DK-8830 Tjele, Denmark Abstract An analysis of litter size and average piglet weight at birth in Landrace and Yorkshire using a standard two-trait mixed model (SMM) and a recursive mixed model (RMM) is presented. The RMM establishes a one-way link from litter size to average piglet weight. It is shown that there is a one-to-one correspondence between the parameters of SMM and RMM and that they generate equivalent likelihoods. As parametrized in this work, the RMM tests for the presence of a recursive relationship between additive genetic values, permanent environmental effects and specific environmental effects of litter size, on average piglet weight. The equivalent standard mixed model tests whether the covariance matrices of the random effects have or not a diagonal structure. In Landrace, posterior predictive model checking supports a model without any form of recursion, or, alternatively, a SMM with diagonal covariance matrices of the three random effects. In Yorkshire, the same criterion favours a model with recursion at the level of specific environmental effects only, or, in terms of the SMM, the association between traits is shown to be exclusively due to an environmental (negative) correlation. It is argued that the choice between a SMM or a RMM should be guided by the availability of software, by ease of interpretation, or by the need to test a particular theory or hypothesis, that may best be formulated under one parameterization and not the other. Contents 1 INTRODUCTION MATERIAL and METHODS 3.1 Data Models and likelihoods

2 .3 Likelihood identification under the SMM and the RMM Generating an identifiable likelihood model to address the nature of the relationship between traits Prior and Posterior distributions Implementation Model testing RESULTS 1 4 DISCUSSION INTRODUCTION Mixed linear models (Henderson, 1984) are broadly used to predict breeding values and to estimate variance components for traits of interest in livestock and plant breeding and play an important role in evolutionary and theoretical quantitative genetics (Cheverud, 1984; Lande, 1979; Walsh, 003). In genetic improvement programs, the objective of selection includes typically several correlated traits. The classical approach for a multipletrait analysis is to use models posing that the nature of the correlation between response variables (phenotypes) is due to linear associations between unobservables, such as additive genetic values or non-genetic sources, like permanent or temporary environmental effects. Structural equation models represent an extension of the standard linear model to account for links (feedback and/or recursiveness) involving either the phenotypes directly, or latent variables; they are well established in econometrics and sociology (Goldberger, 197; Jöreskog, 1973; Duncan, 1975). These models were discussed in the early genetics literature by Wright (191) but this work has not received much attention in quantitative genetics. Recently, Xiong et al. (004) proposed the use of structural equation models for modeling and identifying genetic networks. In a quantitative genetics context, Gianola and Sorensen (004) studied the consequences of the existence of simultaneous and recursive relationships between phenotypes on genetic parameters and presented statistical methods for inference. A recent application to study the relationship between somatic cell score and milk yield in goats is in de los Campos et al. (006). Here we are concerned with an illustration of the implementation of structural equation models for the analysis of litter size and average litter weight in two breeds of Danish pigs. Litter size is an important trait in pig genetic improvement programmes (Rothschild and Bidanel, 1998) and there is now convincing evidence that it has responded successfully to selection (i.e. Sorensen et al., 000; Noguera et al., 00). Several studies have also reported negative associations between litter size and individual birth weight (Kerr and Cameron, 1995; Roehe, 1999; Sorensen et al., 000). Further, Sorensen et al. (000) report an increase in the proportion of piglets born dead at higher litter size values. Litter size is basically determined by ovulation rate and embryo mortality (Blasco et al., 1995); these processes take place mainly at the early stages of gestation. Piglet weight at

3 birth is mostly determined by growth in late gestation. One could then postulate a one-way causal path establishing an effect of litter size on piglet weight at birth. This specification defines a recursive two-trait system. On the other hand, simultaneity occurs when trait 1 affects trait and vice-versa. The objective of this study is, first, to show that recursive models can be interpreted as alternative parameterizations of standard linear models. We discuss identifiability of dispersion parameters, a topic that is intimately connected to the possibility of drawing inferences from the various parametric forms of a given model. Secondly, we address the statistical problems involved in deciding whether the association between traits is mediated by additive genetic and/or environmental covariances, or via recursion only. The results are illustrated using data on litter size and average litter weight in pigs. MATERIAL and METHODS.1 Data Data from two breeds were analysed: Landrace and Yorkshire. The traits analysed were total number born per litter and average litter weight at birth (referred to as litter size and average piglet weight, hereinafter). The Landrace dataset included 5, 178 litter size records and a pedigree file of 8, 800 individuals. The raw means for litter size and average piglet weight were 14.3 piglets and 1.36 kg., respectively, with standard deviations 3.6 piglets and 0.35 kg. The Yorkshire dataset consisted of 3, 938 litter size records and a pedigree file of 7, 143 individuals. The raw means for litter size and average piglet weight were piglets and 1.30 kg., respectively, with standard deviations 3.40 piglets and 0. kg. The raw correlations between traits were 0.01 in Landrace and 0.43 in Yorkshire. Piglet weight at birth is strongly genetically determined by maternal effects (Grandinson et al., 00), and, as a consequence, average piglet weight (as well as litter size) was considered a trait of the sow.. Models and likelihoods A description is provided of a standard mixed model (SMM) and a recursive mixed model (RMM). The SMM postulates the following linear structures for y Lij (subscript L represents litter size) and y W ij (subscript W represents average piglet weight) of the jth pair of records from female i: y Lij = x Lijb L + u Li + p Li + e Lij, (1a) y W ij = x W ijb W + u W i + p W i + e W ij, (1b) where x kij, (k = L, W ), is the appropriate row of a known incidence matrix, b k is a vector containing effects of herd-years, seasons and parity number, u ki is an additive genetic effect of individual i, p ki is a permanent environmental effect of individual i and e kij is a residual effect (the lengths of the vectors of additive genetic effects and data are different, but to 3

4 simplify notation, it is assumed throughout that after an appropriate relabeling, a common subindex i can be used for y, u and p) The following distributions were assigned to the location parameters: (b L, b W ) N ( (0, 0), I10 5), (u Li, u W i G) N ((0, 0), G), (p Li, p W i P) N ((0, 0), P), (e Lij, e W ij R ij ) N ((0, 0), R ij ). () Above, I is the identity matrix (of appropriate order), G = [ σ ul σ ul u W σ ul u W σ u W ] (3) and P = [ σ pl σ pl p W σ pl p W σ p W ]. (4) A possible approach to modelling the residual term R ij is as follows. Assume that the residual terms for individual piglet weight at birth, that contribute to a given average piglet weight, are conditionally normally and independently distributed, given litter size, ( ) with residual variance σ e W 1 ρ el e W, where ρel e W is the residual correlation between litter size and individual piglet weight at birth. Also assume that the residual terms for litter size are normally distributed with variance σ e L. Then the marginal (with respect to litter size) residual covariance between two individual piglet weight at birth records is ρ e L e W σ e W and the residual covariance matrix is equal to [ ] σ el ρ el e W σ el σ ew R ij = ρ el e W σ el σ ew σ e W nij ( 1 + (nij 1) ρ e L e W σ e L ). (5) The terms σ x m and σ xl x W (x = u, p, e; m = L, W ) in (3), (4) and (5) are variance and covariance components associated with the distribution of additive genetic effects (x = u), permanent environmental effects (x = p) and residual effects (x = e), for litter size and for average piglet weight. In (5), the off-diagonal term ρ el e W σ el σ ew = σ el e W, and n ij is the known number of records contributing to the average piglet weight of female i in parity j. There are three identifiable parameters in the likelihood based on (5). (Rather than assuming conditional independence of individual piglet weight residuals, given litter size, a more general model would include an extra term to account for a residual correlation between individual piglet weight residuals in their conditional distribution. However, this would lead to 4 parameters in (5) and to problems of identifiability in the likelihood). The residual dispersion matrix can also be written as [ ] σ el β el e W σ e L R ij = β el e W σ σ e W e L nij + n ij 1 n ij β e L e W σ, (6) e L 4

5 where β el e W = σe L e W σ e L is the residual regression of individual piglet weight at birth on litter size. Matrix R ij is positive definite since σ e σ L e W n ij (1 + (n ij 1) ρ ) > ρ e L e W σ e L σ e W. The residual covariance matrix (5) for n ij = 1 is denoted by R. The heritabilities for the two traits are h L = h W = σ u L σ u L + σ p L + σ e L, σ u W, (7) σ u W + σ p W + σ e W nij (1 + (n ij 1) ρ ) and the coefficients of correlation are ρ x = σ x L x W σ xl σ xw, x = u, p, e. (8) 77 Writing y ij = (y Lij, y W ij ), equations (1) can be expressed as y ij = X ij b + u i + p i + e ij, (9) where [ x X ij = Lij 0 0 x W ij ], b = (b L, b W ), u i = (u Li, u W i ), p i = (p Li, p W i ), e ij = (e Lij, e W ij ). It follows that the sampling model for y ij is the Gaussian process y ij b, u i, p i, R ij N (X ij b + u i + p i, R ij ) (10) and the contribution to the likelihood by y ij is y ij b, G, P, R ij N (X ij b, G + P + R ij ). (11) The RMM assumes the following linear relationships between the jth pair of records from individual i and location parameters: y Lij = x Lijb L + u Li + p Li + e Lij, (1a) y W ij = λ ( y Lij x Lijb L ) + x W ij b W + u W i + p W i + e W ij, (1b) where λ is the recursive parameter. The first term in the right hand side of (1b) indicates that, according to the model, average piglet weight is linearly related to the deviation of litter size from its group mean, and the strength of this relationship is measured by λ. On the other hand, Gianola and Sorensen (004) postulate recursiveness or simultaneity between traits involving the observed phenotypes, rather than the unobserved deviations. We return to this point in the Discussion. 5

6 The system defined by (1) can be retrieved subtracting the mean on both sides of (9) and multiplying by Λ, to get The reduced form of (13) is which is the same as (9), where and z i = [ zli z W i Λ (y ij X ij b) = Λu i + Λp i + Λe ij = u i + p i + e ij. (13) y ij = X ij b + Λ 1 u i + Λ 1 p i + Λ 1 e ij, (14) ] [ = Λ 1 = [ 1 0 λ 1 z Li z W i λz Li ] 1 = [ 1 0 λ 1 ], ], z i = u i, p i, e ij; z = u, p, e j. It follows from the Gaussian form of the distributions () that where u i G N ((0, 0), G ), p i P N ((0, 0), P ), e ij R ij N ( (0, 0), R ij), (15) G = ΛGΛ, P = ΛPΛ, (16) R ij = ΛR ij Λ. Therefore the sampling model for y ij under the RMM is the Gaussian process y ij b, u i, p i, R ij N ( X ij b + Λ 1 u i + Λ 1 p i, R ij), (17) and the contribution to the likelihood by y ij is ( y ij b, G, P, R ij, λ N X ij b, Λ ( ( 1 G + P + Rij) ) ) Λ 1. (18) If λ were known this is the same likelihood as (11) due to the one-to-one relationship Λ 1 ( G + P + R ij) ( Λ 1 ) = G + P + Rij. (19) However, with unknown λ, the left hand side of (19) contains 10 parameters and the right hand side 9. There is thus an infinite number of matrices involving the left hand side of (19) that satisfy the equality, for any given G + P + R ij. In other words, disregarding identifiability at the level of the mean for both models, the RMM as defined above generates an unidentifiable likelihood. 6

7 Likelihood identification under the SMM and the RMM The subject of identifiability of the SMM and the RMM at the level of the mean is well known (e.g. Searle, 1971) and will not be discussed. In likelihood (11) of the SMM there are 9 dispersion parameters associated with G, P and R ij. When the data include repeated records of related individuals, the 9 parameters is the maximum number of dispersion parameters that can be identified. This saturated model with non-diagonal covariance matrices for u, p and e is labeled SMM upe. The RMM has an extra parameter, and a constraint needs to be introduced to achieve identification. One possible constraint is to assume that the phenotypic covariance on the recursive scale is zero. That is, denoting the mean of y L by µ L, Cov (y L, y W λ (y L µ L )) = Cov (u L, u W ) + Cov (p L, p W ) + Cov (e L, e W ) = Cov (u L, u W λu L ) + Cov (p L, p W λp L ) + Cov (e L, e W λe L ) = σ ul u W + σ pl p W + σ el e W λ ( σ u L + σ p L + σ e L ) = 0. (0) This places the following interpretation on λ: λ = σ u L u W + σ pl p W + σ el e W σ u L + σ p L + σ e L, (1) the phenotypic regression of average litter weight on litter size. Expanding (19) it is easy to show that the constraint (0) guarantees a one-to-one relationship between the dispersion parameters of the RMM and those of the SMM upe and the likelihoods become equivalent. In this setting the RMM subject to the chosen constraint and the unconstraint SMM upe are two different identifiable parameterizations of the same likelihood model. From the point of view of a likelihood analysis, inferences on the recursive scale can be obtained by fitting the SMM upe and transforming the estimated parameters appropriately, and viceversa. However it is not statistically meaningful to ask whether the data have been generated by the SMM upe or by the recursive process described by the RMM subject to constraint (0), since both specifications lead to the same likelihood..4 Generating an identifiable likelihood model to address the nature of the relationship between traits Here we present a statistically meaningful way to address the question whether the data have been generated by a recursive mechanism. The starting point is the SMM defined in (3), (4), (5) and (9) but with a diagonal matrix for all the dispersion structures; that is, [ ] σ G = ul 0 0 σ, () u W [ ] σ P = pl 0 0 σ, (3) p W 7

8 and R ij = [ σ el 0 0 σ e W nij ]. (4) The contribution to the likelihood by the pair of records y ij is the same as in (11), that is, y ij b, G, P, R ij N (X ij b, (G + P + R ij )) (5) with G, P and R ij appropriately interpreted in the light of (), (3) and (4). There are 6 dispersion parameters associated with this model (the covariance matrices of u, p and e have 0 off-diagonal elements), that is labeled SMM 0. The RMM that is developed here postulates that the relationship between data and location parameters is now y ij = X ij b + Λ u u i + Λ p p i + Λ e e ij = X ij b + u i + p i + e ij, (6) where u i, p i and e ij are the same stochastic variables as in the SMM 0 with covariance matrices (), (3) and (4), and with [ ] 1 0 Λ u =, λ u u i = ( u Li, u W i ) = (uli, u Wi + λ u u Li ), (7) and similarly for Λ p, Λ e, p i and e ij. Notice that the Λ s in (6) have the same structure as the Λ 1 in (14). Contrary to the generation of recursion in (13), the recursive model defined by (6) is not obtained by a linear transformation of the SMM and the two models lead to different marginal (with respect to random effects) distributions of the data. The linear structure specified by (6) and (7) has an interesting property: the components of average litter weight (z Wi + λ x z Li ), z = u, p, e, have a term z Wi independent of litter size, and a component λ x z Li dependent on litter size. The sampling model for y ij is y ij b, u i, p i, R ij N ( X ij b + u i + p i, R ij), (8) and the contribution to the likelihood from y ij is y ij b, G, P, R ij N ( X ij b, G + P + R ij), (9) where G = Λ u GΛ u, P = Λ p PΛ p, R ij = Λ e R ij Λ e. This form of recursive (saturated) model is labeled RMM upe. There are 9 identifiable parameters in the dispersion matrix of this likelihood and when λ u = λ p = λ e = 0 (or when Λ u = Λ p = Λ e = I), likelihood (9) is equal to (5). A comparison between RMM upe and RMM with λ u = λ p = λ e = 0, which is labeled RMM 0, is jointly testing whether there is or not recursion at the level of the unobservable additive genetic values, permanent environmental and environmental 8

9 effects. Alternatively, since likelihoods (9) and (11) are equivalent, the comparison can be interpreted as testing whether the covariance matrices of the random effects of the SMM upe have or not a diagonal structure. Indeed, note that G = P = [ σ ul σ ul u W σ ul u W σ u W ] [ ] σ = ul λ u σ u L λ u σ u L σ u W + λ uσ, (30) u L ] [ ] σ = pl λ p σ p L σ p λ W p σ p L σ p W + λ pσ, (31) p L [ ] σ R el λ e σ e L ij = λ e σ σ e W e L nij + λ eσ. (3) e L [ σ pl σ pl p W σ pl p W The lower diagonal element in (3) is very similar to the corresponding element in (6). However when the trait is not average (that is, when n ij = 1), the second term in the lower diagonal element of (6) vanishes. Since, for example, σ ul u W = β ul u W σ u L, by inspection of (30), (31) and (3) with (3), (4) and (6) it is obvious that the β s under the SMM are identical to the λ s in the RMM. We shall also need [ ] [ ] σ R = el σ el e W σ = el λ e σ e L σ el e σ W e λ W e σ e L σ e W + λ eσ, (33) e L which is matrix R ij for n ij = 1. When λ u = λ p = λ e = 0, the above covariance matrices become equal to (), (3) and (4). Under the RMM upe, the heritability of average litter weight for n ij = n T for all i, j, is defined as h W = σ u W + σ p W σ u W.5 Prior and Posterior distributions + σ e W nt + λ eσ e L. (34) For the RMM upe, the joint prior distribution of all parameters is assumed to admit the factorization p (b, u, p, G, P, R ) = p (b) p (u G ) p (p P ) p (G ) p (P ) p (R ), (35) where u is the vector that contains the pairs (u Li, u W i ) for all individuals in the pedigree, and p is the vector that contains all permanent environmental effects (p Li, p W i ) of females with records. The vector b is allocated an improper uniform distribution and vectors u and p are assumed to be normally distributed u G, A N (0, A G ), 9

10 where A is the known additive genetic relationship matrix, and p P N (0, I P ). The matrices G, P and R follow inverse Wishart distributions G G 0, v G IW (G 0, v G ), 13 P P 0, v p IW (P 0, v P ), R R 0, v R IW (R 0, v R ), where the hyperpriors G 0, P 0 and R 0 are known matrices of dimension and the v s are known degrees of freedom. The conditional density for the whole data y = {y ij } is equal to p (y b, u, p, Σ ) = i,j p ( ) y ij b, u i, p i, R ij (36) where Σ is block diagonal with blocks R ij associated with each pair of records y ij. The posterior distribution of the RMM upe, up to a proportionality constant, is obtained by multiplication of the joint prior (35) by (36), giving p (b, u, p, G, P, R y) p (y b, u, p, Σ ) p (u G ) p (p P ) p (G ) p (P ) p (R ) (37) which is also the posterior distribution of SMM upe, the standard two-trait mixed model with non-diagonal covariance matrices associated with all the random effects. Inferences based on RMM upe can be drawn from the posterior distribution (37) and the recursive parameters can easily be constructed from (30), (31) and (33), λ u = σ u L u W σ u L, (38) and λ p = σ p L p W σ a L, (39) λ e = σ e L e W σ e L. (40) A variety of submodels can be generated by either assuming some or all the λ s equal, or by setting some of them equal to zero..6 Implementation If the number of piglets born was the same for all litters, n T, say, then Σ = I R n T, where R n T denotes the residual covariance matrix (3) with n ij replaced by n T. In this case, the structure of p (y b, u, p, Σ ) in (37) simplifies considerably. To take advantage of this simplification in the computations one can augment the piglet weight data with the so-called missing single records yw mis, so that n ij = n T for all ij, where n T is the largest 10

11 number of records contributing to average piglet weight in the dataset. This technique is known as data augmentation (Tanner and Wong, 1987) and the general idea is as follows. Given observed data y and a model indexed by parameters θ, the posterior distribution p (θ y) is proportional to p (y θ) p (θ). When the model is fitted using McMC, drawing samples from this posterior distribution may be computationally demanding. However, it may be easy to draw samples from p ( θ y, y mis) p ( y, y mis θ ) p (θ) p ( y mis, θ y ), where y mis stands for the missing data. The strategy requires generating y mis from [y mis θ, y]. In the present case, yw mis is generated from N ( E ( yw sim y W, y L, θ ), V ar ( yw sim y W, y L, θ )) where θ is the vector of all parameters indexing the model. After a little experimentation, a length of the Gibbs chain equal to one million was chosen. In Table 1 and Table we report Monte Carlo standard errors of estimates of various posterior means to give an idea of the accuracy of the Monte Carlo computations..7 Model testing Checking for systematic differences between a given model and the observed data discloses the quality of fit of the posed model. An attractive way to study the fit of a model is to use posterior predictive model checking (Gelman et al., 1996, 004). The approach is simple to implement, it is flexible and provides a graphical exploration of residual-type diagnostics. The key feature is the construction of the so-called discrepancy measures that describe particular putative features of the data that the model may fail to account for. To be more specific, consider testing for the presence of recursion at the level of permanent environmental effects. Absence or presence of recursion at the level of additive genetic effects or residuals is studied in a similar way. Let (y Lij, y W ij ), i = 1,,..., denote observed data and for parity j = 1, define the discrepancy measure i b p = (y W i1 x W i1 b W ) (p Li p L ) i (p Li p L ), (41) the change of average piglet weight per unit change of permanent environmental effect associated with litter size. In (41), the sum is over all females with first parity records, and p L is the average p Li across females. If the observed data had been generated under RMM 0 one would expect a value of b p in the vicinity of zero. If parameters were known, one could compare the observed value of b p to its sampling distribution, with a significant difference indicating model failure with respect to the discrepancy measure. This is equivalent to simulating data (y rep Li1, yrep W i1 ), i = 1,,..., under the RMM 0, if parameters were known, computing b rep p in each replicate, and deciding whether the observed value of b p is an 11

12 atypical value in the distribution of b rep p. Specifically and in the current context, one is testing whether the null model RMM 0 is failing to account for a recursive mechanism present in the observed data. Since parameters are not known, we use the idea of posterior predictive model checking (Gelman et al., 1996, 004) and consider the posterior predictive distribution of b p b rep p. This distribution reflects uncertainty about the parameters that enter in the discrepancy measure (41) as well as sampling variation. Notice that the parameters are inferred from the null model RMM 0 that assumes absence of recursion. The presence of recursion, not accounted for by model RMM 0 would result in a distribution of b p b rep p shifted from zero. This can also be construed as a test for a non-zero covariance between permanent environmental effects affecting litter size and those affecting average piglet weight. The exploration of recursion at the level of additive genetic effects and of residuals involves constructing b u b rep u and b e b rep e along the same lines. Often the diagnostic results of posterior predictive model checking are apparent visually, as is the case in the present work. Other times it can be useful to compute a posterior predictive p value to see whether the results could have arisen by chance under the null model (Gelman et al., 1996, 004). These can be very easily computed from the McMC output. 3 RESULTS The familiar parameterization in a two-trait mixed model analysis is based on the saturated SMM upe. We therefore show in Table 1 Monte Carlo estimates of posterior means and standard deviations for chosen parameters based on the SMM upe for Landrace and Yorkshire. Due to the symmetry of all the posterior distributions referred to below, standard deviations rather than posterior intervals are reported. The figures in the table indicate that there is a striking difference between the breeds, especially for the size and sign of the correlation coefficients. For Landrace, a value in the vicinity of zero for all the three correlation coefficients is in an area of high probability mass. For Yorkshire, only for the environmental correlation is the value of zero excluded in the 95% posterior interval. Table shows Monte Carlo estimates of posterior means and standard deviations for chosen parameters based on the RMM upe parameterization for Landrace and Yorkshire. There is a one-to-one relation between the parameters of the RMM upe and those of the SMM upe. The conclusions based on the recursive parameters are the same as those based on the correlation coefficients from Table 1. Figures 1 and show the posterior predictive distribution of discrepancies b u b rep u, b p b rep p and b e b rep e for Landrace and Yorkshire generated under RMM 0. For Landrace, the Monte Carlo estimates of the posterior means (posterior standard deviations) for the three discrepancy measures are (0.80), (0.00) and (0.003), reflecting lack of recursion at all levels. There is therefore lack of evidence suggesting that there is conflict between the data and the null model RMM 0, with respect to the feature described by the discrepancy measure. For Yorkshire, the corresponding figures are (0.05), 1

13 (0.019) and (0.00), supporting recursion at the level of the residual term only, a feature of the data that the null model fails to account for. Table 1: Monte Carlo estimates of posterior means of chosen parameters (posterior standard deviations in brackets) based on SMM upe. h : heritability with subscripts L: litter size, W: average piglet weight for n T = 5 individuals; ρ : correlations with subscripts u, p and e involving additive genetic, permanent and environmental effects; L: Landrace; Y : Yorkshire; MSE: Monte Carlo standard error h L h W ρ u ρ p ρ e L.08(.0).4(.06).16(.19).07(.3).01(.06) Y.07(.0).9(.04) -.1(.16) -.4(.44) -.73(.03) MSE Table : Monte Carlo estimates of posterior means of chosen parameters (posterior standard deviations in brackets) based on RMM upe. h : heritability with subscripts L: litter size, W : average piglet weight for n T = 5 individuals; λ with subscripts u, p and e involving additive genetic, permanent and environmental effects; L: Landrace; Y : Yorkshire; MSE: Monte Carlo standard error h L h W λ u λ p λ e L.08(.0).4(.06).0(.06).050(.189).000(.003) Y.07(.0).(.03) -.08(.0) -.08(.067) -.034(.003) MSE DISCUSSION In a recent article, Gianola and Sorensen (004) discussed the use of simultaneous equation models to analyse and interpret systems of traits that may be subject to feed-back and recursive relationships. Here we report an application of a recursive mixed model for the analysis of litter size and average piglet weight in two breeds of Danish pigs. The recursive relationship defined by model (1) establishes that average piglet weight is linearly related to the deviation of litter size from its group mean. The traditional specification, like in Gianola and Sorensen (004), postulates that average piglet weight is linearly related to litter size, rather than to its deviation from the mean. The system defined by (1) is free of some identifiability problems at the level of parameters entering the mean that are common to both traits. It seems also appealing that deviations from a mid value, rather 13

14 Figure 1: (Landrace) Estimates of posterior distributions (under RMM 0 ) of discrepancies b u b rep u (left), b p b rep p (center) and b e b rep e (right) Figure : (Yorkshire) Estimates of posterior distributions (under RMM 0 ) of discrepancies b u b rep u (left), b p b rep p (center) and b e b rep e (right) then absolute values, exert an influence on average piglet weight. Ultimately, these are two different models and a way of discerning between them is by computing their posterior probabilities, in the light of the data. This was not studied in the present work. The saturated recursive model used in this work has 9 identifiable dispersion parameters. A more parsimonious alternative with 7 parameters postulates that the three recursive parameters λ u, λ p and λ e are equal. In general, the recursive parameterization can be an attractive approach to arrive at parsimonious models, especially in analyses involving many traits. Special attention has been given here to identifiability at the level of the likelihood, despite the fact that inferences were based on posterior distributions. In principle, a Bayesian analysis with a non-identifiable likelihood is possible if proper prior distributions are specified for all the parameters (Bernardo and Smith, 1994). In fact, depending on the prior distributions, a Bayesian analysis with a non-identifiable likelihood may result in Bayesian learning, in the sense that the posterior and prior distributions of the nonidentified parameters are different (see, for example, Sorensen and Gianola (00), page 543). However an McMC implementation of a Bayesian model with barely identified parameters can lead to poor inferences due to extremely slow convergence and very short effective chain lengths. Achieving identifiability of parameters at the level of the likelihood 14

15 will always lead to Bayesian learning and in general to better behaviour of the McMC algorithm. However, there may be situations where the constraints needed for identifiability may restrict inferences, and an unconstrained model using a careful prior specification could be considered instead. The analyses of Yorkshire and Landrace data lead to markedly different inferences; we are not disturbed by this result. The breeds are distinct in various behavioural, physiological and anatomical traits, as well as in outward appearance. From a breeding point of view, in Landrace, changes in litter size should not lead to associated changes in average litter weight. In Yorkshire, a change in environmental deviation of litter size of one unit (for example, due to culling) should result in a temporary reduction of average piglet weight equal to 36 gr. In neither breed, but especially in Landrace, should successful selection for litter size have a direct effect on average piglet weight. There is a rich literature dealing with various transformations of the data or reparameterizations that can lead to computationally more tractable analyses of the multivariate linear model (for example, Meyer, 1987; Quaas, 1988; Jensen and Mao, 1988; Ducrocq and Besbes, 1993; Groeneveld, 1994; Thompson et al., 1994; Gelfand et al., 1995; Ducrocq and Chapuis, 1997). While the recursive model can be viewed in this framework, the focus of the present work is that a recursive model whose likelihood is identifiable is an alternative parameterization of a standard mixed model. The two models provide different interpretation of the results, but are statistically equivalent. There is a one-to-one relationship between the parameters entering the likelihood in both models. This applies also in principle, to simultaneous equation models, which in general require a larger number of constraints to achieve identifiability. However, it may not always be easy to define the equivalent standard model, say, to a model involving complex simultaneous and recursive relationships among many traits. Ultimately, the choice of parameterization should be guided by the availability of software (in simple situations like in the present work), by ease of interpretation, or by the need to test a particular theory or hypothesis. The mathematical formulation of such a hypothesis may be more naturally accomplished using one parameterization and not the other. Acknowledgement: We are grateful to Gustavo de los Campos, Robin Thompson and Daniel Gianola for discussions and comments on an earlier version of this paper. References Bernardo, J. M. and A. F. M. Smith (1994). Bayesian Theory. Wiley. Blasco, A., J. P. Bidanel, and C. Haley (1995). Genetics and neonatal survival. In M. A. Varley (Ed.), The Neonatal Pig. Development and Survival, Wallingford, Oxon, UK, pp CAB International. Cheverud, J. M. (1984). Quantitative genetics and developmental constraints on evolution by selection. Journal of Theoretical Biology 110,

16 de los Campos, G., D. Gianola, P. Boettcher, and P. Moroni (006). A structural equation model for describing relationships between somatic cell score and milk yield in dairy goats. Journal of Animal Science 84, Ducrocq, V. and B. Besbes (1993). Solution of multiple trait animal models with missing data on some traits. Journal of Animal Breeding and Genetics 110, Ducrocq, V. and H. Chapuis (1997). Generalising the use of the canonical transformation for the solution of multivariate mixed model equations. Genetics, Selection, Evolution 9, Duncan, O. D. (1975). Introduction to Structural Equation Models. Academic Press, San Diego, CA. Gelfand, A. E., S. K. Sahu, and B. P. Carlin (1995). Efficient parameterization for normal linear mixed models. Biometrika 8, Gelman, A., J. B. Carlin, H. S. Stern, and D. B. Rubin (004). Bayesian Data Analysis. Chapman and Hall. Gelman, A., X. L. Meng, and H. Stern (1996). Posterior predictive assessment of model fitness via realized discrepancies (with discussion). Statistica Sinica 6, Gianola, D. and D. Sorensen (004). Quantitative genetic models describing simultaneous and recursive relatiosnhips between phenotypes. Genetics 167, Goldberger, A. S. (197). Structural equation methods in the social sciences. Econometrica 40, Grandinson, K., M. S. Lund, L. Rydhmer, and E. Strandberg (00). Genetic parameters for piglet mortality traits crushing, stillbirth and total mortality, and their relation to birth weight. Acta Agricultura Scandinavica, Series A, Animal Science 5, Groeneveld, E. (1994). A reparameterization to improve numerical optimization in multivariate REML (co)variance component estimation. Genetics, Selection, Evolution 6, Henderson, C. R. (1984). Applications of Linear Models in Animal Breeding. University of Guelph. Jensen, J. and I. L. Mao (1988). Transformation algorithms in analysis of single trait and of multiple trait models with equal design matrices and one random factor per trait: a review. Journal of Animal Science 6, Jöreskog, K. G. (1973). A general method for estimating a linear structural equation system. In A. S. Goldberger and O. D. Duncan (Eds.), Structural Equation Models in the Social Sciences, pp New York: Seminar. 16

17 Kerr, J. C. and N. D. Cameron (1995). Reproductive performance of pigs selected for components of efficient lean growth. Animal Science 60, Lande, R. (1979). Quantitative genetic analysis of multivariate evolution, applied to brain:body allometry. Evolution 33, Meyer, K. (1987). A note on the use of an equivalent model to account for relationships between animals in estimating variance components. Journal of Animal Breeding and Genetics 104, Noguera, J. L., L. Varona, D. Babot, and J. Estany (00). Multivariate analysis of litter size for multiple parities with production traits in pigs: II. response to selection for litter size and correlated responses to production traits. Journal of Animal Science 80, Quaas, R. L. (1988). Transformed mixed model equations: a recursive algorithm to eliminate A 1. Journal of Dairy Science 7, Roehe, R. (1999). Genetic determination of individual birthweight and its association with sows productivity traits using Bayesian analysis. Journal of Animal Science 77, Rothschild, M. F. and J. P. Bidanel (1998). Biology and genetics of reproduction. In M. F. Rothschild and A. Ruvinsky (Eds.), The Genetics of the Pig, Wallingford, Oxon, UK, pp CAB International. Searle, S. R. (1971). Linear Models. Wiley. Sorensen, D. and D. Gianola (00). Likelihood, Bayesian, and MCMC Methods in Quantitative Genetics. Springer-Verlag. Sorensen, D., A. Vernersen, and S. Andersen (000). Bayesian analysis of response to selection: A case study using litter size in Danish Yorkshire pigs. Genetics 156, Tanner, M. A. and W. Wong (1987). The calculation of posterior distributions by data augmentation. Journal of the American Statistical Association 8, Thompson, R., R. E. Crump, J. Juga, and P. M. Visscher (1994). Estimating variances and covariances for bivariate animal models using scaling and transformation. Genetics, Selection, Evolution 7, Walsh, B. (003). Evolutionary quantitative genetics. In D. J. Balding, M. Bishop, and C. Cannings (Eds.), Handbook of Statistical Genetics, Volume I, Chichester, UK, pp John Wiley. 17

18 Wright, S. (191). Correlation and causation. Journal of Agricultural Research 10, Xiong, M., J. Li, and X. Fang (004). Identification of genetic networks. Genetics 166,

Alternative implementations of Monte Carlo EM algorithms for likelihood inferences

Genet. Sel. Evol. 33 001) 443 45 443 INRA, EDP Sciences, 001 Alternative implementations of Monte Carlo EM algorithms for likelihood inferences Louis Alberto GARCÍA-CORTÉS a, Daniel SORENSEN b, Note a