Estimation of Variance Components in Animal Breeding

Size: px

Start display at page:

Download "Estimation of Variance Components in Animal Breeding"

Calvin Dwain Skinner
6 years ago
Views:

1 Estimation of Variance Components in Animal Breeding L R Schaeffer July 19-23, 2004 Iowa State University Short Course

2 2

3 Contents 1 Distributions 9 11 Random Variables 9 12 Discrete Random Variables Binomial Distribution Poisson Distribution General Matrix Results Expectations Variance-Covariance Matrices Continuous Distributions Uniform Distribution Normal Distribution Beta Distribution Gamma Distribution Chi-Square Distribution The Wishart Distribution The t-distribution 22 3

4 4 CONTENTS 148 The F-distribution Quadratic Forms 23 2 Building Blocks Basic Blocks Basic Model Mixed Model Equations Unbiased Estimation of Variances Variances of Quadratic Forms Variance of A Ratio of Variance Estimates 33 3 Likelihood Methods The Likelihood Function Maximum Likelihood The EM Algorithm Restricted Maximum Likelihood Derivative Free REML First Derivatives and EM Algorithm Second Derivatives, Average Information Animal Models 51 4 Bayesian Methods The Joint Posterior Distribution Fully Conditional Posterior Distributions 57

5 CONTENTS 5 43 Computational Scheme Round Round Burn-In Periods and Estimates Influence of the Priors Long Chain or Many Chains? 64 5 Data Readiness Pedigree Files Achieving Chronological Order Inbreeding Calculations Data File Control Files 72 6 Software Packages DMU-Jensen and Madsen Control File Comments on DMU VCE - Groeneveld pfile COMMENT DATA MODEL COVARIANCE 84

6 6 CONTENTS 626 SYSTEM OUTPUT END MTDFREML MTDFNRM MTDFPREP MTDFRUN 89 7 Maternal Effects Models Introduction Simulation of Records Mixed Model Equations Estimation of Covariances EM-REML Bayesian Estimation Warnings 99 8 Random Regression Models Introduction Multiple Trait Approach Covariance Functions and Orthogonal Polynomials Reduced Orders of Fit Basic Structure of RRM Example Data Analysis By RRM 112

7 CONTENTS 7 85 Estimation of Parameters EM-REML Bayesian Estimation Multiple Trait Models Introduction Models Example Data The MME Estimation of Covariance Matrices Sampling Solutions to MME Sampling New Covariance Matrices Non-Additive Genetic Models Introduction The Model Example Data Genetic Relationships HMME Estimation of Variances 139

8 8 CONTENTS

9 Chapter 1 Distributions 11 Random Variables A random variable is a real-valued function which exists within the domain of a defined sample space A random variable is conventionally designated by a capital letter, say Y, and the value of Y is denoted by a small letter, say y The sample space is the range of values that y can be assigned The small letter y is used in these notes for Y and y Random variables can be either discrete or continuous A discrete random variable can assume only a finite number of distinct values, such as zero or one, for example A continuous random variable can assume any value within the range of the sample space Random variables usually follow a distribution function or probability density which can be described mathematically in most cases The distribution could have a number of parameters associated with it, such as a mean and variance, and all parameters will generally be designated collectively as a vector θ The goal of statistical analysis is usually to estimate elements of θ from the observed random variables, y If y represents a random variable from some distribution, then the expectation of y is denoted by E(y) = µ where E() means expected value The expected value of y depends on its distribution and range of allowable values The expected value is known as the mean, or the first moment of the distribution Also of importance is the variance of y that could be expected with that distribution The variance of a scalar random variable, y, is defined as V ar(y) = E(y 2 ) E(y)E(y) = E(y E(y)) 2 and is commonly represented as σ 2 y Variances are known as the second moment of the distribution Variances are always greater than zero 9

10 10 CHAPTER 1 DISTRIBUTIONS 12 Discrete Random Variables In the general discrete case, the probability that Y takes the value y, is defined as the sum of the probabilities of all sample points that are assigned the value y That is, P (Y = y) = p(y) The probability distribution of Y lists the probabilities for each value of y Suppose Y can take on four values with the following probabilities: y p(y) 0 1/8 1 1/4 2 1/4 3 3/8 Any other values of y are assumed to have zero probability, and the sum of all probabilities is 1, as required of a valid distribution function Various kinds of probabilities may be calculated from this table P r(y = 0) = p(0) = 1 8 P r(y = 0 or Y = 2) = p(0) + p(2) = 3 8 P r(y = 1 or Y = 2) = p(1) + p(2) = 1 2 The cumulative distribution function is F (y) = P r(y y), for example, F (0) = p(0), F (1) = p(0) + p(1) = 3 8, F (2) = p(0) + p(1) + p(2) = 5 8, F (3) = p(0) + p(1) + p(2) + p(3) = 1 Also, if y 1 y 2, then F (y 1 ) F (y 2 ) Finally, P r(y 1 < Y y 2 ) = P r(y y 2 ) P r(y y 1 ) = F (y 2 ) F (y 1 )

11 12 DISCRETE RANDOM VARIABLES 11 The expected value of a discrete random variable is defined as E(y) = y y p(y) For the example above, E(y) = (0(1/8) + 1(1/4) + 2(1/4) + 3(3/8)) = 1875 Similarly, the expected value of a function of Y, say g(y ) is given by E(g(y)) = y g(y) p(y) Suppose g(y) = y 2, then E(y 2 ) = (0(1/8) + 1(1/4) + 4(1/4) + 9(3/8)) = 4625 The variance of discrete random variable Y is For the example, V ar(y) = E(y E(y)) 2 = E(y 2 ) [E(y)] 2 V ar(y) = ( 1875) 2 (1/8) + ( 875) 2 (1/4) + (125) 2 (1/4) + (1125) 2 (3/8) = 4625 (1875) 2 = Binomial Distribution A common discrete distribution is the binomial distribution A binomial event can take on only two possible outcomes, success or failure, zero or one, heads or tails, diseased or not diseased, and so on The probability of one outcome is q and the probability of the other outcome is 1 q Trials, or a succession of binomial events, are assumed to be independent The random variable Y is the number of successes The probability distribution is given by ( ) n p(y) = q y (1 q) n y, y for y = 0, 1, 2,, n and 0 q 1 The number of trials is n The expected value and variance of the binomial distribution are E(y) = n q V ar(y) = n q (1 q)

12 12 CHAPTER 1 DISTRIBUTIONS 122 Poisson Distribution A Poisson probability distribution provides a good model for the probability distribution of the number Y of rare events that occur in a given space, time, volume, or any other dimension, and λ is the average value of Y In dairy cattle breeding, for example, the number of quality embryos produced by a cow during superovulation can range from 0 to 20 (or more), but the average might be only 3 or 4 The Poisson probability distribution is given by p(y) = λy exp λ, y! for y = 0, 1, 2, and λ > 0 Also, The mean and the variance are equal E(y) = λ V ar(y) = λ 13 General Matrix Results Extending results from scalar random variables to vectors of random variables, also called a random vector variable, then the following general results apply Vectors are denoted by boldfaced small letters 131 Expectations Let y 1 and y 2 be two random vector variables, then for i = 1 or 2, then µ i1 µ i2 E(y i ) = µ i =, µ in for a vector of length n If c is a scalar constant, then Similarly, if C is a matrix of constants, then Finally, E(cy i ) = cµ i E(Cy i ) = Cµ i E(y 1 + y 2 ) = E(y 1 ) + E(y 2 ) = µ 1 + µ 2

13 14 CONTINUOUS DISTRIBUTIONS Variance-Covariance Matrices Let y be a random vector variable of length n, then the variance-covariance matrix of y is as follows: V ar(y) = E(yy ) E(y)E(y ) σy 2 1 σ y1 y 2 σ y1 y n σ y1 y = 2 σy 2 2 σ y2 y n = V σ y1 y n σ y2 y n σ 2 y n A variance-covariance (VCV) matrix of a random vector contains variances on the diagonals and covariances in the off-diagonals A VCV matrix is square, symmetric and should always be positive definite or positive semi-definite, ie all of the eigenvalues must be positive Another name for VCV matrix is dispersion matrix or (co)variance matrix Let C be a matrix of constants conformable for multiplication with the vector y, then V ar(cy) = E(Cyy C ) E(Cy)E(y C ) = CE(yy )C CE(y)E(y )C = C ( E(yy ) E(y)E(y ) ) C = CV ar(y)c = CVC If there are two sets of functions of y, say C 1 y and C 2 y, then Cov(C 1 y, C 2 y) = C 1 VC 2 Similarly, if y and z represent two different random vectors, possibly of different orders, and if the (co)variance matrix between these two vectors is W, then Cov(C 1 y, C 2 z) = C 1 WC 2 14 Continuous Distributions Consider measuring the amount of milk given by a dairy cow at a particular milking Even if a machine of perfect accuracy was used, the amount of milk would be a unique point on a continuum of possible values, such as kg of milk As such it is mathematically impossible to assign a nonzero probability to all of the infinite possible points in the continuum Thus, a different method of describing a probability distribution of a continuous random variable must be used The sum of the probabilities (if they

14 14 CHAPTER 1 DISTRIBUTIONS could be assigned) through the continuum is still assumed to sum to 1 The cumulative distribution function of a random variable is F (y) = P (Y y), for < y < As y approaches, then F (y) approaches 0 As y approaches, then F (y) approaches 1 Thus, F (y) is said to be a nondecreasing function of y If a < b, then F (a) < F (b) If F (y) is the cumulative distribution function of Y, then the probability density function of Y is given by p(y) = F (y) = F (y), y wherever the derivative exists Always for p(y) being a probability density function, Conversely, F (y) = p(y) y = 1 y p(t) t The expected value of a continuous random variable y is E(y) = y p(y) y provided that the integral exists If g(y) is a function of y, then E(g(y)) = provided that the integral exists Finally, g(y) p(y) y V ar(y) = E(y 2 ) [E(y)] Uniform Distribution The Uniform Distribution is one of the basic distributions in statistics The primary application is in the generation of random numbers from other distributions y U(a, b) p(y) = U(y a, b) where b is greater than a In a uniform distribution, every value between a and b has an equal probability of existing, ie p(y) = 1/(b a) Usually b = 1 and a = 0 so that

15 14 CONTINUOUS DISTRIBUTIONS 15 the observed y are between 0 and 1, representing probabilities The expected value and variance of this distribution are E(y) = = E(y 2 ) = = b a b a y p(y) y y/(b a) y = 1 2 y2 /(b a) b a = 1 2 b2 /(b a) 1 2 a2 /(b a) = 1 2 (b2 a 2 )/(b a) = (a + b)/2, b a b a y 2 p(y) y y 2 /(b a) y = 1 3 y3 /(b a) b a = 1 3 (b3 a 3 )/(b a) = 1 3 [(a + b)2 ab] Then V ar(y) = E(y 2 ) E(y) 2 = (4[(a + b) 2 ab] 3(a + b) 2 )/12 = (b a) 2 /12 Uniform Random Number Generators George Marsaglia of Florida State University developed a uniform random number generator that has passed 18 different tests, and which has a long cycle time (ie the number of calls to the subroutine before the sequence of random numbers begins to repeat itself) The strategy utilized is a multiply-with-carry scheme To give an idea of how this scheme works, start with x n 1 = , which is called the seed The next number would be generated by This would be followed by x n = 672 [x n 1 (x n 1 /1000) 1000)] + x n 1 /1000 = 672 [456] = x n = =

16 16 CHAPTER 1 DISTRIBUTIONS The number 672 is carefully chosen, and there is a process for doing this The cycle time, or period, of the above generator is 336,000 Thus, after 336,000 operations, the numbers begin again in the same sequence This is an example of how multiply-with-carry works Marsaglia s proposed generator makes use of 8 numbers at a time rather than just the previous random number, as well as two sequences of numbers rather than just one The period for this generator is which is a very, very large number The two sequences are x n = 12013x n x n x n x n x n x n x n x n 1 + and carry mod 2 16, x n = 9272x n x n x n x n x n x n x n x n 1 + carry mod 2 16 The coefficients were carefully chosen Each sequence provides a 16 bit integer number, and therefore, combining the two (by concatenation) gives a 32-bit random integer Example Usage The uniform distribution random number generator can be used to simulate a discrete random variable (such as that given earlier - Section 12) The steps of a simple program to assign y a value based upon the probabilities given earlier, would be as follows: call uniform(p) y = 0 if(pgt0125)y=1 if(pgt0375)y=2 if(pgt0625)y=3 142 Normal Distribution The normal distribution has been the most commonly assumed distribution in animal breeding and statistical genetics The properties of the distribution are well known; computations for estimators are relatively easy; and normal distributions suffice for a majority of situations A scalar random variable y has a normal probability distribution if and only if p(y) = (2π) 5 σ 2 exp( 5(y µ) 2 σ 2 )

17 14 CONTINUOUS DISTRIBUTIONS 17 for < x < +, where σ 2 is the variance of y and µ is the expected value of y For a random vector variable, y, the multivariate normal density function is p(y) = (2π) 5n V 5 exp( 5(y µ) V 1 (y µ)) denoted as y N(µ, V) where V is the variance-covariance matrix of y Note that the determinant of V must be positive, otherwise the density function is undefined Random Normal Deviates There have been several ways to generate random variables from a normal distribution Let r 1 and r 2 be values from a uniform distribution generator, between 0 and 1, then two random normal deviates can be computed as and y 1 = ( 2 ln r 1 ) 5 cos 2πr 2, y 2 = ( 2 ln r 1 ) 5 sin 2πr 2 The quality of the results depend on the quality of the uniform distribution generator Another algorithm again uses two values from a uniform distribution generator, say r 1 and r 2, then v 1 = r 1 + r 1 10, and v 2 = r 2 + r 2 10, if q is less than 1 and greater than 0, then q = v 1 v 1 + v 2 v 2, f = ( 2 ln q/q) 5 y 1 = v 1 f, y 2 = v 2 f 143 Beta Distribution Beta distributions may be assumed for random variables such as heritability, probability, or gene frequency all of which are limited to a value between zero and 1 p(y a, b) = Cy a 1 (1 y) b 1, for y [0, 1], where the constant of integration is C = Γ(a + b)/(γ(a)γ(b)),

18 18 CHAPTER 1 DISTRIBUTIONS and a and b are greater than 0 The gamma function, Γ( ) is Γ(α) = 0 y α 1 exp y y, for α > 0 When α = 1, then Γ(1) = 1 For α being an integer and greater than 1, then The expectation of a beta variable is Γ(α) = (α 1)Γ(α 1) = (α 1)! E(y a, b) = C 1 0 [ y y a 1 (1 y) b 1] y, which simplifies to The variance of a beta variable is E(y a, b) = a a + b V ar(y a, b) = ab (a + b) 2 (a + b + 1) 144 Gamma Distribution The Gamma Distribution has the following form; p(y a, b) = Cy a 1 exp by, for y being greater than 0 Variance components are always supposed to be greater than 0, thus a gamma distribution may be appropriate The constant of integration is C = ba Γ(a), and both a and b are greater than zero The mean of the distribution is and the variance of the distribution is E(y a, b) = a b, V ar(y a, b) = a b 2 The coefficient of variation (standard deviation divided by the mean) is equal to a 5 The gamma distribution is the parent distribution for two special cases When a = 1, then the gamma distribution becomes the Exponential Distribution, ie, p(y b) = b exp by,

19 14 CONTINUOUS DISTRIBUTIONS 19 for y greater than zero The other case is when a = v/2 and b = 05, where v is commonly known as the degrees of freedom and is greater than zero Then and This is the Central Chi-square distribution p(y v) = Cy (v/2) 1 exp y/2, C = (1/2)v/2 Γ(v/2) 145 Chi-Square Distribution In the estimation of variance components, quadratic forms of y are needed If y N(0, I), then y y χ 2 n, where χ 2 n is a central chi-square distribution with n degrees of freedom and n is the length of the random vector variable y The mean of the central chi-square distribution is n, and the variance is 2n The probability distribution function of s = y y is p(s n) = (s) (n/2) 1 exp 05s/[2 05n Γ(05n)], for s > 0 If y N(µ, I), then y y χ 2 n,λ where λ is the noncentrality parameter which is equal to 5µ µ The mean of a noncentral chi-square distribution is n + 2λ and the variance is 2n + 8λ If y N(µ, V), then y Qy has a noncentral chi-square distribution only if QV is idempotent, ie QVQV = QV The noncentrality parameter is λ = 5µ QVQµ and the mean and variance of the distribution are tr(qv) + 2λ and 2tr(QV) + 8λ, respectively If there are two quadratic forms of y, say y Qy and y Py, and both quadratic forms have chi-square distributions, then the two quadratic forms are independent if QVP = 0 Independence of quadratic forms is necessary for the construction of valid tests of hypotheses This property is not required for estimation of variances and covariances Random Chi-Square Generator One way to generate a central Chi-square variate with n degrees of freedom is to generate a vector of length n of random normal deviates, then sum the squares of these deviates Thus, a very good random normal deviate generator would be necessary Also, the computing time to generate one Chi-square variate would depend on n, the number of random normal deviates to be generated In order to save time for large n, a Chi-square variate is generated using a random gamma distribution variate, which requires two uniform variates, r 1 and r 2, then

20 20 CHAPTER 1 DISTRIBUTIONS 1 call uniform(r1) call uniform(r2) V1 = r1 + r1-1 V2 = r2 + r2-1 f = V1*V1 + V2*V2 if ( fgt10)go to 1 if (V1eq00)then c=00 else c=v2/v1 endif anm = n/2-1 s = dsqrt(anm+anm+10) y = s*c + anm if(yle00)go to 1 temp = anm*dlog(y/anm)-s*c if(dabs(temp)gt85)then go to 1 else e = (10+c*c)*dexp(temp) call uniform(r3) if(r3gte)go to 1 endif c at this point y is the gamma variate return The Chi-square variate with n degrees of freedom is then derived as If(n is even) then call gamma(iseed,n,gam) chi = gam + gam else (n is odd) m = n-1 call gamma(iseed,m,gam) chi = gam + gam call normal(znorm) chi = chi + znorm*znorm endif Chi-square variates are needed in Gibbs sampling to obtain new sample values of a variance component

21 14 CONTINUOUS DISTRIBUTIONS The Wishart Distribution The Wishart Distribution is akin to a multivariate Chi-square Distribution An entire matrix is envisioned of which the diagonals have a Chi-square distribution, and the offdiagonals have a built-in correlation structure The resulting matrix is positive definite 1 To generate a matrix having a Wishart distribution, let U equal a matrix with q rows and m columns (and q > m), then W = U U is an m by m positive definite matrix 2 Perform a Cholesky decomposition of W, so that and T is a lower triangular matrix W = TT, 3 Fill an m by m matrix, Z, with random normal deviates, and fill a vector, V, of length m with the square roots of random Chi-square variates, such that the first element has q 1 degrees of freedom, the second has q 2 df, and so on, and the last element has q m df 4 Now form a matrix B such that the diagonals are B i,i = v i v i, for i = 1 and for i > 1 The offdiagonals of B are i 1 B i,i = v i v i + z i,j z i,j j=1 B i,j = B j,i = z i,j v i, for j = 2 to m and i = 1, and for i > 1 i 1 B i,j = B j,i = z i,j v i + z k,i z k,j, for j = 2 to m and i = 2, (j 1) 5 Finally, calculate S = TBT k=1 The matrix S is a random Wishart matrix based upon the relationships in W Note that the off-diagonals in B should be close to zero, on average

22 22 CHAPTER 1 DISTRIBUTIONS 147 The t-distribution The t-distribution is based on the ratio of two independent random variables The first random variable follows a univariate normal distribution, and the second random variable follows a central chi-square distribution Let y N(0, 1) and s χ 2 n with y and s being independent, then y (s/n) 5 t n The mean of a t-distribution is the mean of the y variable, and the variance is n/(n 2), and n is the degrees of freedom of the distribution As n becomes larger, the t-distribution becomes very similar to the normal distribution in shape A t-distribution with small degrees of freedom would allow more observations to occur in the tails of the distribution, and therefore would look like a squashed normal distribution 148 The F-distribution A common distribution used in the testing of hypotheses is the F-distribution A central F-distribution is based on the ratio of two independent central chi-square variables Let s χ 2 n and v χ 2 m with s and v being independent, then (s/n) (v/m) F n,m The mean of the F-distribution is m/(m 2) and the variance is 2m 2 (n + m 2) n(m 2) 2 (m 4) Tables of F-values have been constructed for various probability levels as criteria to test if the numerator chi-square variable has a noncentral chi-square distribution If the calculated F-value is greater than the value in the tables, then s is implied to have a noncentral chi-square distribution, otherwise s has a central chi-square distribution The square of a t-distribution variable gives a variable that has an F-distribution with 1 and n degrees of freedom Noncentral F-distributions exist depending on whether the numerator or denominator variables have noncentral chi-square distributions Tables for noncentral F-distributions generally do not exist because of the difficulty in predicting the noncentrality parameters However, using random chi-square generators it is possible to numerically calculate an expected noncentral F value for specific situations When both the numerator and denominator chi-square variables are from noncentral distributions, then their ratio follows a doubly noncentral F-distribution

23 15 QUADRATIC FORMS Quadratic Forms A quadratic form is a weighted sum of squares of elements of a random vector variable The general form is y Qy, where y is a random vector variable, and Q is a regulator matrix The regulator matrix can take on various forms and values depending on the situation Usually Q is a symmetric matrix, but not necessarily positive definite Examples of different Q matrices are as follows: 1 Q = I, then y Qy = y y which is a total sum of squares of the elements in y 2 Q = J(1/n), then y Qy = y Jy(1/n) where n is the length of y Note that J = 11, so that y Jy = (y 1)(1 y) and (1 y) is the sum of the elements in y 3 Q = (I J(1/n)) /(n 1), then y Qy gives the variance of the elements in y, σ 2 y The expected value of a quadratic form is However, so that then E(y Qy) = E(tr(y Qy)) = E(tr(Qyy )) = tr(qe(yy )) If we let V ar(y) = V and E(y) = µ, then V ar(y) = E(yy ) E(y)E(y ) E(yy ) = V ar(y) + E(y)E(y ), E(y Qy) = tr(q(v ar(y) + E(y)E(y ))) E(y Qy) = tr(q(v + µµ )) = tr(qv) + tr(qµµ ) = tr(qv) + µ Qµ The expectation of a quadratic form does not depend on the distribution of y However, the variance of a quadratic form requires that y follows a multivariate normal distribution Without showing the derivation, the variance of a quadratic form, assuming y has a multivariate normal distribution, is V ar(y Qy) = 2tr(QVQV) + 4µ QVQµ The quadratic form, y Qy, has a chi-square distribution if tr(qvqv) = tr(qv), and µ QVQµ = µ Qµ, or the single condition that QV is idempotent Then if m = tr(qv) and λ = 5µ Qµ,

24 24 CHAPTER 1 DISTRIBUTIONS the expected value of y Qy is m + 2λ and the variance is 2m + 8λ, which are the usual results for a noncentral chi-square variable The covariance between two quadratic forms, say y Qy and y Py, is Cov(y Qy, y Py) = 2tr(QVPV) + 4µ QVPµ The covariance is zero if QVP = 0, then the two quadratic forms are said to be independent

25 Chapter 2 Building Blocks 21 Basic Blocks In order to derive Maximum Likelihood and Residual Maximum Likelihood, a number of results about derivatives of determinants and other quantities needs to be reviewed These will be called Building Blocks or BB for short, because they are not theorems or conjectures BB-1 The (co)variance matrix of y is V = s Z i G i Z iσi 2 + Rσ0 2 i=1 = ZGZ + R Usually, each G i is assumed to be I for most random factors, but for animal models G i might be equal to A, the additive genetic relationship matrix Thus, G i does not always have to be diagonal, and will not be an identity in animal model analyses BB-2 The inverse of V is V 1 = R 1 R 1 Z(Z R 1 Z + G 1 ) 1 Z R 1 To prove, show that VV 1 = I Let T = Z R 1 Z + G 1, then VV 1 = (ZGZ + R)[R 1 R 1 ZT 1 Z R 1 ] = ZGZ R 1 ZGZ R 1 ZT 1 Z R 1 +I ZT 1 Z R 1 25

26 26 CHAPTER 2 BUILDING BLOCKS = I + [ZGT ZGZ R 1 Z Z](T 1 Z R 1 ) = I + [ZG(Z R 1 Z + G 1 ) ZGZ R 1 Z Z](T 1 Z R 1 ) = I + [ZGZ R 1 Z + Z ZGZ R 1 Z Z](T 1 Z R 1 ) = I + [0](T 1 Z R 1 ) = I BB-3 If k is a scalar constant and A is any square matrix of order m, then Ak = k m A BB-4 For general square matrices, say M and U, of the same order then MU = M U BB-5 For the general matrix below with A and D being square and non-singular (ie the inverse of each exists), then A B Q D = A D + QA 1 B = D A + BD 1 Q Then if A = I and D = I, then I = 1, so that I + QB = I + BQ = I + B Q = I + Q B BB-6 Using the results in (BB-4) and (BB-5), then V = R + ZGZ = R(I + R 1 ZGZ ) = R I + R 1 ZGZ = R I + Z R 1 ZG = R (G 1 + Z R 1 Z)G = R G 1 + Z R 1 Z G

27 21 BASIC BLOCKS 27 BB-7 The mixed model coefficient matrix of Henderson can be denoted by ( ) X C = R 1 X X R 1 Z Z R 1 X Z R 1 Z + G 1 then the determinant of C can be derived as C = X R 1 X G 1 + Z (R 1 R 1 X(X R 1 X) X R 1 )Z = Z R 1 Z + G 1 X (R 1 R 1 Z(Z R 1 Z + G 1 ) 1 Z R 1 )X Now let S = R 1 R 1 X(X R 1 X) X R 1 then C = X R 1 X G 1 + Z SZ = Z R 1 Z + G 1 X V 1 X BB-8 A projection matrix, P, is defined as Properties of P: Therefore, P = V 1 V 1 X(X V 1 X) X V 1 PX = 0, Py = V 1 (y Xˆb), where ˆb = (X V 1 X) X V 1 y y PZ i G i Z ipy = (y Xˆb) V 1 Z i G i Z iv 1 (y Xˆb) BB-9 Derivative of V 1 is V 1 σ 2 i = V 1 V σi 2 V 1 BB-10 Derivative of ln V is ln V σ 2 i = tr ( V ) 1 V σi 2

28 28 CHAPTER 2 BUILDING BLOCKS BB-11 Derivative of P is P σ 2 i = P V σi 2 P BB-12 Derivative of V is V σ 2 i = Z i G i Z i BB-13 Derivative of ln X V 1 X is ln X V 1 X σ 2 i = tr(x V 1 X) X V 1 V σi 2 V 1 X 22 Basic Model The following simple model will be assumed The general linear model is described as y = Xb + Zu + e, where E(y) = Xb, E(u) = 0, and E(e) = 0 Often u is partitioned into s factors as u = (u 1 u 2 u s ) The (co)variance matrices are defined as G = V ar(u) = V ar u 1 u 2 u s G 1 σ G 2 σ G s σs 2 and Then R = V ar(e) = Iσ 2 0 V ar(y) = V = ZGZ + R,

29 23 MIXED MODEL EQUATIONS 29 and if Z is partitioned corresponding to u, as Z = [ Z 1 Z 2 Z s ], then s ZGZ = Z i G i Z iσi 2 i=1 Let V i = Z i G i Z i and V 0 = I, then V = s V i σi 2 i=o Covariances between random factors in the model will be allowed later, such as in a maternal effects model where there are covariances between direct and maternal effects, or random regression models with covariances between the random regression coefficients, or multiple trait models 23 Mixed Model Equations Henderson s mixed model equations (MME) are written as X R 1 X X R 1 Z 1 X R 1 Z 2 X R 1 Z s Z 1R 1 X Z 1R 1 Z 1 + G 1 1 σ 2 1 Z 1R 1 Z 2 Z 1R 1 Z s Z 2R 1 X Z 2R 1 Z 1 Z 2R 1 Z 2 + G 1 2 σ 2 2 Z 2R 1 Z s Z sr 1 X Z sr 1 Z 1 Z sr 1 Z 2 Z sr 1 Z s + G 1 s = X R 1 y Z 1R 1 y Z 2R 1 y Z sr 1 y Quadratic forms from solutions to these equations are σs 2 ˆb û 1 û 2 û s û ig i û i, where and for the residual variance, where û = G i Z ipy, ê ê, ê = y Xˆb Zû

30 30 CHAPTER 2 BUILDING BLOCKS 24 Unbiased Estimation of Variances Unbiased estimation is no longer practiced in animal breeding, but the history of the development from unbiased to likelihood based methods is important to understand The best way to describe unbiased methods of estimation is to give a small example with only three observations Assume that all G i are equal to I for this example, so that Z i G i Z i simplifies to Z i Z i Let X = Z 2 = 1 1 1, Z 1 = , and y =, , Then and and V 0 = I V 1 = Z 1 Z 1 = V 2 = Z 2 Z 2 = , In this example, there are 3 unknown variances to be estimated, and consequently, at least three quadratic forms are needed in order to estimate the variances The Q- matrices are the weights of the observations in the quadratic forms These matrices differ depending on the method of estimation that is chosen Below are three arbitrary Q-matrices that were chosen such that Q k X = 0 They do not necessarily correspond to any known method of estimation, but are for illustration of the calculations Let Q 1 = 1 2 1, Q 2 = and Q 3 = The numeric values of the quadratic forms are y Q 1 y = 657,,

31 25 VARIANCES OF QUADRATIC FORMS 31 y Q 2 y = 306, and y Q 3 y = 882 For example, ( y Q 1 y = ) = 657 The expectations of the quadratic forms are E(y Q 1 y) = trq 1 V 0 σ trq 1 V 1 σ trq 1 V 2 σ 2 2 = 4σ σ σ2 2 E(y Q 2 y) = 4σ σ σ2, 2 E(y Q 3 y) = 6σ σ σ 2 2 Now equate the values of the quadratic forms to their corresponding expectations, which gives a system of equations to be solved, such as Fσ = w In this case, the equations would be which gives the solution as ˆσ = F 1 w, or ˆσ 2 0 ˆσ 1 2 = ˆσ 2 2 σ 2 0 σ 2 1 σ 2 2 = Note that one of the estimates is negative, which is not appropriate for a variance component The estimate is said to be out of the parameter space, 25 Variances of Quadratic Forms The variance of a quadratic form is given by V ar(y Qy) = 2trQVQV + 4b X QVQXb Only translation invariant quadratic forms are typically considered in variance component estimation, that means b X QVQXb = 0 Thus, only 2trQVQV needs to be calculated Remember that V can be written as the sum of s + 1 matrices, V i σ 2 i, then trqvqv = s s trq V i σi 2 Q V j σj 2 i=o j=o = s i=o s j=o tr QV i QV j σi 2 σj 2

32 32 CHAPTER 2 BUILDING BLOCKS For example, if s = 2, then trqvqv = trqv 0 QV 0 σ trQV 0 QV 1 σ 2 0σ trQV 0 QV 2 σ 2 0σ trqv 1 QV 1 σ trQV 1 QV 2 σ 2 1σ trqv 2 QV 2 σ 4 2 The exact sampling variances require the true, unknown components of variance The magnitude of the sampling variances depends on 1 The true magnitude of the individual components, 2 The matrix Q which depends on the method of estimation and the model, and 3 The structure and amount of the data through X and Z Normally, the variance-covariance matrix of the estimates, commonly known as the sampling variances of the estimates, were never actually computed during the days of unbiased methods due to their computational complexity However, with today s computers their calculation can still be very challenging and usually impossible For small examples, the calculations can be easily demonstrated In this case, V ar(f 1 w) = F 1 V ar(w)f 1, a function of the variance-covariance matrix of the quadratic forms Using the small example of the previous section, the V ar(w) is a 3x3 matrix The (1,1) element is the variance of y Q 1 y which is V ar(y Q 1 y) = 2trQ 1 VQ 1 V = 2trQ 1 V 0 Q 1 V 0 σ trQ 1 V 0 Q 1 V 1 σ 2 0σ trQ 1 V 0 Q 1 V 2 σ 2 0σ trQ 1 V 1 Q 1 V 1 σ trQ 1 V 1 Q 1 V 2 σ 2 1σ trQ 1 V 2 Q 1 V 2 σ 4 2 = 20σ σ 2 0σ σ 2 0σ σ σ 2 1σ σ 4 2 The (1,2) element is the covariance between the first and second quadratic forms, Cov(y Q 1 y, y Q 2 y) = 2trQ 1 VQ 2 V, and similarly for the other terms All of the results are summarized in the table below Forms σ0 4 σ0 2σ2 1 σ0 2σ2 2 σ1 4 σ1 2σ2 2 σ2 4 V ar(w 1 ) Cov(w 1, w 2 ) Cov(w 1, w 3 ) V ar(w 2 ) Cov(w 2, w 3 ) V ar(w 3 )

33 26 VARIANCE OF A RATIO OF VARIANCE ESTIMATES 33 To get numeric values for these variances, the true components need to be known Assume that the true values are σ 2 0 = 250, σ2 1 = 10, and σ2 2 = 80, then the variance of w 1 is V ar(w 1 ) = 20(250) (250)(10) + 16(250)(80) +8(10) 2 + 0(10)(80) + 8(80) 2 = 1, 662, 000 The complete variance- covariance matrix of the quadratic forms is V ar w 1 w 2 w 3 = 1, 662, 000 1, 147, 800 2, 144, 000 1, 147, 800 1, 757, 200 2, 218, 400 2, 144, 000 2, 218, 400 3, 550, 800 The variance-covariance matrix of the estimated variances (assuming the above true values) would be V ar(ˆσ) = F 1 V ar(w)f 1 = 405, , , , , , , , , 500 = C 26 Variance of A Ratio of Variance Estimates Often estimates of ratios of functions of the variances are needed for animal breeding work, such as heritabilities, repeatabilities, and variance ratios Let such a ratio be denoted as a/c where a = ˆσ 2 2 = (0 0 1)ˆσ = 72 and c = ˆσ ˆσ ˆσ 2 2 = (1 1 1)ˆσ = 288 (NOTE: the negative estimate for ˆσ 2 1 was set to zero before calculating c From Osborne and Patterson (1952) and Rao (1968) an approximation to the variance of a ratio is given by Now note that V ar(a/c) = (c 2 V ar(a) + a 2 V ar(c) 2ac Cov(a, c))/c 4 V ar(a) = (0 0 1)C(0 0 1) = 293, 500, V ar(c) = (1 1 1)C(1 1 1) = 231, 200, Cov(a, c, ) = (0 0 1)C(1 1 1) = 194, 750

34 34 CHAPTER 2 BUILDING BLOCKS Then V ar(a/c) = [(288) 2 (293, 500) + (72) 2 (231, 200) 2(72)(288)(194, 750)]/(288) 4 = This result is very large, but could be expected from only 3 observations Thus, (a/c) = 25 with a standard deviation of Another approximation method assumes that the denominator has been estimated accurately, so that it is considered to be a constant, such as the estimate of σ 2 e Then, For the example problem, this gives V ar(a/c) = V ar(a)/c 2 V ar(a/c) = 293, 500/(288) 2 = , which is slightly larger than the previous approximation The second approximation would not be suitable for a ratio of the residual variance to the variance of one of the other components Suppose a = ˆσ 0 2 = 216, and c = ˆσ2 2 = 72, then (a/c) = 30, and with the first method, and V ar(a/c) = [(72) 2 (405, 700) + (216) 2 (293, 500) 2(72)(216)( 240, 700)]/(72) 4 = , V ar(a/c) = 405, 700/(72) 2 = 7826, with the second method The first method is probably more realistic in this situation, but both are very large

35 Chapter 3 Likelihood Methods 31 The Likelihood Function The normal distribution likelihood function is commonly assumed in animal breeding, and is used in REML and related methods of estimation Use will be made of the building blocks of the previous chapter The multivariate normal distribution likelihood function is L(y) = (2π) 5N V 5 exp( 5(y Xb) V 1 (y Xb)) The log of the likelihood, say L 1 is L 1 = 05[N ln(2π) + ln V +(y Xb) V 1 (y Xb)] The term N ln(2π) is a constant that does not involve any of the unknown variances or effects in the model, and therefore, it is commonly omitted during maximization computations Maximizing the log likelihood maximizes the original likelihood function In the previous chapter, V = R Z R 1 Z + G 1 G, and therefore, ln V = ln R + ln G + ln Z R 1 Z + G 1 If R = Iσ 2 0, then ln R = ln Iσ0 2 = ln(σ0) 2 N I = N ln σ0(1) 2 35

36 36 CHAPTER 3 LIKELIHOOD METHODS Similarly, if G = + Iσi 2, where i = 1 to s, then ln G = = s ln Iσi 2 i=1 s q i ln σi 2 i=1 Except, that in animal models one of the G i is equal to Aσi 2 In that case, ln Aσ 2 i = ln(σ 2 i ) q i A which is ln Aσ 2 i = q i ln σ 2 i A = q i ln σ 2 i + ln A and Recall that so that ( ) X C = R 1 X X R 1 Z Z R 1 X Z R 1 Z + G 1, C = Z R 1 Z + G 1 X V 1 X ln C = ln Z R 1 Z + G 1 + ln X V 1 X 32 Maximum Likelihood Hartley and Rao (1967) described the maximum likelihood approach for the estimation of variance components Let L 2 be equivalent to L 1 except for the constant involving π L 2 = 05[ln V +(y Xb) V 1 (y Xb)] The derivatives of L 2 with respect to b and to σ 2 i for i = 0, 1, s are and L 2 b = X V 1 Xb X V 1 y L 2 σ 2 i = 5 tr[v 1 ( V/ σ 2 i )] + 5(y Xb) V 1 ( V/ σ 2 i )V 1 (y Xb) = 5 tr[v 1 V i ] + 5(y Xb) V 1 V i V 1 (y Xb) Equating the derivatives to zero gives ˆb = (X V 1 X) X V 1 y,

37 32 MAXIMUM LIKELIHOOD 37 and Recall that tr[v 1 V i ] = (y Xˆb) V 1 V i V 1 (y Xˆb) Py = V 1 (y Xˆb), where P is the projection matrix, and that V i = Z i Z i, then tr[v 1 Z i Z i] = y PV i Py In usual mixed model theory, the solution vector for a random factor may be written as so that Also, Let û i = G i Z ipy, y PV i Py = y PZ i G i G 2 i G i Z ipy = û ig 2 i û i = û iûi/σi 4 tr[v 1 V i ] = tr[(r 1 R 1 Z(Z R 1 Z + G 1 ) 1 Z R 1 )Z i Z i] T = (Z R 1 Z + G 1 ) 1, R = Iσ 2 0, and G = + Iσ 2 i, then tr[v 1 V i ] = tr(z iz i )σ 2 0 tr(z iztz Z i )σ 4 0 If T can be partitioned into submatrices for each random factor, then and Tσ 2 0 (Z Z + + Iαi ) = I, which yields Finally, + TZ Zσ0 2 = I T( Iσ 2 i ), TZ Z i σ 2 0 = I T ii σ 2 i, tr(z iztz Z i )σ0 4 = tr(z iz i )σ0 2 tr(i T ii σ 2 i )σi 2 tr[v 1 V i ] = tr(i T ii σi 2 )σi 2 = triσ 2 i = q i σ 2 i trt ii σ 4 i trt ii σ 4 i

38 38 CHAPTER 3 LIKELIHOOD METHODS Combining results gives ˆσ i 2 = (û iûi + trt iiˆσ 0)/q 2 i for i = 1, 2,, s, and for i = 0 gives ˆσ 0 2 = (y y ˆb X y û Z y)/ N 321 The EM Algorithm EM stands for Expectation Maximization From Searle, Casella, and McCulloch (1992) the following explanation is given The procedure alternates between calculating conditional expected values and maximizing simplified likelihoods The actual data y are called the incomplete data in the EM algorithm, and the complete data are considered to be y and the unobservable random effects, u i If the realized values of the unobservable random effects were known, then their variance would be the average of their squared values, ie, ˆσ 2 i = u iu i /q i However, in real life the realized values of the random effects are unknown The steps of the EM algorithm are as follows: Step 0 Decide on starting values for the variances and set m = 0 Step 1(E-step) Calculate the conditional expectation of the sufficient statistics, conditional on the incomplete data E(u iu i y) = σ 4(m) i y P (m) Z i Z ip (m) y +tr(σ 2(m) i I σ 4(m) i Z i(v (m) ) 1 Z i ) = ˆt (m) i Step 2(M-step) Maximize the likelihood of the complete data, σ 2(m+1) i = ˆt (m) i /q i, i = 0, 1, 2,, s Step 3 If convergence is reached, set ˆσ = σ (m+1), otherwise increase m by one and return to Step 1 This is equivalent to constructing and solving the mixed model equations with a given set of variances, σ (m), and then σ 2(m+1) 0 = (y y ˆb X y û Z y)/n, and σ 2(m+1) i = (û iûi + σ 2(m+1) 0 trt ii )/q i

39 33 RESTRICTED MAXIMUM LIKELIHOOD Restricted Maximum Likelihood Restricted (or Residual) maximum likelihood (REML), was first suggested by Thompson (1962), and was described formally by Patterson and Thompson (1971) The procedure requires that y have a multivariate normal distribution The method is translation invariant The maximum likelihood approach automatically keeps the estimator within the allowable parameter space(ie zero to plus infinity), and therefore, REML is a biased procedure REML was proposed as an improvement to ML in order to account for the degrees of freedom lost in estimating fixed effects The likelihood function used in REML is that for a set of error contrasts (ie residuals) that are assumed to have a multivariate normal distribution The multivariate normal distribution likelihood function for the residual contrasts, K y, where K X = 0, and K has rank equal to N r(x), is L(K y) = (2π) 5(N r(x)) K VK 5 exp( 5(K y) (K VK) 1 (K y)) The natural log of the likelihood function is L 3 = 5(N r(x)) ln(2π) 5 ln K VK 5y K(K VK) 1 K y Notice that 5(N r(x)) ln(2π) is a constant that does not depend on the unknown variance components or factors in the model, and therefore, can be ignored to give L 4 Searle (1979) showed that and ln K VK = ln V + ln X V 1 X y K(K VK) 1 K y = y Py = (y Xˆb) V 1 (y Xˆb) for any K such that K X = 0 Hence, L 4 can be written as L 4 = 5 ln V 5 ln X V 1 X 5(y Xˆb) V 1 (y Xˆb) REML can be calculated a number of different ways 1 Derivative Free approach is a search technique to find the parameters that maximize the log likelihood function Two techniques will be described here 2 First Derivatives and EM is where the first derivatives of the log likelihood are determined and set to zero in order to maximize the likelihood function Solutions need to be obtained by iteration because the resulting equations are non linear 3 Second Derivatives are generally more computationally demanding Gradient methods are used to find the parameters that make the first derivatives equal to zero Newton-Raphson (involves the observed information matrix) and Fishers Method of Scoring (involves the expected information matrix) have been used Lately, the average information algorithm (averages the observed and expected information matrices) has been used to reduce the computational time

40 40 CHAPTER 3 LIKELIHOOD METHODS All of the approaches attempt to maximize the log likelihood function of the error contrasts To illustrate the methods, consider a single trait model with three factors (F, A, B), of which A and B are random factors There were a total of 90 observations, and the total sum of squares was 356,000 The least squares equations for this small example are shown below F 1 F 2 A 1 A 2 A 3 B 1 B 2 B 3 B 4 = Derivative Free REML Derivative Free REML was proposed by Smith and Graser(1986) and Graser, Smith and Tier(1987) and has been expanded upon by Meyer (1987,91) who has developed a set of programs for computing estimates of variance components for a whole range of univariate and multivariate models The description given below is a very simplified version of the method for basic understanding of the technique Imagine an s dimensional array containing the values of the likelihood function for every possible set of values of the ratios of the components to the residual variance The technique is to search this array and find the set of ratios for which the likelihood function is maximized There is more than one way to conduct this search Care must be taken to find the global maximum rather than one of possibly many local maxima At the same time the number of likelihood evaluations to be computed must also be minimized Various alternative forms of L 4 can be derived Note that and that and that combining these results gives Now note that ln V = ln R + ln G + ln G 1 + Z R 1 Z ln X V 1 X = ln C ln Z R 1 Z + G 1 L 4 = 5 ln R 5 ln G 5 ln C 5y Py ln R = ln Iσ 2 0

41 33 RESTRICTED MAXIMUM LIKELIHOOD 41 Then = N ln σ0, 2 ln G = s q i ln σi 2, i=1 and ln C = ln X R 1 X + ln Z SZ + G 1 where ln X R 1 X = ln X Xσ0 2 = ln(σ0 2 X X = ln X X r(x) ln σ0, 2 and Z SZ + G 1 = σ0 2 MZ + G 1 = σ 2 0 (Z MZ + G 1 σ 2 0) ln C = ln X X r(x) ln σ 2 0 q ln σ ln Z MZ + G 1 σ 2 0, and finally, the log-likelihood function becomes L 4 = s 5(N r(x) q) ln σ0 2 5 q i ln σi 2 i=1 5 ln C 5y Py, where Note that C = ( X X X Z Z X Z Z + G 1 σ 2 0 ) q i ln σi 2 = q i ln σ0/α 2 i = q i (ln σ0 2 ln α i ) so that L 4 = s 5[(N r(x)) ln σ0 2 q i ln α i + ln C +y Py] i=1 The quantity y Py is y (y Xˆb Zû)/σ0 2 The computations are achieved by constructing the following matrix, X X X Z X y Z X Z Z + G 1 σ0 2 Z y y X y Z y y = ( ) C W y y W y, y then by Gaussian elimination of one row at a time, the sum of the log of the non-zero pivots (using the same ordering for each evaluation of the likelihood) gives log C and y (y Xˆb Zû) Gaussian elimination, using sparse matrix techniques, requires less computing time than inverting the coefficient matrix of the mixed model equations The ordering of factors within the equations could be critical to the computational process and some experimentation may be necessary to determine the best ordering The likelihood

42 42 CHAPTER 3 LIKELIHOOD METHODS function can be evaluated without the calculation of solutions to the mixed model equations, without inverting the coefficient matrix of the mixed model equations, and without computing any of the σi 2 The formulations for more general models and multiple trait models are more complex, but follow the same ideas Searching the array of likelihood values for various values of α i can be done in several different ways One method is to fix the values of all but one of the s α i, and then evaluate L 2 for four or more different values of the α i that were not fixed Then one can use a quadratic regression analysis to determine the value of that one ratio which maximizes L 2 given that the other ratios are fixed This is repeated for each of the s ratios, and the process is repeated until a maximum likelihood is obtained The calculations are demonstrated in the example that follows Begin by fixing the value of α B = 10 and letting the value of α A take on the values of (5, 10, 20, 30, 40) Using L 4 to evaluate the likelihood, then the results were as follows: α A L For example, the likelihood value for α A = 40, would be L 4 = 1 2 [(N r(x)) ln σ2 0 q A ln α A q B ln α B + ln C +y (y Xˆb Zû)/σ 2 0] where ln C = , y Py = /σ0 2 = 88, q A ln α A = , q B ln α B = , σ0 2 = , ln σ0 2 = , (N r(x)) = 88, then L 4 = 05[88(45685) ( /963997)] =

43 33 RESTRICTED MAXIMUM LIKELIHOOD 43 then To find the value of α A that maximizes L 4 for α B = 10, let Q = ˆβ = (Q Q) 1 Q Y = and Y = From this a prediction equation for L 4 can be written as L 4 = α A α 2 A This equation can be differentiated with respect to α A and then equated to zero to find the value of the ratio that maximizes the prediction equation This gives α A = 04489/(2(000698)) = Now keep α A = and try a number of values of α B from 2 to 10, which give the following results α B L Applying the quadratic regression to these points gives α B = The next step would be to fix α B = and to try new values for α A, such as 25 to 40 by units of 1 The range of values becomes finer and finer To insure that one has found the global maximum, the entire process could be started with vastly different starting values for the ratios, such as α B = 50 and let values for α A be 40, 50, 60, and 70 The more components there are to estimate, the more evaluations of the likelihood

44 44 CHAPTER 3 LIKELIHOOD METHODS that are going to be needed, and the more probable that convergence might be to a local maximum rather than to the global maximum Please refer to the literature for specification of the log likelihood function for particular models and situations Also, refer to work by Boldman and Van Vleck (1993) which found a simplification of Meyer s algorithms which reduced computational time by several orders of magnitude Even so, DFREML has been applied to fairly small data sets and can take considerable time to find estimates for these The available software may not be able to handle particular models, and so the user should be aware of these possible problems The Simplex Method The Simplex Method (Nelder and Mead, 1965) is a procedure for finding the minimum of a function (ie the minimum of 2L 4 or the maximum of L 4 ) with respect to the unknown variances and covariances The best way to describe the method is using the example data from the previous sections Begin by constructing a set of points for which L 4 is to be evaluated A point is a vector of values for the unknowns ( α A, α B ), for example, ( ) θ 1 = , then form two more points by changing one unknown at a time Let the three points be as shown in the following table No α A α B Now calculate L 4 for each point and arrange from largest to lowest value No α A α B L The idea now is to find another point to replace the last one(lowest L 4 ) This is done by a process called reflection Compute the mean of all points excluding the one with the lowest L 4 ( ) θ m = , then the reflection step is θ 4 = θ m + r (θ m θ last ),

45 33 RESTRICTED MAXIMUM LIKELIHOOD 45 where r is recommended by Nelder and Mead (1965) to be 1, giving ( ) θ 4 = The corresponding L 4 for this point was Compared to those in the table it has the largest value, and therefore, is a better point than the other three No α A α B L Given this success, the Simplex method calls for an expansion step, ie bigger change Thus, θ 5 = θ m + E (θ 4 θ m ), where E is suggested to be equal to 2 Hence ( θ 5 = ) to make a Then L 4 = , and the expanded point is better yet Now drop θ 1 from the table and put θ 5 at the top No α A α B L This completes one iteration Begin the next iteration by computing the mean of all points excluding the point with the lowest L 4 ( ) θ m = Another reflection step gives θ 6 = θ m + r (θ m θ last ), ( ) = However, this gives L 4 = , which is between θ 2 and θ 4, and can push out θ 2 from the table No α A α B L

46 46 CHAPTER 3 LIKELIHOOD METHODS Instead of an expansion step, a contraction step is needed because θ 6 did not give a greater L 4 than the first two Thus, θ 7 = θ m + c (θ 6 θ m ), where c = 05 is recommended Hence, ( θ 7 = ) Then L 4 = is better than that given by θ 4, but not by θ 5, thus the new table becomes as follows: No α A α B L The following steps were taken in the next iteration 1 The mean of the top two L 4 is ( θ m = ) 2 A reflection step gives θ 8 = θ m + r (θ m θ last ), ( ) = , which gave L 4 = , which is better than θ 7 3 Add θ 8 to the table and drop θ 4 No α A α B L Because L 4 for θ 8 was not larger than L 4 for θ 5 or smaller than L 4 for θ 7, then no expansion or contraction step is necessary Begin the next iteration The Simplex method continues in this manner until all point entries in the table are equal The constants recommended by Nelder and Mead (1965) for reflection, expansion, and contraction could be adjusted for a particular data set This method may converge to a local maximum, and so different starting values are needed to see if it converges to the same point The Simplex method does not work well with a large number of parameters to be estimated

Estimation of Variances and Covariances

Estimation of Variances and Covariances Variables and Distributions Random variables are samples from a population with a given set of population parameters Random variables can be discrete, having a limited