Estimation of Variance Components in Animal Breeding

Size: px
Start display at page:

Download "Estimation of Variance Components in Animal Breeding"

Transcription

1 Estimation of Variance Components in Animal Breeding L R Schaeffer July 19-23, 2004 Iowa State University Short Course

2 2

3 Contents 1 Distributions 9 11 Random Variables 9 12 Discrete Random Variables Binomial Distribution Poisson Distribution General Matrix Results Expectations Variance-Covariance Matrices Continuous Distributions Uniform Distribution Normal Distribution Beta Distribution Gamma Distribution Chi-Square Distribution The Wishart Distribution The t-distribution 22 3

4 4 CONTENTS 148 The F-distribution Quadratic Forms 23 2 Building Blocks Basic Blocks Basic Model Mixed Model Equations Unbiased Estimation of Variances Variances of Quadratic Forms Variance of A Ratio of Variance Estimates 33 3 Likelihood Methods The Likelihood Function Maximum Likelihood The EM Algorithm Restricted Maximum Likelihood Derivative Free REML First Derivatives and EM Algorithm Second Derivatives, Average Information Animal Models 51 4 Bayesian Methods The Joint Posterior Distribution Fully Conditional Posterior Distributions 57

5 CONTENTS 5 43 Computational Scheme Round Round Burn-In Periods and Estimates Influence of the Priors Long Chain or Many Chains? 64 5 Data Readiness Pedigree Files Achieving Chronological Order Inbreeding Calculations Data File Control Files 72 6 Software Packages DMU-Jensen and Madsen Control File Comments on DMU VCE - Groeneveld pfile COMMENT DATA MODEL COVARIANCE 84

6 6 CONTENTS 626 SYSTEM OUTPUT END MTDFREML MTDFNRM MTDFPREP MTDFRUN 89 7 Maternal Effects Models Introduction Simulation of Records Mixed Model Equations Estimation of Covariances EM-REML Bayesian Estimation Warnings 99 8 Random Regression Models Introduction Multiple Trait Approach Covariance Functions and Orthogonal Polynomials Reduced Orders of Fit Basic Structure of RRM Example Data Analysis By RRM 112

7 CONTENTS 7 85 Estimation of Parameters EM-REML Bayesian Estimation Multiple Trait Models Introduction Models Example Data The MME Estimation of Covariance Matrices Sampling Solutions to MME Sampling New Covariance Matrices Non-Additive Genetic Models Introduction The Model Example Data Genetic Relationships HMME Estimation of Variances 139

8 8 CONTENTS

9 Chapter 1 Distributions 11 Random Variables A random variable is a real-valued function which exists within the domain of a defined sample space A random variable is conventionally designated by a capital letter, say Y, and the value of Y is denoted by a small letter, say y The sample space is the range of values that y can be assigned The small letter y is used in these notes for Y and y Random variables can be either discrete or continuous A discrete random variable can assume only a finite number of distinct values, such as zero or one, for example A continuous random variable can assume any value within the range of the sample space Random variables usually follow a distribution function or probability density which can be described mathematically in most cases The distribution could have a number of parameters associated with it, such as a mean and variance, and all parameters will generally be designated collectively as a vector θ The goal of statistical analysis is usually to estimate elements of θ from the observed random variables, y If y represents a random variable from some distribution, then the expectation of y is denoted by E(y) = µ where E() means expected value The expected value of y depends on its distribution and range of allowable values The expected value is known as the mean, or the first moment of the distribution Also of importance is the variance of y that could be expected with that distribution The variance of a scalar random variable, y, is defined as V ar(y) = E(y 2 ) E(y)E(y) = E(y E(y)) 2 and is commonly represented as σ 2 y Variances are known as the second moment of the distribution Variances are always greater than zero 9

10 10 CHAPTER 1 DISTRIBUTIONS 12 Discrete Random Variables In the general discrete case, the probability that Y takes the value y, is defined as the sum of the probabilities of all sample points that are assigned the value y That is, P (Y = y) = p(y) The probability distribution of Y lists the probabilities for each value of y Suppose Y can take on four values with the following probabilities: y p(y) 0 1/8 1 1/4 2 1/4 3 3/8 Any other values of y are assumed to have zero probability, and the sum of all probabilities is 1, as required of a valid distribution function Various kinds of probabilities may be calculated from this table P r(y = 0) = p(0) = 1 8 P r(y = 0 or Y = 2) = p(0) + p(2) = 3 8 P r(y = 1 or Y = 2) = p(1) + p(2) = 1 2 The cumulative distribution function is F (y) = P r(y y), for example, F (0) = p(0), F (1) = p(0) + p(1) = 3 8, F (2) = p(0) + p(1) + p(2) = 5 8, F (3) = p(0) + p(1) + p(2) + p(3) = 1 Also, if y 1 y 2, then F (y 1 ) F (y 2 ) Finally, P r(y 1 < Y y 2 ) = P r(y y 2 ) P r(y y 1 ) = F (y 2 ) F (y 1 )

11 12 DISCRETE RANDOM VARIABLES 11 The expected value of a discrete random variable is defined as E(y) = y y p(y) For the example above, E(y) = (0(1/8) + 1(1/4) + 2(1/4) + 3(3/8)) = 1875 Similarly, the expected value of a function of Y, say g(y ) is given by E(g(y)) = y g(y) p(y) Suppose g(y) = y 2, then E(y 2 ) = (0(1/8) + 1(1/4) + 4(1/4) + 9(3/8)) = 4625 The variance of discrete random variable Y is For the example, V ar(y) = E(y E(y)) 2 = E(y 2 ) [E(y)] 2 V ar(y) = ( 1875) 2 (1/8) + ( 875) 2 (1/4) + (125) 2 (1/4) + (1125) 2 (3/8) = 4625 (1875) 2 = Binomial Distribution A common discrete distribution is the binomial distribution A binomial event can take on only two possible outcomes, success or failure, zero or one, heads or tails, diseased or not diseased, and so on The probability of one outcome is q and the probability of the other outcome is 1 q Trials, or a succession of binomial events, are assumed to be independent The random variable Y is the number of successes The probability distribution is given by ( ) n p(y) = q y (1 q) n y, y for y = 0, 1, 2,, n and 0 q 1 The number of trials is n The expected value and variance of the binomial distribution are E(y) = n q V ar(y) = n q (1 q)

12 12 CHAPTER 1 DISTRIBUTIONS 122 Poisson Distribution A Poisson probability distribution provides a good model for the probability distribution of the number Y of rare events that occur in a given space, time, volume, or any other dimension, and λ is the average value of Y In dairy cattle breeding, for example, the number of quality embryos produced by a cow during superovulation can range from 0 to 20 (or more), but the average might be only 3 or 4 The Poisson probability distribution is given by p(y) = λy exp λ, y! for y = 0, 1, 2, and λ > 0 Also, The mean and the variance are equal E(y) = λ V ar(y) = λ 13 General Matrix Results Extending results from scalar random variables to vectors of random variables, also called a random vector variable, then the following general results apply Vectors are denoted by boldfaced small letters 131 Expectations Let y 1 and y 2 be two random vector variables, then for i = 1 or 2, then µ i1 µ i2 E(y i ) = µ i =, µ in for a vector of length n If c is a scalar constant, then Similarly, if C is a matrix of constants, then Finally, E(cy i ) = cµ i E(Cy i ) = Cµ i E(y 1 + y 2 ) = E(y 1 ) + E(y 2 ) = µ 1 + µ 2

13 14 CONTINUOUS DISTRIBUTIONS Variance-Covariance Matrices Let y be a random vector variable of length n, then the variance-covariance matrix of y is as follows: V ar(y) = E(yy ) E(y)E(y ) σy 2 1 σ y1 y 2 σ y1 y n σ y1 y = 2 σy 2 2 σ y2 y n = V σ y1 y n σ y2 y n σ 2 y n A variance-covariance (VCV) matrix of a random vector contains variances on the diagonals and covariances in the off-diagonals A VCV matrix is square, symmetric and should always be positive definite or positive semi-definite, ie all of the eigenvalues must be positive Another name for VCV matrix is dispersion matrix or (co)variance matrix Let C be a matrix of constants conformable for multiplication with the vector y, then V ar(cy) = E(Cyy C ) E(Cy)E(y C ) = CE(yy )C CE(y)E(y )C = C ( E(yy ) E(y)E(y ) ) C = CV ar(y)c = CVC If there are two sets of functions of y, say C 1 y and C 2 y, then Cov(C 1 y, C 2 y) = C 1 VC 2 Similarly, if y and z represent two different random vectors, possibly of different orders, and if the (co)variance matrix between these two vectors is W, then Cov(C 1 y, C 2 z) = C 1 WC 2 14 Continuous Distributions Consider measuring the amount of milk given by a dairy cow at a particular milking Even if a machine of perfect accuracy was used, the amount of milk would be a unique point on a continuum of possible values, such as kg of milk As such it is mathematically impossible to assign a nonzero probability to all of the infinite possible points in the continuum Thus, a different method of describing a probability distribution of a continuous random variable must be used The sum of the probabilities (if they

14 14 CHAPTER 1 DISTRIBUTIONS could be assigned) through the continuum is still assumed to sum to 1 The cumulative distribution function of a random variable is F (y) = P (Y y), for < y < As y approaches, then F (y) approaches 0 As y approaches, then F (y) approaches 1 Thus, F (y) is said to be a nondecreasing function of y If a < b, then F (a) < F (b) If F (y) is the cumulative distribution function of Y, then the probability density function of Y is given by p(y) = F (y) = F (y), y wherever the derivative exists Always for p(y) being a probability density function, Conversely, F (y) = p(y) y = 1 y p(t) t The expected value of a continuous random variable y is E(y) = y p(y) y provided that the integral exists If g(y) is a function of y, then E(g(y)) = provided that the integral exists Finally, g(y) p(y) y V ar(y) = E(y 2 ) [E(y)] Uniform Distribution The Uniform Distribution is one of the basic distributions in statistics The primary application is in the generation of random numbers from other distributions y U(a, b) p(y) = U(y a, b) where b is greater than a In a uniform distribution, every value between a and b has an equal probability of existing, ie p(y) = 1/(b a) Usually b = 1 and a = 0 so that

15 14 CONTINUOUS DISTRIBUTIONS 15 the observed y are between 0 and 1, representing probabilities The expected value and variance of this distribution are E(y) = = E(y 2 ) = = b a b a y p(y) y y/(b a) y = 1 2 y2 /(b a) b a = 1 2 b2 /(b a) 1 2 a2 /(b a) = 1 2 (b2 a 2 )/(b a) = (a + b)/2, b a b a y 2 p(y) y y 2 /(b a) y = 1 3 y3 /(b a) b a = 1 3 (b3 a 3 )/(b a) = 1 3 [(a + b)2 ab] Then V ar(y) = E(y 2 ) E(y) 2 = (4[(a + b) 2 ab] 3(a + b) 2 )/12 = (b a) 2 /12 Uniform Random Number Generators George Marsaglia of Florida State University developed a uniform random number generator that has passed 18 different tests, and which has a long cycle time (ie the number of calls to the subroutine before the sequence of random numbers begins to repeat itself) The strategy utilized is a multiply-with-carry scheme To give an idea of how this scheme works, start with x n 1 = , which is called the seed The next number would be generated by This would be followed by x n = 672 [x n 1 (x n 1 /1000) 1000)] + x n 1 /1000 = 672 [456] = x n = =

16 16 CHAPTER 1 DISTRIBUTIONS The number 672 is carefully chosen, and there is a process for doing this The cycle time, or period, of the above generator is 336,000 Thus, after 336,000 operations, the numbers begin again in the same sequence This is an example of how multiply-with-carry works Marsaglia s proposed generator makes use of 8 numbers at a time rather than just the previous random number, as well as two sequences of numbers rather than just one The period for this generator is which is a very, very large number The two sequences are x n = 12013x n x n x n x n x n x n x n x n 1 + and carry mod 2 16, x n = 9272x n x n x n x n x n x n x n x n 1 + carry mod 2 16 The coefficients were carefully chosen Each sequence provides a 16 bit integer number, and therefore, combining the two (by concatenation) gives a 32-bit random integer Example Usage The uniform distribution random number generator can be used to simulate a discrete random variable (such as that given earlier - Section 12) The steps of a simple program to assign y a value based upon the probabilities given earlier, would be as follows: call uniform(p) y = 0 if(pgt0125)y=1 if(pgt0375)y=2 if(pgt0625)y=3 142 Normal Distribution The normal distribution has been the most commonly assumed distribution in animal breeding and statistical genetics The properties of the distribution are well known; computations for estimators are relatively easy; and normal distributions suffice for a majority of situations A scalar random variable y has a normal probability distribution if and only if p(y) = (2π) 5 σ 2 exp( 5(y µ) 2 σ 2 )

17 14 CONTINUOUS DISTRIBUTIONS 17 for < x < +, where σ 2 is the variance of y and µ is the expected value of y For a random vector variable, y, the multivariate normal density function is p(y) = (2π) 5n V 5 exp( 5(y µ) V 1 (y µ)) denoted as y N(µ, V) where V is the variance-covariance matrix of y Note that the determinant of V must be positive, otherwise the density function is undefined Random Normal Deviates There have been several ways to generate random variables from a normal distribution Let r 1 and r 2 be values from a uniform distribution generator, between 0 and 1, then two random normal deviates can be computed as and y 1 = ( 2 ln r 1 ) 5 cos 2πr 2, y 2 = ( 2 ln r 1 ) 5 sin 2πr 2 The quality of the results depend on the quality of the uniform distribution generator Another algorithm again uses two values from a uniform distribution generator, say r 1 and r 2, then v 1 = r 1 + r 1 10, and v 2 = r 2 + r 2 10, if q is less than 1 and greater than 0, then q = v 1 v 1 + v 2 v 2, f = ( 2 ln q/q) 5 y 1 = v 1 f, y 2 = v 2 f 143 Beta Distribution Beta distributions may be assumed for random variables such as heritability, probability, or gene frequency all of which are limited to a value between zero and 1 p(y a, b) = Cy a 1 (1 y) b 1, for y [0, 1], where the constant of integration is C = Γ(a + b)/(γ(a)γ(b)),

18 18 CHAPTER 1 DISTRIBUTIONS and a and b are greater than 0 The gamma function, Γ( ) is Γ(α) = 0 y α 1 exp y y, for α > 0 When α = 1, then Γ(1) = 1 For α being an integer and greater than 1, then The expectation of a beta variable is Γ(α) = (α 1)Γ(α 1) = (α 1)! E(y a, b) = C 1 0 [ y y a 1 (1 y) b 1] y, which simplifies to The variance of a beta variable is E(y a, b) = a a + b V ar(y a, b) = ab (a + b) 2 (a + b + 1) 144 Gamma Distribution The Gamma Distribution has the following form; p(y a, b) = Cy a 1 exp by, for y being greater than 0 Variance components are always supposed to be greater than 0, thus a gamma distribution may be appropriate The constant of integration is C = ba Γ(a), and both a and b are greater than zero The mean of the distribution is and the variance of the distribution is E(y a, b) = a b, V ar(y a, b) = a b 2 The coefficient of variation (standard deviation divided by the mean) is equal to a 5 The gamma distribution is the parent distribution for two special cases When a = 1, then the gamma distribution becomes the Exponential Distribution, ie, p(y b) = b exp by,

19 14 CONTINUOUS DISTRIBUTIONS 19 for y greater than zero The other case is when a = v/2 and b = 05, where v is commonly known as the degrees of freedom and is greater than zero Then and This is the Central Chi-square distribution p(y v) = Cy (v/2) 1 exp y/2, C = (1/2)v/2 Γ(v/2) 145 Chi-Square Distribution In the estimation of variance components, quadratic forms of y are needed If y N(0, I), then y y χ 2 n, where χ 2 n is a central chi-square distribution with n degrees of freedom and n is the length of the random vector variable y The mean of the central chi-square distribution is n, and the variance is 2n The probability distribution function of s = y y is p(s n) = (s) (n/2) 1 exp 05s/[2 05n Γ(05n)], for s > 0 If y N(µ, I), then y y χ 2 n,λ where λ is the noncentrality parameter which is equal to 5µ µ The mean of a noncentral chi-square distribution is n + 2λ and the variance is 2n + 8λ If y N(µ, V), then y Qy has a noncentral chi-square distribution only if QV is idempotent, ie QVQV = QV The noncentrality parameter is λ = 5µ QVQµ and the mean and variance of the distribution are tr(qv) + 2λ and 2tr(QV) + 8λ, respectively If there are two quadratic forms of y, say y Qy and y Py, and both quadratic forms have chi-square distributions, then the two quadratic forms are independent if QVP = 0 Independence of quadratic forms is necessary for the construction of valid tests of hypotheses This property is not required for estimation of variances and covariances Random Chi-Square Generator One way to generate a central Chi-square variate with n degrees of freedom is to generate a vector of length n of random normal deviates, then sum the squares of these deviates Thus, a very good random normal deviate generator would be necessary Also, the computing time to generate one Chi-square variate would depend on n, the number of random normal deviates to be generated In order to save time for large n, a Chi-square variate is generated using a random gamma distribution variate, which requires two uniform variates, r 1 and r 2, then

20 20 CHAPTER 1 DISTRIBUTIONS 1 call uniform(r1) call uniform(r2) V1 = r1 + r1-1 V2 = r2 + r2-1 f = V1*V1 + V2*V2 if ( fgt10)go to 1 if (V1eq00)then c=00 else c=v2/v1 endif anm = n/2-1 s = dsqrt(anm+anm+10) y = s*c + anm if(yle00)go to 1 temp = anm*dlog(y/anm)-s*c if(dabs(temp)gt85)then go to 1 else e = (10+c*c)*dexp(temp) call uniform(r3) if(r3gte)go to 1 endif c at this point y is the gamma variate return The Chi-square variate with n degrees of freedom is then derived as If(n is even) then call gamma(iseed,n,gam) chi = gam + gam else (n is odd) m = n-1 call gamma(iseed,m,gam) chi = gam + gam call normal(znorm) chi = chi + znorm*znorm endif Chi-square variates are needed in Gibbs sampling to obtain new sample values of a variance component

21 14 CONTINUOUS DISTRIBUTIONS The Wishart Distribution The Wishart Distribution is akin to a multivariate Chi-square Distribution An entire matrix is envisioned of which the diagonals have a Chi-square distribution, and the offdiagonals have a built-in correlation structure The resulting matrix is positive definite 1 To generate a matrix having a Wishart distribution, let U equal a matrix with q rows and m columns (and q > m), then W = U U is an m by m positive definite matrix 2 Perform a Cholesky decomposition of W, so that and T is a lower triangular matrix W = TT, 3 Fill an m by m matrix, Z, with random normal deviates, and fill a vector, V, of length m with the square roots of random Chi-square variates, such that the first element has q 1 degrees of freedom, the second has q 2 df, and so on, and the last element has q m df 4 Now form a matrix B such that the diagonals are B i,i = v i v i, for i = 1 and for i > 1 The offdiagonals of B are i 1 B i,i = v i v i + z i,j z i,j j=1 B i,j = B j,i = z i,j v i, for j = 2 to m and i = 1, and for i > 1 i 1 B i,j = B j,i = z i,j v i + z k,i z k,j, for j = 2 to m and i = 2, (j 1) 5 Finally, calculate S = TBT k=1 The matrix S is a random Wishart matrix based upon the relationships in W Note that the off-diagonals in B should be close to zero, on average

22 22 CHAPTER 1 DISTRIBUTIONS 147 The t-distribution The t-distribution is based on the ratio of two independent random variables The first random variable follows a univariate normal distribution, and the second random variable follows a central chi-square distribution Let y N(0, 1) and s χ 2 n with y and s being independent, then y (s/n) 5 t n The mean of a t-distribution is the mean of the y variable, and the variance is n/(n 2), and n is the degrees of freedom of the distribution As n becomes larger, the t-distribution becomes very similar to the normal distribution in shape A t-distribution with small degrees of freedom would allow more observations to occur in the tails of the distribution, and therefore would look like a squashed normal distribution 148 The F-distribution A common distribution used in the testing of hypotheses is the F-distribution A central F-distribution is based on the ratio of two independent central chi-square variables Let s χ 2 n and v χ 2 m with s and v being independent, then (s/n) (v/m) F n,m The mean of the F-distribution is m/(m 2) and the variance is 2m 2 (n + m 2) n(m 2) 2 (m 4) Tables of F-values have been constructed for various probability levels as criteria to test if the numerator chi-square variable has a noncentral chi-square distribution If the calculated F-value is greater than the value in the tables, then s is implied to have a noncentral chi-square distribution, otherwise s has a central chi-square distribution The square of a t-distribution variable gives a variable that has an F-distribution with 1 and n degrees of freedom Noncentral F-distributions exist depending on whether the numerator or denominator variables have noncentral chi-square distributions Tables for noncentral F-distributions generally do not exist because of the difficulty in predicting the noncentrality parameters However, using random chi-square generators it is possible to numerically calculate an expected noncentral F value for specific situations When both the numerator and denominator chi-square variables are from noncentral distributions, then their ratio follows a doubly noncentral F-distribution

23 15 QUADRATIC FORMS Quadratic Forms A quadratic form is a weighted sum of squares of elements of a random vector variable The general form is y Qy, where y is a random vector variable, and Q is a regulator matrix The regulator matrix can take on various forms and values depending on the situation Usually Q is a symmetric matrix, but not necessarily positive definite Examples of different Q matrices are as follows: 1 Q = I, then y Qy = y y which is a total sum of squares of the elements in y 2 Q = J(1/n), then y Qy = y Jy(1/n) where n is the length of y Note that J = 11, so that y Jy = (y 1)(1 y) and (1 y) is the sum of the elements in y 3 Q = (I J(1/n)) /(n 1), then y Qy gives the variance of the elements in y, σ 2 y The expected value of a quadratic form is However, so that then E(y Qy) = E(tr(y Qy)) = E(tr(Qyy )) = tr(qe(yy )) If we let V ar(y) = V and E(y) = µ, then V ar(y) = E(yy ) E(y)E(y ) E(yy ) = V ar(y) + E(y)E(y ), E(y Qy) = tr(q(v ar(y) + E(y)E(y ))) E(y Qy) = tr(q(v + µµ )) = tr(qv) + tr(qµµ ) = tr(qv) + µ Qµ The expectation of a quadratic form does not depend on the distribution of y However, the variance of a quadratic form requires that y follows a multivariate normal distribution Without showing the derivation, the variance of a quadratic form, assuming y has a multivariate normal distribution, is V ar(y Qy) = 2tr(QVQV) + 4µ QVQµ The quadratic form, y Qy, has a chi-square distribution if tr(qvqv) = tr(qv), and µ QVQµ = µ Qµ, or the single condition that QV is idempotent Then if m = tr(qv) and λ = 5µ Qµ,

24 24 CHAPTER 1 DISTRIBUTIONS the expected value of y Qy is m + 2λ and the variance is 2m + 8λ, which are the usual results for a noncentral chi-square variable The covariance between two quadratic forms, say y Qy and y Py, is Cov(y Qy, y Py) = 2tr(QVPV) + 4µ QVPµ The covariance is zero if QVP = 0, then the two quadratic forms are said to be independent

25 Chapter 2 Building Blocks 21 Basic Blocks In order to derive Maximum Likelihood and Residual Maximum Likelihood, a number of results about derivatives of determinants and other quantities needs to be reviewed These will be called Building Blocks or BB for short, because they are not theorems or conjectures BB-1 The (co)variance matrix of y is V = s Z i G i Z iσi 2 + Rσ0 2 i=1 = ZGZ + R Usually, each G i is assumed to be I for most random factors, but for animal models G i might be equal to A, the additive genetic relationship matrix Thus, G i does not always have to be diagonal, and will not be an identity in animal model analyses BB-2 The inverse of V is V 1 = R 1 R 1 Z(Z R 1 Z + G 1 ) 1 Z R 1 To prove, show that VV 1 = I Let T = Z R 1 Z + G 1, then VV 1 = (ZGZ + R)[R 1 R 1 ZT 1 Z R 1 ] = ZGZ R 1 ZGZ R 1 ZT 1 Z R 1 +I ZT 1 Z R 1 25

26 26 CHAPTER 2 BUILDING BLOCKS = I + [ZGT ZGZ R 1 Z Z](T 1 Z R 1 ) = I + [ZG(Z R 1 Z + G 1 ) ZGZ R 1 Z Z](T 1 Z R 1 ) = I + [ZGZ R 1 Z + Z ZGZ R 1 Z Z](T 1 Z R 1 ) = I + [0](T 1 Z R 1 ) = I BB-3 If k is a scalar constant and A is any square matrix of order m, then Ak = k m A BB-4 For general square matrices, say M and U, of the same order then MU = M U BB-5 For the general matrix below with A and D being square and non-singular (ie the inverse of each exists), then A B Q D = A D + QA 1 B = D A + BD 1 Q Then if A = I and D = I, then I = 1, so that I + QB = I + BQ = I + B Q = I + Q B BB-6 Using the results in (BB-4) and (BB-5), then V = R + ZGZ = R(I + R 1 ZGZ ) = R I + R 1 ZGZ = R I + Z R 1 ZG = R (G 1 + Z R 1 Z)G = R G 1 + Z R 1 Z G

27 21 BASIC BLOCKS 27 BB-7 The mixed model coefficient matrix of Henderson can be denoted by ( ) X C = R 1 X X R 1 Z Z R 1 X Z R 1 Z + G 1 then the determinant of C can be derived as C = X R 1 X G 1 + Z (R 1 R 1 X(X R 1 X) X R 1 )Z = Z R 1 Z + G 1 X (R 1 R 1 Z(Z R 1 Z + G 1 ) 1 Z R 1 )X Now let S = R 1 R 1 X(X R 1 X) X R 1 then C = X R 1 X G 1 + Z SZ = Z R 1 Z + G 1 X V 1 X BB-8 A projection matrix, P, is defined as Properties of P: Therefore, P = V 1 V 1 X(X V 1 X) X V 1 PX = 0, Py = V 1 (y Xˆb), where ˆb = (X V 1 X) X V 1 y y PZ i G i Z ipy = (y Xˆb) V 1 Z i G i Z iv 1 (y Xˆb) BB-9 Derivative of V 1 is V 1 σ 2 i = V 1 V σi 2 V 1 BB-10 Derivative of ln V is ln V σ 2 i = tr ( V ) 1 V σi 2

28 28 CHAPTER 2 BUILDING BLOCKS BB-11 Derivative of P is P σ 2 i = P V σi 2 P BB-12 Derivative of V is V σ 2 i = Z i G i Z i BB-13 Derivative of ln X V 1 X is ln X V 1 X σ 2 i = tr(x V 1 X) X V 1 V σi 2 V 1 X 22 Basic Model The following simple model will be assumed The general linear model is described as y = Xb + Zu + e, where E(y) = Xb, E(u) = 0, and E(e) = 0 Often u is partitioned into s factors as u = (u 1 u 2 u s ) The (co)variance matrices are defined as G = V ar(u) = V ar u 1 u 2 u s G 1 σ G 2 σ G s σs 2 and Then R = V ar(e) = Iσ 2 0 V ar(y) = V = ZGZ + R,

29 23 MIXED MODEL EQUATIONS 29 and if Z is partitioned corresponding to u, as Z = [ Z 1 Z 2 Z s ], then s ZGZ = Z i G i Z iσi 2 i=1 Let V i = Z i G i Z i and V 0 = I, then V = s V i σi 2 i=o Covariances between random factors in the model will be allowed later, such as in a maternal effects model where there are covariances between direct and maternal effects, or random regression models with covariances between the random regression coefficients, or multiple trait models 23 Mixed Model Equations Henderson s mixed model equations (MME) are written as X R 1 X X R 1 Z 1 X R 1 Z 2 X R 1 Z s Z 1R 1 X Z 1R 1 Z 1 + G 1 1 σ 2 1 Z 1R 1 Z 2 Z 1R 1 Z s Z 2R 1 X Z 2R 1 Z 1 Z 2R 1 Z 2 + G 1 2 σ 2 2 Z 2R 1 Z s Z sr 1 X Z sr 1 Z 1 Z sr 1 Z 2 Z sr 1 Z s + G 1 s = X R 1 y Z 1R 1 y Z 2R 1 y Z sr 1 y Quadratic forms from solutions to these equations are σs 2 ˆb û 1 û 2 û s û ig i û i, where and for the residual variance, where û = G i Z ipy, ê ê, ê = y Xˆb Zû

30 30 CHAPTER 2 BUILDING BLOCKS 24 Unbiased Estimation of Variances Unbiased estimation is no longer practiced in animal breeding, but the history of the development from unbiased to likelihood based methods is important to understand The best way to describe unbiased methods of estimation is to give a small example with only three observations Assume that all G i are equal to I for this example, so that Z i G i Z i simplifies to Z i Z i Let X = Z 2 = 1 1 1, Z 1 = , and y =, , Then and and V 0 = I V 1 = Z 1 Z 1 = V 2 = Z 2 Z 2 = , In this example, there are 3 unknown variances to be estimated, and consequently, at least three quadratic forms are needed in order to estimate the variances The Q- matrices are the weights of the observations in the quadratic forms These matrices differ depending on the method of estimation that is chosen Below are three arbitrary Q-matrices that were chosen such that Q k X = 0 They do not necessarily correspond to any known method of estimation, but are for illustration of the calculations Let Q 1 = 1 2 1, Q 2 = and Q 3 = The numeric values of the quadratic forms are y Q 1 y = 657,,

31 25 VARIANCES OF QUADRATIC FORMS 31 y Q 2 y = 306, and y Q 3 y = 882 For example, ( y Q 1 y = ) = 657 The expectations of the quadratic forms are E(y Q 1 y) = trq 1 V 0 σ trq 1 V 1 σ trq 1 V 2 σ 2 2 = 4σ σ σ2 2 E(y Q 2 y) = 4σ σ σ2, 2 E(y Q 3 y) = 6σ σ σ 2 2 Now equate the values of the quadratic forms to their corresponding expectations, which gives a system of equations to be solved, such as Fσ = w In this case, the equations would be which gives the solution as ˆσ = F 1 w, or ˆσ 2 0 ˆσ 1 2 = ˆσ 2 2 σ 2 0 σ 2 1 σ 2 2 = Note that one of the estimates is negative, which is not appropriate for a variance component The estimate is said to be out of the parameter space, 25 Variances of Quadratic Forms The variance of a quadratic form is given by V ar(y Qy) = 2trQVQV + 4b X QVQXb Only translation invariant quadratic forms are typically considered in variance component estimation, that means b X QVQXb = 0 Thus, only 2trQVQV needs to be calculated Remember that V can be written as the sum of s + 1 matrices, V i σ 2 i, then trqvqv = s s trq V i σi 2 Q V j σj 2 i=o j=o = s i=o s j=o tr QV i QV j σi 2 σj 2

32 32 CHAPTER 2 BUILDING BLOCKS For example, if s = 2, then trqvqv = trqv 0 QV 0 σ trQV 0 QV 1 σ 2 0σ trQV 0 QV 2 σ 2 0σ trqv 1 QV 1 σ trQV 1 QV 2 σ 2 1σ trqv 2 QV 2 σ 4 2 The exact sampling variances require the true, unknown components of variance The magnitude of the sampling variances depends on 1 The true magnitude of the individual components, 2 The matrix Q which depends on the method of estimation and the model, and 3 The structure and amount of the data through X and Z Normally, the variance-covariance matrix of the estimates, commonly known as the sampling variances of the estimates, were never actually computed during the days of unbiased methods due to their computational complexity However, with today s computers their calculation can still be very challenging and usually impossible For small examples, the calculations can be easily demonstrated In this case, V ar(f 1 w) = F 1 V ar(w)f 1, a function of the variance-covariance matrix of the quadratic forms Using the small example of the previous section, the V ar(w) is a 3x3 matrix The (1,1) element is the variance of y Q 1 y which is V ar(y Q 1 y) = 2trQ 1 VQ 1 V = 2trQ 1 V 0 Q 1 V 0 σ trQ 1 V 0 Q 1 V 1 σ 2 0σ trQ 1 V 0 Q 1 V 2 σ 2 0σ trQ 1 V 1 Q 1 V 1 σ trQ 1 V 1 Q 1 V 2 σ 2 1σ trQ 1 V 2 Q 1 V 2 σ 4 2 = 20σ σ 2 0σ σ 2 0σ σ σ 2 1σ σ 4 2 The (1,2) element is the covariance between the first and second quadratic forms, Cov(y Q 1 y, y Q 2 y) = 2trQ 1 VQ 2 V, and similarly for the other terms All of the results are summarized in the table below Forms σ0 4 σ0 2σ2 1 σ0 2σ2 2 σ1 4 σ1 2σ2 2 σ2 4 V ar(w 1 ) Cov(w 1, w 2 ) Cov(w 1, w 3 ) V ar(w 2 ) Cov(w 2, w 3 ) V ar(w 3 )

33 26 VARIANCE OF A RATIO OF VARIANCE ESTIMATES 33 To get numeric values for these variances, the true components need to be known Assume that the true values are σ 2 0 = 250, σ2 1 = 10, and σ2 2 = 80, then the variance of w 1 is V ar(w 1 ) = 20(250) (250)(10) + 16(250)(80) +8(10) 2 + 0(10)(80) + 8(80) 2 = 1, 662, 000 The complete variance- covariance matrix of the quadratic forms is V ar w 1 w 2 w 3 = 1, 662, 000 1, 147, 800 2, 144, 000 1, 147, 800 1, 757, 200 2, 218, 400 2, 144, 000 2, 218, 400 3, 550, 800 The variance-covariance matrix of the estimated variances (assuming the above true values) would be V ar(ˆσ) = F 1 V ar(w)f 1 = 405, , , , , , , , , 500 = C 26 Variance of A Ratio of Variance Estimates Often estimates of ratios of functions of the variances are needed for animal breeding work, such as heritabilities, repeatabilities, and variance ratios Let such a ratio be denoted as a/c where a = ˆσ 2 2 = (0 0 1)ˆσ = 72 and c = ˆσ ˆσ ˆσ 2 2 = (1 1 1)ˆσ = 288 (NOTE: the negative estimate for ˆσ 2 1 was set to zero before calculating c From Osborne and Patterson (1952) and Rao (1968) an approximation to the variance of a ratio is given by Now note that V ar(a/c) = (c 2 V ar(a) + a 2 V ar(c) 2ac Cov(a, c))/c 4 V ar(a) = (0 0 1)C(0 0 1) = 293, 500, V ar(c) = (1 1 1)C(1 1 1) = 231, 200, Cov(a, c, ) = (0 0 1)C(1 1 1) = 194, 750

34 34 CHAPTER 2 BUILDING BLOCKS Then V ar(a/c) = [(288) 2 (293, 500) + (72) 2 (231, 200) 2(72)(288)(194, 750)]/(288) 4 = This result is very large, but could be expected from only 3 observations Thus, (a/c) = 25 with a standard deviation of Another approximation method assumes that the denominator has been estimated accurately, so that it is considered to be a constant, such as the estimate of σ 2 e Then, For the example problem, this gives V ar(a/c) = V ar(a)/c 2 V ar(a/c) = 293, 500/(288) 2 = , which is slightly larger than the previous approximation The second approximation would not be suitable for a ratio of the residual variance to the variance of one of the other components Suppose a = ˆσ 0 2 = 216, and c = ˆσ2 2 = 72, then (a/c) = 30, and with the first method, and V ar(a/c) = [(72) 2 (405, 700) + (216) 2 (293, 500) 2(72)(216)( 240, 700)]/(72) 4 = , V ar(a/c) = 405, 700/(72) 2 = 7826, with the second method The first method is probably more realistic in this situation, but both are very large

35 Chapter 3 Likelihood Methods 31 The Likelihood Function The normal distribution likelihood function is commonly assumed in animal breeding, and is used in REML and related methods of estimation Use will be made of the building blocks of the previous chapter The multivariate normal distribution likelihood function is L(y) = (2π) 5N V 5 exp( 5(y Xb) V 1 (y Xb)) The log of the likelihood, say L 1 is L 1 = 05[N ln(2π) + ln V +(y Xb) V 1 (y Xb)] The term N ln(2π) is a constant that does not involve any of the unknown variances or effects in the model, and therefore, it is commonly omitted during maximization computations Maximizing the log likelihood maximizes the original likelihood function In the previous chapter, V = R Z R 1 Z + G 1 G, and therefore, ln V = ln R + ln G + ln Z R 1 Z + G 1 If R = Iσ 2 0, then ln R = ln Iσ0 2 = ln(σ0) 2 N I = N ln σ0(1) 2 35

36 36 CHAPTER 3 LIKELIHOOD METHODS Similarly, if G = + Iσi 2, where i = 1 to s, then ln G = = s ln Iσi 2 i=1 s q i ln σi 2 i=1 Except, that in animal models one of the G i is equal to Aσi 2 In that case, ln Aσ 2 i = ln(σ 2 i ) q i A which is ln Aσ 2 i = q i ln σ 2 i A = q i ln σ 2 i + ln A and Recall that so that ( ) X C = R 1 X X R 1 Z Z R 1 X Z R 1 Z + G 1, C = Z R 1 Z + G 1 X V 1 X ln C = ln Z R 1 Z + G 1 + ln X V 1 X 32 Maximum Likelihood Hartley and Rao (1967) described the maximum likelihood approach for the estimation of variance components Let L 2 be equivalent to L 1 except for the constant involving π L 2 = 05[ln V +(y Xb) V 1 (y Xb)] The derivatives of L 2 with respect to b and to σ 2 i for i = 0, 1, s are and L 2 b = X V 1 Xb X V 1 y L 2 σ 2 i = 5 tr[v 1 ( V/ σ 2 i )] + 5(y Xb) V 1 ( V/ σ 2 i )V 1 (y Xb) = 5 tr[v 1 V i ] + 5(y Xb) V 1 V i V 1 (y Xb) Equating the derivatives to zero gives ˆb = (X V 1 X) X V 1 y,

37 32 MAXIMUM LIKELIHOOD 37 and Recall that tr[v 1 V i ] = (y Xˆb) V 1 V i V 1 (y Xˆb) Py = V 1 (y Xˆb), where P is the projection matrix, and that V i = Z i Z i, then tr[v 1 Z i Z i] = y PV i Py In usual mixed model theory, the solution vector for a random factor may be written as so that Also, Let û i = G i Z ipy, y PV i Py = y PZ i G i G 2 i G i Z ipy = û ig 2 i û i = û iûi/σi 4 tr[v 1 V i ] = tr[(r 1 R 1 Z(Z R 1 Z + G 1 ) 1 Z R 1 )Z i Z i] T = (Z R 1 Z + G 1 ) 1, R = Iσ 2 0, and G = + Iσ 2 i, then tr[v 1 V i ] = tr(z iz i )σ 2 0 tr(z iztz Z i )σ 4 0 If T can be partitioned into submatrices for each random factor, then and Tσ 2 0 (Z Z + + Iαi ) = I, which yields Finally, + TZ Zσ0 2 = I T( Iσ 2 i ), TZ Z i σ 2 0 = I T ii σ 2 i, tr(z iztz Z i )σ0 4 = tr(z iz i )σ0 2 tr(i T ii σ 2 i )σi 2 tr[v 1 V i ] = tr(i T ii σi 2 )σi 2 = triσ 2 i = q i σ 2 i trt ii σ 4 i trt ii σ 4 i

38 38 CHAPTER 3 LIKELIHOOD METHODS Combining results gives ˆσ i 2 = (û iûi + trt iiˆσ 0)/q 2 i for i = 1, 2,, s, and for i = 0 gives ˆσ 0 2 = (y y ˆb X y û Z y)/ N 321 The EM Algorithm EM stands for Expectation Maximization From Searle, Casella, and McCulloch (1992) the following explanation is given The procedure alternates between calculating conditional expected values and maximizing simplified likelihoods The actual data y are called the incomplete data in the EM algorithm, and the complete data are considered to be y and the unobservable random effects, u i If the realized values of the unobservable random effects were known, then their variance would be the average of their squared values, ie, ˆσ 2 i = u iu i /q i However, in real life the realized values of the random effects are unknown The steps of the EM algorithm are as follows: Step 0 Decide on starting values for the variances and set m = 0 Step 1(E-step) Calculate the conditional expectation of the sufficient statistics, conditional on the incomplete data E(u iu i y) = σ 4(m) i y P (m) Z i Z ip (m) y +tr(σ 2(m) i I σ 4(m) i Z i(v (m) ) 1 Z i ) = ˆt (m) i Step 2(M-step) Maximize the likelihood of the complete data, σ 2(m+1) i = ˆt (m) i /q i, i = 0, 1, 2,, s Step 3 If convergence is reached, set ˆσ = σ (m+1), otherwise increase m by one and return to Step 1 This is equivalent to constructing and solving the mixed model equations with a given set of variances, σ (m), and then σ 2(m+1) 0 = (y y ˆb X y û Z y)/n, and σ 2(m+1) i = (û iûi + σ 2(m+1) 0 trt ii )/q i

39 33 RESTRICTED MAXIMUM LIKELIHOOD Restricted Maximum Likelihood Restricted (or Residual) maximum likelihood (REML), was first suggested by Thompson (1962), and was described formally by Patterson and Thompson (1971) The procedure requires that y have a multivariate normal distribution The method is translation invariant The maximum likelihood approach automatically keeps the estimator within the allowable parameter space(ie zero to plus infinity), and therefore, REML is a biased procedure REML was proposed as an improvement to ML in order to account for the degrees of freedom lost in estimating fixed effects The likelihood function used in REML is that for a set of error contrasts (ie residuals) that are assumed to have a multivariate normal distribution The multivariate normal distribution likelihood function for the residual contrasts, K y, where K X = 0, and K has rank equal to N r(x), is L(K y) = (2π) 5(N r(x)) K VK 5 exp( 5(K y) (K VK) 1 (K y)) The natural log of the likelihood function is L 3 = 5(N r(x)) ln(2π) 5 ln K VK 5y K(K VK) 1 K y Notice that 5(N r(x)) ln(2π) is a constant that does not depend on the unknown variance components or factors in the model, and therefore, can be ignored to give L 4 Searle (1979) showed that and ln K VK = ln V + ln X V 1 X y K(K VK) 1 K y = y Py = (y Xˆb) V 1 (y Xˆb) for any K such that K X = 0 Hence, L 4 can be written as L 4 = 5 ln V 5 ln X V 1 X 5(y Xˆb) V 1 (y Xˆb) REML can be calculated a number of different ways 1 Derivative Free approach is a search technique to find the parameters that maximize the log likelihood function Two techniques will be described here 2 First Derivatives and EM is where the first derivatives of the log likelihood are determined and set to zero in order to maximize the likelihood function Solutions need to be obtained by iteration because the resulting equations are non linear 3 Second Derivatives are generally more computationally demanding Gradient methods are used to find the parameters that make the first derivatives equal to zero Newton-Raphson (involves the observed information matrix) and Fishers Method of Scoring (involves the expected information matrix) have been used Lately, the average information algorithm (averages the observed and expected information matrices) has been used to reduce the computational time

40 40 CHAPTER 3 LIKELIHOOD METHODS All of the approaches attempt to maximize the log likelihood function of the error contrasts To illustrate the methods, consider a single trait model with three factors (F, A, B), of which A and B are random factors There were a total of 90 observations, and the total sum of squares was 356,000 The least squares equations for this small example are shown below F 1 F 2 A 1 A 2 A 3 B 1 B 2 B 3 B 4 = Derivative Free REML Derivative Free REML was proposed by Smith and Graser(1986) and Graser, Smith and Tier(1987) and has been expanded upon by Meyer (1987,91) who has developed a set of programs for computing estimates of variance components for a whole range of univariate and multivariate models The description given below is a very simplified version of the method for basic understanding of the technique Imagine an s dimensional array containing the values of the likelihood function for every possible set of values of the ratios of the components to the residual variance The technique is to search this array and find the set of ratios for which the likelihood function is maximized There is more than one way to conduct this search Care must be taken to find the global maximum rather than one of possibly many local maxima At the same time the number of likelihood evaluations to be computed must also be minimized Various alternative forms of L 4 can be derived Note that and that and that combining these results gives Now note that ln V = ln R + ln G + ln G 1 + Z R 1 Z ln X V 1 X = ln C ln Z R 1 Z + G 1 L 4 = 5 ln R 5 ln G 5 ln C 5y Py ln R = ln Iσ 2 0

41 33 RESTRICTED MAXIMUM LIKELIHOOD 41 Then = N ln σ0, 2 ln G = s q i ln σi 2, i=1 and ln C = ln X R 1 X + ln Z SZ + G 1 where ln X R 1 X = ln X Xσ0 2 = ln(σ0 2 X X = ln X X r(x) ln σ0, 2 and Z SZ + G 1 = σ0 2 MZ + G 1 = σ 2 0 (Z MZ + G 1 σ 2 0) ln C = ln X X r(x) ln σ 2 0 q ln σ ln Z MZ + G 1 σ 2 0, and finally, the log-likelihood function becomes L 4 = s 5(N r(x) q) ln σ0 2 5 q i ln σi 2 i=1 5 ln C 5y Py, where Note that C = ( X X X Z Z X Z Z + G 1 σ 2 0 ) q i ln σi 2 = q i ln σ0/α 2 i = q i (ln σ0 2 ln α i ) so that L 4 = s 5[(N r(x)) ln σ0 2 q i ln α i + ln C +y Py] i=1 The quantity y Py is y (y Xˆb Zû)/σ0 2 The computations are achieved by constructing the following matrix, X X X Z X y Z X Z Z + G 1 σ0 2 Z y y X y Z y y = ( ) C W y y W y, y then by Gaussian elimination of one row at a time, the sum of the log of the non-zero pivots (using the same ordering for each evaluation of the likelihood) gives log C and y (y Xˆb Zû) Gaussian elimination, using sparse matrix techniques, requires less computing time than inverting the coefficient matrix of the mixed model equations The ordering of factors within the equations could be critical to the computational process and some experimentation may be necessary to determine the best ordering The likelihood

42 42 CHAPTER 3 LIKELIHOOD METHODS function can be evaluated without the calculation of solutions to the mixed model equations, without inverting the coefficient matrix of the mixed model equations, and without computing any of the σi 2 The formulations for more general models and multiple trait models are more complex, but follow the same ideas Searching the array of likelihood values for various values of α i can be done in several different ways One method is to fix the values of all but one of the s α i, and then evaluate L 2 for four or more different values of the α i that were not fixed Then one can use a quadratic regression analysis to determine the value of that one ratio which maximizes L 2 given that the other ratios are fixed This is repeated for each of the s ratios, and the process is repeated until a maximum likelihood is obtained The calculations are demonstrated in the example that follows Begin by fixing the value of α B = 10 and letting the value of α A take on the values of (5, 10, 20, 30, 40) Using L 4 to evaluate the likelihood, then the results were as follows: α A L For example, the likelihood value for α A = 40, would be L 4 = 1 2 [(N r(x)) ln σ2 0 q A ln α A q B ln α B + ln C +y (y Xˆb Zû)/σ 2 0] where ln C = , y Py = /σ0 2 = 88, q A ln α A = , q B ln α B = , σ0 2 = , ln σ0 2 = , (N r(x)) = 88, then L 4 = 05[88(45685) ( /963997)] =

43 33 RESTRICTED MAXIMUM LIKELIHOOD 43 then To find the value of α A that maximizes L 4 for α B = 10, let Q = ˆβ = (Q Q) 1 Q Y = and Y = From this a prediction equation for L 4 can be written as L 4 = α A α 2 A This equation can be differentiated with respect to α A and then equated to zero to find the value of the ratio that maximizes the prediction equation This gives α A = 04489/(2(000698)) = Now keep α A = and try a number of values of α B from 2 to 10, which give the following results α B L Applying the quadratic regression to these points gives α B = The next step would be to fix α B = and to try new values for α A, such as 25 to 40 by units of 1 The range of values becomes finer and finer To insure that one has found the global maximum, the entire process could be started with vastly different starting values for the ratios, such as α B = 50 and let values for α A be 40, 50, 60, and 70 The more components there are to estimate, the more evaluations of the likelihood

44 44 CHAPTER 3 LIKELIHOOD METHODS that are going to be needed, and the more probable that convergence might be to a local maximum rather than to the global maximum Please refer to the literature for specification of the log likelihood function for particular models and situations Also, refer to work by Boldman and Van Vleck (1993) which found a simplification of Meyer s algorithms which reduced computational time by several orders of magnitude Even so, DFREML has been applied to fairly small data sets and can take considerable time to find estimates for these The available software may not be able to handle particular models, and so the user should be aware of these possible problems The Simplex Method The Simplex Method (Nelder and Mead, 1965) is a procedure for finding the minimum of a function (ie the minimum of 2L 4 or the maximum of L 4 ) with respect to the unknown variances and covariances The best way to describe the method is using the example data from the previous sections Begin by constructing a set of points for which L 4 is to be evaluated A point is a vector of values for the unknowns ( α A, α B ), for example, ( ) θ 1 = , then form two more points by changing one unknown at a time Let the three points be as shown in the following table No α A α B Now calculate L 4 for each point and arrange from largest to lowest value No α A α B L The idea now is to find another point to replace the last one(lowest L 4 ) This is done by a process called reflection Compute the mean of all points excluding the one with the lowest L 4 ( ) θ m = , then the reflection step is θ 4 = θ m + r (θ m θ last ),

45 33 RESTRICTED MAXIMUM LIKELIHOOD 45 where r is recommended by Nelder and Mead (1965) to be 1, giving ( ) θ 4 = The corresponding L 4 for this point was Compared to those in the table it has the largest value, and therefore, is a better point than the other three No α A α B L Given this success, the Simplex method calls for an expansion step, ie bigger change Thus, θ 5 = θ m + E (θ 4 θ m ), where E is suggested to be equal to 2 Hence ( θ 5 = ) to make a Then L 4 = , and the expanded point is better yet Now drop θ 1 from the table and put θ 5 at the top No α A α B L This completes one iteration Begin the next iteration by computing the mean of all points excluding the point with the lowest L 4 ( ) θ m = Another reflection step gives θ 6 = θ m + r (θ m θ last ), ( ) = However, this gives L 4 = , which is between θ 2 and θ 4, and can push out θ 2 from the table No α A α B L

46 46 CHAPTER 3 LIKELIHOOD METHODS Instead of an expansion step, a contraction step is needed because θ 6 did not give a greater L 4 than the first two Thus, θ 7 = θ m + c (θ 6 θ m ), where c = 05 is recommended Hence, ( θ 7 = ) Then L 4 = is better than that given by θ 4, but not by θ 5, thus the new table becomes as follows: No α A α B L The following steps were taken in the next iteration 1 The mean of the top two L 4 is ( θ m = ) 2 A reflection step gives θ 8 = θ m + r (θ m θ last ), ( ) = , which gave L 4 = , which is better than θ 7 3 Add θ 8 to the table and drop θ 4 No α A α B L Because L 4 for θ 8 was not larger than L 4 for θ 5 or smaller than L 4 for θ 7, then no expansion or contraction step is necessary Begin the next iteration The Simplex method continues in this manner until all point entries in the table are equal The constants recommended by Nelder and Mead (1965) for reflection, expansion, and contraction could be adjusted for a particular data set This method may converge to a local maximum, and so different starting values are needed to see if it converges to the same point The Simplex method does not work well with a large number of parameters to be estimated

Estimation of Variances and Covariances

Estimation of Variances and Covariances Estimation of Variances and Covariances Variables and Distributions Random variables are samples from a population with a given set of population parameters Random variables can be discrete, having a limited

More information

Likelihood Methods. 1 Likelihood Functions. The multivariate normal distribution likelihood function is

Likelihood Methods. 1 Likelihood Functions. The multivariate normal distribution likelihood function is Likelihood Methods 1 Likelihood Functions The multivariate normal distribution likelihood function is The log of the likelihood, say L 1 is Ly = π.5n V.5 exp.5y Xb V 1 y Xb. L 1 = 0.5[N lnπ + ln V +y Xb

More information

Lecture Notes. Introduction

Lecture Notes. Introduction 5/3/016 Lecture Notes R. Rekaya June 1-10, 016 Introduction Variance components play major role in animal breeding and genetic (estimation of BVs) It has been an active area of research since early 1950

More information

Chapter 5 Prediction of Random Variables

Chapter 5 Prediction of Random Variables Chapter 5 Prediction of Random Variables C R Henderson 1984 - Guelph We have discussed estimation of β, regarded as fixed Now we shall consider a rather different problem, prediction of random variables,

More information

Chapter 12 REML and ML Estimation

Chapter 12 REML and ML Estimation Chapter 12 REML and ML Estimation C. R. Henderson 1984 - Guelph 1 Iterative MIVQUE The restricted maximum likelihood estimator (REML) of Patterson and Thompson (1971) can be obtained by iterating on MIVQUE,

More information

Chapter 11 MIVQUE of Variances and Covariances

Chapter 11 MIVQUE of Variances and Covariances Chapter 11 MIVQUE of Variances and Covariances C R Henderson 1984 - Guelph The methods described in Chapter 10 for estimation of variances are quadratic, translation invariant, and unbiased For the balanced

More information

Animal Models. Sheep are scanned at maturity by ultrasound(us) to determine the amount of fat surrounding the muscle. A model (equation) might be

Animal Models. Sheep are scanned at maturity by ultrasound(us) to determine the amount of fat surrounding the muscle. A model (equation) might be Animal Models 1 Introduction An animal model is one in which there are one or more observations per animal, and all factors affecting those observations are described including an animal additive genetic

More information

MIXED MODELS THE GENERAL MIXED MODEL

MIXED MODELS THE GENERAL MIXED MODEL MIXED MODELS This chapter introduces best linear unbiased prediction (BLUP), a general method for predicting random effects, while Chapter 27 is concerned with the estimation of variances by restricted

More information

1 Data Arrays and Decompositions

1 Data Arrays and Decompositions 1 Data Arrays and Decompositions 1.1 Variance Matrices and Eigenstructure Consider a p p positive definite and symmetric matrix V - a model parameter or a sample variance matrix. The eigenstructure is

More information

Maternal Genetic Models

Maternal Genetic Models Maternal Genetic Models In mammalian species of livestock such as beef cattle sheep or swine the female provides an environment for its offspring to survive and grow in terms of protection and nourishment

More information

Chapter 3 Best Linear Unbiased Estimation

Chapter 3 Best Linear Unbiased Estimation Chapter 3 Best Linear Unbiased Estimation C R Henderson 1984 - Guelph In Chapter 2 we discussed linear unbiased estimation of k β, having determined that it is estimable Let the estimate be a y, and if

More information

MA/ST 810 Mathematical-Statistical Modeling and Analysis of Complex Systems

MA/ST 810 Mathematical-Statistical Modeling and Analysis of Complex Systems MA/ST 810 Mathematical-Statistical Modeling and Analysis of Complex Systems Review of Basic Probability The fundamentals, random variables, probability distributions Probability mass/density functions

More information

Peter Hoff Linear and multilinear models April 3, GLS for multivariate regression 5. 3 Covariance estimation for the GLM 8

Peter Hoff Linear and multilinear models April 3, GLS for multivariate regression 5. 3 Covariance estimation for the GLM 8 Contents 1 Linear model 1 2 GLS for multivariate regression 5 3 Covariance estimation for the GLM 8 4 Testing the GLH 11 A reference for some of this material can be found somewhere. 1 Linear model Recall

More information

Summary INTRODUCTION. Running head : AI-REML FOR EQUAL DESIGN MATRICES. K. Meyer

Summary INTRODUCTION. Running head : AI-REML FOR EQUAL DESIGN MATRICES. K. Meyer Running head : AI-REML FOR EQUAL DESIGN MATRICES An average information Restricted Maximum Likelihood algorithm for estimating reduced rank genetic covariance matrices or covariance functions for animal

More information

Part IB Statistics. Theorems with proof. Based on lectures by D. Spiegelhalter Notes taken by Dexter Chua. Lent 2015

Part IB Statistics. Theorems with proof. Based on lectures by D. Spiegelhalter Notes taken by Dexter Chua. Lent 2015 Part IB Statistics Theorems with proof Based on lectures by D. Spiegelhalter Notes taken by Dexter Chua Lent 2015 These notes are not endorsed by the lecturers, and I have modified them (often significantly)

More information

Mixed-Models. version 30 October 2011

Mixed-Models. version 30 October 2011 Mixed-Models version 30 October 2011 Mixed models Mixed models estimate a vector! of fixed effects and one (or more) vectors u of random effects Both fixed and random effects models always include a vector

More information

Repeated Records Animal Model

Repeated Records Animal Model Repeated Records Animal Model 1 Introduction Animals are observed more than once for some traits, such as Fleece weight of sheep in different years. Calf records of a beef cow over time. Test day records

More information

MLES & Multivariate Normal Theory

MLES & Multivariate Normal Theory Merlise Clyde September 6, 2016 Outline Expectations of Quadratic Forms Distribution Linear Transformations Distribution of estimates under normality Properties of MLE s Recap Ŷ = ˆµ is an unbiased estimate

More information

Lecture 2: Linear Models. Bruce Walsh lecture notes Seattle SISG -Mixed Model Course version 23 June 2011

Lecture 2: Linear Models. Bruce Walsh lecture notes Seattle SISG -Mixed Model Course version 23 June 2011 Lecture 2: Linear Models Bruce Walsh lecture notes Seattle SISG -Mixed Model Course version 23 June 2011 1 Quick Review of the Major Points The general linear model can be written as y = X! + e y = vector

More information

Alternative implementations of Monte Carlo EM algorithms for likelihood inferences

Alternative implementations of Monte Carlo EM algorithms for likelihood inferences Genet. Sel. Evol. 33 001) 443 45 443 INRA, EDP Sciences, 001 Alternative implementations of Monte Carlo EM algorithms for likelihood inferences Louis Alberto GARCÍA-CORTÉS a, Daniel SORENSEN b, Note a

More information

5.1 Consistency of least squares estimates. We begin with a few consistency results that stand on their own and do not depend on normality.

5.1 Consistency of least squares estimates. We begin with a few consistency results that stand on their own and do not depend on normality. 88 Chapter 5 Distribution Theory In this chapter, we summarize the distributions related to the normal distribution that occur in linear models. Before turning to this general problem that assumes normal

More information

Animal Model. 2. The association of alleles from the two parents is assumed to be at random.

Animal Model. 2. The association of alleles from the two parents is assumed to be at random. Animal Model 1 Introduction In animal genetics, measurements are taken on individual animals, and thus, the model of analysis should include the animal additive genetic effect. The remaining items in the

More information

REML Variance-Component Estimation

REML Variance-Component Estimation REML Variance-Component Estimation In the numerous forms of analysis of variance (ANOVA) discussed in previous chapters, variance components were estimated by equating observed mean squares to expressions

More information

[y i α βx i ] 2 (2) Q = i=1

[y i α βx i ] 2 (2) Q = i=1 Least squares fits This section has no probability in it. There are no random variables. We are given n points (x i, y i ) and want to find the equation of the line that best fits them. We take the equation

More information

Lecture 3: Linear Models. Bruce Walsh lecture notes Uppsala EQG course version 28 Jan 2012

Lecture 3: Linear Models. Bruce Walsh lecture notes Uppsala EQG course version 28 Jan 2012 Lecture 3: Linear Models Bruce Walsh lecture notes Uppsala EQG course version 28 Jan 2012 1 Quick Review of the Major Points The general linear model can be written as y = X! + e y = vector of observed

More information

MAS223 Statistical Inference and Modelling Exercises

MAS223 Statistical Inference and Modelling Exercises MAS223 Statistical Inference and Modelling Exercises The exercises are grouped into sections, corresponding to chapters of the lecture notes Within each section exercises are divided into warm-up questions,

More information

Chapter 1: Systems of linear equations and matrices. Section 1.1: Introduction to systems of linear equations

Chapter 1: Systems of linear equations and matrices. Section 1.1: Introduction to systems of linear equations Chapter 1: Systems of linear equations and matrices Section 1.1: Introduction to systems of linear equations Definition: A linear equation in n variables can be expressed in the form a 1 x 1 + a 2 x 2

More information

Random Vectors and Multivariate Normal Distributions

Random Vectors and Multivariate Normal Distributions Chapter 3 Random Vectors and Multivariate Normal Distributions 3.1 Random vectors Definition 3.1.1. Random vector. Random vectors are vectors of random 75 variables. For instance, X = X 1 X 2., where each

More information

Preliminaries. Copyright c 2018 Dan Nettleton (Iowa State University) Statistics / 38

Preliminaries. Copyright c 2018 Dan Nettleton (Iowa State University) Statistics / 38 Preliminaries Copyright c 2018 Dan Nettleton (Iowa State University) Statistics 510 1 / 38 Notation for Scalars, Vectors, and Matrices Lowercase letters = scalars: x, c, σ. Boldface, lowercase letters

More information

Lecture 9 Multi-Trait Models, Binary and Count Traits

Lecture 9 Multi-Trait Models, Binary and Count Traits Lecture 9 Multi-Trait Models, Binary and Count Traits Guilherme J. M. Rosa University of Wisconsin-Madison Mixed Models in Quantitative Genetics SISG, Seattle 18 0 September 018 OUTLINE Multiple-trait

More information

Lecture 32: Infinite-dimensional/Functionvalued. Functions and Random Regressions. Bruce Walsh lecture notes Synbreed course version 11 July 2013

Lecture 32: Infinite-dimensional/Functionvalued. Functions and Random Regressions. Bruce Walsh lecture notes Synbreed course version 11 July 2013 Lecture 32: Infinite-dimensional/Functionvalued Traits: Covariance Functions and Random Regressions Bruce Walsh lecture notes Synbreed course version 11 July 2013 1 Longitudinal traits Many classic quantitative

More information

A Few Special Distributions and Their Properties

A Few Special Distributions and Their Properties A Few Special Distributions and Their Properties Econ 690 Purdue University Justin L. Tobias (Purdue) Distributional Catalog 1 / 20 Special Distributions and Their Associated Properties 1 Uniform Distribution

More information

STA 2201/442 Assignment 2

STA 2201/442 Assignment 2 STA 2201/442 Assignment 2 1. This is about how to simulate from a continuous univariate distribution. Let the random variable X have a continuous distribution with density f X (x) and cumulative distribution

More information

So far our focus has been on estimation of the parameter vector β in the. y = Xβ + u

So far our focus has been on estimation of the parameter vector β in the. y = Xβ + u Interval estimation and hypothesis tests So far our focus has been on estimation of the parameter vector β in the linear model y i = β 1 x 1i + β 2 x 2i +... + β K x Ki + u i = x iβ + u i for i = 1, 2,...,

More information

Part 6: Multivariate Normal and Linear Models

Part 6: Multivariate Normal and Linear Models Part 6: Multivariate Normal and Linear Models 1 Multiple measurements Up until now all of our statistical models have been univariate models models for a single measurement on each member of a sample of

More information

MATRIX ALGEBRA AND SYSTEMS OF EQUATIONS. + + x 1 x 2. x n 8 (4) 3 4 2

MATRIX ALGEBRA AND SYSTEMS OF EQUATIONS. + + x 1 x 2. x n 8 (4) 3 4 2 MATRIX ALGEBRA AND SYSTEMS OF EQUATIONS SYSTEMS OF EQUATIONS AND MATRICES Representation of a linear system The general system of m equations in n unknowns can be written a x + a 2 x 2 + + a n x n b a

More information

Lecture 2: Linear and Mixed Models

Lecture 2: Linear and Mixed Models Lecture 2: Linear and Mixed Models Bruce Walsh lecture notes Introduction to Mixed Models SISG, Seattle 18 20 July 2018 1 Quick Review of the Major Points The general linear model can be written as y =

More information

1 Uniform Distribution. 2 Gamma Distribution. 3 Inverse Gamma Distribution. 4 Multivariate Normal Distribution. 5 Multivariate Student-t Distribution

1 Uniform Distribution. 2 Gamma Distribution. 3 Inverse Gamma Distribution. 4 Multivariate Normal Distribution. 5 Multivariate Student-t Distribution A Few Special Distributions Their Properties Econ 675 Iowa State University November 1 006 Justin L Tobias (ISU Distributional Catalog November 1 006 1 / 0 Special Distributions Their Associated Properties

More information

Linear Algebra Review

Linear Algebra Review Linear Algebra Review Yang Feng http://www.stat.columbia.edu/~yangfeng Yang Feng (Columbia University) Linear Algebra Review 1 / 45 Definition of Matrix Rectangular array of elements arranged in rows and

More information

Multivariate Linear Models

Multivariate Linear Models Multivariate Linear Models Stanley Sawyer Washington University November 7, 2001 1. Introduction. Suppose that we have n observations, each of which has d components. For example, we may have d measurements

More information

Restricted Maximum Likelihood in Linear Regression and Linear Mixed-Effects Model

Restricted Maximum Likelihood in Linear Regression and Linear Mixed-Effects Model Restricted Maximum Likelihood in Linear Regression and Linear Mixed-Effects Model Xiuming Zhang zhangxiuming@u.nus.edu A*STAR-NUS Clinical Imaging Research Center October, 015 Summary This report derives

More information

Stat260: Bayesian Modeling and Inference Lecture Date: February 10th, Jeffreys priors. exp 1 ) p 2

Stat260: Bayesian Modeling and Inference Lecture Date: February 10th, Jeffreys priors. exp 1 ) p 2 Stat260: Bayesian Modeling and Inference Lecture Date: February 10th, 2010 Jeffreys priors Lecturer: Michael I. Jordan Scribe: Timothy Hunter 1 Priors for the multivariate Gaussian Consider a multivariate

More information

BIOS 2083 Linear Models Abdus S. Wahed. Chapter 2 84

BIOS 2083 Linear Models Abdus S. Wahed. Chapter 2 84 Chapter 2 84 Chapter 3 Random Vectors and Multivariate Normal Distributions 3.1 Random vectors Definition 3.1.1. Random vector. Random vectors are vectors of random variables. For instance, X = X 1 X 2.

More information

Linear Methods for Prediction

Linear Methods for Prediction Chapter 5 Linear Methods for Prediction 5.1 Introduction We now revisit the classification problem and focus on linear methods. Since our prediction Ĝ(x) will always take values in the discrete set G we

More information

[POLS 8500] Review of Linear Algebra, Probability and Information Theory

[POLS 8500] Review of Linear Algebra, Probability and Information Theory [POLS 8500] Review of Linear Algebra, Probability and Information Theory Professor Jason Anastasopoulos ljanastas@uga.edu January 12, 2017 For today... Basic linear algebra. Basic probability. Programming

More information

Fundamentals of Engineering Analysis (650163)

Fundamentals of Engineering Analysis (650163) Philadelphia University Faculty of Engineering Communications and Electronics Engineering Fundamentals of Engineering Analysis (6563) Part Dr. Omar R Daoud Matrices: Introduction DEFINITION A matrix is

More information

Multivariate Distributions

Multivariate Distributions IEOR E4602: Quantitative Risk Management Spring 2016 c 2016 by Martin Haugh Multivariate Distributions We will study multivariate distributions in these notes, focusing 1 in particular on multivariate

More information

Bayesian Linear Models

Bayesian Linear Models Bayesian Linear Models Sudipto Banerjee 1 and Andrew O. Finley 2 1 Department of Forestry & Department of Geography, Michigan State University, Lansing Michigan, U.S.A. 2 Biostatistics, School of Public

More information

ANOVA Variance Component Estimation. Copyright c 2012 Dan Nettleton (Iowa State University) Statistics / 32

ANOVA Variance Component Estimation. Copyright c 2012 Dan Nettleton (Iowa State University) Statistics / 32 ANOVA Variance Component Estimation Copyright c 2012 Dan Nettleton (Iowa State University) Statistics 611 1 / 32 We now consider the ANOVA approach to variance component estimation. The ANOVA approach

More information

The linear model is the most fundamental of all serious statistical models encompassing:

The linear model is the most fundamental of all serious statistical models encompassing: Linear Regression Models: A Bayesian perspective Ingredients of a linear model include an n 1 response vector y = (y 1,..., y n ) T and an n p design matrix (e.g. including regressors) X = [x 1,..., x

More information

Testing a Normal Covariance Matrix for Small Samples with Monotone Missing Data

Testing a Normal Covariance Matrix for Small Samples with Monotone Missing Data Applied Mathematical Sciences, Vol 3, 009, no 54, 695-70 Testing a Normal Covariance Matrix for Small Samples with Monotone Missing Data Evelina Veleva Rousse University A Kanchev Department of Numerical

More information

Bayesian Linear Models

Bayesian Linear Models Bayesian Linear Models Sudipto Banerjee 1 and Andrew O. Finley 2 1 Biostatistics, School of Public Health, University of Minnesota, Minneapolis, Minnesota, U.S.A. 2 Department of Forestry & Department

More information

VARIANCE COMPONENT ESTIMATION & BEST LINEAR UNBIASED PREDICTION (BLUP)

VARIANCE COMPONENT ESTIMATION & BEST LINEAR UNBIASED PREDICTION (BLUP) VARIANCE COMPONENT ESTIMATION & BEST LINEAR UNBIASED PREDICTION (BLUP) V.K. Bhatia I.A.S.R.I., Library Avenue, New Delhi- 11 0012 vkbhatia@iasri.res.in Introduction Variance components are commonly used

More information

Notes on Random Vectors and Multivariate Normal

Notes on Random Vectors and Multivariate Normal MATH 590 Spring 06 Notes on Random Vectors and Multivariate Normal Properties of Random Vectors If X,, X n are random variables, then X = X,, X n ) is a random vector, with the cumulative distribution

More information

The purpose of this section is to derive the asymptotic distribution of the Pearson chi-square statistic. k (n j np j ) 2. np j.

The purpose of this section is to derive the asymptotic distribution of the Pearson chi-square statistic. k (n j np j ) 2. np j. Chapter 9 Pearson s chi-square test 9. Null hypothesis asymptotics Let X, X 2, be independent from a multinomial(, p) distribution, where p is a k-vector with nonnegative entries that sum to one. That

More information

Ch4. Distribution of Quadratic Forms in y

Ch4. Distribution of Quadratic Forms in y ST4233, Linear Models, Semester 1 2008-2009 Ch4. Distribution of Quadratic Forms in y 1 Definition Definition 1.1 If A is a symmetric matrix and y is a vector, the product y Ay = i a ii y 2 i + i j a ij

More information

Stat 5101 Lecture Notes

Stat 5101 Lecture Notes Stat 5101 Lecture Notes Charles J. Geyer Copyright 1998, 1999, 2000, 2001 by Charles J. Geyer May 7, 2001 ii Stat 5101 (Geyer) Course Notes Contents 1 Random Variables and Change of Variables 1 1.1 Random

More information

PREDICTION OF BREEDING VALUES FOR UNMEASURED TRAITS FROM MEASURED TRAITS

PREDICTION OF BREEDING VALUES FOR UNMEASURED TRAITS FROM MEASURED TRAITS Libraries Annual Conference on Applied Statistics in Agriculture 1994-6th Annual Conference Proceedings PREDICTION OF BREEDING VALUES FOR UNMEASURED TRAITS FROM MEASURED TRAITS Kristin L. Barkhouse L.

More information

1 Appendix A: Matrix Algebra

1 Appendix A: Matrix Algebra Appendix A: Matrix Algebra. Definitions Matrix A =[ ]=[A] Symmetric matrix: = for all and Diagonal matrix: 6=0if = but =0if 6= Scalar matrix: the diagonal matrix of = Identity matrix: the scalar matrix

More information

Statistics for scientists and engineers

Statistics for scientists and engineers Statistics for scientists and engineers February 0, 006 Contents Introduction. Motivation - why study statistics?................................... Examples..................................................3

More information

Regression. Oscar García

Regression. Oscar García Regression Oscar García Regression methods are fundamental in Forest Mensuration For a more concise and general presentation, we shall first review some matrix concepts 1 Matrices An order n m matrix is

More information

Multivariate Statistical Analysis

Multivariate Statistical Analysis Multivariate Statistical Analysis Fall 2011 C. L. Williams, Ph.D. Lecture 4 for Applied Multivariate Analysis Outline 1 Eigen values and eigen vectors Characteristic equation Some properties of eigendecompositions

More information

Reduced Animal Models

Reduced Animal Models Reduced Animal Models 1 Introduction In situations where many offspring can be generated from one mating as in fish poultry or swine or where only a few animals are retained for breeding the genetic evaluation

More information

Properties of Matrices and Operations on Matrices

Properties of Matrices and Operations on Matrices Properties of Matrices and Operations on Matrices A common data structure for statistical analysis is a rectangular array or matris. Rows represent individual observational units, or just observations,

More information

Generalized Linear Models (1/29/13)

Generalized Linear Models (1/29/13) STA613/CBB540: Statistical methods in computational biology Generalized Linear Models (1/29/13) Lecturer: Barbara Engelhardt Scribe: Yangxiaolu Cao When processing discrete data, two commonly used probability

More information

ANALYSIS OF VARIANCE AND QUADRATIC FORMS

ANALYSIS OF VARIANCE AND QUADRATIC FORMS 4 ANALYSIS OF VARIANCE AND QUADRATIC FORMS The previous chapter developed the regression results involving linear functions of the dependent variable, β, Ŷ, and e. All were shown to be normally distributed

More information

Diagnostics can identify two possible areas of failure of assumptions when fitting linear models.

Diagnostics can identify two possible areas of failure of assumptions when fitting linear models. 1 Transformations 1.1 Introduction Diagnostics can identify two possible areas of failure of assumptions when fitting linear models. (i) lack of Normality (ii) heterogeneity of variances It is important

More information

Orthogonal decompositions in growth curve models

Orthogonal decompositions in growth curve models ACTA ET COMMENTATIONES UNIVERSITATIS TARTUENSIS DE MATHEMATICA Volume 4, Orthogonal decompositions in growth curve models Daniel Klein and Ivan Žežula Dedicated to Professor L. Kubáček on the occasion

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistical Sciences! rsalakhu@cs.toronto.edu! h0p://www.cs.utoronto.ca/~rsalakhu/ Lecture 7 Approximate

More information

STA 294: Stochastic Processes & Bayesian Nonparametrics

STA 294: Stochastic Processes & Bayesian Nonparametrics MARKOV CHAINS AND CONVERGENCE CONCEPTS Markov chains are among the simplest stochastic processes, just one step beyond iid sequences of random variables. Traditionally they ve been used in modelling a

More information

component risk analysis

component risk analysis 273: Urban Systems Modeling Lec. 3 component risk analysis instructor: Matteo Pozzi 273: Urban Systems Modeling Lec. 3 component reliability outline risk analysis for components uncertain demand and uncertain

More information

COMPARISON OF FIVE TESTS FOR THE COMMON MEAN OF SEVERAL MULTIVARIATE NORMAL POPULATIONS

COMPARISON OF FIVE TESTS FOR THE COMMON MEAN OF SEVERAL MULTIVARIATE NORMAL POPULATIONS Communications in Statistics - Simulation and Computation 33 (2004) 431-446 COMPARISON OF FIVE TESTS FOR THE COMMON MEAN OF SEVERAL MULTIVARIATE NORMAL POPULATIONS K. Krishnamoorthy and Yong Lu Department

More information

Modification of negative eigenvalues to create positive definite matrices and approximation of standard errors of correlation estimates

Modification of negative eigenvalues to create positive definite matrices and approximation of standard errors of correlation estimates Modification of negative eigenvalues to create positive definite matrices and approximation of standard errors of correlation estimates L. R. Schaeffer Centre for Genetic Improvement of Livestock Department

More information

Probability and Estimation. Alan Moses

Probability and Estimation. Alan Moses Probability and Estimation Alan Moses Random variables and probability A random variable is like a variable in algebra (e.g., y=e x ), but where at least part of the variability is taken to be stochastic.

More information

The Matrix Algebra of Sample Statistics

The Matrix Algebra of Sample Statistics The Matrix Algebra of Sample Statistics James H. Steiger Department of Psychology and Human Development Vanderbilt University James H. Steiger (Vanderbilt University) The Matrix Algebra of Sample Statistics

More information

Manifold Learning for Signal and Visual Processing Lecture 9: Probabilistic PCA (PPCA), Factor Analysis, Mixtures of PPCA

Manifold Learning for Signal and Visual Processing Lecture 9: Probabilistic PCA (PPCA), Factor Analysis, Mixtures of PPCA Manifold Learning for Signal and Visual Processing Lecture 9: Probabilistic PCA (PPCA), Factor Analysis, Mixtures of PPCA Radu Horaud INRIA Grenoble Rhone-Alpes, France Radu.Horaud@inria.fr http://perception.inrialpes.fr/

More information

Raphael Mrode. Training in quantitative genetics and genomics 30 May 10 June 2016 ILRI, Nairobi. Partner Logo. Partner Logo

Raphael Mrode. Training in quantitative genetics and genomics 30 May 10 June 2016 ILRI, Nairobi. Partner Logo. Partner Logo Basic matrix algebra Raphael Mrode Training in quantitative genetics and genomics 3 May June 26 ILRI, Nairobi Partner Logo Partner Logo Matrix definition A matrix is a rectangular array of numbers set

More information

Journal of Multivariate Analysis. Sphericity test in a GMANOVA MANOVA model with normal error

Journal of Multivariate Analysis. Sphericity test in a GMANOVA MANOVA model with normal error Journal of Multivariate Analysis 00 (009) 305 3 Contents lists available at ScienceDirect Journal of Multivariate Analysis journal homepage: www.elsevier.com/locate/jmva Sphericity test in a GMANOVA MANOVA

More information

Joint Distributions. (a) Scalar multiplication: k = c d. (b) Product of two matrices: c d. (c) The transpose of a matrix:

Joint Distributions. (a) Scalar multiplication: k = c d. (b) Product of two matrices: c d. (c) The transpose of a matrix: Joint Distributions Joint Distributions A bivariate normal distribution generalizes the concept of normal distribution to bivariate random variables It requires a matrix formulation of quadratic forms,

More information

Mark your answers ON THE EXAM ITSELF. If you are not sure of your answer you may wish to provide a brief explanation.

Mark your answers ON THE EXAM ITSELF. If you are not sure of your answer you may wish to provide a brief explanation. CS 189 Spring 2015 Introduction to Machine Learning Midterm You have 80 minutes for the exam. The exam is closed book, closed notes except your one-page crib sheet. No calculators or electronic items.

More information

Fundamentals. CS 281A: Statistical Learning Theory. Yangqing Jia. August, Based on tutorial slides by Lester Mackey and Ariel Kleiner

Fundamentals. CS 281A: Statistical Learning Theory. Yangqing Jia. August, Based on tutorial slides by Lester Mackey and Ariel Kleiner Fundamentals CS 281A: Statistical Learning Theory Yangqing Jia Based on tutorial slides by Lester Mackey and Ariel Kleiner August, 2011 Outline 1 Probability 2 Statistics 3 Linear Algebra 4 Optimization

More information

01 Probability Theory and Statistics Review

01 Probability Theory and Statistics Review NAVARCH/EECS 568, ROB 530 - Winter 2018 01 Probability Theory and Statistics Review Maani Ghaffari January 08, 2018 Last Time: Bayes Filters Given: Stream of observations z 1:t and action data u 1:t Sensor/measurement

More information

3. Probability and Statistics

3. Probability and Statistics FE661 - Statistical Methods for Financial Engineering 3. Probability and Statistics Jitkomut Songsiri definitions, probability measures conditional expectations correlation and covariance some important

More information

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A. 1. Let P be a probability measure on a collection of sets A. (a) For each n N, let H n be a set in A such that H n H n+1. Show that P (H n ) monotonically converges to P ( k=1 H k) as n. (b) For each n

More information

AN INTRODUCTION TO GENERALIZED LINEAR MIXED MODELS. Stephen D. Kachman Department of Biometry, University of Nebraska Lincoln

AN INTRODUCTION TO GENERALIZED LINEAR MIXED MODELS. Stephen D. Kachman Department of Biometry, University of Nebraska Lincoln AN INTRODUCTION TO GENERALIZED LINEAR MIXED MODELS Stephen D. Kachman Department of Biometry, University of Nebraska Lincoln Abstract Linear mixed models provide a powerful means of predicting breeding

More information

The Multivariate Normal Distribution 1

The Multivariate Normal Distribution 1 The Multivariate Normal Distribution 1 STA 302 Fall 2014 1 See last slide for copyright information. 1 / 37 Overview 1 Moment-generating Functions 2 Definition 3 Properties 4 χ 2 and t distributions 2

More information

ANOVA: Analysis of Variance - Part I

ANOVA: Analysis of Variance - Part I ANOVA: Analysis of Variance - Part I The purpose of these notes is to discuss the theory behind the analysis of variance. It is a summary of the definitions and results presented in class with a few exercises.

More information

Variational Principal Components

Variational Principal Components Variational Principal Components Christopher M. Bishop Microsoft Research 7 J. J. Thomson Avenue, Cambridge, CB3 0FB, U.K. cmbishop@microsoft.com http://research.microsoft.com/ cmbishop In Proceedings

More information

Fractional Imputation in Survey Sampling: A Comparative Review

Fractional Imputation in Survey Sampling: A Comparative Review Fractional Imputation in Survey Sampling: A Comparative Review Shu Yang Jae-Kwang Kim Iowa State University Joint Statistical Meetings, August 2015 Outline Introduction Fractional imputation Features Numerical

More information

Bayesian inference for sample surveys. Roderick Little Module 2: Bayesian models for simple random samples

Bayesian inference for sample surveys. Roderick Little Module 2: Bayesian models for simple random samples Bayesian inference for sample surveys Roderick Little Module : Bayesian models for simple random samples Superpopulation Modeling: Estimating parameters Various principles: least squares, method of moments,

More information

Mixed-Model Estimation of genetic variances. Bruce Walsh lecture notes Uppsala EQG 2012 course version 28 Jan 2012

Mixed-Model Estimation of genetic variances. Bruce Walsh lecture notes Uppsala EQG 2012 course version 28 Jan 2012 Mixed-Model Estimation of genetic variances Bruce Walsh lecture notes Uppsala EQG 01 course version 8 Jan 01 Estimation of Var(A) and Breeding Values in General Pedigrees The above designs (ANOVA, P-O

More information

Economics 620, Lecture 4: The K-Variable Linear Model I. y 1 = + x 1 + " 1 y 2 = + x 2 + " 2 :::::::: :::::::: y N = + x N + " N

Economics 620, Lecture 4: The K-Variable Linear Model I. y 1 = + x 1 +  1 y 2 = + x 2 +  2 :::::::: :::::::: y N = + x N +  N 1 Economics 620, Lecture 4: The K-Variable Linear Model I Consider the system y 1 = + x 1 + " 1 y 2 = + x 2 + " 2 :::::::: :::::::: y N = + x N + " N or in matrix form y = X + " where y is N 1, X is N

More information

Probability Theory and Statistics. Peter Jochumzen

Probability Theory and Statistics. Peter Jochumzen Probability Theory and Statistics Peter Jochumzen April 18, 2016 Contents 1 Probability Theory And Statistics 3 1.1 Experiment, Outcome and Event................................ 3 1.2 Probability............................................

More information

Cheat Sheet for MATH461

Cheat Sheet for MATH461 Cheat Sheet for MATH46 Here is the stuff you really need to remember for the exams Linear systems Ax = b Problem: We consider a linear system of m equations for n unknowns x,,x n : For a given matrix A

More information

Introduction to Bayesian Statistics and Markov Chain Monte Carlo Estimation. EPSY 905: Multivariate Analysis Spring 2016 Lecture #10: April 6, 2016

Introduction to Bayesian Statistics and Markov Chain Monte Carlo Estimation. EPSY 905: Multivariate Analysis Spring 2016 Lecture #10: April 6, 2016 Introduction to Bayesian Statistics and Markov Chain Monte Carlo Estimation EPSY 905: Multivariate Analysis Spring 2016 Lecture #10: April 6, 2016 EPSY 905: Intro to Bayesian and MCMC Today s Class An

More information

Lecture 1: August 28

Lecture 1: August 28 36-705: Intermediate Statistics Fall 2017 Lecturer: Siva Balakrishnan Lecture 1: August 28 Our broad goal for the first few lectures is to try to understand the behaviour of sums of independent random

More information

First Year Examination Department of Statistics, University of Florida

First Year Examination Department of Statistics, University of Florida First Year Examination Department of Statistics, University of Florida August 20, 2009, 8:00 am - 2:00 noon Instructions:. You have four hours to answer questions in this examination. 2. You must show

More information

Frequentist-Bayesian Model Comparisons: A Simple Example

Frequentist-Bayesian Model Comparisons: A Simple Example Frequentist-Bayesian Model Comparisons: A Simple Example Consider data that consist of a signal y with additive noise: Data vector (N elements): D = y + n The additive noise n has zero mean and diagonal

More information

. Find E(V ) and var(v ).

. Find E(V ) and var(v ). Math 6382/6383: Probability Models and Mathematical Statistics Sample Preliminary Exam Questions 1. A person tosses a fair coin until she obtains 2 heads in a row. She then tosses a fair die the same number

More information

Today. Probability and Statistics. Linear Algebra. Calculus. Naïve Bayes Classification. Matrix Multiplication Matrix Inversion

Today. Probability and Statistics. Linear Algebra. Calculus. Naïve Bayes Classification. Matrix Multiplication Matrix Inversion Today Probability and Statistics Naïve Bayes Classification Linear Algebra Matrix Multiplication Matrix Inversion Calculus Vector Calculus Optimization Lagrange Multipliers 1 Classical Artificial Intelligence

More information