VARIANCE COMPONENT ESTIMATION & BEST LINEAR UNBIASED PREDICTION (BLUP)

Size: px
Start display at page:

Download "VARIANCE COMPONENT ESTIMATION & BEST LINEAR UNBIASED PREDICTION (BLUP)"

Transcription

1 VARIANCE COMPONENT ESTIMATION & BEST LINEAR UNBIASED PREDICTION (BLUP) V.K. Bhatia I.A.S.R.I., Library Avenue, New Delhi Introduction Variance components are commonly used in formulating appropriate designs, establishing quality control procedures, or, in statistical genetics in estimating heritabilities and genetic correlations. Traditionally the estimators used most often have been the analysis of variance (ANOVA) estimators, which are obtained by equating observed and expected mean squares from an analysis of variance and solving the resulting equations. If the data are balanced, the ANOVA estimators have many appealing properties. In unbalanced situations, these properties are rarely hold true which create number of problems in arriving at correct decisions. As in reality, variance components are mostly estimated from unbalanced data only so there are number of problems associated with them in these situations. In unbalanced situations, two general classes of estimators have sparked considerable interest: maximum likelihood and restricted maximum likelihood (ML and REML) and minimum norm and minimum variance quadratic unbiased estimation (MINQUE and MIVQUE). The links between them is also very important component. In addition to estimation problems in unbalanced case, the notion of robust estimation which takes care of influence of outliers and underlying statistical assumptions is also of interest. The classical least squares model contains only are random element, the random error; all other effects are assumed to be fixed constants. For this class of models, the assumption of independence of the i implies independence of the y i. That is, if Var I 2, then Vary I 2 also. Such models are called fixed effects models or more simply fixed models. There are situations when there is more than one random term. The classical variance components problems, in which the purpose is to estimate components of variance rather than specific treatment effects, is one example. In these cases, the treatment effects are assumed to be a random sample from a population of such effects and the goal of the study is to estimate the variance among these effects in the population. The individual effects that happen to be observed in the study are not of any particular interest except for the information they provide on the variance component. Models in which all effects assumed to be random effects are called random models. Models that contain both fixed and random effects are called mixed models. Analysis of Variance Approach The conventional least square approach, sometimes called the analysis of variance approach, to mixed model is to assume initially that all effects, other than the term that assigns a unique random element to each observation are fixed effects. Least squares is applied to this fixed model to obtain relevant partitions of the sums of squares. Then, the model containing the random effects as reinstated and expectations of the mean squares are derived. The mean square expectations determine how tests of significance are to be made and how variance

2 components are to be estimated. Adjustments to tests of significance are made by constructing an error mean square that has the proper expectation with respect to the random elements. This requires the expectations of the mean squares under the random model. For balanced data the mean square expectations are easily obtained. The expectations are expressed in terms of a linear function of the variance components for the random effect plus a general statement of the classes of fixed effects involved in the quadratic function. Henderson s Methods I, II & III Henderson described three methods of estimating variance components that are just three different ways of using the general ANOVA method. They differ only in the different quadratics ( not always sums of squares ) used for a vector of any linearly independent quadratic forms of the observations. All three also suffer from demerits of the general ANOVA method - that for unbalanced data no optimal application of the method is known, the methods can yield negative estimates, and distributional properties of the estimators are not known. Method I In Method I the quadratics used are analogous to the sums of squares used for balanced data, the analogy being such that certain sums of squares in balanced data become, for unbalanced data, quadratic forms that are not non-negative definite, and so they are not sums of squares. Thus e.g. for 2-way cross classification with n observations per cell, the sum of squares bnyi.. y... bnyi.. abny... becomes for unbalanced data bni. yi.. y... bni. yi.. abn.. y... This method is easy to compute, even for large data sets; and for random models, it yields estimators that are unbiased. It can not be used for mixed models. It can only be adopted to a mixed model by altering that model and treating the fixed effect either as non-existent or as random - in which case the estimated variance components for the true random effects will be biased. Method II This is designed to capitalize on the easy computability of Method I and to broaden its use by removing the limitation of Method I that it can not be used for mixed models. The method has two parts. First make the temporary assumption that the random effects are fixed and for the model y = X + Z u + e solve the normal equation for 0 XX XZ 0 Xy 0 ZX ZZ u Zy Then consider the vector of data adjusted for 0, namely z = y - X 0 and then model for z will be z = 1 + Zu + Ke

3 where differ from and where K is known. This can thus easily be analysed by Method I. Method II is relatively easy to compute, especially when the number of fixed effects is not too large. And although it can be used for a wide variety of mixed models, it can not be used for those mixed models that have interactions between fixed and random factors, whether those interactions are defined as random effects (the usual case) or as fixed effects. Method III This uses sums of squares that arise in fitting an overparameterised model and submodels thereof. It can be used for any mixed model and yield estimators that are unbiased. Although the method uses sums of squares that are known (at least in some cases) to be useful in certain fixed effects models, no analytic evidence is available that these sums of squares have optimal properties for estimating variances. The main disadvantage of this method is that though its confinement to sums of squares for fitting overparameterised models, there is a problem of too many sums of squares being available. For example for the 2-way crossed classification overparameterised model with equation y e ijk i j ij ijk Suppose all effects are random. There are then four variance components to estimate : R,,, and. But for that model there are five different sums of squares,, R, R, and R R,, as well as SSE which can be used. From these at least three sets suggest themselves as possible candidates for Method III estimation R R,, SSE (a) R, (b) R R, R (c) R, R, R,, SSE,, SSE All three sets yield the same estimators of 2 and 2 e. Two different estimators of 2 2 arise and it is difficult to conclude that which sets of estimators are to be preferred. and ML (Maximum Likelihood) The method is based on the maximizing the likelihood function. For the mixed model, under the assumption of normality of error terms and random effects we have y X Zu e NX, V with 2 ' 2 2 ' V = i ZZ i i ein= i ZZ i i i1 The likelihood function is then 2 12 / N 12 / ' V exp 1/ 2 yx V 1 yx i0 L = Maximizing L with respect to elements of and the variance components (the i 2 s that occur in V) leads to equations that have to be solved to yield ML estimators of and of 2. These

4 equations can be written in a variety of ways and can be solved iteratively. Despite the numerical difficulties involved in solving these equations for obtaining ML estimators of variance components, it is preferred over ANOVA method. The reason is that this method is well defined and the resulting estimators have attractive, well-known large-sample properties they are normally distributed and their sampling variances are known. REML (Restricted Maximum Likelihood) REML estimators are obtained from maximizing that part of the likelihood which is invariant to the location parameter; i.e. in terms of the mixed model y X Zu e, invariant to X. Another way of looking at it, is that REML maximizes the likelihood of a vector combinations of the observations that are invariant to X. Suppose Ly is such a vector. Then Ly LX LZu Le is invariant to X if and only if LX = 0. Computational problems for obtaining solutions are same as that of ML method. The REML estimation procedure does not, however, include estimating. On the other hand the REML equations with balanced data provide solutions that are identical to ANOVA estimators which are unbiased and have attractive minimum variance properties. In this sense REML is said to take account of the degrees of freedom involved in estimating the fixed effects, whereas ML estimators do not. The easiest example of this is the case of a simple sample of n observations 2 from a N(, ) distribution. The two estimators of 2 are 2 ML 2 i / 2 2 REML i x x n x x /( n1) The REML estimator has taken account of the one degree of freedom required for estimating, whereas the ML estimator has not. The REML estimator is also unbiased, but the ML estimator is not. In the general case of unbalanced data neither the ML estimator nor the REML estimators are unbiased. MINQUE (Minimum Norm Quadratic Unbiased Estimation) The Method is based on the concept that the estimation minimize a (Euclidean) norm, be a quadratic form of the observations and be unbiased. Its development involves extensive algebra. More importantly, its concept demands the use of some pre-assigned weights that effectively play a part of a priori values for the unknown variance components. This method has two advantages; it involves no normality assumptions as do ML and REML. And the equations that yield the estimator do not have to solved iteratively. The solution only depends on the pre-assigned values; different pre-assigned values can give different estimators from the same data set. One must therefore talk about a MINQUE estimator and not the MINQUE estimator. This appears to a troublesome feature of the MINQUE procedure. Also, its estimators can be negative and they are only unbiased if indeed the true, unknown value of 2 is pre-assigned. There is also a close relationship between REML and MINQUE i.e. a MINQUE solution = a first iterate of REML.

5 MIVQUE (Minimum Variance Quadratic Unbiased Estimation) MINQUE demands no assumptions about the form of the distribution of y. But if the usual normality assumptions are invoked, the MINQUE solution has the properties of being that unbiased quadratic form of the observations which has minimum variance; i.e. it is a minimum variance quadratic unbiased estimator, MIVQUE. I-MINQUE (Iterative MINQUE) As already pointed out, the MINQUE procedure demands using a weight vector for the preassigned value for 2. No iteration is involved. But having obtained a solution, 2 1 say, its existence prompts the idea of using it as a new pre-assigned value for getting a new estimate of 2, say 2 2. This leads to using the MINQUE equations iteratively to yield iterative MINQUE, or I-MINQUE estimators. They are, of course, if one iterates to convergence, the same as REML estimators. Hence I-MINQUE = REML. Even in the absence of normality assumptions on y, the I-MINQUE solutions do have large-sample normality properties. Negative Variance Component Estimates The variance components should always be positive because they are assumed to represent the variance of a random variable. But some of existing methods like ANOVA and MIVQUE do give rise to negative estimates. These negative estimates may arise for a variety of reasons. The variability in your data may be large enough to produce a negative estimate even though the true value of the variance component is positive. Data may contain outliers which exhibit unusual large variability. A different model for interpreting your data may be appropriate. Under some statistical models for variance components analysis, negative estimates are an indication that observations in the data are negatively correlated. Robust Estimation Outliers may occur with respect to any of the random components in a mixed - model analysis of variance. There is an extensive literature on robust estimation in the case of single error component. There is, however, only a small body of literature on robust estimation in the variance-component model Computational Problems The special features of various computational problems of estimating variance components involve the application of iterative procedures such as Newton-Raphson and Marquardt method, Method of scoring, Quasi-Newton methods, EM algorithm and Method of successive approximations. Evaluation of Algorithms Several recent research papers evaluate algorithms for variance components estimation. While there is no consensus on the best method, some general conclusions seem to be as follows: 1. The Newton-Raphson method often converges in the fewest iterations, followed by the scoring method and the EM algorithm. In some cases the EM algorithm requires a very large number of iterations. The individual iterations tend to be slightly shorter for the EM algorithm, but this depends greatly on the details of the programming.

6 2. The robustness of the methods to their starting values (ability to converge given poor starting values) is the reverse of the rate of convergence. The EM algorithm is better than Newton-Raphson. 3. The EM algorithm automatically takes care of inequality constraints imposed by the parameter space. Other algorithms need specialized programming to incorporate constraints. 4. Newton-Raphson and scoring generate an estimated, asymptotic variance-covariance matrix for the estimates as a part of their calculations. At the end of the EM iterations, special programming [perhaps a single step of Newton-Raphson ] needs to be employed to calculate asymptotic standard errors. Computational Methods Available in SAS Four methods are available in SAS PROC VARCOMP statements using the METHOD = option. They are The Type 1 Method This method (METHOD = TYPE 1) computes the type 1 sum of squares for each effect, equates each mean square involving only random effects to its expected values and solves the resulting system of equation. The MIVQUE0 Method The MIVQUE0 method (METHOD = MIVQUE0) produces unbiased estimates that are invariant with respect to the fixed effects of the model and are locally best quadratic unbiased estimates given that the true ratio of each component to the residual error component is zero. The technique is similar to Type 1 except that the random effects are adjusted only for the fixed effects. This is a default method used in PROC VARCOMP. The MAXIMUM - LIKELIHOOD Method The ML method (METHOD = ML) computes maximum likelihood estimates of the variance components. The RESTRICTED MAXIMUM - LIKELIHOOD Method The restricted maximum likelihood method (METHOD = REML) is similar to ML method, but it first separates the likelihood into two parts, one that contains the fixed effects and another that does not. This is an iterated version of MIVQUE0. Specification for using PROC VARCOMP in SAS The following statements are used in the VARCOMP procedure Required in this order : Optional : PROC VARCOMP <option> ; CLASS Variables ; MODEL dependents = effects </option> ; BY variables ;

7 Only one MODEL statement is allowed. The BY, CLASS and MODEL statements are described after the PROC VARCOMP statements. PROC VARCOMP statement PROC VARCOMP <option> ; DATA (SAS data set. If this is omitted the most recently created SAS data set is used) EPSILON = number (default 1E - 8) (Convergence value) MAXITER = number (number of iterations) (default = 50) METHOD = TYPE 1/MIVQUE0/ML/REML (default = MIVQUE0) By statement BY variables ; A BY statement can be used with PROC VARCOMP to obtain separate analyses on observation in groups determined by the BY variables. CLASS statement The CLASS statement specifies the classification variables to be used in the analysis. MODEL statement MODEL dependents = effects </option> ; The MODEL statement gives the dependent variables and independent effects. If more than one dependent is specified, a separate analysis is performed for each one. Only one MODEL statement is allowed. Only one option is available in the MODEL statement. FIXED = n = Tells VARCOMP that the first n effects is the MODEL statement are effects. The remaining effects are assumed to be random. By default PROC VARCOMP assumes that all effects are random in the model of Y = A B/ Fixed = 1 then A x B is considered a random effect. Example: In this example, A & B are classification variables and Y is the dependent variable. A is declared fixed, and B and A x B are random. data a; input a b y; cards;

8 ; Proc Varcomp method = type 1; Class a b; model y = a b/ Fixed = 1 ; run; Proc Varcomp method = mivque0; Class a b; model y = a b/ Fixed = 1; run; Proc Varcomp method = ml; class a b; model y = a b/ Fixed = 1; run; Proc varcomp method = reml; class a b; model y = a b/ Fixed = 1; run; Exercise: The data given below is first month milk yield of 28 daughters of 4 sires in 3 herds Herd Sire Daughter Milk Yield , 160, , 110, 115, , , 130, , 142, , 117, , , 125, , , 129, 131 Case (i) Assume herd and sire as random components. (ii) Assume only sire as random component. Obtain the different variance components by all the four methods.

9 Best Linear Unbiased Prediction (BLUP) A problem that occurs frequently in animal and plant breeding applications and probably in many other fields as well, is that given a sample data vector from a mixed model, the experimenter wishes to predict some set of linear functions of a future random vector. Thus, it is a problem of prediction of random vector in mixed linear models and takes different form under different situations, which is known as (a) Best Prediction (BP) (i) The form of the joint distribution of records and of the random vector to be predicted is known. (ii) Parameters of the distribution are known. (iii) It has been proved that the conditional mean of genetic values given the records, has optimum properties. (b) Best Linear Prediction(BLP) (i) The form of the distribution is not known or certain parameters are not known. (i) We do know means of the records, the means of the genetic values and variances and covariances or second moments are known. (ii) This involves finding that linear function of the records which minimizes the average of squared errors of prediction. (iii) In case of normal distribution BLP is BP. (c) Best Linear Unbiased Prediction (BLUP) (i) The problem is the same as for BLP, but now we do not know the means. (ii) Only the variances and covariances of the random vectors are known. (iii) (iii)we find the linear function of the records which has same expectation as the genetic values to be predicted and which is, in the class of that function, which minimizes the average of squared errors. (d) Neither first nor second moments are known and still it is desired to use linear prediction methods (i) We never really known parameters, but we may have good prior estimates of them and it will be (1) BP when we have good estimates of all parameters (2) BLP when we have good estimates of first and second moments (3) BLUP when we have good estimates of the second central moments (ii) If we have no prior estimates of either first or second moments, we need to estimate them from the same data that are used for prediction. In practical situations, mostly problems are of the type in which we assume that the variance covariance matrix of random variables is known and further it is assumed that records follow mixed model. Two methods been most frequently used. In the first, a regular least squares solution is obtained by treating all random variables except an error vector with variance I 2 as fixed. Then the predictor is as linear function of the least square solution. In the second method, estimates of the fixed estimates of the model are obtained by some method, possibly by regular

10 least square as in the first method, the data are adjusted for the fixed effects and then selection index methods are applied to the adjusted data as though the adjustment had been made with known values of fixed effects. Henderson, (1963) suggested a combination of these methods and described a mixed model method which resulted simultaneously best linear unbiased estimators of estimable linear functions of the fixed elements of the model and best linear unbiased predictors of the random elements of the model. The general linear model y = X + Zu + e where y is a nx1 vector of observations X is known (nxp) matrix is (p x 1) vector of fixed effects u is (q x 1) non observable random effect e is (n x 1) error effect vector and y X y V ZG R E u 0 and V u GZ ' G O e 0 e R O R No assumptions are made concerning the distribution of the random variables, however G and R are assumed known without error and are non singular. The general problem to be solved is to predict a function K +M u ( generally fixed, u generally random) as the predictand, by a linear function of the observations, L y, the predictor, such that the prediction error variances for predictors of each element of K + M u are minimized and such that the expected value of the predictor is equal to the expected value of the predictand. The function K must be an estimable function. The prediction error is K + M u - L y and the variance-covariance matrix of this function is the matrix of interest since we wish to minimize each individual diagonal element. To do this we define this matrix algebraically Assumption: V( K ) = 0 and all cov involving K = 0, V(K + M u - L y) = V(M u) + V(L y) - Cov(M u, y L) - Cov(L y, u M) = M GM + L VL - M GZ L - L ZGM To ensure that the predictor is unbiased, i.e., has the same expected value as the predictand, we add a Lagrange Multiplier to the variance-covariance matrix of prediction errors prior to minimizing the function.

11 We know that and E(K + M u)= K E(L y) = L X Thus in order for L X = K for all possible vectors,, then L X - K = 0 must be true. Hence the Lagrange Multiplier becomes (L X - K ). The LM added to V(K + M u - L y) gives the function, F, below F = M GM + L VL - M GZ L - L ZGM + (L X - K ) The function F is differentiated with respect to the unknowns, L and, and the derivatives are equated to zero (null matrices) F = 2VL - 2ZGM + X = 0 L' F and = L X - K = 0 Note that the second derivative provides the condition which must hold in order for the prediction to be unbiased. These results can be rearranged in matrix form as follows: V X L ZGM X' K Recall that V = ZGZ + R and let = 1 2 ZGZ' R X L ZGM X' 0 K From the first line RL + ZG(Z L - M) + X = 0 Let S = G(Z L - M) and note that 1 G S Z' L M and M = Z L - G -1 S Now we can write the following equations R Z X L 0 1 Z' G 0 S M X' 0 0 K

12 Absorb the L equation into the other two ZR ' Z G ZR ' X S M - XR ' Z XR ' X 1 1 K Multiply both sides by -1 and let C C C C 12' 22 = ZR 1 Z G 1 ZR 1 ' ' X 1 1 XR ' Z XR ' X 1 Then and and where or S = C C C C ' 22 M K RL = -ZS - X C11 C12 M = - Z X C12' C 22 K L = R -1 Z X C C M C12' C 22 K C L y = M' K' C = M u + K u = M' K' C C ' 22 1 ZR ' y 1 XR ' y u = ZR 1 Z G 1 ZR 1 ' ' X 1 1 XR ' Z XR ' X u = 1 1 XR ' X XR ' Z ZR ' X ZR ' Z G ZR ' y 1 XR ' y 1 XR ' y 1 ZR ' y These equations are commonly referred to as Henderson s mixed model equations, and these provide predictors with the smallest prediction error variances among all linear unbiased predictors. This methodology can be extended to various situations like the case of individual model and model for related sires etc.

13 Animal Additive Genetic Model The model for the individual record is y i = x i + z i u + a i + e i where represents fixed effects with x i relating the record on the i-th animal to this vector, u represents random effects other than breeding values, z i relates this vector to y i a i is the additive genetic value of the i-th animal and e i is a random error associated with the individual record. The vector representation of the entire set of records is y = X + Zu + Z a a + e If a represents only those animals with records, Z a = I. Otherwise it is an identity matrix with rows deleted that correspond to animals without records. Var (u) = G Var (a) = A 2 a Var (e) = R, usually I 2 e Cov (u,a ) = 0, Cov (u,e ) = 0, Cov (a,e ) = 0 If Z a I, the mixed model equations are XR ' 1 X XR ' 1 Z XR ' 1 Z a Z' R X Z' R Z G Z' R Z a ZaR ' 1 X Za ' R 1 Z Za ' R 1 Za A 1 / 2 a o u = a XR ' 1 y ZR ' 1 y Z' a R 1 y If Z a = I, it simplifies to XR ' 1 X XR ' 1 Z XR ' 1 ZR ' 1 X ZR ' 1 Z G 1 ZR ' 1 R 1 X R 1 Z R 1 A 1 / 2 a o u = a XR ' 1 y ZR ' 1 y R 1 y If R = I 2 e it further simplifies to XX ' XZ ' X' ZX ' ZZ ' G 1 2 e Z' X Z I A 1 2 e / 2 a o u = a Xy ' Zy ' y If the number of animals is large, one should, of course, use Henderson s method (1976) for computing A -1. Since this method requires using a base population of non-inbred, unrelated animals, some of these probably do not have records. Also we may wish to evaluate some

14 progeny that have not yet made a record. Both of these circumstances will result in Z a I, but a will contain predicted breeding values of these animals without records. Sire model with additive genetic effects The model in which related sires are mated to a random sample of unrelated dams, no dam has more than one progeny with a record, and each progeny produces one record, is y ij = x ij + s i + z i u + e ij where represents fixed effects with x ij relating the j-th progeny of the i-th sire to these effects s i represents the sire effect on the progeny record u represents other random factors with z ij relating these to the ij-th progeny record e ij is a random error The vector representation is y = X + Z s s + Zu + e Var (s) = A 2 s where A is the numerator relationship of the sires and 2 s is the sire variance in the base population. If the sires comprise a random sample from this population 2 s = 1 4 additive genetic variance. Some columns of Z s will be null if s contains sires with no progeny, as will usually be the case if the simple method for computation of A -1 requiring base population animals, is used. Var (u) = G, Cov (s,u ) = 0 Var (e) = R, usually = I 2 e Cov (s,e ) = 0, Cov (u,e ) = 0 If sires and dams are truly random, I 2 e =.75I (additive genetic variance) + I (environmental variance) With this model the mixed model equations are XR ' 1 X XR ' 1 Z s XR ' 1 Z Z' s R X Z' s R Z s A / s Z' s R Z ZR ' 1 X ZR ' 1 Zs ZR ' 1 Z G 1 If R = I e 2, then it simplifies to XX ' XZs ' X' Z' s X Zs ' Zs A 1 2 e / 2 s Z' s Z ZX ' Z ZZ ' 2 1 eg o s = u o s = u Xy ' Z' s y Zy ' XR ' 1 y Z' s R 1 y ZR ' 1 y

15 Illustration: Suppose we consider seven sires with the following relationships: S 0 S 1 S 2 S 3 S 4 S 5 S 6 There are no progeny on S 0. Each sire has two progeny in each of the two contemporary groups that differ by 100 kg and the calves by each sire within contemporary group differ by 100 kgs. There are a total of 24 calves. Results obtained under different models are as under: Table 1: Summary of solutions from different models Solutions Model g 1 g 2 S 0 S 1 S 2 S 3 S 4 S 5 S 6 Sires with progeny data Groups + Sires with progeny u g s Sires related Groups + Sires related Table 2: Rank of sires under different models Model Rank Sires with progeny data 3, 5, 4-1, 6, 2 Groups + Sires with progeny 3, 5, 4, 6, 1, 2 Sires related 3, 1, 5, 4, 0, 6, 2 Groups + Sires related 3, 1, 5, 0, 4, 6, 2 The changes in the sire evaluations as best described by the changes in rank shown in Table 2. Based on progeny data alone, no distinction can be made between sire one and four. The addition of groups changes the rank of one, four and six. The addition of relationship creates more rank changes. Clearly the relationships are the leading contributing factors to the correct ranking, since three is the best bull by all models and he was the son of one and one is the son of the base sire.

16 The addition of the relationship matrix does two things for the prediction procedure. 1) It provides RELATIONSHIP TIES among the animals in different contemporary groups. Relationship ties do the same thing as reference sire having progeny in many different contemporary groups. This is an important aspect of including A -1. 2) It also gives predictions that include the parental half-sib information that is available. The lower the heritability of the trait, the more important this aspect of including the relationship inverse becomes. This second aspect is equivalent to the selection index theory approach which combines sources of information into one predicted value. References Dempfle,L.(1977). Comparison of several sire evaluation methods in dairy cattle breeding. Livestock Production Science, pp Goldberger, A.S.(1962). Best linear unbiased prediction in the generalised linear regression model. JASA, 57, pp Harville, D. A.(1976). Extension of the Gauss-Markov theorem to include the estimation of random effects. Ann. Statist., 4, pp Harville,D.A.(1990). BLUP (Best Linear Unbiased Prediction) and Beyond. In Advances in Statistical Methods for Genetic Improvement of Livestock ( D. Gianola and K. Hammoud, eds.), pp Springer, New York. Henderson, C.R.(1963). Selection index and expected genetic advance. In Statistical Genetics and Plant Breeding, pp Nat. Acad. Sci., Nat. Res. Council Publication, 982, Washington, DC. Henderson, C.R.(1975). Best linear unbiased estimation and prediction under a selection model. Biometrics, 31, pp Henderson, C.R.(1976). A simple method for computing the inverse of a numerator relationship matrix used in prediction used in prediction of breeding values. Biometrics, 32, pp Henderson, C.R.(1984). Applications of Linear Models in Animal Breeding. University of Guelph. Lindley D.V. and Smith, A.F.M.(1972). Bayes estimates for the linear model (with discussion). JRSS, Ser. B, 34, pp Robinson, G.K.(1991). That BLUP is a good thing: The estimation of random effects. Statistical Science, 6(1), pp

Mixed-Model Estimation of genetic variances. Bruce Walsh lecture notes Uppsala EQG 2012 course version 28 Jan 2012

Mixed-Model Estimation of genetic variances. Bruce Walsh lecture notes Uppsala EQG 2012 course version 28 Jan 2012 Mixed-Model Estimation of genetic variances Bruce Walsh lecture notes Uppsala EQG 01 course version 8 Jan 01 Estimation of Var(A) and Breeding Values in General Pedigrees The above designs (ANOVA, P-O

More information

Chapter 5 Prediction of Random Variables

Chapter 5 Prediction of Random Variables Chapter 5 Prediction of Random Variables C R Henderson 1984 - Guelph We have discussed estimation of β, regarded as fixed Now we shall consider a rather different problem, prediction of random variables,

More information

MIXED MODELS THE GENERAL MIXED MODEL

MIXED MODELS THE GENERAL MIXED MODEL MIXED MODELS This chapter introduces best linear unbiased prediction (BLUP), a general method for predicting random effects, while Chapter 27 is concerned with the estimation of variances by restricted

More information

Best unbiased linear Prediction: Sire and Animal models

Best unbiased linear Prediction: Sire and Animal models Best unbiased linear Prediction: Sire and Animal models Raphael Mrode Training in quantitative genetics and genomics 3 th May to th June 26 ILRI, Nairobi Partner Logo Partner Logo BLUP The MME of provided

More information

Lecture 5: BLUP (Best Linear Unbiased Predictors) of genetic values. Bruce Walsh lecture notes Tucson Winter Institute 9-11 Jan 2013

Lecture 5: BLUP (Best Linear Unbiased Predictors) of genetic values. Bruce Walsh lecture notes Tucson Winter Institute 9-11 Jan 2013 Lecture 5: BLUP (Best Linear Unbiased Predictors) of genetic values Bruce Walsh lecture notes Tucson Winter Institute 9-11 Jan 013 1 Estimation of Var(A) and Breeding Values in General Pedigrees The classic

More information

Chapter 12 REML and ML Estimation

Chapter 12 REML and ML Estimation Chapter 12 REML and ML Estimation C. R. Henderson 1984 - Guelph 1 Iterative MIVQUE The restricted maximum likelihood estimator (REML) of Patterson and Thompson (1971) can be obtained by iterating on MIVQUE,

More information

Mixed-Models. version 30 October 2011

Mixed-Models. version 30 October 2011 Mixed-Models version 30 October 2011 Mixed models Mixed models estimate a vector! of fixed effects and one (or more) vectors u of random effects Both fixed and random effects models always include a vector

More information

Chapter 11 MIVQUE of Variances and Covariances

Chapter 11 MIVQUE of Variances and Covariances Chapter 11 MIVQUE of Variances and Covariances C R Henderson 1984 - Guelph The methods described in Chapter 10 for estimation of variances are quadratic, translation invariant, and unbiased For the balanced

More information

Likelihood Methods. 1 Likelihood Functions. The multivariate normal distribution likelihood function is

Likelihood Methods. 1 Likelihood Functions. The multivariate normal distribution likelihood function is Likelihood Methods 1 Likelihood Functions The multivariate normal distribution likelihood function is The log of the likelihood, say L 1 is Ly = π.5n V.5 exp.5y Xb V 1 y Xb. L 1 = 0.5[N lnπ + ln V +y Xb

More information

3. Properties of the relationship matrix

3. Properties of the relationship matrix 3. Properties of the relationship matrix 3.1 Partitioning of the relationship matrix The additive relationship matrix, A, can be written as the product of a lower triangular matrix, T, a diagonal matrix,

More information

Lecture Notes. Introduction

Lecture Notes. Introduction 5/3/016 Lecture Notes R. Rekaya June 1-10, 016 Introduction Variance components play major role in animal breeding and genetic (estimation of BVs) It has been an active area of research since early 1950

More information

Maternal Genetic Models

Maternal Genetic Models Maternal Genetic Models In mammalian species of livestock such as beef cattle sheep or swine the female provides an environment for its offspring to survive and grow in terms of protection and nourishment

More information

Lecture 3: Linear Models. Bruce Walsh lecture notes Uppsala EQG course version 28 Jan 2012

Lecture 3: Linear Models. Bruce Walsh lecture notes Uppsala EQG course version 28 Jan 2012 Lecture 3: Linear Models Bruce Walsh lecture notes Uppsala EQG course version 28 Jan 2012 1 Quick Review of the Major Points The general linear model can be written as y = X! + e y = vector of observed

More information

Multilevel Models in Matrix Form. Lecture 7 July 27, 2011 Advanced Multivariate Statistical Methods ICPSR Summer Session #2

Multilevel Models in Matrix Form. Lecture 7 July 27, 2011 Advanced Multivariate Statistical Methods ICPSR Summer Session #2 Multilevel Models in Matrix Form Lecture 7 July 27, 2011 Advanced Multivariate Statistical Methods ICPSR Summer Session #2 Today s Lecture Linear models from a matrix perspective An example of how to do

More information

Generalized, Linear, and Mixed Models

Generalized, Linear, and Mixed Models Generalized, Linear, and Mixed Models CHARLES E. McCULLOCH SHAYLER.SEARLE Departments of Statistical Science and Biometrics Cornell University A WILEY-INTERSCIENCE PUBLICATION JOHN WILEY & SONS, INC. New

More information

Quantitative characters - exercises

Quantitative characters - exercises Quantitative characters - exercises 1. a) Calculate the genetic covariance between half sibs, expressed in the ij notation (Cockerham's notation), when up to loci are considered. b) Calculate the genetic

More information

5. Best Linear Unbiased Prediction

5. Best Linear Unbiased Prediction 5. Best Linear Unbiased Prediction Julius van der Werf Lecture 1: Best linear unbiased prediction Learning objectives On completion of Lecture 1 you should be able to: Understand the principle of mixed

More information

Chapter 3 Best Linear Unbiased Estimation

Chapter 3 Best Linear Unbiased Estimation Chapter 3 Best Linear Unbiased Estimation C R Henderson 1984 - Guelph In Chapter 2 we discussed linear unbiased estimation of k β, having determined that it is estimable Let the estimate be a y, and if

More information

Linear Regression. In this problem sheet, we consider the problem of linear regression with p predictors and one intercept,

Linear Regression. In this problem sheet, we consider the problem of linear regression with p predictors and one intercept, Linear Regression In this problem sheet, we consider the problem of linear regression with p predictors and one intercept, y = Xβ + ɛ, where y t = (y 1,..., y n ) is the column vector of target values,

More information

RESTRICTED M A X I M U M LIKELIHOOD TO E S T I M A T E GENETIC P A R A M E T E R S - IN PRACTICE

RESTRICTED M A X I M U M LIKELIHOOD TO E S T I M A T E GENETIC P A R A M E T E R S - IN PRACTICE RESTRICTED M A X I M U M LIKELIHOOD TO E S T I M A T E GENETIC P A R A M E T E R S - IN PRACTICE K. M e y e r Institute of Animal Genetics, Edinburgh University, W e s t M a i n s Road, Edinburgh EH9 3JN,

More information

Lecture 2: Linear Models. Bruce Walsh lecture notes Seattle SISG -Mixed Model Course version 23 June 2011

Lecture 2: Linear Models. Bruce Walsh lecture notes Seattle SISG -Mixed Model Course version 23 June 2011 Lecture 2: Linear Models Bruce Walsh lecture notes Seattle SISG -Mixed Model Course version 23 June 2011 1 Quick Review of the Major Points The general linear model can be written as y = X! + e y = vector

More information

Alternative implementations of Monte Carlo EM algorithms for likelihood inferences

Alternative implementations of Monte Carlo EM algorithms for likelihood inferences Genet. Sel. Evol. 33 001) 443 45 443 INRA, EDP Sciences, 001 Alternative implementations of Monte Carlo EM algorithms for likelihood inferences Louis Alberto GARCÍA-CORTÉS a, Daniel SORENSEN b, Note a

More information

Animal Models. Sheep are scanned at maturity by ultrasound(us) to determine the amount of fat surrounding the muscle. A model (equation) might be

Animal Models. Sheep are scanned at maturity by ultrasound(us) to determine the amount of fat surrounding the muscle. A model (equation) might be Animal Models 1 Introduction An animal model is one in which there are one or more observations per animal, and all factors affecting those observations are described including an animal additive genetic

More information

Reduced Animal Models

Reduced Animal Models Reduced Animal Models 1 Introduction In situations where many offspring can be generated from one mating as in fish poultry or swine or where only a few animals are retained for breeding the genetic evaluation

More information

Lecture 4. Basic Designs for Estimation of Genetic Parameters

Lecture 4. Basic Designs for Estimation of Genetic Parameters Lecture 4 Basic Designs for Estimation of Genetic Parameters Bruce Walsh. Aug 003. Nordic Summer Course Heritability The reason for our focus, indeed obsession, on the heritability is that it determines

More information

Should genetic groups be fitted in BLUP evaluation? Practical answer for the French AI beef sire evaluation

Should genetic groups be fitted in BLUP evaluation? Practical answer for the French AI beef sire evaluation Genet. Sel. Evol. 36 (2004) 325 345 325 c INRA, EDP Sciences, 2004 DOI: 10.1051/gse:2004004 Original article Should genetic groups be fitted in BLUP evaluation? Practical answer for the French AI beef

More information

INTRODUCTION TO ANIMAL BREEDING. Lecture Nr 3. The genetic evaluation (for a single trait) The Estimated Breeding Values (EBV) The accuracy of EBVs

INTRODUCTION TO ANIMAL BREEDING. Lecture Nr 3. The genetic evaluation (for a single trait) The Estimated Breeding Values (EBV) The accuracy of EBVs INTRODUCTION TO ANIMAL BREEDING Lecture Nr 3 The genetic evaluation (for a single trait) The Estimated Breeding Values (EBV) The accuracy of EBVs Etienne Verrier INA Paris-Grignon, Animal Sciences Department

More information

Animal Model. 2. The association of alleles from the two parents is assumed to be at random.

Animal Model. 2. The association of alleles from the two parents is assumed to be at random. Animal Model 1 Introduction In animal genetics, measurements are taken on individual animals, and thus, the model of analysis should include the animal additive genetic effect. The remaining items in the

More information

Repeated Records Animal Model

Repeated Records Animal Model Repeated Records Animal Model 1 Introduction Animals are observed more than once for some traits, such as Fleece weight of sheep in different years. Calf records of a beef cow over time. Test day records

More information

REML Variance-Component Estimation

REML Variance-Component Estimation REML Variance-Component Estimation In the numerous forms of analysis of variance (ANOVA) discussed in previous chapters, variance components were estimated by equating observed mean squares to expressions

More information

Lecture 5 Basic Designs for Estimation of Genetic Parameters

Lecture 5 Basic Designs for Estimation of Genetic Parameters Lecture 5 Basic Designs for Estimation of Genetic Parameters Bruce Walsh. jbwalsh@u.arizona.edu. University of Arizona. Notes from a short course taught June 006 at University of Aarhus The notes for this

More information

Selection on selected records

Selection on selected records Selection on selected records B. GOFFINET I.N.R.A., Laboratoire de Biometrie, Centre de Recherches de Toulouse, chemin de Borde-Rouge, F 31320 Castanet- Tolosan Summary. The problem of selecting individuals

More information

[y i α βx i ] 2 (2) Q = i=1

[y i α βx i ] 2 (2) Q = i=1 Least squares fits This section has no probability in it. There are no random variables. We are given n points (x i, y i ) and want to find the equation of the line that best fits them. We take the equation

More information

Properties of the least squares estimates

Properties of the least squares estimates Properties of the least squares estimates 2019-01-18 Warmup Let a and b be scalar constants, and X be a scalar random variable. Fill in the blanks E ax + b) = Var ax + b) = Goal Recall that the least squares

More information

Single and multitrait estimates of breeding values for survival using sire and animal models

Single and multitrait estimates of breeding values for survival using sire and animal models Animal Science 00, 75: 15-4 1357-798/0/11300015$0 00 00 British Society of Animal Science Single and multitrait estimates of breeding values for survival using sire and animal models T. H. E. Meuwissen

More information

Lecture 15 Multiple regression I Chapter 6 Set 2 Least Square Estimation The quadratic form to be minimized is

Lecture 15 Multiple regression I Chapter 6 Set 2 Least Square Estimation The quadratic form to be minimized is Lecture 15 Multiple regression I Chapter 6 Set 2 Least Square Estimation The quadratic form to be minimized is Q = (Y i β 0 β 1 X i1 β 2 X i2 β p 1 X i.p 1 ) 2, which in matrix notation is Q = (Y Xβ) (Y

More information

Best linear unbiased prediction when error vector is correlated with other random vectors in the model

Best linear unbiased prediction when error vector is correlated with other random vectors in the model Best linear unbiased prediction when error vector is correlated with other random vectors in the model L.R. Schaeffer, C.R. Henderson To cite this version: L.R. Schaeffer, C.R. Henderson. Best linear unbiased

More information

Raphael Mrode. Training in quantitative genetics and genomics 30 May 10 June 2016 ILRI, Nairobi. Partner Logo. Partner Logo

Raphael Mrode. Training in quantitative genetics and genomics 30 May 10 June 2016 ILRI, Nairobi. Partner Logo. Partner Logo Basic matrix algebra Raphael Mrode Training in quantitative genetics and genomics 3 May June 26 ILRI, Nairobi Partner Logo Partner Logo Matrix definition A matrix is a rectangular array of numbers set

More information

1 Mixed effect models and longitudinal data analysis

1 Mixed effect models and longitudinal data analysis 1 Mixed effect models and longitudinal data analysis Mixed effects models provide a flexible approach to any situation where data have a grouping structure which introduces some kind of correlation between

More information

PRINCIPAL COMPONENTS ANALYSIS

PRINCIPAL COMPONENTS ANALYSIS 121 CHAPTER 11 PRINCIPAL COMPONENTS ANALYSIS We now have the tools necessary to discuss one of the most important concepts in mathematical statistics: Principal Components Analysis (PCA). PCA involves

More information

Lecture 9 Multi-Trait Models, Binary and Count Traits

Lecture 9 Multi-Trait Models, Binary and Count Traits Lecture 9 Multi-Trait Models, Binary and Count Traits Guilherme J. M. Rosa University of Wisconsin-Madison Mixed Models in Quantitative Genetics SISG, Seattle 18 0 September 018 OUTLINE Multiple-trait

More information

Wiley. Methods and Applications of Linear Models. Regression and the Analysis. of Variance. Third Edition. Ishpeming, Michigan RONALD R.

Wiley. Methods and Applications of Linear Models. Regression and the Analysis. of Variance. Third Edition. Ishpeming, Michigan RONALD R. Methods and Applications of Linear Models Regression and the Analysis of Variance Third Edition RONALD R. HOCKING PenHock Statistical Consultants Ishpeming, Michigan Wiley Contents Preface to the Third

More information

G E INTERACTION USING JMP: AN OVERVIEW

G E INTERACTION USING JMP: AN OVERVIEW G E INTERACTION USING JMP: AN OVERVIEW Sukanta Dash I.A.S.R.I., Library Avenue, New Delhi-110012 sukanta@iasri.res.in 1. Introduction Genotype Environment interaction (G E) is a common phenomenon in agricultural

More information

Variance Component Estimation Using Constrained Nonlinear.Maximization. Franz Preitschopf Universitat Augsburg. George Casella Cornell University

Variance Component Estimation Using Constrained Nonlinear.Maximization. Franz Preitschopf Universitat Augsburg. George Casella Cornell University Variance Component Estimation Using Constrained Nonlinear.Maximization BU-1029-M June 1989 Franz Preitschopf Universitat Augsburg George Casella Cornell University Key words and phrases: Mixed model, maximum

More information

Lecture 2: Linear and Mixed Models

Lecture 2: Linear and Mixed Models Lecture 2: Linear and Mixed Models Bruce Walsh lecture notes Introduction to Mixed Models SISG, Seattle 18 20 July 2018 1 Quick Review of the Major Points The general linear model can be written as y =

More information

Genetic Parameters for Stillbirth in the Netherlands

Genetic Parameters for Stillbirth in the Netherlands Genetic Parameters for Stillbirth in the Netherlands Arnold Harbers, Linda Segeren and Gerben de Jong CR Delta, P.O. Box 454, 68 AL Arnhem, The Netherlands Harbers.A@CR-Delta.nl 1. Introduction Stillbirth

More information

Least Squares Estimation-Finite-Sample Properties

Least Squares Estimation-Finite-Sample Properties Least Squares Estimation-Finite-Sample Properties Ping Yu School of Economics and Finance The University of Hong Kong Ping Yu (HKU) Finite-Sample 1 / 29 Terminology and Assumptions 1 Terminology and Assumptions

More information

The concept of breeding value. Gene251/351 Lecture 5

The concept of breeding value. Gene251/351 Lecture 5 The concept of breeding value Gene251/351 Lecture 5 Key terms Estimated breeding value (EB) Heritability Contemporary groups Reading: No prescribed reading from Simm s book. Revision: Quantitative traits

More information

Multiple Linear Regression

Multiple Linear Regression Multiple Linear Regression Simple linear regression tries to fit a simple line between two variables Y and X. If X is linearly related to Y this explains some of the variability in Y. In most cases, there

More information

PREDICTION OF BREEDING VALUES FOR UNMEASURED TRAITS FROM MEASURED TRAITS

PREDICTION OF BREEDING VALUES FOR UNMEASURED TRAITS FROM MEASURED TRAITS Libraries Annual Conference on Applied Statistics in Agriculture 1994-6th Annual Conference Proceedings PREDICTION OF BREEDING VALUES FOR UNMEASURED TRAITS FROM MEASURED TRAITS Kristin L. Barkhouse L.

More information

Inverse of a Square Matrix. For an N N square matrix A, the inverse of A, 1

Inverse of a Square Matrix. For an N N square matrix A, the inverse of A, 1 Inverse of a Square Matrix For an N N square matrix A, the inverse of A, 1 A, exists if and only if A is of full rank, i.e., if and only if no column of A is a linear combination 1 of the others. A is

More information

Open Problems in Mixed Models

Open Problems in Mixed Models xxiii Determining how to deal with a not positive definite covariance matrix of random effects, D during maximum likelihood estimation algorithms. Several strategies are discussed in Section 2.15. For

More information

Oct Simple linear regression. Minimum mean square error prediction. Univariate. regression. Calculating intercept and slope

Oct Simple linear regression. Minimum mean square error prediction. Univariate. regression. Calculating intercept and slope Oct 2017 1 / 28 Minimum MSE Y is the response variable, X the predictor variable, E(X) = E(Y) = 0. BLUP of Y minimizes average discrepancy var (Y ux) = C YY 2u C XY + u 2 C XX This is minimized when u

More information

Multiple-Trait Across-Country Evaluations Using Singular (Co)Variance Matrix and Random Regression Model

Multiple-Trait Across-Country Evaluations Using Singular (Co)Variance Matrix and Random Regression Model Multiple-rait Across-Country Evaluations Using Singular (Co)Variance Matrix and Random Regression Model Esa A. Mäntysaari M Agrifood Research Finland, Animal Production, SF 31600 Jokioinen 1. Introduction

More information

Models with multiple random effects: Repeated Measures and Maternal effects

Models with multiple random effects: Repeated Measures and Maternal effects Models with multiple random effects: Repeated Measures and Maternal effects 1 Often there are several vectors of random effects Repeatability models Multiple measures Common family effects Cleaning up

More information

GENERALIZED LINEAR MIXED MODELS: AN APPLICATION

GENERALIZED LINEAR MIXED MODELS: AN APPLICATION Libraries Conference on Applied Statistics in Agriculture 1994-6th Annual Conference Proceedings GENERALIZED LINEAR MIXED MODELS: AN APPLICATION Stephen D. Kachman Walter W. Stroup Follow this and additional

More information

Multiple regression. Partial regression coefficients

Multiple regression. Partial regression coefficients Multiple regression We now generalise the results of simple linear regression to the case where there is one response variable Y and two predictor variables, X and Z. Data consist of n triplets of values

More information

The equivalence of the Maximum Likelihood and a modified Least Squares for a case of Generalized Linear Model

The equivalence of the Maximum Likelihood and a modified Least Squares for a case of Generalized Linear Model Applied and Computational Mathematics 2014; 3(5): 268-272 Published online November 10, 2014 (http://www.sciencepublishinggroup.com/j/acm) doi: 10.11648/j.acm.20140305.22 ISSN: 2328-5605 (Print); ISSN:

More information

1 Motivation for Instrumental Variable (IV) Regression

1 Motivation for Instrumental Variable (IV) Regression ECON 370: IV & 2SLS 1 Instrumental Variables Estimation and Two Stage Least Squares Econometric Methods, ECON 370 Let s get back to the thiking in terms of cross sectional (or pooled cross sectional) data

More information

An overview of Fay Herriot model with our package smallarea

An overview of Fay Herriot model with our package smallarea An overview of Fay Herriot model with our package smallarea Abhishek Nandy May 3, 2013 1 The Fay Herriot Model 1.1 Model Notations The Fay Herriot model can be written as follows: y i = x T i β + v i +

More information

H = σ 2 G / σ 2 P heredity determined by genotype. degree of genetic determination. Nature vs. Nurture.

H = σ 2 G / σ 2 P heredity determined by genotype. degree of genetic determination. Nature vs. Nurture. HCS825 Lecture 5, Spring 2002 Heritability Last class we discussed heritability in the broad sense (H) and narrow sense heritability (h 2 ). Heritability is a term that refers to the degree to which a

More information

COVARIANCE ANALYSIS. Rajender Parsad and V.K. Gupta I.A.S.R.I., Library Avenue, New Delhi

COVARIANCE ANALYSIS. Rajender Parsad and V.K. Gupta I.A.S.R.I., Library Avenue, New Delhi COVARIANCE ANALYSIS Rajender Parsad and V.K. Gupta I.A.S.R.I., Library Avenue, New Delhi - 110 012 1. Introduction It is well known that in designed experiments the ability to detect existing differences

More information

1. The Multivariate Classical Linear Regression Model

1. The Multivariate Classical Linear Regression Model Business School, Brunel University MSc. EC550/5509 Modelling Financial Decisions and Markets/Introduction to Quantitative Methods Prof. Menelaos Karanasos (Room SS69, Tel. 08956584) Lecture Notes 5. The

More information

Statistical Distribution Assumptions of General Linear Models

Statistical Distribution Assumptions of General Linear Models Statistical Distribution Assumptions of General Linear Models Applied Multilevel Models for Cross Sectional Data Lecture 4 ICPSR Summer Workshop University of Colorado Boulder Lecture 4: Statistical Distributions

More information

An indirect approach to the extensive calculation of relationship coefficients

An indirect approach to the extensive calculation of relationship coefficients Genet. Sel. Evol. 34 (2002) 409 421 409 INRA, EDP Sciences, 2002 DOI: 10.1051/gse:2002015 Original article An indirect approach to the extensive calculation of relationship coefficients Jean-Jacques COLLEAU

More information

Multiple random effects. Often there are several vectors of random effects. Covariance structure

Multiple random effects. Often there are several vectors of random effects. Covariance structure Models with multiple random effects: Repeated Measures and Maternal effects Bruce Walsh lecture notes SISG -Mixed Model Course version 8 June 01 Multiple random effects y = X! + Za + Wu + e y is a n x

More information

Lecture 9. QTL Mapping 2: Outbred Populations

Lecture 9. QTL Mapping 2: Outbred Populations Lecture 9 QTL Mapping 2: Outbred Populations Bruce Walsh. Aug 2004. Royal Veterinary and Agricultural University, Denmark The major difference between QTL analysis using inbred-line crosses vs. outbred

More information

Simple Linear Regression

Simple Linear Regression Simple Linear Regression ST 430/514 Recall: A regression model describes how a dependent variable (or response) Y is affected, on average, by one or more independent variables (or factors, or covariates)

More information

Simple Regression Model Setup Estimation Inference Prediction. Model Diagnostic. Multiple Regression. Model Setup and Estimation.

Simple Regression Model Setup Estimation Inference Prediction. Model Diagnostic. Multiple Regression. Model Setup and Estimation. Statistical Computation Math 475 Jimin Ding Department of Mathematics Washington University in St. Louis www.math.wustl.edu/ jmding/math475/index.html October 10, 2013 Ridge Part IV October 10, 2013 1

More information

Introduction to General and Generalized Linear Models

Introduction to General and Generalized Linear Models Introduction to General and Generalized Linear Models Mixed effects models - Part II Henrik Madsen Poul Thyregod Informatics and Mathematical Modelling Technical University of Denmark DK-2800 Kgs. Lyngby

More information

Large Sample Properties of Estimators in the Classical Linear Regression Model

Large Sample Properties of Estimators in the Classical Linear Regression Model Large Sample Properties of Estimators in the Classical Linear Regression Model 7 October 004 A. Statement of the classical linear regression model The classical linear regression model can be written in

More information

Multiple Linear Regression

Multiple Linear Regression Multiple Linear Regression University of California, San Diego Instructor: Ery Arias-Castro http://math.ucsd.edu/~eariasca/teaching.html 1 / 42 Passenger car mileage Consider the carmpg dataset taken from

More information

2.1 Linear regression with matrices

2.1 Linear regression with matrices 21 Linear regression with matrices The values of the independent variables are united into the matrix X (design matrix), the values of the outcome and the coefficient are represented by the vectors Y and

More information

REGRESSION DIAGNOSTICS AND REMEDIAL MEASURES

REGRESSION DIAGNOSTICS AND REMEDIAL MEASURES REGRESSION DIAGNOSTICS AND REMEDIAL MEASURES Lalmohan Bhar I.A.S.R.I., Library Avenue, Pusa, New Delhi 110 01 lmbhar@iasri.res.in 1. Introduction Regression analysis is a statistical methodology that utilizes

More information

Univariate Normal Distribution; GLM with the Univariate Normal; Least Squares Estimation

Univariate Normal Distribution; GLM with the Univariate Normal; Least Squares Estimation Univariate Normal Distribution; GLM with the Univariate Normal; Least Squares Estimation PRE 905: Multivariate Analysis Spring 2014 Lecture 4 Today s Class The building blocks: The basics of mathematical

More information

Statistical Techniques II EXST7015 Simple Linear Regression

Statistical Techniques II EXST7015 Simple Linear Regression Statistical Techniques II EXST7015 Simple Linear Regression 03a_SLR 1 Y - the dependent variable 35 30 25 The objective Given points plotted on two coordinates, Y and X, find the best line to fit the data.

More information

ON VARIANCE COVARIANCE COMPONENTS ESTIMATION IN LINEAR MODELS WITH AR(1) DISTURBANCES. 1. Introduction

ON VARIANCE COVARIANCE COMPONENTS ESTIMATION IN LINEAR MODELS WITH AR(1) DISTURBANCES. 1. Introduction Acta Math. Univ. Comenianae Vol. LXV, 1(1996), pp. 129 139 129 ON VARIANCE COVARIANCE COMPONENTS ESTIMATION IN LINEAR MODELS WITH AR(1) DISTURBANCES V. WITKOVSKÝ Abstract. Estimation of the autoregressive

More information

An Overview of Variance Component Estimation

An Overview of Variance Component Estimation An Overview of Variance Component Estimation by Shayle R. Searle Biometrics Unit, Cornell University, Ithaca, N.Y., U.S.A., 14853 BU-1231-M April 1994 AN OVERVIEW OF VARIANCE COMPONENT ESTIMATION Shayle

More information

STAT 100C: Linear models

STAT 100C: Linear models STAT 100C: Linear models Arash A. Amini June 9, 2018 1 / 56 Table of Contents Multiple linear regression Linear model setup Estimation of β Geometric interpretation Estimation of σ 2 Hat matrix Gram matrix

More information

Introduction to Random Effects of Time and Model Estimation

Introduction to Random Effects of Time and Model Estimation Introduction to Random Effects of Time and Model Estimation Today s Class: The Big Picture Multilevel model notation Fixed vs. random effects of time Random intercept vs. random slope models How MLM =

More information

MIXED MODELS FOR REPEATED (LONGITUDINAL) DATA PART 2 DAVID C. HOWELL 4/1/2010

MIXED MODELS FOR REPEATED (LONGITUDINAL) DATA PART 2 DAVID C. HOWELL 4/1/2010 MIXED MODELS FOR REPEATED (LONGITUDINAL) DATA PART 2 DAVID C. HOWELL 4/1/2010 Part 1 of this document can be found at http://www.uvm.edu/~dhowell/methods/supplements/mixed Models for Repeated Measures1.pdf

More information

Variations. ECE 6540, Lecture 10 Maximum Likelihood Estimation

Variations. ECE 6540, Lecture 10 Maximum Likelihood Estimation Variations ECE 6540, Lecture 10 Last Time BLUE (Best Linear Unbiased Estimator) Formulation Advantages Disadvantages 2 The BLUE A simplification Assume the estimator is a linear system For a single parameter

More information

Generalized Linear. Mixed Models. Methods and Applications. Modern Concepts, Walter W. Stroup. Texts in Statistical Science.

Generalized Linear. Mixed Models. Methods and Applications. Modern Concepts, Walter W. Stroup. Texts in Statistical Science. Texts in Statistical Science Generalized Linear Mixed Models Modern Concepts, Methods and Applications Walter W. Stroup CRC Press Taylor & Francis Croup Boca Raton London New York CRC Press is an imprint

More information

Next is material on matrix rank. Please see the handout

Next is material on matrix rank. Please see the handout B90.330 / C.005 NOTES for Wednesday 0.APR.7 Suppose that the model is β + ε, but ε does not have the desired variance matrix. Say that ε is normal, but Var(ε) σ W. The form of W is W w 0 0 0 0 0 0 w 0

More information

Multivariate Regression (Chapter 10)

Multivariate Regression (Chapter 10) Multivariate Regression (Chapter 10) This week we ll cover multivariate regression and maybe a bit of canonical correlation. Today we ll mostly review univariate multivariate regression. With multivariate

More information

Multiple Regression Analysis. Part III. Multiple Regression Analysis

Multiple Regression Analysis. Part III. Multiple Regression Analysis Part III Multiple Regression Analysis As of Sep 26, 2017 1 Multiple Regression Analysis Estimation Matrix form Goodness-of-Fit R-square Adjusted R-square Expected values of the OLS estimators Irrelevant

More information

Part 6: Multivariate Normal and Linear Models

Part 6: Multivariate Normal and Linear Models Part 6: Multivariate Normal and Linear Models 1 Multiple measurements Up until now all of our statistical models have been univariate models models for a single measurement on each member of a sample of

More information

ASPECTS OF SELECTION FOR PERFORMANCE IN SEVERAL ENVIRONMENTS WITH HETEROGENEOUS VARIANCES

ASPECTS OF SELECTION FOR PERFORMANCE IN SEVERAL ENVIRONMENTS WITH HETEROGENEOUS VARIANCES University of Nebraska - Lincoln DigitalCommons@University of Nebraska - Lincoln Faculty Papers and Publications in Animal Science Animal Science Department 2-3-1987 ASPECTS OF SELECTION FOR PERFORMANCE

More information

Hypothesis Testing for Var-Cov Components

Hypothesis Testing for Var-Cov Components Hypothesis Testing for Var-Cov Components When the specification of coefficients as fixed, random or non-randomly varying is considered, a null hypothesis of the form is considered, where Additional output

More information

Lecture 28: BLUP and Genomic Selection. Bruce Walsh lecture notes Synbreed course version 11 July 2013

Lecture 28: BLUP and Genomic Selection. Bruce Walsh lecture notes Synbreed course version 11 July 2013 Lecture 28: BLUP and Genomic Selection Bruce Walsh lecture notes Synbreed course version 11 July 2013 1 BLUP Selection The idea behind BLUP selection is very straightforward: An appropriate mixed-model

More information

POLI 8501 Introduction to Maximum Likelihood Estimation

POLI 8501 Introduction to Maximum Likelihood Estimation POLI 8501 Introduction to Maximum Likelihood Estimation Maximum Likelihood Intuition Consider a model that looks like this: Y i N(µ, σ 2 ) So: E(Y ) = µ V ar(y ) = σ 2 Suppose you have some data on Y,

More information

Lecture 7 Correlated Characters

Lecture 7 Correlated Characters Lecture 7 Correlated Characters Bruce Walsh. Sept 2007. Summer Institute on Statistical Genetics, Liège Genetic and Environmental Correlations Many characters are positively or negatively correlated at

More information

Linear Algebra Massoud Malek

Linear Algebra Massoud Malek CSUEB Linear Algebra Massoud Malek Inner Product and Normed Space In all that follows, the n n identity matrix is denoted by I n, the n n zero matrix by Z n, and the zero vector by θ n An inner product

More information

Non-polynomial Least-squares fitting

Non-polynomial Least-squares fitting Applied Math 205 Last time: piecewise polynomial interpolation, least-squares fitting Today: underdetermined least squares, nonlinear least squares Homework 1 (and subsequent homeworks) have several parts

More information

ECON The Simple Regression Model

ECON The Simple Regression Model ECON 351 - The Simple Regression Model Maggie Jones 1 / 41 The Simple Regression Model Our starting point will be the simple regression model where we look at the relationship between two variables In

More information

Estimating Breeding Values

Estimating Breeding Values Estimating Breeding Values Principle how is it estimated? Properties Accuracy Variance Prediction Error Selection Response select on EBV GENE422/522 Lecture 2 Observed Phen. Dev. Genetic Value Env. Effects

More information

AN INTRODUCTION TO GENERALIZED LINEAR MIXED MODELS. Stephen D. Kachman Department of Biometry, University of Nebraska Lincoln

AN INTRODUCTION TO GENERALIZED LINEAR MIXED MODELS. Stephen D. Kachman Department of Biometry, University of Nebraska Lincoln AN INTRODUCTION TO GENERALIZED LINEAR MIXED MODELS Stephen D. Kachman Department of Biometry, University of Nebraska Lincoln Abstract Linear mixed models provide a powerful means of predicting breeding

More information

Lecture 5: LDA and Logistic Regression

Lecture 5: LDA and Logistic Regression Lecture 5: and Logistic Regression Hao Helen Zhang Hao Helen Zhang Lecture 5: and Logistic Regression 1 / 39 Outline Linear Classification Methods Two Popular Linear Models for Classification Linear Discriminant

More information

14 Multiple Linear Regression

14 Multiple Linear Regression B.Sc./Cert./M.Sc. Qualif. - Statistics: Theory and Practice 14 Multiple Linear Regression 14.1 The multiple linear regression model In simple linear regression, the response variable y is expressed in

More information

Generalized Linear Models

Generalized Linear Models Generalized Linear Models Lecture 3. Hypothesis testing. Goodness of Fit. Model diagnostics GLM (Spring, 2018) Lecture 3 1 / 34 Models Let M(X r ) be a model with design matrix X r (with r columns) r n

More information