Genetics: Early Online, published on May 5, 2017 as /genetics

Size: px
Start display at page:

Download "Genetics: Early Online, published on May 5, 2017 as /genetics"

Transcription

1 Genetics: Early Online, published on May 5, 2017 as /genetics GENETICS INVESTIGATION Accounting for Sampling Error in Genetic Eigenvalues using Random Matrix Theory Jacqueline L. Sztepanacz,1 and Mark W. Blows School of Biological Sciences, University of Queensland ABSTRACT The distribution of genetic variance in multivariate phenotypes is characterized by the empirical spectral distribution of the eigenvalues of the genetic covariance matrix. Empirical estimates of genetic eigenvalues from random effects linear models are known to be over-dispersed by sampling error, where large eigenvalues are biased upwards, and small eigenvalues are biased downwards. The overdispersion of the leading eigenvalues of sample covariance matrices have been demonstrated to conform to the Tracy-Widom (TW) distribution. Here we show that genetic eigenvalues estimated using REML in a multivariate random effects model with an unconstrained genetic covariance structure will also conform to the TW distribution after empirical scaling and centering. However, where estimation procedures using either REML or MCMC impose boundary constraints, the resulting genetic eigenvalues tend not be TW distributed. We show how using confidence intervals from sampling distributions of genetic eigenvalues without reference to the TW distribution is insufficient protection against mistaking sampling error as genetic variance, particularly when eigenvalues are small. By scaling such sampling distributions to the appropriate TW scale, the critical value of the TW statistic can be used to determine if the magnitude of a genetic eigenvalue exceeds the sampling error for each eigenvalue in the spectral distribution of a given genetic covariance matrix. KEYWORDS genetic variance; eigenvalues; random matrix theory; Tracy Widom distribution; REML-MVN Introduction As a consequence of pleiotropy, the biological conclusions drawn from the distribution of genetic variance across phenotypic space are often in sharp contrast to the magnitude of genetic variance typically observed for individual traits (Dickerson 1955; Blows and Hoffmann 2005). Most individual traits tend to display substantial levels of genetic variance, however most of this genetic variance is often confined to fewer multivariate trait combinations than the number of individual traits that are measured (Hine and Blows 2006; Kirkpatrick 2009; Walsh and Blows 2009). As applications of quantitative genetics to human health, agriculture and evolutionary biology move towards adopting multivariate analyses of genetic variance, it will therefore be important that analytical approaches to accommodate the complexity of higher dimensional genetic information are developed. Copyright 2017 by the Genetics Society of America doi: /genetics.XXX.XXXXXX Manuscript compiled: Thursday 4 th May, 2017% 1 School of Biological Sciences, University of Queensland, St. Lucia, Queensland, Australia jsztepanacz@gmail.com The fundamental tool for understanding how pleiotropic covariance restricts the genetic variance in individual traits to multivariate combinations of these traits is the genetic covariance (G) matrix, a symmetrical variance-component matrix whose diagonal elements represent the genetic variance in individual traits and the off-diagonal elements the covariances between them. The multivariate distribution of genetic variance is then determined by the empirical spectral distribution of the eigenvalues of G, in which each eigenvalue explains a portion of the total genetic variance underlying the phenotypic space (Dickerson 1955; Hill and Thompson 1978; Lande 1979; Pease and Bull 1988; Blows 2007). Although analyses of the spectral distribution of G are relatively uncommon, an exponential decline in eigenvalues is typically observed, with some eigenvalues approaching zero (Pitchers et al. 2014; Kirkpatrick 2009). In evolutionary genetics the relative sizes of the eigenvalues of G are of particular interest, as they may determine how populations respond to selection in directions of phenotypic space. For example, trait combinations with small genetic eigenvalues that have low levels of genetic variance form a nearly-null genetic subspace (Gomulkiewicz and Houle 2009; Houle and Fierst 2013; Hine et al. 2014), which may or may not include a true null subspace where genetic vari- Copyright Genetics, Vol. XXX, XXXX XXXX May

2 ance is zero (Mezey and Houle 2005). The response to selection in these directions of phenotypic space may be severely slowed (Kirkpatrick et al. 1990), and biased towards trait combinations with higher levels of genetic variance (Chenoweth et al. 2010). If population sizes are small, failure to respond to selection in these regions of phenotypic space is also a real possibility (Gomulkiewicz and Houle 2009; Hine et al. 2014). Consequently, determining whether the eigenvalues from variance-component matrices that represent levels of genetic variance are significantly different from zero is important for understanding the evolution of multivariate phenotypes. Although the pattern of decay in genetic eigenvalues from multivariate genetic analyses may reflect the biological covariance among traits, it is known from random matrix theory (RMT) that a similar pattern is also the expected outcome from random sampling alone (Johnstone 2006). RMT provides a framework for understanding the behaviour of eigenvalues of symmetrical matrices with elements drawn randomly from a wide array of statistical distributions. For sample covariance matrices in particular, the behavior of such eigenvalues has been the subject of intense interest (Bai and Silverstein 2010; Wigner 1955). In a genetics context, sample covariance matrices describe amongindividual covariance structure, for example SNPs that are represented as genomic relatedness matrices (Patterson et al. 2006), and phenotypic covariance matrices. To illustrate two key results from RMT for sample covariance matrices, consider the phenotypic covariance (P) matrix of a set of p = 5 quantitative traits sampled from a multivariate normal distribution with a mean of 0 and covariance matrix equal to the identity matrix on n = 250 individuals. While the naïve expectation would be that all eigenvalues of this matrix are equal to 1, sampling error causes the eigenvalues to be over-dispersed, where some eigenvalues are much larger and smaller than 1. This spectral distribution of eigenvalues can be characterized in two ways. First, the bulk of the eigenvalues are known to conform to the Marchenko-Pastur (MP) distribution (Johnstone 2006), which defines the interval in which the bulk of the eigenvalues will fall and the shape of their distribution (Figure 1A). Second, each individual eigenvalue in the spectrum is known to conform to a variation of the Tracy-Widom distribution for which the moments of the distribution differ for each eigenvalue (Tracy and Widom 1996; Johnstone 2001) (Figure 1B). Utilizing this property of individual eigenvalues, one application of RMT in genetics has been to test for cryptic population structure in genome wide association studies. When p SNPs are used to estimate the relatedness between n individuals, the eigenvalues of the n x n sample covariance matrix (the genomic relatedness matrix) can be tested against the TW distribution to determine if cryptic population structure is present in the sample (Bryc et al. 2013; Patterson et al. 2006). In contrast, a G matrix is partitioned from P using a multivariate random effects model, and therefore is not an example of a sample covariance matrix, but a variance component matrix. The theoretical limiting distributions for the eigenvalues of variance component matrices, like G, have been until recently unexplored. Hill and Thompson (1978) first noted that genetic eigenvalues tended to be over-dispersed as a consequence of sampling; large eigenvalues tended to overestimated, and small eigenvalues tended to be underestimated. Blows and McGuigan (2015) showed that the bulk of the genetic eigenvalues follow a MP-like distribution that describes this overdispersion in the context of RMT, and it has subsequently been confirmed that the MP distribution can be generalized to describe the bulk distribution of eigenvalues from variance component matrices (Fan and Johnstone 2016). In addition to the bulk distribution of eigenvalues, the leading eigenvalue of G has been shown to conform to the TW distribution using an empirical scaling approach (Saccenti et al. 2011; Blows and McGuigan 2015), highlighting that hypothesis testing of genetic eigenvalues can take advantage of the TW distribution to form the appropriate null distribution that observed genetic eigenvalues can be tested against. Here, we show how RMT approaches can be applied in quantitative genetic analyses to account for the sampling error that is concentrated in the leading eigenvalues of G matrices. We demonstrate how the use of confidence intervals on the eigenvalues of G estimated from restricted maximum likelihood (REML) and Bayesian genetic analyses can result in erroneous conclusions concerning the presence of genetic variance when examined in the absence of the appropriate null distribution. We integrate the use of the TW distribution into the analyses of genetic eigenvalues from these approaches to enable a test of whether a genetic eigenvalue significantly differs from the null distribution. Methods Simulation of Random Data To illustrate the behaviour of the sampling variance in genetic eigenvalues we generated simulated data sets with a simple experimental design that represented 50 lines, with 5 individuals sampled from each line (representing, for example, 5 individuals sampled from each of 50 inbred lines). The identity matrix was used to sample the five traits for each data set from a multivariate normal distribution with a mean of 0, giving no rise to within- or among- line covariance. Therefore, any covariance among the five traits within each data set was solely a consequence of sampling. Estimation of (co)variance components at the among-line (genetic) level then represents the behaviour of the sampling error in the genetic variance of individual traits and the multivariate genetic eigenvectors, in the absence of genetic information in the data. This simulation closely matches that used in McGuigan and Blows (2015), where the relevance of the MP and TW distributions for genetic eigenvalues was first explored using 10 simulated traits. In the current work we use only five traits to reduce computational times. This also represents the modal number of traits that are currently used in empirical studies that have estimated G matrices (Pitchers et al. 2014). While RMT was developed for very high dimensional problems, the TW distribution has been shown to empirically fit well with p = 5 traits in a number of contexts (Ma 2012), including for variance-component matrices (Blows and McGuigan 2015). Statistical Analysis The statistical analysis of multivariate genetic data typically employs either restricted maximum likelihood (REML) or Bayesian based approaches to estimate the genetic covariance matrix. In these analyses, an important choice to be made in the multivariate modeling is the type of covariance structure to be imposed on the data at the genetic level (in this case the among-line level of our simulation). The simplest covariance structure is an unconstrained covariance matrix that permits genetic correlations to exceed the theoretical boundary (-1 and 1), often resulting in negative eigenvalues of G. The probability that an estimated 2 Jacqueline L. Sztepanacz et al.

3 Figure 1 The limiting eigenvalue distributions for a P-matrix of five traits from random matrix theory. For each of 1000 simulated data sets, n = 250 samples for p = 5 traits were sampled from a multivariate normal distribution with a mean of 0 and covariance matrix equal to the identity matrix, such that any covariance among the five traits within each data set was only as a consequence of sampling. (A) A histogram representing the bulk distribution of the 5000 simulated phenotypic eigenvalues. The solid curve is the Marchenko-Pastur distribution, scaled and centered using p/n = (B) Each histogram represents the TW statistic for each of the 5 eigenvalues of P calculated from the 1000 simulated datasets, in order from the 1st (far right) to the 5th (far left) eigenvalue. The solid curves represent the theoretical Tracy-Widom distributions for each eigenvalue that have been approximated by a gamma distribution (Chiani 2014). G-matrix will be non-positive definite is very high with the commonly employed sample sizes in evolutionary studies (Hill and Thompson 1978). While negative eigenvalues seem counterintuitive, they are an important component of the behaviour of the sampling error in variance-component matrices and the unconstrained covariance structure is therefore most compatible with RMT (Blows and McGuigan 2015). An alternative choice that is used extensively in quantitative genetics is a constrained covariance structure that requires the estimate of G to stay within the parameter space. Within REML based analyses, this can be achieved by specifying a factoranalytic covariance structure that fits a number of orthogonal trait combinations, up to the number of traits. In this case, G is constrained to be positive semi-definite, so that all eigenvalues are 0. Similarly, Bayesian analyses that utilize Markov chain Monte Carlo (MCMC) typically sample from a sums-of-squaresand-cross-product matrix and consequently estimates of G from these analyses are also constrained to be positive semi-definite (Sorensen and Gianola 2010; Hazelton and Gurrin 2003). Although a constrained covariance structure results in biologically sensible estimates of G, the leading genetic eigenvalues remain subject to the sampling error predicted from RMT as we show below. Here, we use both REML and MCMC approaches to analyze random data, in order to determine whether the sampling variance that is concentrated in the leading eigenvalues that result from the respective analyses conform to the TW distribution. We then demonstrate how the TW distribution can be used to determine whether an observed genetic eigenvalue significantly differs from zero, and explore the utility of this method for three experimental designs that represent different levels of power. First, using REML, we analyze the simulated datasets employing both unconstrained and factor analytic covariance structures to determine whether the sampling variance of random data generated by each of these models is TW distributed. Next, using an MCMC approach we analyzed single random datasets using two commonly employed priors. We sampled times from each of the posterior distributions in order to determine whether the posterior distributions of eigenvalues from MCMC models are TW distributed. Third, we use a recently suggested REML-MVN sampling approach that approximates the sampling error of genetic (co)variances (Houle and Meyer 2015; Meyer and Houle 2013) to generate samples from the REML analysis of single random datasets, to determine whether the eigenvalues generated by REML-MVN sampling of random data are TW distributed. Finally, we simulated data with genetic variance for three experimental designs that represent different levels of power, and we demonstrate how REML-MVN sampling can be used to test whether a genetic eigenvalue represents a significant level of genetic variance using the TW distribution. REML estimation of G using unstructured and factor analytic models and the TW distribution We performed two separate analyses of the simulated data sets, first employing an unconstrained covariance structure, and second using a factor-analytic covariance structure. For both analyses, G was estimated using the following multivariate linear model: Y = µ + Z l θ G + Iɛ (1) where Y denotes a stacked vector of multivariate observations, Z l was the among-line design matrix relating observations GENETICS Journal Template on Overleaf 3

4 to the vector of unknown random effects, θ G was the vector of unknown random genetic effects, and ɛ was a vector of residual errors. The random effect (and residuals) were assumed to be normally distributed and elements of θ G were further assumed to be drawn from θ G N(0, G Z l ), where G was the genetic covariance matrix and Z l was the genetic relationship matrix. In the case of the unconstrained analysis, G was modeled as an unstructured covariance matrix, and in the case of the factor analytic analysis G was modeled using a full-rank factor analytic structure: G = ΛΛ T (2) where Λ was a lower triangular matrix of factor loadings. Models were run using REML implemented in the MIXED procedure in SAS (version 9.4; SAS Institute Inc.). Both analyses returned G matrices from which the spectral decomposition provided samples for each of the five eigenvalues of G. As any covariance in G estimated from these models is simply a consequence of sampling, these eigenvalues form the null distribution of sampling variance that can be compared against the theoretical TW distribution. To determine whether the eigenvalues of G obtained from (1) conform to the TW distribution specific to each eigenvalue, they first needed to be scaled and centered. The scaling and centering parameters for the theoretical limiting distributions of variance component matrices such as G are currently unknown (I. Johnstone, personal communication), however an empirical scaling approach can be utilized (Saccenti et al. 2011). Rescaling of each observed genetic eigenvalue λ o was accomplished using the approach of Saccenti et al (2011): TW G = µ i + σ i σ o (λ o µ o ) (3) where µ i and σ i are the mean and standard deviation of the theoretical TW distribution for the i th eigenvalue, and µ o and σ o are the mean and standard deviation of the corresponding observed genetic eigenvalues of random data λ o, respectively. The rescaling procedure followed here differs from that given in Box 2 of Blows and McGuigan (2015), with the removal of a redundant step represented by an initial rescaling using the equation TW w = p (2/3) (λ i 2) before applying (3). To obtain the theoretical distributions of each eigenvalue to compare the TW G statistic to, we generated observations from each theoretical TW distribution using its gamma approximation. For each theoretical TW distribution with mean (µ i ), variance (σ i ), and skewness (ρ i ), the shape (k i ), scale (θ i ) and constant (α i ) of the matching gamma distribution (Γ[k i, θ i ] α i ) is given by (Chiani 2014): k i = 4 ρ 2 ; θ i = σ iρ i 2 ; α i = k i θ i µ i. (4) i The distributions of the TW G statistics for the observed genetic eigenvalues were then compared to the appropriate theoretical TW distribution for that particular eigenvalue using QQ plots. MCMC estimation of G and the TW distribution MCMC is an analytical approach that is often used to place confidence on genetic parameters by sampling from the MCMC chain and using this posterior distribution of the (co)variance components to generate confidence intervals (Hadfield et al. 2010; Sorensen 2008; O Hara et al. 2008). The covariance components in such models are constrained to fall within the parameter space and hence the estimated G matrices will be positive-definite (O Hara et al. 2008; Hadfield 2010; Hazelton and Gurrin 2003). Therefore, the posterior distribution can t be used to test whether genetic variances themselves differ from zero. Consequently, a randomization of the data across the pedigree or with respect to the genetic groups such that any estimated covariance is a consequence of sampling and not genetic variance has been used to provide a posterior null distribution that is used to compare the observed distribution to (eg. Aguirre et al. 2014; McGuigan et al. 2015; Hadfield 2010). To determine whether the eigenvalues from the posterior null distribution generated by MCMC analyses conformed to the TW distribution we analyzed a single random data set (the first of our simulated datasets from above) using the MCMCglmm package (Hadfield 2010) in R. We sampled from the posterior distribution of the MCMC chain times, in order to obtain G matrices (and consequently samples of each eigenvalue). The analysis was repeated 10 times (for 10 different random data sets), in order to determine the consistency of results across the 10 data sets. The data were analyzed in accordance with the multivariate linear model described in (1). We performed two separate analyses where the posterior chain was sampled times in each, and where the two analyses differed by the prior distribution specified. In practice, the goal of many quantitative genetic studies is to choose a prior that is non-informative with respect to the (co)variance components, having little influence on the posterior distribution. However, choosing uninformative priors is a non-trivial task; presumably uninformative flat prior distributions have been shown to strongly influence (some) model parameters, and specifying uninformative priors at all scales of the analysis may be difficult (Van Dongen 2006; O Hara et al. 2008). Here we used two priors that are often employed in quantitative genetic studies: an inverse Wishart distribution where the phenotypic variance is partitioned equally among random effects and the degree of belief is weak (eg. Wilson et al. 2010), and a half-cauchy parameter expanded prior that may perform better when variances are close to zero (eg. Hadfield 2009). Preliminary analyses in which we used an improper prior deviated substantially from the TW distribution and often suffered from numerical issues, and therefore we do not consider this option any further. In both cases, the priors for the location parameters were normally distributed and diffuse about a mean of zero and a variance of For the variance components we first analyzed the data using an inverse-wishart prior where the scale parameter was defined by a diagonal matrix containing values of one-half of the phenotypic variance, and the degree of belief was set at 5.002, slightly more than the dimensions of the matrix. Next, we used a half-cauchy parameter expanded prior with a scale parameter equal to For both priors the joint posterior distribution was estimated from MCMC iterations sampled at 300 iteration intervals after an initial burn-in period of iterations. Overall, model convergence diagnostics indicated that the MCMC chain sampled the parameter space adequately, and the autocorrelation between successive samples was typically much below 0.1 for analyses using both priors. For each of samples of the posterior distribution, we performed an eigenanalysis of G, obtaining samples for each eigenvalue. We subsequently scaled the eigenvalues as above, according to equation (3), and the TW G statistics for the genetic eigenvalues were then compared to the appropriate 4 Jacqueline L. Sztepanacz et al.

5 gamma approximation of the theoretical TW distribution for that particular eigenvalue. REML estimation of G with MVN sampling and the TW distribution Conceptually similar to the confidence intervals placed on variance component estimates from MCMC using the posterior distribution, Houle and Meyer (2015; 2013) have recently shown how the inverse of the Fisher Information matrix of covariance parameters (H(θ G ) 1 ) from multivariate REML models can be used to generate confidence intervals on REML estimates of variance components. This approach is appealing compared to MCMC methods from both a computational perspective and because no prior specification is required. To determine whether the sampling distribution of the eigenvalues generated by REML- MVN sampling conformed to the TW distribution we analyzed a single random data set (the first of the simulated datasets from above), fitting an unconstrained REML analysis in accordance with (1), and then obtained REML-MVN samples from the inverse of the Fisher Information matrix (H(θ G ) 1 ) of this analysis. Again, we repeated the analysis 10 times (for 10 different random data sets), in order to determine the consistency of results across data sets. REML estimates of the p(p+1)/2 (co)variance components were obtained by fitting (1) with an unconstrained covariance structure that allowed negative eigenvalues. Following estimation of the (co)variance components, we directly sampled the elements of G times, from the distribution N( ˆθ G, H( ˆθ G ) 1 ), where ˆθ G was the vector of parameter estimates of G and H( ˆθ G ) 1 was the inverse of the Fisher information matrix from the unconstrained analysis (Houle and Meyer 2015; Meyer and Houle 2013). This is considered sampling on the G-scale of an unconstrained analysis. Alternatively, one can sample on the L-scale, where the elements of L (with vector of estimates θ L ) are the Cholesky factors from a factor analytic model. However, REML-MVN sampling requires that the Fisher Information matrix be safely positive-definite. The Fisher Information matrix from the analysis of these data that employed a factor analytic covariance structure was highly ill-conditioned, and consequently did not meet this requirement. Therefore, we only sample on the G-scale of unconstrained analyses here. For each of the samples from the distribution N( ˆθ G, H( ˆθ G ) 1 ) we performed an eigenanalysis of G, obtaining estimates of each eigenvalue. We subsequently scaled the eigenvalues as above, according to equation (3), and the TW G statistics for the genetic eigenvalues were then compared to the appropriate gamma approximation of the theoretical TW distribution for that particular eigenvalue. Using REML-MVN to test observed genetic eigenvalues against the appropriate null To illustrate how the REML-MVN approach can be used to test whether genetic eigenvalues significantly differ from the null distribution, we simulated two data structures that incorporated known genetic variance for each of three experimental designs that represented different levels of power. Incorporating known genetic variance into the simulation also allowed us to highlight the potential problems that arise when REML-MVN sampling is used to generate confidence intervals on eigenvalues without reference to the TW distribution. The three experimental designs were: 50 lines with 5 individuals sampled per line (the same design presented above for random data), 500 lines with 10 individuals sampled per line, and a half-sibling design with 200 sires, 5 dams per sire, and 5 individuals per full-sib family. These experimental designs represent, for example, 50 (500) inbred lines with 5 (10) individuals sampled per line, and a standard half-sibling breeding design used in many quantitative genetic studies. We modified the data simulation from above so that a single genetic dimension was present in the data, or five genetic dimensions were present in the data (ie. the genetic covariance matrix was full-rank). The among line (or among sire) covariance matrix G 1 for the data simulated to have a single genetic dimension was modeled using the factor analytic structure: Λ 1 = together with an unstructured dam (in the case of the halfsib design) and residual (for all cases) covariance matrix that resulted in an expected eigenvalue of λ 1 of The among line covariance matrix G 5 for the data simulated to have five genetic dimensions was modeled using the factor-analytic structure: Λ 5 = together with an unstructured dam (in the case of the halfsib design) and residual (for all cases) covariance matrix. The expected eigenvalues of the five genetic factors were 0.69, 0.57, 0.32, 0.08, 0.01, respectively. For each of the three experimental designs we also simulated random data, using the identity matrix to sample from a multivariate normal distribution with a mean of 0, giving rise to no within- or among- line covariance. This provided a baseline with which the performance of REML-MVN in conjunction with the TW could be compared for experimental designs with varying levels of power. In total there were 3 experimental designs, 2 simulated data sets with genetic variance, and one simulated random data set (6 data sets with genetic variance + 3 data sets without genetic variance). To obtain the parameter estimates of the observed G for each experimental design/data set combination that had genetic variance, each of the six data sets were analyzed using model (1) with an unconstrained covariance structure. To generate their respective null distributions the data were randomized across the lines (or sires in the case of the half-sib design), and again were analyzed using model (1) with an unconstrained covariance structure. We then sampled the (co)variance elements of the randomized G times, from the distribution N( ˆθ G, H( ˆθ G ) 1 ) (Houle and Meyer 2015; Meyer and Houle 2013). The genetic eigenvalues that formed the null distributions were scaled using (3). To compare the eigenvalues resulting from the observed G to the null distribution, they also needed to be placed on the appropriate scale. For each observed eigenvalue, GENETICS Journal Template on Overleaf 5

6 the TW G statistic was calculated according to (3) where µ o and σ o were the mean and standard deviation of the eigenvalues from the null distribution of randomized data, substituting each of the five observed genetic eigenvalues λ i for the observed distribution with genetic variance. The sampling distributions of the simulated random data for each of the three experimental designs were generated as described above. Results REML estimation of G using unstructured and factor analytic models and the TW distribution For the random datasets the magnitudes of the leading eigenvalues of G were biased upwards by the concentration of sampling error under both the unconstrained and constrained models. Since these data were simulated randomly with respect to genetic line, the genetic variance in each trait and consequently in any eigenvector is expected to be zero. The average eigenvalues for λ 1 of 0.13 and 0.11, respectively, therefore represented spurious genetic variance that was purely a consequence of sampling error. With sample covariance matrices, it is well known that as the ratio of the number of parameters to the number of observations ( p n ) increases, the inflation of the leading eigenvalues will also increase (Johnstone 2001). Therefore, we expect that the magnitude of the spurious eigenvalue will increase as the ratio of p to the genetic degrees of freedom increases, although this remains to be explored in detail for different experimental designs. Each of the first four eigenvalues that were estimated from the random datasets using an unconstrained covariance structure that permitted negative eigenvalues fit the TW distribution well (Table 1). However, the last eigenvalue did show some deviation, falling below the 1:1 line on the QQ plot at the tails of the distribution (Figure 2; Table 1). Consequently, the proportion of eigenvalues that exceeded the critical value of the theoretical distribution at α 0.05 was only (Table 1). In contrast to the unstructured REML analyses, only the first two genetic eigenvalues from the model fitting the factor analytic structure appear to conform to the TW distribution (Figure 2; Table 1). The factor analytic covariance structure constrains the genetic estimates to the parameter space, and therefore the lower bound on the eigenvalues from these models is zero. By the second eigenvalue (λ 2 ) the effect of the boundary constraint was becoming evident by the deviation of the lower tail from the 1:1 line on the QQ plot (Figure 2B). As the boundary was approached there was a significant deviation from the TW distribution for the last three eigenvalues, evidenced by both the QQ plot (Figure 2C-E) and the proportion of observations that exceeded the critical value (Table 1). MCMC estimation of G and the TW distribution The MCMC analysis differed from the previous REML models, in that the sampling variance of each eigenvalue was characterized by samples of the posterior distributions for single data sets. The centering of the eigenvalue distribution along the x-axis is therefore determined by the individual data set that is used. For example, in the first of our simulated data sets, the observed lead eigenvalue estimated from an unconstrained REML analysis of that data set was Therefore it would be expected that the samples of the posterior distribution from the MCMC analysis of this data set would center on However, the MCMC analyses returned posterior distributions of eigenvalues that substantially differed in mode depending on the prior that was used. For the inverse Wishart prior, the posterior distribution of the leading eigenvalue (λ 1 ) was centered on , well above the parameter estimate from REML (Table 1). In contrast, the parameter expanded prior returned a posterior distribution that was centered well below the REML parameter estimate at (Table 1). The posterior distributions of the eigenvalues from the first data set that was analyzed deviated substantially from the TW distribution for both priors that were used, with a larger proportion of eigenvalues exceeding the critical value than expected by chance in most cases (Table 1). This deviation from the theoretical distribution was consistent across all 10 of the random data sets that were analyzed (Figure 3A,B). The posterior distributions of variances from MCMC models are constrained to fall within the parameter space, as they are for factor analytic REML models, and therefore both the boundary constraint and the concentration of sampling variance in leading eigenvalues may result in their overestimation and consequently deviation from the theoretical distribution. However, considering that the first eigenvalue from MCMC models already failed to conform well to the TW distribution, despite being well away from the boundary with p=5 traits, the boundary constraint may not be the only contributing factor to the deviation from the TW distribution in this case (Table 1; Figure 3A,B). REML estimation of G with MVN sampling and the TW distribution The REML-MVN sampling was carried out using the same 10 data-sets analyzed in the MCMC models above. Again, for the first of the simulated data-sets, the mean magnitude of the lead eigenvalue observed from the REML-MVN sampling approach was higher than the parameter estimate of , with an average eigenvalue of 0.15 (Table 1). Despite the difference in mean for the lead eigenvalue λ 1, the sampling distribution generated by the REML-MVN approach conformed well to the TW distribution and in a consistent manner across the 10 data sets (Figure 4A), with the proportion of observations for the first data set that exceeded the critical value (α 0.05 ) at Similarly, the following two eigenvalues λ 2 and λ 3 also conformed well to the theoretical distribution (Table 1), at least as well as the eigenvalues estimated from random data-sets using an unstructured covariance structure. However, the last eigenvalue λ 5 deviated from the distribution, falling below the 1:1 line on the QQ plot at the tails of the distribution (Table 1) in a consistent manner across the 10 data sets (Figure 4B). This bias of the last eigenvalue was similar in magnitude to the bias observed from the analysis of random data sets using an unstructured model (Table 1). Using REML-MVN to test observed genetic eigenvalues against the appropriate null It has recently been suggested that the confidence intervals on genetic variances that are generated using REML-MVN can be used to determine whether eigenvalues contain non-zero levels of genetic variance, by comparing the parameter estimate to the number of standard deviations above zero (Houle and Meyer 2015). For the REML-MVN analysis of the first random data set, the observed mean of the leading eigenvalue was 3.5 times larger than its standard deviation and above zero (Table 1), raising the possibility that sampling error could be misinterpreted as genetic variance in this dimension. Relying solely on the distribution of sampling error around an eigenvalue gener- 6 Jacqueline L. Sztepanacz et al.

7 Figure 2 A-E) QQ plots of the TW G statistics for the five observed genetic eigenvalues λ 1 -λ 5, respectively, from simulated data sets, against the approximations of their respective theoretical TW distributions. The solid lines depicts the TW G statistics from REML analyses that employed an unconstrained covariance structure, whereas the dashed lines depict the TW G statistics from factor analytic models that constrain estimates of G to the parameter space. The dotted line indicates the 1:1 line. Table 1 The mean (or posterior mode in the case of MCMC samples), standard deviation, and proportion of samples exceeding the critical value for each eigenvalue (λ i ) of the analysis of either datasets (Unconstrained and five-dimension factor analytic model FA05*), or samples from a single dataset (MCMC and REML-MVN). MCMC analyses were conducted using either an inverse wishart prior (Inv-Wish) or a parameter expanded prior (Par-Exp). Approximate critical values for the TW statistic were determined by the sample of the gamma approximation for each eigenvalue (λ 1 - λ 5 ), and are.9748, , , , , respectively. Unconstrained FA(0)5* MCMC (Inv.-Wish.) MCMC (Par.-Exp.) REML-MVN λ 1 mean sd TW λ 2 mean sd TW λ 3 mean sd TW λ 4 mean e sd TW λ 5 mean e e sd e TW *9 618 of factor analytic models converged GENETICS Journal Template on Overleaf 7

8 Table 2 The REML parameter estimates of each eigenvalue λ i for simulated data, with the upper and lower confidence intervals (C.I.) determined by REML-MVN sampling. TW 0.05 indicates the critical value for the TW statistic determined by the sample of the gamma approximation of the theoretical TW distribution for each respective eigenvalue. TW λi represents the scaled and centered TW G statistic for each eigenvalue that is compared against the critical value. If TW λi falls below the critical value, the respective eigenvalue is interpreted to be non-significant. Conversely, if TW λi exceeds the critical value then the respective eigenvalue is interpreted to represent a significant level of genetic variance. Approximate critical values for the TW statistic were determined by the sample of the gamma approximation for each eigenvalue (λ 1 - λ 5 ), and are.9748, , , , , respectively. Random Data One Factor Full Rank No. Sires λ Lower C.I Upper C.I TW λ λ Lower C.I Upper C.I TW λ λ Lower C.I Upper C.I TW λ * λ Lower C.I Upper C.I TW λ λ Lower C.I Upper C.I TW λ * * * * *factors identified to be significant using the TW distribution factors identified to be significant by a log-likelihood ratio test between factor analytic models of rank k and k-1 factors. The χ 2 statistic was calculated as - 2*LogLikelihood((k-1)- k), and with the degrees of freedom equal to the difference in the number of parameters between the two models. 8 Jacqueline L. Sztepanacz et al.

9 A) λ ated by REML-MVN to infer genetic variance in real data will therefore likely be insufficient to demonstrate the presence of genetic variance. However, the centering and scaling parameters of the TW distributions specific to each eigenvalue can be used to form the appropriate null distribution to test observed genetic eigenvalues against B) λ TW statistic Figure 3 A) QQ plot of the TWG statistic for the first genetic eigenvalue vs. the theoretical TW distribution. The TWG statistic was calculated according to (3), using the posterior distribution of MCMC analyses that employed an inverse wishart prior. The 10 lines represent the 10 data sets that were analyzed, and the dashed line indicates the 1:1 line. B) QQ plot of the TWG statistic vs. the theoretical TW distribution as described above using the posterior distribution of MCMC analyses that employed a parameter expanded prior. The 10 lines represent the data data sets that were analyzed, and the dashed line indicates the 1:1 line. We simulated data to have either one genetic dimension or five genetic dimensions for each of three different experimental designs (50 lines, 500 lines, 200 sires) representing different levels of experimental power, in order to demonstrate how significance testing using REML-MVN sampling combined with the TW distribution performs in these situations. As expected, as the ratio of p to the genetic degrees of freedom decreased, the inflation of the leading eigenvalues of random data simulated with no genetic variance decreased (Table 2). However, for all three experimental designs the lower confidence interval for the first two eigenvalues of random data were above 0, despite the lack of simulated genetic variance in these eigenvectors. This potentially indicates the presence of significant genetic variance, when the confidence intervals are interpreted in isolation. When empirically re-scaled to the TW distribution, however, none of the five eigenvalues were deemed significant by comparison of their TWG statistics to the critical values of the TW distributions specific to each eigenvalue (Table 2), highlighting the value of the TW approach to test for the significance of observed genetic eigenvalues. For the data simulated to have a single genetic eigenvalue, the lower confidence interval of the sampling distribution generated by REML-MVN for λ1 was far above zero at , , and for the 50 line, 200 sire, and 500 line cases, respectively (Table 2). Therefore, the presence of this genetic dimension was correctly identified by examining the confidence interval for all three experimental designs in this case. However, as observed for the random data, the lower confidence intervals of λ2 were also above 0, incorrectly identifying the presence of a second genetic dimension (Table 2). To illustrate this behaviour, consider the example of the 500 line experimental design. For both λ1 and λ2 the mean un-scaled eigenvalues were both above zero, and the 95% confidence intervals did not overlap zero (Figure 5A,C), suggesting the presence of significant genetic variance. However, by empirically re-scaling to the TW distribution, λ1 was demonstrated to be the only significant genetic eigenvalue (Figure 5B,D). Here the appropriate comparison was between the observed genetic eigenvalue in real data and the null distribution generated by REML-MVN sampling of randomized data, after both the observed eigenvalue and null distribution were centered and scaled according to the theoretical TW distribution specific to that eigenvalue. The observed genetic eigenvalue λ1 fell well outside the null distribution (Figure 5B), and above the critical value of (Table 1), indicating significant genetic variance in this genetic dimension. In contrast, the observed eigenvalue λ2 fell well within the null distribution (Figure 5D), and below the critical value of , indicating a lack of significant genetic variance in this dimension. Currently, the best method to test whether eigenvalues are significantly different from 0 is by comparing the likelihoods of a series of nested reduced-rank factor analytic models. Here, if reducing the rank of the model from k to k-1 dimensions significantly reduces the fit, then the kth eigenvalue is said to be significant (Hine and Blows 2006). For the 50 line, 200 sire, and 500 line simulated data sets with a single genetic eigenvalue, factor analytic modeling identified a single genetic dimension in GENETICS Journal Template on Overleaf 9

10 each (Table 2), consistent with the results of the TW approach presented above. Therefore, the TW approach performed as well as factor analytic modeling in this case. For the data simulated to be full rank, factor analytic modeling identified the presence of the first four genetic factors (Table 2). The lack of statistical support for λ 5 was not surprising considering the low level of genetic variance accounted for by this factor (λ 5 = 0.01), and that factor analytic models tend to be conservative. In contrast, the TW testing approach correctly identified the presence of all five genetic eigenvalues (Table 2). However, the statistical significance of λ 5 should be interpreted with some caution. We previously demonstrated using REML-MVN sampling of random data (50 line experimental design) that the sampling distribution of the last eigenvalue did not conform well to the TW distribution (Figure 4B), with fewer samples exceeding the critical value than expected by chance (Table 1). This may result in anti-conservative significance tests for the last eigenvalue of variance component matrices. Discussion Sampling error generates patterns in the empirical spectral distribution of variance component matrices that can look strikingly like the biological patterns that are interpreted to represent genetic covariance that concentrates genetic variance into fewer multivariate dimensions than the number of traits measured (Blows and McGuigan 2015). As a consequence of the magnitude of sampling error in the leading eigenvalues of G, particular care needs to be taken to determine if a genetic eigenvalue is greater in magnitude than expected by sampling error alone. Random matrix theory provides a generally applicable framework for testing whether eigenvalues of sample covariance matrices represent significant levels of variance (Johnstone 2001; Tracy and Widom 2009, 1996). Here we demonstrate that random matrix theory can similarly be applied to the outcomes of multivariate genetic analyses that take the form of variance-component matrices, in order to test whether leading genetic eigenvalues represent significant levels of genetic variance. Sampling error in genetic eigenvalues of random data In general, the genetic eigenvalues from unconstrained REMLbased analyses of random data conformed well to the TW distribution (Figure 2), with the small number of lines (n=50) and traits (p=5). While the proportion of leading (λ 1 ) eigenvalues that exceeded the critical value at α 0.05 was greater than 0.05 at , as the number of genetic degrees of freedom and p increase the fit to the theoretical distribution is predicted to further improve. This is illustrated, in part, by the better fit of the sampling error of λ 1 to the theoretical distribution from the analysis of 200 random datasets of p=10 traits, where the proportion of eigenvalues to exceed the critical value was 0.05 (Blows and McGuigan 2015). While sampling error is known to concentrate genetic variance in the leading eigenvalues of G, particularly with small sample sizes that are used here, the corollary is that the trailing eigenvalues must be underestimated. In contrast to the leading eigenvalues, the last eigenvalue λ 5 did show some deviation from the theoretical TW distribution (Table 1), with a smaller proportion of eigenvalues exceeding the critical value then expected by chance. The behaviour of the last eigenvalue of sample covariance matrices may be better described by a reflected TW distribution (Ma 2012). Whether this may also be the case for variance component matrices such as G remains to Figure 4 A) QQ plot of the TW G statistic for the first genetic eigenvalue vs. the theoretical TW distribution. The TW G statistic was calculated according to (3), using the sampling estimates obtained from REML-MVN sampling of a model that employed an unconstrained covariance structure. The 10 lines represent the 10 datasets that were analyzed, and the dashed line indicates the 1:1 line. B) QQ plot of the TW G statistic for the fifth genetic eigenvalue vs. the theoretical TW distribution.the TW G statistic was calculated according to (3), using the sampling estimates obtained from REML-MVN sampling of a model that employed an unconstrained covariance structure. The 10 lines represent the 10 datasets that were analyzed, and the dashed line indicates the 1:1 line. 10 Jacqueline L. Sztepanacz et al.

Lecture 2: Linear Models. Bruce Walsh lecture notes Seattle SISG -Mixed Model Course version 23 June 2011

Lecture 2: Linear Models. Bruce Walsh lecture notes Seattle SISG -Mixed Model Course version 23 June 2011 Lecture 2: Linear Models Bruce Walsh lecture notes Seattle SISG -Mixed Model Course version 23 June 2011 1 Quick Review of the Major Points The general linear model can be written as y = X! + e y = vector

More information

STAT 425: Introduction to Bayesian Analysis

STAT 425: Introduction to Bayesian Analysis STAT 425: Introduction to Bayesian Analysis Marina Vannucci Rice University, USA Fall 2017 Marina Vannucci (Rice University, USA) Bayesian Analysis (Part 2) Fall 2017 1 / 19 Part 2: Markov chain Monte

More information

Estimating sampling error of evolutionary statistics based on genetic covariance matrices using maximum likelihood

Estimating sampling error of evolutionary statistics based on genetic covariance matrices using maximum likelihood doi: 10.1111/jeb.12674 Estimating sampling error of evolutionary statistics based on genetic covariance matrices using maximum likelihood D. HOULE* & K. MEYER *Department of Biological Science, Florida

More information

Stat 5101 Lecture Notes

Stat 5101 Lecture Notes Stat 5101 Lecture Notes Charles J. Geyer Copyright 1998, 1999, 2000, 2001 by Charles J. Geyer May 7, 2001 ii Stat 5101 (Geyer) Course Notes Contents 1 Random Variables and Change of Variables 1 1.1 Random

More information

MIXED MODELS THE GENERAL MIXED MODEL

MIXED MODELS THE GENERAL MIXED MODEL MIXED MODELS This chapter introduces best linear unbiased prediction (BLUP), a general method for predicting random effects, while Chapter 27 is concerned with the estimation of variances by restricted

More information

Default Priors and Effcient Posterior Computation in Bayesian

Default Priors and Effcient Posterior Computation in Bayesian Default Priors and Effcient Posterior Computation in Bayesian Factor Analysis January 16, 2010 Presented by Eric Wang, Duke University Background and Motivation A Brief Review of Parameter Expansion Literature

More information

Estimation of large dimensional sparse covariance matrices

Estimation of large dimensional sparse covariance matrices Estimation of large dimensional sparse covariance matrices Department of Statistics UC, Berkeley May 5, 2009 Sample covariance matrix and its eigenvalues Data: n p matrix X n (independent identically distributed)

More information

1 Data Arrays and Decompositions

1 Data Arrays and Decompositions 1 Data Arrays and Decompositions 1.1 Variance Matrices and Eigenstructure Consider a p p positive definite and symmetric matrix V - a model parameter or a sample variance matrix. The eigenstructure is

More information

Lecture 3: Linear Models. Bruce Walsh lecture notes Uppsala EQG course version 28 Jan 2012

Lecture 3: Linear Models. Bruce Walsh lecture notes Uppsala EQG course version 28 Jan 2012 Lecture 3: Linear Models Bruce Walsh lecture notes Uppsala EQG course version 28 Jan 2012 1 Quick Review of the Major Points The general linear model can be written as y = X! + e y = vector of observed

More information

401 Review. 6. Power analysis for one/two-sample hypothesis tests and for correlation analysis.

401 Review. 6. Power analysis for one/two-sample hypothesis tests and for correlation analysis. 401 Review Major topics of the course 1. Univariate analysis 2. Bivariate analysis 3. Simple linear regression 4. Linear algebra 5. Multiple regression analysis Major analysis methods 1. Graphical analysis

More information

Slice Sampling with Adaptive Multivariate Steps: The Shrinking-Rank Method

Slice Sampling with Adaptive Multivariate Steps: The Shrinking-Rank Method Slice Sampling with Adaptive Multivariate Steps: The Shrinking-Rank Method Madeleine B. Thompson Radford M. Neal Abstract The shrinking rank method is a variation of slice sampling that is efficient at

More information

Stat260: Bayesian Modeling and Inference Lecture Date: February 10th, Jeffreys priors. exp 1 ) p 2

Stat260: Bayesian Modeling and Inference Lecture Date: February 10th, Jeffreys priors. exp 1 ) p 2 Stat260: Bayesian Modeling and Inference Lecture Date: February 10th, 2010 Jeffreys priors Lecturer: Michael I. Jordan Scribe: Timothy Hunter 1 Priors for the multivariate Gaussian Consider a multivariate

More information

Supplementary Note on Bayesian analysis

Supplementary Note on Bayesian analysis Supplementary Note on Bayesian analysis Structured variability of muscle activations supports the minimal intervention principle of motor control Francisco J. Valero-Cuevas 1,2,3, Madhusudhan Venkadesan

More information

Bayesian Regression Linear and Logistic Regression

Bayesian Regression Linear and Logistic Regression When we want more than point estimates Bayesian Regression Linear and Logistic Regression Nicole Beckage Ordinary Least Squares Regression and Lasso Regression return only point estimates But what if we

More information

ST 740: Markov Chain Monte Carlo

ST 740: Markov Chain Monte Carlo ST 740: Markov Chain Monte Carlo Alyson Wilson Department of Statistics North Carolina State University October 14, 2012 A. Wilson (NCSU Stsatistics) MCMC October 14, 2012 1 / 20 Convergence Diagnostics:

More information

VCMC: Variational Consensus Monte Carlo

VCMC: Variational Consensus Monte Carlo VCMC: Variational Consensus Monte Carlo Maxim Rabinovich, Elaine Angelino, Michael I. Jordan Berkeley Vision and Learning Center September 22, 2015 probabilistic models! sky fog bridge water grass object

More information

Numerical Solutions to the General Marcenko Pastur Equation

Numerical Solutions to the General Marcenko Pastur Equation Numerical Solutions to the General Marcenko Pastur Equation Miriam Huntley May 2, 23 Motivation: Data Analysis A frequent goal in real world data analysis is estimation of the covariance matrix of some

More information

Generalized Linear Models for Non-Normal Data

Generalized Linear Models for Non-Normal Data Generalized Linear Models for Non-Normal Data Today s Class: 3 parts of a generalized model Models for binary outcomes Complications for generalized multivariate or multilevel models SPLH 861: Lecture

More information

Technical Vignette 5: Understanding intrinsic Gaussian Markov random field spatial models, including intrinsic conditional autoregressive models

Technical Vignette 5: Understanding intrinsic Gaussian Markov random field spatial models, including intrinsic conditional autoregressive models Technical Vignette 5: Understanding intrinsic Gaussian Markov random field spatial models, including intrinsic conditional autoregressive models Christopher Paciorek, Department of Statistics, University

More information

Dominance Genetic Variance for Traits Under Directional Selection in Drosophila serrata

Dominance Genetic Variance for Traits Under Directional Selection in Drosophila serrata Genetics: Early Online, published on March 16, 2015 as 10.1534/genetics.115.175489 Dominance Genetic Variance for Traits Under Directional Selection in Drosophila serrata Jacqueline L. Sztepanacz 1* and

More information

Lecture 5: BLUP (Best Linear Unbiased Predictors) of genetic values. Bruce Walsh lecture notes Tucson Winter Institute 9-11 Jan 2013

Lecture 5: BLUP (Best Linear Unbiased Predictors) of genetic values. Bruce Walsh lecture notes Tucson Winter Institute 9-11 Jan 2013 Lecture 5: BLUP (Best Linear Unbiased Predictors) of genetic values Bruce Walsh lecture notes Tucson Winter Institute 9-11 Jan 013 1 Estimation of Var(A) and Breeding Values in General Pedigrees The classic

More information

Computational statistics

Computational statistics Computational statistics Markov Chain Monte Carlo methods Thierry Denœux March 2017 Thierry Denœux Computational statistics March 2017 1 / 71 Contents of this chapter When a target density f can be evaluated

More information

Asymptotic distribution of the largest eigenvalue with application to genetic data

Asymptotic distribution of the largest eigenvalue with application to genetic data Asymptotic distribution of the largest eigenvalue with application to genetic data Chong Wu University of Minnesota September 30, 2016 T32 Journal Club Chong Wu 1 / 25 Table of Contents 1 Background Gene-gene

More information

. Also, in this case, p i = N1 ) T, (2) where. I γ C N(N 2 2 F + N1 2 Q)

. Also, in this case, p i = N1 ) T, (2) where. I γ C N(N 2 2 F + N1 2 Q) Supplementary information S7 Testing for association at imputed SPs puted SPs Score tests A Score Test needs calculations of the observed data score and information matrix only under the null hypothesis,

More information

Selection on Multiple Traits

Selection on Multiple Traits Selection on Multiple Traits Bruce Walsh lecture notes Uppsala EQG 2012 course version 7 Feb 2012 Detailed reading: Chapter 30 Genetic vs. Phenotypic correlations Within an individual, trait values can

More information

Bayesian Methods in Multilevel Regression

Bayesian Methods in Multilevel Regression Bayesian Methods in Multilevel Regression Joop Hox MuLOG, 15 september 2000 mcmc What is Statistics?! Statistics is about uncertainty To err is human, to forgive divine, but to include errors in your design

More information

Lecture 3. G. Cowan. Lecture 3 page 1. Lectures on Statistical Data Analysis

Lecture 3. G. Cowan. Lecture 3 page 1. Lectures on Statistical Data Analysis Lecture 3 1 Probability (90 min.) Definition, Bayes theorem, probability densities and their properties, catalogue of pdfs, Monte Carlo 2 Statistical tests (90 min.) general concepts, test statistics,

More information

Estimation of Parameters in Random. Effect Models with Incidence Matrix. Uncertainty

Estimation of Parameters in Random. Effect Models with Incidence Matrix. Uncertainty Estimation of Parameters in Random Effect Models with Incidence Matrix Uncertainty Xia Shen 1,2 and Lars Rönnegård 2,3 1 The Linnaeus Centre for Bioinformatics, Uppsala University, Uppsala, Sweden; 2 School

More information

Mixed-Model Estimation of genetic variances. Bruce Walsh lecture notes Uppsala EQG 2012 course version 28 Jan 2012

Mixed-Model Estimation of genetic variances. Bruce Walsh lecture notes Uppsala EQG 2012 course version 28 Jan 2012 Mixed-Model Estimation of genetic variances Bruce Walsh lecture notes Uppsala EQG 01 course version 8 Jan 01 Estimation of Var(A) and Breeding Values in General Pedigrees The above designs (ANOVA, P-O

More information

Appendix: Modeling Approach

Appendix: Modeling Approach AFFECTIVE PRIMACY IN INTRAORGANIZATIONAL TASK NETWORKS Appendix: Modeling Approach There is now a significant and developing literature on Bayesian methods in social network analysis. See, for instance,

More information

Alternative implementations of Monte Carlo EM algorithms for likelihood inferences

Alternative implementations of Monte Carlo EM algorithms for likelihood inferences Genet. Sel. Evol. 33 001) 443 45 443 INRA, EDP Sciences, 001 Alternative implementations of Monte Carlo EM algorithms for likelihood inferences Louis Alberto GARCÍA-CORTÉS a, Daniel SORENSEN b, Note a

More information

MFM Practitioner Module: Risk & Asset Allocation. John Dodson. February 3, 2010

MFM Practitioner Module: Risk & Asset Allocation. John Dodson. February 3, 2010 MFM Practitioner Module: Risk & Asset Allocation Estimator February 3, 2010 Estimator Estimator In estimation we do not endow the sample with a characterization; rather, we endow the parameters with a

More information

Lecture 32: Infinite-dimensional/Functionvalued. Functions and Random Regressions. Bruce Walsh lecture notes Synbreed course version 11 July 2013

Lecture 32: Infinite-dimensional/Functionvalued. Functions and Random Regressions. Bruce Walsh lecture notes Synbreed course version 11 July 2013 Lecture 32: Infinite-dimensional/Functionvalued Traits: Covariance Functions and Random Regressions Bruce Walsh lecture notes Synbreed course version 11 July 2013 1 Longitudinal traits Many classic quantitative

More information

ABC methods for phase-type distributions with applications in insurance risk problems

ABC methods for phase-type distributions with applications in insurance risk problems ABC methods for phase-type with applications problems Concepcion Ausin, Department of Statistics, Universidad Carlos III de Madrid Joint work with: Pedro Galeano, Universidad Carlos III de Madrid Simon

More information

A Bayesian Approach to Phylogenetics

A Bayesian Approach to Phylogenetics A Bayesian Approach to Phylogenetics Niklas Wahlberg Based largely on slides by Paul Lewis (www.eeb.uconn.edu) An Introduction to Bayesian Phylogenetics Bayesian inference in general Markov chain Monte

More information

An Algorithm for Bayesian Variable Selection in High-dimensional Generalized Linear Models

An Algorithm for Bayesian Variable Selection in High-dimensional Generalized Linear Models Proceedings 59th ISI World Statistics Congress, 25-30 August 2013, Hong Kong (Session CPS023) p.3938 An Algorithm for Bayesian Variable Selection in High-dimensional Generalized Linear Models Vitara Pungpapong

More information

Introduction to Bayesian Statistics and Markov Chain Monte Carlo Estimation. EPSY 905: Multivariate Analysis Spring 2016 Lecture #10: April 6, 2016

Introduction to Bayesian Statistics and Markov Chain Monte Carlo Estimation. EPSY 905: Multivariate Analysis Spring 2016 Lecture #10: April 6, 2016 Introduction to Bayesian Statistics and Markov Chain Monte Carlo Estimation EPSY 905: Multivariate Analysis Spring 2016 Lecture #10: April 6, 2016 EPSY 905: Intro to Bayesian and MCMC Today s Class An

More information

Multivariate Statistical Analysis

Multivariate Statistical Analysis Multivariate Statistical Analysis Fall 2011 C. L. Williams, Ph.D. Lecture 4 for Applied Multivariate Analysis Outline 1 Eigen values and eigen vectors Characteristic equation Some properties of eigendecompositions

More information

Statistical Inference: Estimation and Confidence Intervals Hypothesis Testing

Statistical Inference: Estimation and Confidence Intervals Hypothesis Testing Statistical Inference: Estimation and Confidence Intervals Hypothesis Testing 1 In most statistics problems, we assume that the data have been generated from some unknown probability distribution. We desire

More information

Part 6: Multivariate Normal and Linear Models

Part 6: Multivariate Normal and Linear Models Part 6: Multivariate Normal and Linear Models 1 Multiple measurements Up until now all of our statistical models have been univariate models models for a single measurement on each member of a sample of

More information

Logistic Regression: Regression with a Binary Dependent Variable

Logistic Regression: Regression with a Binary Dependent Variable Logistic Regression: Regression with a Binary Dependent Variable LEARNING OBJECTIVES Upon completing this chapter, you should be able to do the following: State the circumstances under which logistic regression

More information

Multivariate Distributions

Multivariate Distributions IEOR E4602: Quantitative Risk Management Spring 2016 c 2016 by Martin Haugh Multivariate Distributions We will study multivariate distributions in these notes, focusing 1 in particular on multivariate

More information

Basics of Modern Missing Data Analysis

Basics of Modern Missing Data Analysis Basics of Modern Missing Data Analysis Kyle M. Lang Center for Research Methods and Data Analysis University of Kansas March 8, 2013 Topics to be Covered An introduction to the missing data problem Missing

More information

MISCELLANEOUS TOPICS RELATED TO LIKELIHOOD. Copyright c 2012 (Iowa State University) Statistics / 30

MISCELLANEOUS TOPICS RELATED TO LIKELIHOOD. Copyright c 2012 (Iowa State University) Statistics / 30 MISCELLANEOUS TOPICS RELATED TO LIKELIHOOD Copyright c 2012 (Iowa State University) Statistics 511 1 / 30 INFORMATION CRITERIA Akaike s Information criterion is given by AIC = 2l(ˆθ) + 2k, where l(ˆθ)

More information

Bayesian Inference on Joint Mixture Models for Survival-Longitudinal Data with Multiple Features. Yangxin Huang

Bayesian Inference on Joint Mixture Models for Survival-Longitudinal Data with Multiple Features. Yangxin Huang Bayesian Inference on Joint Mixture Models for Survival-Longitudinal Data with Multiple Features Yangxin Huang Department of Epidemiology and Biostatistics, COPH, USF, Tampa, FL yhuang@health.usf.edu January

More information

STATS 200: Introduction to Statistical Inference. Lecture 29: Course review

STATS 200: Introduction to Statistical Inference. Lecture 29: Course review STATS 200: Introduction to Statistical Inference Lecture 29: Course review Course review We started in Lecture 1 with a fundamental assumption: Data is a realization of a random process. The goal throughout

More information

MCMC algorithms for fitting Bayesian models

MCMC algorithms for fitting Bayesian models MCMC algorithms for fitting Bayesian models p. 1/1 MCMC algorithms for fitting Bayesian models Sudipto Banerjee sudiptob@biostat.umn.edu University of Minnesota MCMC algorithms for fitting Bayesian models

More information

evolution in a wild population of red deer

evolution in a wild population of red deer Genetics: Early Online, published on October 2, 214 as 1.1534/genetics.114.164319 1 2 A multivariate analysis of genetic constraints to life history evolution in a wild population of red deer 3 4 5 Craig

More information

Appendix from L. J. Revell, On the Analysis of Evolutionary Change along Single Branches in a Phylogeny

Appendix from L. J. Revell, On the Analysis of Evolutionary Change along Single Branches in a Phylogeny 008 by The University of Chicago. All rights reserved.doi: 10.1086/588078 Appendix from L. J. Revell, On the Analysis of Evolutionary Change along Single Branches in a Phylogeny (Am. Nat., vol. 17, no.

More information

Bayesian inference for factor scores

Bayesian inference for factor scores Bayesian inference for factor scores Murray Aitkin and Irit Aitkin School of Mathematics and Statistics University of Newcastle UK October, 3 Abstract Bayesian inference for the parameters of the factor

More information

MFM Practitioner Module: Risk & Asset Allocation. John Dodson. January 28, 2015

MFM Practitioner Module: Risk & Asset Allocation. John Dodson. January 28, 2015 MFM Practitioner Module: Risk & Asset Allocation Estimator January 28, 2015 Estimator Estimator Review: tells us how to reverse the roles in conditional probability. f Y X {x} (y) f Y (y)f X Y {y} (x)

More information

An R # Statistic for Fixed Effects in the Linear Mixed Model and Extension to the GLMM

An R # Statistic for Fixed Effects in the Linear Mixed Model and Extension to the GLMM An R Statistic for Fixed Effects in the Linear Mixed Model and Extension to the GLMM Lloyd J. Edwards, Ph.D. UNC-CH Department of Biostatistics email: Lloyd_Edwards@unc.edu Presented to the Department

More information

Mixed-Models. version 30 October 2011

Mixed-Models. version 30 October 2011 Mixed-Models version 30 October 2011 Mixed models Mixed models estimate a vector! of fixed effects and one (or more) vectors u of random effects Both fixed and random effects models always include a vector

More information

Introduction to Algorithmic Trading Strategies Lecture 10

Introduction to Algorithmic Trading Strategies Lecture 10 Introduction to Algorithmic Trading Strategies Lecture 10 Risk Management Haksun Li haksun.li@numericalmethod.com www.numericalmethod.com Outline Value at Risk (VaR) Extreme Value Theory (EVT) References

More information

Psychology 282 Lecture #4 Outline Inferences in SLR

Psychology 282 Lecture #4 Outline Inferences in SLR Psychology 282 Lecture #4 Outline Inferences in SLR Assumptions To this point we have not had to make any distributional assumptions. Principle of least squares requires no assumptions. Can use correlations

More information

Fundamental Probability and Statistics

Fundamental Probability and Statistics Fundamental Probability and Statistics "There are known knowns. These are things we know that we know. There are known unknowns. That is to say, there are things that we know we don't know. But there are

More information

Bustamante et al., Supplementary Nature Manuscript # 1 out of 9 Information #

Bustamante et al., Supplementary Nature Manuscript # 1 out of 9 Information # Bustamante et al., Supplementary Nature Manuscript # 1 out of 9 Details of PRF Methodology In the Poisson Random Field PRF) model, it is assumed that non-synonymous mutations at a given gene are either

More information

Wiley. Methods and Applications of Linear Models. Regression and the Analysis. of Variance. Third Edition. Ishpeming, Michigan RONALD R.

Wiley. Methods and Applications of Linear Models. Regression and the Analysis. of Variance. Third Edition. Ishpeming, Michigan RONALD R. Methods and Applications of Linear Models Regression and the Analysis of Variance Third Edition RONALD R. HOCKING PenHock Statistical Consultants Ishpeming, Michigan Wiley Contents Preface to the Third

More information

Maximum Likelihood Estimation; Robust Maximum Likelihood; Missing Data with Maximum Likelihood

Maximum Likelihood Estimation; Robust Maximum Likelihood; Missing Data with Maximum Likelihood Maximum Likelihood Estimation; Robust Maximum Likelihood; Missing Data with Maximum Likelihood PRE 906: Structural Equation Modeling Lecture #3 February 4, 2015 PRE 906, SEM: Estimation Today s Class An

More information

BTRY 4830/6830: Quantitative Genomics and Genetics

BTRY 4830/6830: Quantitative Genomics and Genetics BTRY 4830/6830: Quantitative Genomics and Genetics Lecture 23: Alternative tests in GWAS / (Brief) Introduction to Bayesian Inference Jason Mezey jgm45@cornell.edu Nov. 13, 2014 (Th) 8:40-9:55 Announcements

More information

Concentration Inequalities for Random Matrices

Concentration Inequalities for Random Matrices Concentration Inequalities for Random Matrices M. Ledoux Institut de Mathématiques de Toulouse, France exponential tail inequalities classical theme in probability and statistics quantify the asymptotic

More information

Statistical techniques for data analysis in Cosmology

Statistical techniques for data analysis in Cosmology Statistical techniques for data analysis in Cosmology arxiv:0712.3028; arxiv:0911.3105 Numerical recipes (the bible ) Licia Verde ICREA & ICC UB-IEEC http://icc.ub.edu/~liciaverde outline Lecture 1: Introduction

More information

Markov Chain Monte Carlo (MCMC)

Markov Chain Monte Carlo (MCMC) Markov Chain Monte Carlo (MCMC Dependent Sampling Suppose we wish to sample from a density π, and we can evaluate π as a function but have no means to directly generate a sample. Rejection sampling can

More information

On dealing with spatially correlated residuals in remote sensing and GIS

On dealing with spatially correlated residuals in remote sensing and GIS On dealing with spatially correlated residuals in remote sensing and GIS Nicholas A. S. Hamm 1, Peter M. Atkinson and Edward J. Milton 3 School of Geography University of Southampton Southampton SO17 3AT

More information

STAT 536: Genetic Statistics

STAT 536: Genetic Statistics STAT 536: Genetic Statistics Tests for Hardy Weinberg Equilibrium Karin S. Dorman Department of Statistics Iowa State University September 7, 2006 Statistical Hypothesis Testing Identify a hypothesis,

More information

Monte Carlo in Bayesian Statistics

Monte Carlo in Bayesian Statistics Monte Carlo in Bayesian Statistics Matthew Thomas SAMBa - University of Bath m.l.thomas@bath.ac.uk December 4, 2014 Matthew Thomas (SAMBa) Monte Carlo in Bayesian Statistics December 4, 2014 1 / 16 Overview

More information

Estimation of Optimally-Combined-Biomarker Accuracy in the Absence of a Gold-Standard Reference Test

Estimation of Optimally-Combined-Biomarker Accuracy in the Absence of a Gold-Standard Reference Test Estimation of Optimally-Combined-Biomarker Accuracy in the Absence of a Gold-Standard Reference Test L. García Barrado 1 E. Coart 2 T. Burzykowski 1,2 1 Interuniversity Institute for Biostatistics and

More information

STA 294: Stochastic Processes & Bayesian Nonparametrics

STA 294: Stochastic Processes & Bayesian Nonparametrics MARKOV CHAINS AND CONVERGENCE CONCEPTS Markov chains are among the simplest stochastic processes, just one step beyond iid sequences of random variables. Traditionally they ve been used in modelling a

More information

component risk analysis

component risk analysis 273: Urban Systems Modeling Lec. 3 component risk analysis instructor: Matteo Pozzi 273: Urban Systems Modeling Lec. 3 component reliability outline risk analysis for components uncertain demand and uncertain

More information

Factor Analysis. Robert L. Wolpert Department of Statistical Science Duke University, Durham, NC, USA

Factor Analysis. Robert L. Wolpert Department of Statistical Science Duke University, Durham, NC, USA Factor Analysis Robert L. Wolpert Department of Statistical Science Duke University, Durham, NC, USA 1 Factor Models The multivariate regression model Y = XB +U expresses each row Y i R p as a linear combination

More information

A Practitioner s Guide to Cluster-Robust Inference

A Practitioner s Guide to Cluster-Robust Inference A Practitioner s Guide to Cluster-Robust Inference A. C. Cameron and D. L. Miller presented by Federico Curci March 4, 2015 Cameron Miller Cluster Clinic II March 4, 2015 1 / 20 In the previous episode

More information

Lecture 1 Basic Statistical Machinery

Lecture 1 Basic Statistical Machinery Lecture 1 Basic Statistical Machinery Bruce Walsh. jbwalsh@u.arizona.edu. University of Arizona. ECOL 519A, Jan 2007. University of Arizona Probabilities, Distributions, and Expectations Discrete and Continuous

More information

Bayes: All uncertainty is described using probability.

Bayes: All uncertainty is described using probability. Bayes: All uncertainty is described using probability. Let w be the data and θ be any unknown quantities. Likelihood. The probability model π(w θ) has θ fixed and w varying. The likelihood L(θ; w) is π(w

More information

ASA Section on Survey Research Methods

ASA Section on Survey Research Methods REGRESSION-BASED STATISTICAL MATCHING: RECENT DEVELOPMENTS Chris Moriarity, Fritz Scheuren Chris Moriarity, U.S. Government Accountability Office, 411 G Street NW, Washington, DC 20548 KEY WORDS: data

More information

Restricted Maximum Likelihood in Linear Regression and Linear Mixed-Effects Model

Restricted Maximum Likelihood in Linear Regression and Linear Mixed-Effects Model Restricted Maximum Likelihood in Linear Regression and Linear Mixed-Effects Model Xiuming Zhang zhangxiuming@u.nus.edu A*STAR-NUS Clinical Imaging Research Center October, 015 Summary This report derives

More information

Rank Regression with Normal Residuals using the Gibbs Sampler

Rank Regression with Normal Residuals using the Gibbs Sampler Rank Regression with Normal Residuals using the Gibbs Sampler Stephen P Smith email: hucklebird@aol.com, 2018 Abstract Yu (2000) described the use of the Gibbs sampler to estimate regression parameters

More information

Review. DS GA 1002 Statistical and Mathematical Models. Carlos Fernandez-Granda

Review. DS GA 1002 Statistical and Mathematical Models.   Carlos Fernandez-Granda Review DS GA 1002 Statistical and Mathematical Models http://www.cims.nyu.edu/~cfgranda/pages/dsga1002_fall16 Carlos Fernandez-Granda Probability and statistics Probability: Framework for dealing with

More information

The Nonparametric Bootstrap

The Nonparametric Bootstrap The Nonparametric Bootstrap The nonparametric bootstrap may involve inferences about a parameter, but we use a nonparametric procedure in approximating the parametric distribution using the ECDF. We use

More information

MLMED. User Guide. Nicholas J. Rockwood The Ohio State University Beta Version May, 2017

MLMED. User Guide. Nicholas J. Rockwood The Ohio State University Beta Version May, 2017 MLMED User Guide Nicholas J. Rockwood The Ohio State University rockwood.19@osu.edu Beta Version May, 2017 MLmed is a computational macro for SPSS that simplifies the fitting of multilevel mediation and

More information

Bayesian System Identification based on Hierarchical Sparse Bayesian Learning and Gibbs Sampling with Application to Structural Damage Assessment

Bayesian System Identification based on Hierarchical Sparse Bayesian Learning and Gibbs Sampling with Application to Structural Damage Assessment Bayesian System Identification based on Hierarchical Sparse Bayesian Learning and Gibbs Sampling with Application to Structural Damage Assessment Yong Huang a,b, James L. Beck b,* and Hui Li a a Key Lab

More information

3. Properties of the relationship matrix

3. Properties of the relationship matrix 3. Properties of the relationship matrix 3.1 Partitioning of the relationship matrix The additive relationship matrix, A, can be written as the product of a lower triangular matrix, T, a diagonal matrix,

More information

Dependence. Practitioner Course: Portfolio Optimization. John Dodson. September 10, Dependence. John Dodson. Outline.

Dependence. Practitioner Course: Portfolio Optimization. John Dodson. September 10, Dependence. John Dodson. Outline. Practitioner Course: Portfolio Optimization September 10, 2008 Before we define dependence, it is useful to define Random variables X and Y are independent iff For all x, y. In particular, F (X,Y ) (x,

More information

Bayesian Inference for the Multivariate Normal

Bayesian Inference for the Multivariate Normal Bayesian Inference for the Multivariate Normal Will Penny Wellcome Trust Centre for Neuroimaging, University College, London WC1N 3BG, UK. November 28, 2014 Abstract Bayesian inference for the multivariate

More information

Using Estimating Equations for Spatially Correlated A

Using Estimating Equations for Spatially Correlated A Using Estimating Equations for Spatially Correlated Areal Data December 8, 2009 Introduction GEEs Spatial Estimating Equations Implementation Simulation Conclusion Typical Problem Assess the relationship

More information

Large sample covariance matrices and the T 2 statistic

Large sample covariance matrices and the T 2 statistic Large sample covariance matrices and the T 2 statistic EURANDOM, the Netherlands Joint work with W. Zhou Outline 1 2 Basic setting Let {X ij }, i, j =, be i.i.d. r.v. Write n s j = (X 1j,, X pj ) T and

More information

Lecture 2: Linear and Mixed Models

Lecture 2: Linear and Mixed Models Lecture 2: Linear and Mixed Models Bruce Walsh lecture notes Introduction to Mixed Models SISG, Seattle 18 20 July 2018 1 Quick Review of the Major Points The general linear model can be written as y =

More information

Phylogenetics: Bayesian Phylogenetic Analysis. COMP Spring 2015 Luay Nakhleh, Rice University

Phylogenetics: Bayesian Phylogenetic Analysis. COMP Spring 2015 Luay Nakhleh, Rice University Phylogenetics: Bayesian Phylogenetic Analysis COMP 571 - Spring 2015 Luay Nakhleh, Rice University Bayes Rule P(X = x Y = y) = P(X = x, Y = y) P(Y = y) = P(X = x)p(y = y X = x) P x P(X = x 0 )P(Y = y X

More information

Lecture 9. QTL Mapping 2: Outbred Populations

Lecture 9. QTL Mapping 2: Outbred Populations Lecture 9 QTL Mapping 2: Outbred Populations Bruce Walsh. Aug 2004. Royal Veterinary and Agricultural University, Denmark The major difference between QTL analysis using inbred-line crosses vs. outbred

More information

Bayesian Hierarchical Models

Bayesian Hierarchical Models Bayesian Hierarchical Models Gavin Shaddick, Millie Green, Matthew Thomas University of Bath 6 th - 9 th December 2016 1/ 34 APPLICATIONS OF BAYESIAN HIERARCHICAL MODELS 2/ 34 OUTLINE Spatial epidemiology

More information

Inferences on a Normal Covariance Matrix and Generalized Variance with Monotone Missing Data

Inferences on a Normal Covariance Matrix and Generalized Variance with Monotone Missing Data Journal of Multivariate Analysis 78, 6282 (2001) doi:10.1006jmva.2000.1939, available online at http:www.idealibrary.com on Inferences on a Normal Covariance Matrix and Generalized Variance with Monotone

More information

MULTIVARIATE ESTIMATION OF GENETIC PARAMETERS QUO VADIS? Karin Meyer

MULTIVARIATE ESTIMATION OF GENETIC PARAMETERS QUO VADIS? Karin Meyer Proc. Assoc. Advmt. Anim. Breed. Genet. 19:71 78 MULTIVARIATE ESTIMATION OF GENETIC PARAMETERS QUO VADIS? Karin Meyer Animal Genetics and Breeding Unit *, University of New England, Armidale, NSW 2351

More information

Next is material on matrix rank. Please see the handout

Next is material on matrix rank. Please see the handout B90.330 / C.005 NOTES for Wednesday 0.APR.7 Suppose that the model is β + ε, but ε does not have the desired variance matrix. Say that ε is normal, but Var(ε) σ W. The form of W is W w 0 0 0 0 0 0 w 0

More information

Now consider the case where E(Y) = µ = Xβ and V (Y) = σ 2 G, where G is diagonal, but unknown.

Now consider the case where E(Y) = µ = Xβ and V (Y) = σ 2 G, where G is diagonal, but unknown. Weighting We have seen that if E(Y) = Xβ and V (Y) = σ 2 G, where G is known, the model can be rewritten as a linear model. This is known as generalized least squares or, if G is diagonal, with trace(g)

More information

Random Matrices and Multivariate Statistical Analysis

Random Matrices and Multivariate Statistical Analysis Random Matrices and Multivariate Statistical Analysis Iain Johnstone, Statistics, Stanford imj@stanford.edu SEA 06@MIT p.1 Agenda Classical multivariate techniques Principal Component Analysis Canonical

More information

series. Utilize the methods of calculus to solve applied problems that require computational or algebraic techniques..

series. Utilize the methods of calculus to solve applied problems that require computational or algebraic techniques.. 1 Use computational techniques and algebraic skills essential for success in an academic, personal, or workplace setting. (Computational and Algebraic Skills) MAT 203 MAT 204 MAT 205 MAT 206 Calculus I

More information

More on Estimation. Maximum Likelihood Estimation.

More on Estimation. Maximum Likelihood Estimation. More on Estimation. In the previous chapter we looked at the properties of estimators and the criteria we could use to choose between types of estimators. Here we examine more closely some very popular

More information

COPYRIGHTED MATERIAL CONTENTS. Preface Preface to the First Edition

COPYRIGHTED MATERIAL CONTENTS. Preface Preface to the First Edition Preface Preface to the First Edition xi xiii 1 Basic Probability Theory 1 1.1 Introduction 1 1.2 Sample Spaces and Events 3 1.3 The Axioms of Probability 7 1.4 Finite Sample Spaces and Combinatorics 15

More information

Kneib, Fahrmeir: Supplement to "Structured additive regression for categorical space-time data: A mixed model approach"

Kneib, Fahrmeir: Supplement to Structured additive regression for categorical space-time data: A mixed model approach Kneib, Fahrmeir: Supplement to "Structured additive regression for categorical space-time data: A mixed model approach" Sonderforschungsbereich 386, Paper 43 (25) Online unter: http://epub.ub.uni-muenchen.de/

More information

Principal Component Analysis-I Geog 210C Introduction to Spatial Data Analysis. Chris Funk. Lecture 17

Principal Component Analysis-I Geog 210C Introduction to Spatial Data Analysis. Chris Funk. Lecture 17 Principal Component Analysis-I Geog 210C Introduction to Spatial Data Analysis Chris Funk Lecture 17 Outline Filters and Rotations Generating co-varying random fields Translating co-varying fields into

More information

Principal Component Analysis for a Spiked Covariance Model with Largest Eigenvalues of the Same Asymptotic Order of Magnitude

Principal Component Analysis for a Spiked Covariance Model with Largest Eigenvalues of the Same Asymptotic Order of Magnitude Principal Component Analysis for a Spiked Covariance Model with Largest Eigenvalues of the Same Asymptotic Order of Magnitude Addy M. Boĺıvar Cimé Centro de Investigación en Matemáticas A.C. May 1, 2010

More information