DOI /sagmb Statistical Applications in Genetics and Molecular Biology 2013; 12(3):

Size: px
Start display at page:

Download "DOI /sagmb Statistical Applications in Genetics and Molecular Biology 2013; 12(3):"

Transcription

1 DOI /sagmb Statistical Alications in Genetics and Molecular Biology 013; 1(3): Christina Lehermeier, Valentin Wimmer, Theresa Albrecht a, Hans-Jürgen Auinger, Daniel Gianola, Volker J. Schmid and Chris-Carolin Schön* Sensitivity to rior secification in Bayesian genome-based rediction models Abstract: Different statistical models have been roosed for maximizing rediction accuracy in genomebased rediction of breeding values in lant and animal breeding. However, little is known about the sensitivity of these models with resect to rior and hyerarameter secification, because comarisons of rediction erformance are mainly based on a single set of hyerarameters. In this study, we focused on Bayesian rediction methods using a standard linear regression model with marker covariates coding additive effects at a large number of marker loci. By comaring different hyerarameter settings, we investigated the sensitivity of four methods frequently used in genome-based rediction (Bayesian Ridge, Bayesian Lasso, BayesA and BayesB) to secification of the rior distribution of marker effects. We used datasets simulated according to a tyical maize breeding rogram differing in the number of markers and the number of simulated quantitative trait loci affecting the trait. Furthermore, we used an exerimental maize dataset, comrising 698 doubled haloid lines, each genotyed with single nucleotide olymorhism markers and henotyed as testcrosses for the two quantitative traits grain dry matter yield and grain dry matter content. The redictive ability of the different models was assessed by five-fold cross-validation. The extent of Bayesian learning was quantified by calculation of the Hellinger distance between the rior and osterior densities of marker effects. Our results indicate that similar redictive abilities can be achieved with all methods, but with BayesA and BayesB hyerarameter settings had a stronger effect on rediction erformance than with the other two methods. Prediction erformance of BayesA and BayesB suffered substantially from a nonotimal choice of hyerarameters. Keywords: genome-based rediction; genomic selection; Bayesian learning; shrinkage rior; lant breeding. a Current address: Institute for Cro Production and Plant Breeding, Bavarian State Research Center for Agriculture, Am Gereuth 6, Freising, Germany *Corresonding author: Chris-Carolin Schön, Plant Breeding, Technische Universität München, Emil-Ramann-Straße 4, Freising, Germany, chris.schoen@tum.de Christina Lehermeier, Valentin Wimmer, Theresa Albrecht and Hans-Jürgen Auinger: Plant Breeding, Technische Universität München, Emil-Ramann-Straße 4, Freising, Germany Daniel Gianola: Deartment of Animal Sciences, University of Wisconsin-Madison, Madison, WI 53706, USA; and Institute for Advanced Study, Technische Universität München, Lichtenbergstraße a, Garching, Germany Volker J. Schmid: Deartment of Statistics, Ludwig-Maximilians-Universität München, Ludwigstraße 33, München, Germany Introduction The term genomic selection was introduced by Meuwissen et al. (001) to denote the rediction of breeding values using molecular markers covering the whole genome. Genomic selection is rincially based on two stes. In the first ste, effects are estimated for all markers simultaneously based on a large henotyed and genotyed training oulation, using some aroriate genome-based rediction model. In the second ste, genomic estimated breeding values (GEBVs) of selection candidates are calculated based on their genomic information by using marker effects estimated in the first ste. Candidates are selected based on their GEBVs and not on their henotyes (Jannink et al., 010). First used in animal breeding [e.g., Schaeffer (006)], genome-based rediction of breeding values has been investigated in lant oulations recently (Heffner et al., 009; Albrecht et al., 011; Heslot et al., 01; Riedelsheimer et al., 01), yielding encouraging results for its alication in breeding.

2 376 Christina Lehermeier et al.: Sensitivity of Bayesian genome-based rediction models Many statistical models that can handle the roblem of having more marker effects () than henotyes (n) and that differ with resect to the assumtion of marker effect distribution, have been roosed for genome-based rediction. In a lant breeding context Crossa et al. (010) and Heslot et al. (01) comared the erformance of various models roosed for genome-based rediction using several datasets and cros. However, the use of Bayesian models requires secifying rior distributions with hyerarameters, and comarisons have been made based on a single set of hyerarameters. Limited efforts have been made towards evaluating imact of the choice of model hyerarameters, although this is likely to influence the erformance of the model. In our study, we investigate the sensitivity of the erformance of Bayesian genome-based rediction models with resect to hyerarameter choice. We focused on quantitative traits affected by a large number of loci and an advanced cycle breeding oulation of maize with strong linkage disequilibrium (LD) between markers. We emloyed the models known as Bayesian Ridge (Kneib et al., 011), Bayesian Lasso (Park and Casella, 008), BayesA and BayesB (Meuwissen et al., 001), as these are frequently used for genome-based rediction. The models differ in the rior secification for the marker effects with hyerarameters controlling the amount of shrinkage of the effects. With an increasing number of observations n, the influence of the rior on the osterior distribution should vanish, which is known as the concet of Bayesian learning (Sorensen and Gianola, 00). However, when the number of arameters exceeds n (>> n), which is tyical in genome-based rediction, the rior setting will always matter. Thus, to obtain good redictive abilities an adequate choice of hyerarameters is necessary to revent both over- and underfitting. This study was motivated by theoretical results of Gianola et al. (009), who ointed out that BayesA and BayesB are strongly influenced by hyerarameter choice due to a limited extent of Bayesian learning. Our objectives were to evaluate the sensitivity of these four models to the choice of hyerarameters. The goal was to identify a model with both good redictive ability and robustness over a wide range of hyerarameters in ractice. In order to quantify the Bayesian learning ability of the models, we used the Hellinger distance as distance measure between rior and osterior densities of marker effects. The sueriority of secific Bayesian models has been shown in many studies using simulated data with rior secifications according to simulation arameters. In this study we evaluated the four models based on two datasets simulated according to a tyical maize breeding rogram, differing in the number of markers and in the number of simulated quantitative trait loci (QTL). Results were comared to those from an exerimental maize dataset, comrising olymorhic high-quality single nucleotide olymorhism markers (SNPs) and 698 doubled haloid (DH) lines, henotyed for two quantitative traits (grain dry matter yield and content). The redictive ability of the models was assessed by five-fold cross-validation. Datasets Simulated datasets We simulated two maize datasets (maizea & B), starting with a base oulation similar to that described in Meuwissen et al. (001). One thousand individuals were generated with diloid genomes having a length of 16 Morgan (M) assuming 10 equally sized chromosomes with two halotyes each. The coding in the halotye sequence was 0 or 1 with equal robability to simulate biallelic markers. A mutation-drift equilibrium was exected to be reached after 1000 generations of random mating. We simulated different marker densities by assigning 800 and 64,000 equidistant markers er chromosome to datasets maizea and B, resectively. A random subset of 1000 (500) markers was initially selected as QTL in dataset maizea (B). Each QTL was assigned an additive effect of equal magnitude on the henotye. From generation 1000 we randomly selected 10 individuals to be the founders of a breeding rogram. This mimics the tyically small effective oulation size and large LD of maize breeding oulations. Homozygous recombinant inbred lines (n = 150) were derived from crossings of the founders followed by six generations

3 Christina Lehermeier et al.: Sensitivity of Bayesian genome-based rediction models 377 of selfing. For line i (i = 1,,, 150) three environmental errors (e il, l = 1,, 3) were samled from the same normal distribution N0, ( ) to reresent three henotyic observations er line i. The variance ε was chosen to be three times the variance of the true breeding values (TBV), with TBV being the sum of the QTL effects times the number of alleles er individual. The variance of the environmental errors roduced a reeatability arameter of 0.5. The sum of the individual TBV and the mean error ( 1/3 3 l =1 il ) formed the henotyic value y i 3 for each line i: y i=tbvi+ 1/3 l =1 il. This rocedure mimics a tyical aroach in lant breeding to observe mean henotyic values from several observations er line. In each of six breeding cycles, the 5 lines with the largest henotyic values were selected as arents and recombined to roduce 150 inbred lines in the next cycle. The final datasets maizea and B consisted of the genotyes and henotyes from the seventh cycle. Due to selection and drift a subset of markers and QTL was monomorhic in the final datasets maizea and B. A summary of the number of olymorhic markers, the number of olymorhic QTL and the number of inbred lines n in the datasets is given in Table 1. Additionally the trait heritabilities h, calculated from the squared Pearson correlation between henotyic values and TBV, are listed for the datasets maizea and B. The dataset maizea is ublicly available within the R ackage synbreeddata (Wimmer et al., 01). Exerimental dataset The exerimental dataset comrised 75 fully homozygous DH lines of maize (Zea mays L.), derived from 1 different crosses with 7 inbred lines and nine single crosses as arents. The number of DH lines derived from each cross ranged from 1 to 63 with an average of six DH lines er cross. The lines were henotyed in testcrosses with a single-cross tester in four different Euroean locations in 010 for the traits grain dry matter yield (GDY; dt/ha) and grain dry matter content (GDC; %). For the sake of simlicity, hereinafter testcross values are denoted as breeding values. Each location comrised eight sets each relicated twice. Sets were arranged in a lattice design containing 94 entries and six checks. Outliers were identified and removed based on maximum deviant residuals of a full stage model according to Grubbs (1950). In each location, entries were adjusted for set, relication and block effects and, in a second stage, adjusted means were calculated over locations. The generalized heritability was ˆ h GDY = 0.74 for GDY and h ˆ GDC = 0.94 for GDC, estimated according to Cullis et al. (006). For 698 of the 75 DH lines tested in the field marker data were generated. Lines were genotyed with the MaizeSNP50 BeadChi from Illumina (Ganal et al., 011) containing SNPs. SNPs with a GTScore < 0.7, a call rate < 0.9, a minor allele frequency (MAF) < 0.01 and identical SNPs were excluded from the analysis. Missing values were imuted using the function codegeno() from R ackage synbreed (Wimmer et al., 01) with otion beagleafterfamily. Here, missing values of markers that were monomorhic within one family were imuted with the fixed allele, and missing values in families segregating for the marker were imuted based on information of flanking markers using the software BEAGLE (Browning and Browning, 009). Finally 11,646 high quality SNPs were used for further analysis. We measured air-wise LD between SNPs as squared correlation (r ) between allelic states according to Hill and Table 1 Summary of the simulated and exerimental datasets for grain dry matter yield (GDY) and grain dry matter content (GDC). Reresented are the number of olymorhic single nucleotide olymorhism markers (no. SNP), number of olymorhic quantitative trait loci in the simulated datasets (no. QTL), number of lines (n) and the trait heritability (h ). U reresents an unknown number of QTL. No. SNP No. QTL n h Simulated datasets maizea maizeb Exerimental dataset GDY 11,646 U GDC 11,646 U

4 378 Christina Lehermeier et al.: Sensitivity of Bayesian genome-based rediction models Robertson (1968). The decay of air-wise LD between SNPs on the same chromosome with increasing hysical distance is deicted in Figure 1. We observed a high long-range LD between SNPs, which is tyical for advanced cycle breeding oulations in maize (Ching et al., 00). Genome-based rediction model and Bayesian regularization We used the standard linear model for genome-based rediction of breeding values: y = 1 n 0 +X +e, (1) where y is the n-dimensional vector of henotyes, 1 n an n-dimensional vector of ones, b 0 is the intercet, X is an n incidence matrix with elements x ij (i = 1,, n; j = 1,, ) reresenting the genotye score of line i at marker j, defined by the number of coies of the minor allele at marker j. For fully homozygous lines such as DH lines, the entries can only take values 0 or. The vector is a -dimensional vector of marker effects and e is the n-dimensional vector of residuals. The residuals were assumed to be indeendent and normally distributed with mean 0 and equal variance e, so e N( 0, I e ) with I being the n n identity matrix. This leads to the following data distribution: ( ) f,, N y y x,, () n 0 = i 0 + ij j i=1 j=1 where the notation N(y i μ, ) denotes the conditional density of the normal distribution of y i given the mean μ and variance (de los Camos et al., 013). In the Bayesian framework, rior distributions have to be secified for all unknown arameters. By using Bayes Theorem the osterior distribution of the arameters is assessed by combining the rior and the data distribution (Bernardo and Smith, 00). A frequently used aroach is to use conjugate rior distributions, in order to obtain conditional osterior distributions from the same family of distributions as the rior with udated arameters. For genome-based rediction, with >> n, shrinkage riors are required, which shrink effects towards zero. Meuwissen et al. (001) roosed the Bayesian models BayesA and BayesB for genome-based rediction, which are frequently used. We comared 1.0 Fraction of SNP airs LD (r ) [0, 0.1] (0.1, 0.] (0., 0.4] (0.4, 0.6] (0.6, 0.8] (0.8, 1] 0. 0 [0,10] (10,0] (0,40] (40,60] (60,300] Mb Figure 1 Linkage disequilibrium in the exerimental dataset, measured as squared correlation (r ) of air-wise marker combinations on the same chromosome against hysical distance between them.

5 Christina Lehermeier et al.: Sensitivity of Bayesian genome-based rediction models 379 the redictive abilities and the sensitivity of these models to the choice of hyerarameters with well known shrinkage models, such as the Bayesian Ridge (Kneib et al., 011) and the Bayesian Lasso (Park and Casella, 008). The four models differ with resect to their shrinkage roerties. Whereas Bayesian Lasso, BayesA and BayesB induce marker-secific shrinkage, Bayesian Ridge uses the same shrinkage arameter for all markers. Bayesian Ridge In the Bayesian Ridge model (Kneib et al., 011) a Gaussian rior with mean 0 and common variance assigned to all marker effects j (j = 1,, ): ( ) = ( j ) j= 1 is f N 0,. (3) With a smaller variance the rior density of marker effects is more concentrated around 0 and so effects are shrunken to a greater extent than when the variance is larger. The same rior variance is assigned to all marker effects, so the shrinkage is marker-homogeneous. Each of the variance arameters and e is assigned a scaled inverse-χ rior, f ( ) = e χ ( dfe, Se ) and f ( ) = χ ( df, S), resectively, with df e and df being the degrees of freedom, and S e and S the scale arameters of the corresonding scaled inverse-χ rior distribution. A flat rior distribution is assigned to the intercet 0, with f( 0 ) const. A Gibbs samler is emloyed for osterior inference. A question remains: how hyerarameters of rior distributions f ( e ) and f ( ) should be chosen? According to Pérez et al. (010), arameters can be chosen on rior beliefs that suggest a certain artition of the henotyic variance into residual and genotyic variance, resectively. Here, we focus on the hyerarameter setting of f ( ). In rincile, the aroach for can also be adated for e. The mean and mode of the scaled inverse-χ distribution are mean ( ) = dfs/ ( df ) and mode ( ) = df S / ( df, + ) resectively. The larger the scale S, for fixed df, the larger the mean and mode of the distribution. To get an a riori guess for, a rior exectation about the genotyic variance is used. According to classical quantitative genetics, the henotyic value of an individual y i can be exressed as the sum of its genotyic value and an environmental deviation (Falconer and Mackay, 1996). In the simulated datasets the genotyic value of line i is the true breeding value TBV i. In a oulation the total henotyic variance is given as the sum of the genotyic and the environmental variance comonents. Assuming model (1) holds it follows that TBV = + x. Thus, with marker scores x i 0 j=1 ij j ij assumed to be fixed the genotyic variance of an individual i in the Bayesian Ridge model is derived as = Var x = x, i= 1,..., n. (4) Gi ij j ij j=1 j=1 Following Pérez et al. (010), substituting the individual marker score x ij by the average marker score over n all individuals, xj= 1/ n i=1 xij ( j= 1,..., ), aroximates G= j =1 xj. Thus, the variance of effects is assessed as: G =. (5) = x j j 1 Setting df to a small value, e.g., df = 4, to get a relatively flat rior distribution and alying the a riori exected genotyic variance for from (5), the scale S can be calculated by using the rior mode as: G ( df + ) G ( df + ) S = =. (6) df x df j= 1 j

6 380 Christina Lehermeier et al.: Sensitivity of Bayesian genome-based rediction models Bayesian Lasso The second model we exlored for genome-based rediction is the Bayesian Lasso (Park and Casella, 008; de los Camos et al., 009). Here, conditional Gaussian riors with mean 0 are assigned to the marker effects. In the Bayesian Lasso the variance e τ j of a marker effect is eculiar to marker locus j (j = 1,, ) so that the joint rior distribution, given e and τ is, j f ( τ e) = ( j τ e j), N 0,. j=1 Thus, contrary to marker-homogeneous shrinkage like in Bayesian Ridge, Bayesian Lasso has the otential of roviding marker-heterogeneous shrinkage. The extent of shrinkage deends on τ j, with smaller values of τ j roducing more shrinkage of effect j. For each of the variance arameters τ = { τ j } the same exonential rior distribution is used with f ( τ λ) = j Ex =1 ( τj λ ), under indeendence assumtions. Here, we denote with Ex ( τj λ ) the conditional density of the exonential distribution of τ j given the rate arameter λ. The shrinkage arameter λ can then be either set to a fixed value, or a rior distribution can be assigned to λ. Park and Casella (008) suggested using a Gamma rior for λ, that is, f (λ ) = Gamma(r, δ). For the residual variance e and for the intercet the same rior distributions as in Bayesian Ridge are used. As in Bayesian Ridge, a Gibbs samler is emloyed in Bayesian Lasso. The fully conditional osterior distributions of the unknown arameters are given in de los Camos et al. (009). To find an aroriate λ, the rior variance of the effects = λ j e is used, as derived by Pérez et al. (010). By using the genotyic variance G, an otimal a riori value of λ can be arrived at through the relationshi (7) with h = G/ ( G+ ) ˆ 1 h λ= =, e xj xj G j=1 h j=1 e being the trait heritability (Falconer and Mackay, 1996). (8) BayesA Meuwissen et al. (001) suggested the model BayesA for inferring marker effects in genome-based rediction. Here each of the effects j (j = 1,, ) is assigned a Gaussian rior with mean 0 and variance j, leading to the joint rior distribution ( 1 ) ( j ) f,..., N 0,. = (9) j As oosed to Bayesian Ridge each marker effect j is assigned a marker-secific variance j. This leads to marker-heterogeneous shrinkage as in Bayesian Lasso. However, differently from the Bayesian Lasso where the rior distribution of the variances of marker effects is exonential, in BayesA the same scaled inverse-χ rior f ( ) = χ ( df, S ) j is used. For the intercet 0 a flat rior distribution like in Bayesian Ridge is used, and for the residual variance e the same scaled inverse-χ rior as in Meuwissen et al. (001). A Gibbs samler is emloyed for samling from the joint osterior distribution and the required fully conditional osterior distributions are given in Meuwissen et al. (001). For finding aroriate values for the hyerarameters df and S the same guidelines as in Bayesian Ridge can be used, as the rior distributions are also scaled inverse-χ. j=1 BayesB The model BayesB was suggested by Meuwissen et al. (001) as an extension to BayesA. Due to the assumtion that many SNPs are exected to have no effect on the trait, in difference to BayesA the variances of marker effects are assigned the following mixture distribution as rior: j

7 Christina Lehermeier et al.: Sensitivity of Bayesian genome-based rediction models 381 j = 0 with robability π, ( ) ( ) χ df, S with robability 1. j π Generally j =0 imlies, that the effect j is set to a constant value. Here the conditional rior distributions of the effects j are normal as in BayesA, with mean 0 and variance j. Thus, if the variance j is set to zero, the effect j is set to zero in turn. A Gibbs samler with a Metroolis-Hastings (MH) ste is used, as described in Meuwissen et al. (001), for osterior inference. The MH ste is needed here, because no closed form of the fully conditional osterior distribution of j can be derived due to the mixture distribution. In rincile, the same guidelines as in Bayesian Ridge can be used to choose the hyerarameters df and S, but the robability π of setting j to zero must be taken into account. Considering this in equation (5), can be denoted as j j = ( 1 π) By using the mode of the scaled inverse-χ distribution a scale arameter S can be arrived at by writing S G j=1 ( df + ) G = df ( 1 π) x j., x j= 1 j (10) (11) according to equation (6). Model comarison Hellinger distance The Hellinger distance H(f, g) (Le Cam, 1986), which is also used in Roos and Held (011) to evaluate the sensitivity of models with resect to the choice of rior distributions, measures the distance between two densities f and g: 1 ( ) = ( ) H f, g f( u) g( u) d u. (1) H(f, g) is a symmetric measure, which takes its maximum value of 1, if the density f assigns robability 0 to every data oint to which g assigns a ositive value, and vice versa. The minimum value of H is 0, if f = g. In Gianola et al. (009) it has been ointed out that the results of BayesA and BayesB are highly influenced by the choice of hyerarameters that are assigned to the rior distributions. This lack of Bayesian learning revents the osterior distribution to move far away from the rior distribution, at least for some arameters such as j in BayesA and BayesB. To evaluate this for marker effects, we calculated the Hellinger distances between marginal rior and osterior densities of the marker effects numerically. The marginal osterior density was estimated from the osterior MCMC samles using kernel density estimation with a Gaussian kernel and bandwidth chosen based on Silverman s rule of thumb (Silverman, 1986). To aroximate the integral in H the traezoidal rule was used (Atkinson, 1989). Effective number of arameters To obtain an estimate of the effective number of arameters in the different models, we calculated the D values, L l as suggested by Siegelhalter et al. (00). The D statistic is calculated as ( ) ( ˆ D =1/ L D y, θ D y, θ ( y) 1 ), l= θ = 0,, e, y being the data vector, with θ l being the MCMC samle from the l-th iteration (l = 1,, L) of θ ( ( ))

8 38 Christina Lehermeier et al.: Sensitivity of Bayesian genome-based rediction models ˆ θ ( y ) being the osterior mean of θ, and D(y, θ) = log f(y θ) is the residual deviance. With an informative rior, the effective number of arameters D is generally smaller than the total number of arameters in θ (Gelman et al., 004). Cross-validation We used five-fold cross-validation (CV) to assess the redictive ability of the different models. For fivefold CV, each dataset was randomly divided into five subsets, with four subsets forming the training set and the fifth subset forming the test set. The training set was used to infer the marker effects with the different models, and the test set was used to redict the breeding values based on the genotyic data. Each of the five subsets formed the test set once. The correlation between estimated breeding values ŷ (k) (for k = 1,, 5) and observed henotyic values y (k) cor(ŷ (k), y (k) ) yields an estimate of the redictive ability of a model in fold k, where ŷ (k) ( k) ( ) is obtained from = ˆ k yˆ X ˆ, where the matrix X (k) includes n the marker genotyes of the test set. The effects ˆ were estimated osterior means based on the training set. For the simulated datasets true breeding values (TBVs) are available and the accuracy can be calculated directly. No TBVs are available for exerimental data, so here the accuracy is aroximated according to Dekkers (007) by dividing the redictive correlation by the square-root of the heritability h : ˆ( ) ( ) cor y k, TBV k cor yˆ k, y k h. ( ) ( ) ( ) ( ) Model overview and comutation For every combination of model and dataset, a scenario with hyerarameter setting as denoted earlier as otimal was comuted. Therefore, for Bayesian Ridge, BayesA and BayesB, we set the degrees of freedom of the scaled inverse-χ distribution of to df = 4, for BayesB π was set to 0.8. The otimal scale arameter S was then calculated using equations (6) and (11), resectively. The required genotyic variance G was calculated from the roduct of the trait heritability h and the henotyic variance P. For Bayesian Lasso, an otimal λ was calculated by using equation (8). To judge the influence of the hyerarameters on redictive ability, scenarios with altered hyerarameters were comuted for every model. Table gives an overview of the hyerarameter settings for the different datasets. The models Bayesian Ridge, Bayesian Lasso, BayesA and BayesB are abbreviated as BR, BL, BA and BB, resectively, and different numbers indicate different model scenarios. In the case of BR1, BA1 and BB1 the scale arameter S was set to the value from the formulas given reviously. For the scenarios BR, BA and BB, the chosen scale arameter was ten times the otimal value, and for BR3, BA3 and BB3 the otimal value was divided by ten, resectively. In the BL1 scenarios the arameter λ was set to the otimal value, in BL the otimal value was divided by 10 and in BL3 multilied by 10. This range of hyerarameters was chosen to make rior distributions comarable across scenarios for the four different models. In BL4-6 a Gamma rior was assigned to λ, with shae arameter δ = 0.5 and rate arameter r chosen according to a density of λ with a mode corresonding to the λ values in BL1-3. Comutation was done using the software ackage R (R Develoment Core Team, 01). For Bayesian Ridge and Bayesian Lasso the function BLR() of the R ackage BLR (de los Camos and Pérez, 01) was used. For comutation of BayesA and BayesB we imlemented our own algorithm within R. We used 13,000 iterations for the MCMC algorithms of the different models, with the first 3000 iterations discarded as burn in. Only every 10th samle was retained for storage reasons. Hence, 1000 MCMC samles were used for the calculation of osterior means and densities. The convergence of the Markov chains was checked by visualizing the samling aths and by Geweke s diagnostic of the R ackage coda (Plummer et al., 006).

9 Christina Lehermeier et al.: Sensitivity of Bayesian genome-based rediction models 383 Table Parameter setting for all models in the different datasets. For Bayesian Ridge, BayesA and BayesB the value of S varied, with df = 4 in all scenarios. In BayesB arameter π was set to 0.8. In scenarios BL1-3 different values for λ were chosen, and in BL4-6 the arameter r of the Gamma rior for λ varied, with δ = 0.5 in all scenarios. Model maizea maizeb GDY GDC Bayesian Ridge (S ) BR BR BR Bayesian Lasso with fixed λ (λ) BL BL BL Bayesian Lasso with random λ (r) BL BL BL BayesA (S ) BA BA BA BayesB (S ) BB BB BB Simulation results Full datasets and Hellinger distance The results from the full (not cross-validated) simulated datasets are shown in Table 3. For all models the correlation between henotyic values y and estimated breeding values ŷ and the correlation between TBVs and ŷ is given. Taking the correlation between the true and estimated breeding value cor(tbv, ŷ) as a reference, we interreted an increase of cor(y, ŷ) accomanied by a decrease of cor(tbv, ŷ) as a sign of model overfitting. If both arameters decreased we interreted this to be the result of model underfitting. Furthermore, we reort the effective number of arameters in the models ( D ). For Bayesian Lasso models, we also reort the osterior mean of λ. The fixed λ value in BL1-3 had a great imact on the estimation. Bayesian Ridge and Bayesian Lasso scenarios with random λ (BL4-6) showed similar correlations and D values in both simulated maize datasets. The correlation between TBV and ŷ was highest in these models. In BL3 with a large λ value, effects were shrunken strongly, which is reflected in the small D value and the low correlation between y and ŷ, which indicates underfitting. Here the correlation with TBVs is also low. In BL with small λ, the D statistic and cor(y, ŷ) were high but cor(tbv, ŷ) was decreasing, indicating overfitting on the training data, esecially in the dataset maizeb, where the number of SNPs strongly exceeded the number of observations ( n). It should be noted that the osterior mean of λ in the Bayesian Lasso models with random λ exceeded the otimal λ calculated a riori in all datasets. Hence, fixing λ to an otimal a riori value might distort inference relative to the situations where λ is inferred from the data. We conclude that assigning hyerriors to λ should be referred to assigning a fixed value, as it seemingly allows for Bayesian learning, roviding that the osterior distribution of λ is shar. Similar to the Bayesian Lasso models with different fixed λ, the scale arameter S in BayesA and BayesB had a strong imact on the estimation rocedure. In scenarios BA and BB a higher scale S was assigned to the inverse-χ rior of than in the other BayesA and BayesB

10 384 Christina Lehermeier et al.: Sensitivity of Bayesian genome-based rediction models Table 3 Results from the full (not cross-validated) simulated datasets. The correlations between henotyic values y and estimated breeding values ŷ as well as the correlations between true breeding values (TBV) and ŷ, the effective number of arameters ( D ) and the osterior means of λ from Bayesian Lasso models are given. maizea maizeb Model cor(y, ŷ) cor(tbv, ŷ) D ˆλ Model cor(y, ŷ) cor(tbv, ŷ) D ˆλ Bayesian Ridge Bayesian Ridge BR BR BR BR BR BR Bayesian Lasso Bayesian Lasso BL BL BL BL BL BL BL BL BL BL BL BL BayesA BayesA BA BA BA BA BA BA BayesB BayesB BB BB BB BB BB BB scenarios, leading to less shrinkage of marker effects. This is reflected in the high effective number of arameters in these models ( D ), and in an overfitting of the data, indicated by the high value of cor(y, ŷ) but lower value of cor(tbv, ŷ) comared to the other scenarios. However, the overfitting is less ronounced in BayesB than in BayesA. Vice versa in BA3 and BB3, the small scale value S led to high shrinkage of marker effects and did not result in overfitting. For maizea, we calculated the Hellinger distance H( f, g) between marginal rior (f) and aroximated marginal osterior density (g) of the marker effects. In Figure, the distribution of the Hellinger distance for the different model scenarios is visualized by boxlots for all 1117 SNPs. For Bayesian Ridge and Bayesian Hellinger distance (H) BR1 BR BR3 BL1 BL BL3 BL4 BL5 BL6 BA1 BA BA3 BB1 BB BB3 Figure Distribution of Hellinger distance (H) between the marginal rior and osterior densities of marker effects from different model scenarios, calculated with simulated dataset maizea. Each boxlot dislays the distribution of Hellinger distances of the 1117 marker effects out of each model.

11 Christina Lehermeier et al.: Sensitivity of Bayesian genome-based rediction models 385 Lasso models H was considerably higher than for BayesA and BayesB, indicating that there was less Bayesian learning in BayesA and BayesB, as there is more similarity between the marginal rior and osterior densities of the effects j. The Bayesian Ridge scenarios showed less variability with resect to Hellinger distances of the marker effects than the other models. We conjecture that the marker-heterogeneous shrinkage in Bayesian Lasso, BayesA and BayesB caused the higher variability in these models. Cross-validation We used cross-validation to assess the redictive ability on simulated data. Figure 3 shows the redictive ability and the accuracy from five-fold CV, for all model scenarios in the simulated datasets maizea and maizeb. The mean D value from each CV-fold is deicted as a measure of model comlexity. In the dataset maizea, BL1 yielded the highest redictive ability and accuracy with mean 0.53 and 0.79, resectively. The mean redictive ability of all Bayesian Ridge models and Bayesian Lasso models with random λ was 0.5, and the mean accuracy was aroximately The Bayesian Lasso models with fixed non- otimal λ (BL-3) showed lower redictive abilities and accuracies. The mean redictive ability of BayesA and BayesB models did not exceed 0.5, the highest accuracy with these models was Thus, no BayesA or BayesB scenario exceeded the erformance of Bayesian Ridge and Bayesian Lasso models with random λ, as well as BL1, in terms of redictive ability. However, the redictive ability and accuracy of BayesA models with a Accuracy Predictive ability D mean±sd D BR1 BR BR3 BL1 BL BL3 BL4 BL5 BL6 BA1 BA BA3 BB1 BB BB Accuracy Predictive ability D mean±sd D 0 BR1 BR BR3 BL1 BL BL3 BL4 BL5 BL6 BA1 BA BA3 BB1 BB BB3 100 Figure 3 Predictive abilities and accuracies from five-fold cross-validation, calculated with simulated datasets maizea (uer anel) and maizeb (lower anel). The black dots show the mean redictive ability from five-fold cross-validation for the different scenarios, the gray stars show the mean accuracy. The lines define the interval [mean sd; mean+sd]. The oen circles denote the mean effective number of arameters D in the models.

12 386 Christina Lehermeier et al.: Sensitivity of Bayesian genome-based rediction models non- otimal scale (BA-3) was significantly reduced when comared to BA1 (tested with Student s aired t-test, α = 0.05). In BayesB models the scenarios with non- otimal scale also showed a reduced redictive ability and accuracy, but not as strong as in BayesA models. The highest mean redictive ability in maizeb was 0.64 and the highest mean accuracy was All Bayesian Ridge models, Bayesian Lasso models with random λ and BL1 yielded the same redictive ability and accuracy. BayesA and BayesB with otimal scale had the same redictive ability, but the redictive ability of BayesA and BayesB models with non- otimal S was significantly reduced. It is worth mentioning that the marker-heterogeneous shrinkage in Bayesian Lasso, BayesA and BayesB did not outerform the marker-homogeneous shrinkage of Bayesian Ridge. This did not haen in dataset maizea nor in dataset maizeb, where the number of markers strongly exceeded the number of observations. The higher values of redictive ability and accuracy in the dataset maizeb than in maizea are resumably due to higher heritability, fewer simulated QTL and more markers. As seen in the D values, both over- and underfitting led to reduced redictive abilities and accuracies. Results: exerimental dataset Full dataset The results from the full, not cross-validated, exerimental dataset, for both henotyic traits GDY and GDC, are shown in Table 4. Here the same arameters as for the simulated data are given, excet for the correlation between estimated and true breeding values. In BL, BA and BB cor(y, ŷ) = 1.00 indicates a strong overfitting, because of n and little shrinkage of the effects. For both traits the osterior mean of λ of BL models with random λ (BL4-6) was higher than the calculated otimal value, which was already seen with the simulated datasets. Table 4 Results from the exerimental data. The correlations between henotyic values y and estimated breeding values ŷ, the effective number of arameters ( D ) and the osterior means of λ of Bayesian Lasso models are given. Grain dry matter yield (GDY) Grain dry matter content (GDC) Model cor(y,ŷ) D ˆλ Model cor(y,ŷ) D ˆλ Bayesian Ridge Bayesian Ridge BR BR BR BR BR BR Bayesian Lasso Bayesian Lasso BL BL BL BL BL BL BL BL BL BL BL BL BayesA BayesA BA BA BA BA BA BA BayesB BayesB BB BB BB BB BB BB

13 Christina Lehermeier et al.: Sensitivity of Bayesian genome-based rediction models 387 Cross-validation CV results for the two exerimental traits GDY and GDC are shown in Figure 4. For GDY the highest redictive ability was achieved with BL5 (mean 0.65). All Bayesian Ridge models, all Bayesian Lasso models with random λ and BL1 had an average redictive ability of aroximately Predictive abilities of BayesA and BayesB models were lower but in BA3 and BB3 only slightly. The scenarios with S set to its calculated otimal value (BA1 and BB1) gave lower redictive abilities, than models where a 10 times smaller scale was used (BA3 and BB3). The effective number of arameters D in BA3 and BB3 were in a similar range as in Bayesian Ridge and Bayesian Lasso models with random λ. For the Bayesian Ridge and Bayesian Lasso models the estimated D values met our exectations. However, for BA and BB they were surrisingly low. Due to the reduced shrinkage, the D values should be higher than in BA1 and BB1. The D statistic is based on a number of assumtions (Siegelhalter et al., 00; Celeux et al., 006), which are artially violated in BayesA and BayesB and thus the D statistic might not be a meaningful estimate of the effective number of arameters here. The redictive ability for GDC was higher than for GDY, most likely due to its higher heritability. Here, BL5 also yielded the highest redictive ability, with a mean of The other Bayesian Lasso models with random λ and the Bayesian Ridge models yielded a similar redictive ability. As in the case of GDY, in GDC BayesA and BayesB scenarios also showed more variability in their redictive abilities. The scenarios BA3 and BB3 gave the highest redictive abilities among BayesA and BayesB models, resectively, which was not exected because otimal values for S had been chosen for BA1 and BB1. It seems that the Accuracy Predictive ability D mean±sd D 0 BR1 BR BR3 BL1 BL BL3 BL4 BL5 BL6 BA1 BA BA3 BB1 BB BB Accuracy Predictive ability D mean±sd D BR1 BR BR3 BL1 BL BL3 BL4 BL5 BL6 BA1 BA BA3 BB1 BB BB3 Figure 4 Predictive abilities and accuracies from five-fold cross-validation, calculated with exerimental data, for traits grain dry matter yield (uer anel) and grain dry matter content (lower anel). The black dots show the mean redictive ability from five-fold cross-validation for the different scenarios, the gray stars show the mean accuracy. The lines define the interval [mean sd; mean+sd]. The oen circles denote the mean effective number of arameters D in the models.

14 388 Christina Lehermeier et al.: Sensitivity of Bayesian genome-based rediction models roosed equation for finding an otimal scale S overestimated the true value. Neither for GDY, nor for GDC marker-heterogeneous shrinkage outerformed marker-homogeneous shrinkage, which corroborated the results from the simulated datasets. Discussion We examined the erformance of four Bayesian models for genome-based rediction of breeding values in maize, as well as their sensitivity with resect to the choice of hyerarameters. The models were evaluated with simulated data resembling a commercial maize breeding rogram and with exerimental maize data. A strong influence of rior arameters on the redictive ability was seen in BayesA and BayesB models, as well as in the Bayesian Lasso models with fixed λ. The variation of the scale arameter S in BayesA and BayesB had a strong imact on rediction. Choosing a too large scale S for the rior distribution of variance j led to an overfitting of the data, whereas a too small scale arameter led to underfitting, due to too much shrinkage of the effects. In both cases the redictive ability is considerably reduced. The strong influence of the choice of hyerarameters indicates a lack of Bayesian learning ability in BayesA and BayesB, which was already ointed out by Gianola et al. (009). We quantified the ability of Bayesian learning of the resective models by calculating the Hellinger distance between marginal rior and osterior densities of marker effects for the simulated dataset maizea. A larger distance indicates that the osterior density moved away from the rior density, and that Bayesian learning has taken lace. For BayesA and BayesB models we observed quite small distances, whereas in Bayesian Ridge and Bayesian Lasso the distance between rior and osterior density was much larger. We are aware that a small distance between rior and osterior density can also emerge if a erfect rior density is assigned, however, this haens with robability close to zero if rior knowledge is scant. In combination with a lower redictive ability and the fact, that all BayesA and BayesB scenarios yielded a small Hellinger distance between rior and osterior density, this is very unlikely to be the reason for the small distances found here. From the Hellinger distances it can be seen that Bayesian learning is smaller in BayesA and BayesB than in Bayesian Ridge and Bayesian Lasso and, hence, the influence of the choice of hyerarameters on rediction is large. In BayesA and BayesB the degrees of freedom of the fully conditional osterior distribution of j are df +1, and thus only one degree of freedom higher than the rior degrees of freedom, indeendently of the number of observations (n) or markers () in the model (Gianola et al., 009). In contrast, in Bayesian Ridge the degrees of freedom increase with the number of markers in the model (). In genomic datasets, Bayesian learning is limited due to the n situation. In real life, as with next generation sequencing data, will get even larger, and is exected to increase much more than n. Thus, models with a strong Bayesian learning ability such as the Bayesian Ridge and Bayesian Lasso seem useful. If the rior arameter setting was aroriate, redictive abilities and accuracies of BayesA and BayesB were high and equal to those of Bayesian Ridge and Bayesian Lasso with random λ. However, finding otimal arameters is not straightforward. The roosed guidelines for finding an otimal scale S and λ, according to Pérez et al. (010), did not always yield the best arameter setting in terms of redictive ability. Alternative formulas have been roosed for finding an otimal scale arameter S, e.g., in Habier et al. (010, 011). Both formulas are based on strong assumtions. The formula used in our study suggested by Pérez et al. (010) as well as the formula according to Habier et al. (010, 011), assume indeendence of marker effects, which may be inadequate if strong LD among markers translates into a joint deendence of their effects. In exerimental data there is additional uncertainty in the roosed formulas for finding otimal arameters, because the variance comonents G and e need to be estimated. In ractical alication of genome-based rediction, variance comonents can only be estimated based on the training dataset and not on henotyic values of the test dataset, as these are unknown. Thus, there may be additional uncertainty

15 Christina Lehermeier et al.: Sensitivity of Bayesian genome-based rediction models 389 as the data distribution may change from training to test set. We have chosen hyerarameters based on the mode of the rior distributions. An otion would be to choose hyerarameters based on the mean of the rior densities which would change the numerator of formulas (6) and (11) to G ( df. ) In the case of df = 4 this would lead to an otimal scale S which would be 1/3 of the otimal S derived from the mode of the inverse-c distribution. Thus, our variation of hyerarameter settings is in the order of magnitude of uncertainty due to the choice of formula for hyerarameter calculation and estimation of variance comonents. An alternative to using an ad hoc formula would be to find hyerarameters iteratively via cross-validation, but this would have high comutational costs. Hence, Bayesian models that are less sensitive with resect to the choice of hyerarameters are highly desirable. The Bayesian Ridge model with marker-homogeneous shrinkage was in all datasets among the models with the highest redictive ability. Irresective of the number of markers and observations, marker-secific shrinkage did not outerform marker-homogeneous shrinkage. The erformance in the exerimental dataset was similar as in the simulated datasets for both traits. One reason for the good erformance of Bayesian Ridge may be the large number of QTL affecting the target traits. The simulated datasets comrised more than 300 segregating QTL and, also, the two quantitative traits in the exerimental datasets have been found to be affected by many QTL (Schön et al., 004). Furthermore, substantial long-range LD exists in maize breeding oulations, which was also shown in our data. If there is strong LD, many SNPs are exected to be in LD with at least one QTL, and therefore to have non-zero effects. We conjecture that the large number of QTL and the strong correlation between markers are reasons for the sueriority of the Bayesian Ridge model in terms of redictive abilities. The Bayesian Ridge model is similar to genome enabled best linear unbiased rediction (Pieho, 009). These mixed models have also been shown in data with similar genetic architecture to erform as well as Bayesian models with a more comlex rior setting, e.g., with marker-secific shrinkage (Heslot et al., 01). The sueriority of Bayesian models with marker-heterogeneous shrinkage has been shown mainly in simulation studies with a few simulated QTL (Meuwissen et al., 001; Habier et al., 007). However, such a genetic architecture seems to be unrealistic for truly quantitative traits. If interest lies mainly in the rediction of henotyic traits, there may be little difference if a small effect is assigned to a grou of highly correlated markers, or a larger effect is assigned to only one of them. On the other hand, marker-secific shrinkage models may be advantageous, if one is interested in secific marker effects. It is conjectured, that the erformance of Ridge regression tye models may change comared to that of markersecific shrinkage models when marker coverage is more dense and LD is less ronounced (de los Camos et al., 013). However, with denser marker coverage the ratio /n will further increase and, thus, the influence of the rior density. Hence, the Bayesian Lasso may be advantageous over BayesA and BayesB due to its stronger Bayesian learning ability. In our study, Bayesian Ridge and the Bayesian Lasso with assigning a hyerrior on λ were quite robust, whereas BayesA and BayesB showed a strong sensitivity with resect to the choice of hyerarameters. The ability of Bayesian learning is reduced in these models, as indicated by the Hellinger distance between rior and osterior densities of marker effects. No sueriority of models with marker-secific shrinkage (Bayesian Lasso, BayesA, BayesB) was seen in our maize datasets, with a large number of QTL affecting the quantitative traits and a high long-range LD. Considering also the higher comuting efforts of models with marker-secific shrinkage, we recommend Bayesian Ridge as a robust model for genome-based rediction, if one is mainly interested in the rediction of breeding values in datasets with a similar genetic architecture as those analyzed in this study. Author notes: This research was funded by the German Federal Ministry of Education and Research (BMBF) within the AgroClustEr Synbreed Synergistic lant and animal breeding (FKZ: A). This research was carried out with the suort of the Technische Universität München Institute for Advanced Study, funded by the German Excellence Initiative. We gratefully acknowledge the KWS SAAT AG for roviding the exerimental data.

Lecture 8 Genomic Selection

Lecture 8 Genomic Selection Lecture 8 Genomic Selection Guilherme J. M. Rosa University of Wisconsin-Madison Mixed Models in Quantitative Genetics SISG, Seattle 18 0 Setember 018 OUTLINE Marker Assisted Selection Genomic Selection

More information

4. Score normalization technical details We now discuss the technical details of the score normalization method.

4. Score normalization technical details We now discuss the technical details of the score normalization method. SMT SCORING SYSTEM This document describes the scoring system for the Stanford Math Tournament We begin by giving an overview of the changes to scoring and a non-technical descrition of the scoring rules

More information

Bayesian Spatially Varying Coefficient Models in the Presence of Collinearity

Bayesian Spatially Varying Coefficient Models in the Presence of Collinearity Bayesian Satially Varying Coefficient Models in the Presence of Collinearity David C. Wheeler 1, Catherine A. Calder 1 he Ohio State University 1 Abstract he belief that relationshis between exlanatory

More information

One-way ANOVA Inference for one-way ANOVA

One-way ANOVA Inference for one-way ANOVA One-way ANOVA Inference for one-way ANOVA IPS Chater 12.1 2009 W.H. Freeman and Comany Objectives (IPS Chater 12.1) Inference for one-way ANOVA Comaring means The two-samle t statistic An overview of ANOVA

More information

arxiv: v3 [physics.data-an] 23 May 2011

arxiv: v3 [physics.data-an] 23 May 2011 Date: October, 8 arxiv:.7v [hysics.data-an] May -values for Model Evaluation F. Beaujean, A. Caldwell, D. Kollár, K. Kröninger Max-Planck-Institut für Physik, München, Germany CERN, Geneva, Switzerland

More information

Combining Logistic Regression with Kriging for Mapping the Risk of Occurrence of Unexploded Ordnance (UXO)

Combining Logistic Regression with Kriging for Mapping the Risk of Occurrence of Unexploded Ordnance (UXO) Combining Logistic Regression with Kriging for Maing the Risk of Occurrence of Unexloded Ordnance (UXO) H. Saito (), P. Goovaerts (), S. A. McKenna (2) Environmental and Water Resources Engineering, Deartment

More information

arxiv: v1 [physics.data-an] 26 Oct 2012

arxiv: v1 [physics.data-an] 26 Oct 2012 Constraints on Yield Parameters in Extended Maximum Likelihood Fits Till Moritz Karbach a, Maximilian Schlu b a TU Dortmund, Germany, moritz.karbach@cern.ch b TU Dortmund, Germany, maximilian.schlu@cern.ch

More information

SAS for Bayesian Mediation Analysis

SAS for Bayesian Mediation Analysis Paer 1569-2014 SAS for Bayesian Mediation Analysis Miočević Milica, Arizona State University; David P. MacKinnon, Arizona State University ABSTRACT Recent statistical mediation analysis research focuses

More information

STA 250: Statistics. Notes 7. Bayesian Approach to Statistics. Book chapters: 7.2

STA 250: Statistics. Notes 7. Bayesian Approach to Statistics. Book chapters: 7.2 STA 25: Statistics Notes 7. Bayesian Aroach to Statistics Book chaters: 7.2 1 From calibrating a rocedure to quantifying uncertainty We saw that the central idea of classical testing is to rovide a rigorous

More information

General Linear Model Introduction, Classes of Linear models and Estimation

General Linear Model Introduction, Classes of Linear models and Estimation Stat 740 General Linear Model Introduction, Classes of Linear models and Estimation An aim of scientific enquiry: To describe or to discover relationshis among events (variables) in the controlled (laboratory)

More information

Uncorrelated Multilinear Principal Component Analysis for Unsupervised Multilinear Subspace Learning

Uncorrelated Multilinear Principal Component Analysis for Unsupervised Multilinear Subspace Learning TNN-2009-P-1186.R2 1 Uncorrelated Multilinear Princial Comonent Analysis for Unsuervised Multilinear Subsace Learning Haiing Lu, K. N. Plataniotis and A. N. Venetsanooulos The Edward S. Rogers Sr. Deartment

More information

Information collection on a graph

Information collection on a graph Information collection on a grah Ilya O. Ryzhov Warren Powell October 25, 2009 Abstract We derive a knowledge gradient olicy for an otimal learning roblem on a grah, in which we use sequential measurements

More information

Information collection on a graph

Information collection on a graph Information collection on a grah Ilya O. Ryzhov Warren Powell February 10, 2010 Abstract We derive a knowledge gradient olicy for an otimal learning roblem on a grah, in which we use sequential measurements

More information

Genetic variation of polygenic characters and the evolution of genetic degeneracy

Genetic variation of polygenic characters and the evolution of genetic degeneracy Genetic variation of olygenic characters and the evolution of genetic degeneracy S. A. FRANK Deartment of Ecology and Evolutionary Biology, University of California, Irvine, CA, USA Keywords: mutation;

More information

A Comparison between Biased and Unbiased Estimators in Ordinary Least Squares Regression

A Comparison between Biased and Unbiased Estimators in Ordinary Least Squares Regression Journal of Modern Alied Statistical Methods Volume Issue Article 7 --03 A Comarison between Biased and Unbiased Estimators in Ordinary Least Squares Regression Ghadban Khalaf King Khalid University, Saudi

More information

Hotelling s Two- Sample T 2

Hotelling s Two- Sample T 2 Chater 600 Hotelling s Two- Samle T Introduction This module calculates ower for the Hotelling s two-grou, T-squared (T) test statistic. Hotelling s T is an extension of the univariate two-samle t-test

More information

Finite Mixture EFA in Mplus

Finite Mixture EFA in Mplus Finite Mixture EFA in Mlus November 16, 2007 In this document we describe the Mixture EFA model estimated in Mlus. Four tyes of deendent variables are ossible in this model: normally distributed, ordered

More information

Bayesian Model Averaging Kriging Jize Zhang and Alexandros Taflanidis

Bayesian Model Averaging Kriging Jize Zhang and Alexandros Taflanidis HIPAD LAB: HIGH PERFORMANCE SYSTEMS LABORATORY DEPARTMENT OF CIVIL AND ENVIRONMENTAL ENGINEERING AND EARTH SCIENCES Bayesian Model Averaging Kriging Jize Zhang and Alexandros Taflanidis Why use metamodeling

More information

Supplemental Information

Supplemental Information Sulemental Information Anthony J. Greenberg, Sean R. Hacett, Lawrence G. Harshman and Andrew G. Clar Table of Contents Table S1 2 Table S2 3 Table S3 4 Figure S1 5 Figure S2 6 Figure S3 7 Figure S4 8 Text

More information

Bayesian inference & Markov chain Monte Carlo. Note 1: Many slides for this lecture were kindly provided by Paul Lewis and Mark Holder

Bayesian inference & Markov chain Monte Carlo. Note 1: Many slides for this lecture were kindly provided by Paul Lewis and Mark Holder Bayesian inference & Markov chain Monte Carlo Note 1: Many slides for this lecture were kindly rovided by Paul Lewis and Mark Holder Note 2: Paul Lewis has written nice software for demonstrating Markov

More information

On split sample and randomized confidence intervals for binomial proportions

On split sample and randomized confidence intervals for binomial proportions On slit samle and randomized confidence intervals for binomial roortions Måns Thulin Deartment of Mathematics, Usala University arxiv:1402.6536v1 [stat.me] 26 Feb 2014 Abstract Slit samle methods have

More information

Variable Selection and Model Building

Variable Selection and Model Building LINEAR REGRESSION ANALYSIS MODULE XIII Lecture - 38 Variable Selection and Model Building Dr. Shalabh Deartment of Mathematics and Statistics Indian Institute of Technology Kanur Evaluation of subset regression

More information

(Genome-wide) association analysis

(Genome-wide) association analysis (Genome-wide) association analysis 1 Key concepts Mapping QTL by association relies on linkage disequilibrium in the population; LD can be caused by close linkage between a QTL and marker (= good) or by

More information

Radial Basis Function Networks: Algorithms

Radial Basis Function Networks: Algorithms Radial Basis Function Networks: Algorithms Introduction to Neural Networks : Lecture 13 John A. Bullinaria, 2004 1. The RBF Maing 2. The RBF Network Architecture 3. Comutational Power of RBF Networks 4.

More information

Collaborative Place Models Supplement 1

Collaborative Place Models Supplement 1 Collaborative Place Models Sulement Ber Kaicioglu Foursquare Labs ber.aicioglu@gmail.com Robert E. Schaire Princeton University schaire@cs.rinceton.edu David S. Rosenberg P Mobile Labs david.davidr@gmail.com

More information

AI*IA 2003 Fusion of Multiple Pattern Classifiers PART III

AI*IA 2003 Fusion of Multiple Pattern Classifiers PART III AI*IA 23 Fusion of Multile Pattern Classifiers PART III AI*IA 23 Tutorial on Fusion of Multile Pattern Classifiers by F. Roli 49 Methods for fusing multile classifiers Methods for fusing multile classifiers

More information

Genetic Algorithms, Selection Schemes, and the Varying Eects of Noise. IlliGAL Report No November Department of General Engineering

Genetic Algorithms, Selection Schemes, and the Varying Eects of Noise. IlliGAL Report No November Department of General Engineering Genetic Algorithms, Selection Schemes, and the Varying Eects of Noise Brad L. Miller Det. of Comuter Science University of Illinois at Urbana-Chamaign David E. Goldberg Det. of General Engineering University

More information

MATHEMATICAL MODELLING OF THE WIRELESS COMMUNICATION NETWORK

MATHEMATICAL MODELLING OF THE WIRELESS COMMUNICATION NETWORK Comuter Modelling and ew Technologies, 5, Vol.9, o., 3-39 Transort and Telecommunication Institute, Lomonosov, LV-9, Riga, Latvia MATHEMATICAL MODELLIG OF THE WIRELESS COMMUICATIO ETWORK M. KOPEETSK Deartment

More information

PERFORMANCE BASED DESIGN SYSTEM FOR CONCRETE MIXTURE WITH MULTI-OPTIMIZING GENETIC ALGORITHM

PERFORMANCE BASED DESIGN SYSTEM FOR CONCRETE MIXTURE WITH MULTI-OPTIMIZING GENETIC ALGORITHM PERFORMANCE BASED DESIGN SYSTEM FOR CONCRETE MIXTURE WITH MULTI-OPTIMIZING GENETIC ALGORITHM Takafumi Noguchi 1, Iei Maruyama 1 and Manabu Kanematsu 1 1 Deartment of Architecture, University of Tokyo,

More information

Estimation of the large covariance matrix with two-step monotone missing data

Estimation of the large covariance matrix with two-step monotone missing data Estimation of the large covariance matrix with two-ste monotone missing data Masashi Hyodo, Nobumichi Shutoh 2, Takashi Seo, and Tatjana Pavlenko 3 Deartment of Mathematical Information Science, Tokyo

More information

Using the Divergence Information Criterion for the Determination of the Order of an Autoregressive Process

Using the Divergence Information Criterion for the Determination of the Order of an Autoregressive Process Using the Divergence Information Criterion for the Determination of the Order of an Autoregressive Process P. Mantalos a1, K. Mattheou b, A. Karagrigoriou b a.deartment of Statistics University of Lund

More information

Bayesian Networks Practice

Bayesian Networks Practice Bayesian Networks Practice Part 2 2016-03-17 Byoung-Hee Kim, Seong-Ho Son Biointelligence Lab, CSE, Seoul National University Agenda Probabilistic Inference in Bayesian networks Probability basics D-searation

More information

Probability Estimates for Multi-class Classification by Pairwise Coupling

Probability Estimates for Multi-class Classification by Pairwise Coupling Probability Estimates for Multi-class Classification by Pairwise Couling Ting-Fan Wu Chih-Jen Lin Deartment of Comuter Science National Taiwan University Taiei 06, Taiwan Ruby C. Weng Deartment of Statistics

More information

Research Note REGRESSION ANALYSIS IN MARKOV CHAIN * A. Y. ALAMUTI AND M. R. MESHKANI **

Research Note REGRESSION ANALYSIS IN MARKOV CHAIN * A. Y. ALAMUTI AND M. R. MESHKANI ** Iranian Journal of Science & Technology, Transaction A, Vol 3, No A3 Printed in The Islamic Reublic of Iran, 26 Shiraz University Research Note REGRESSION ANALYSIS IN MARKOV HAIN * A Y ALAMUTI AND M R

More information

MODELING THE RELIABILITY OF C4ISR SYSTEMS HARDWARE/SOFTWARE COMPONENTS USING AN IMPROVED MARKOV MODEL

MODELING THE RELIABILITY OF C4ISR SYSTEMS HARDWARE/SOFTWARE COMPONENTS USING AN IMPROVED MARKOV MODEL Technical Sciences and Alied Mathematics MODELING THE RELIABILITY OF CISR SYSTEMS HARDWARE/SOFTWARE COMPONENTS USING AN IMPROVED MARKOV MODEL Cezar VASILESCU Regional Deartment of Defense Resources Management

More information

Adaptive estimation with change detection for streaming data

Adaptive estimation with change detection for streaming data Adative estimation with change detection for streaming data A thesis resented for the degree of Doctor of Philosohy of the University of London and the Diloma of Imerial College by Dean Adam Bodenham Deartment

More information

Biitlli Biointelligence Laboratory Lb School of Computer Science and Engineering. Seoul National University

Biitlli Biointelligence Laboratory Lb School of Computer Science and Engineering. Seoul National University Monte Carlo Samling Chater 4 2009 Course on Probabilistic Grahical Models Artificial Neural Networs, Studies in Artificial i lintelligence and Cognitive i Process Biitlli Biointelligence Laboratory Lb

More information

Evaluating Circuit Reliability Under Probabilistic Gate-Level Fault Models

Evaluating Circuit Reliability Under Probabilistic Gate-Level Fault Models Evaluating Circuit Reliability Under Probabilistic Gate-Level Fault Models Ketan N. Patel, Igor L. Markov and John P. Hayes University of Michigan, Ann Arbor 48109-2122 {knatel,imarkov,jhayes}@eecs.umich.edu

More information

GENOMIC SELECTION WORKSHOP: Hands on Practical Sessions (BL)

GENOMIC SELECTION WORKSHOP: Hands on Practical Sessions (BL) GENOMIC SELECTION WORKSHOP: Hands on Practical Sessions (BL) Paulino Pérez 1 José Crossa 2 1 ColPos-México 2 CIMMyT-México September, 2014. SLU,Sweden GENOMIC SELECTION WORKSHOP:Hands on Practical Sessions

More information

Population Genetics II (Selection + Haplotype analyses)

Population Genetics II (Selection + Haplotype analyses) 26 th Oct 2015 Poulation Genetics II (Selection + Halotye analyses) Gurinder Singh Mickey twal Center for Quantitative iology Natural Selection Model (Molecular Evolution) llele frequency Embryos Selection

More information

Covariance Matrix Estimation for Reinforcement Learning

Covariance Matrix Estimation for Reinforcement Learning Covariance Matrix Estimation for Reinforcement Learning Tomer Lancewicki Deartment of Electrical Engineering and Comuter Science University of Tennessee Knoxville, TN 37996 tlancewi@utk.edu Itamar Arel

More information

A MIXED CONTROL CHART ADAPTED TO THE TRUNCATED LIFE TEST BASED ON THE WEIBULL DISTRIBUTION

A MIXED CONTROL CHART ADAPTED TO THE TRUNCATED LIFE TEST BASED ON THE WEIBULL DISTRIBUTION O P E R A T I O N S R E S E A R C H A N D D E C I S I O N S No. 27 DOI:.5277/ord73 Nasrullah KHAN Muhammad ASLAM 2 Kyung-Jun KIM 3 Chi-Hyuck JUN 4 A MIXED CONTROL CHART ADAPTED TO THE TRUNCATED LIFE TEST

More information

Yixi Shi. Jose Blanchet. IEOR Department Columbia University New York, NY 10027, USA. IEOR Department Columbia University New York, NY 10027, USA

Yixi Shi. Jose Blanchet. IEOR Department Columbia University New York, NY 10027, USA. IEOR Department Columbia University New York, NY 10027, USA Proceedings of the 2011 Winter Simulation Conference S. Jain, R. R. Creasey, J. Himmelsach, K. P. White, and M. Fu, eds. EFFICIENT RARE EVENT SIMULATION FOR HEAVY-TAILED SYSTEMS VIA CROSS ENTROPY Jose

More information

arxiv: v2 [stat.me] 3 Nov 2014

arxiv: v2 [stat.me] 3 Nov 2014 onarametric Stein-tye Shrinkage Covariance Matrix Estimators in High-Dimensional Settings Anestis Touloumis Cancer Research UK Cambridge Institute University of Cambridge Cambridge CB2 0RE, U.K. Anestis.Touloumis@cruk.cam.ac.uk

More information

Learning Sequence Motif Models Using Gibbs Sampling

Learning Sequence Motif Models Using Gibbs Sampling Learning Sequence Motif Models Using Gibbs Samling BMI/CS 776 www.biostat.wisc.edu/bmi776/ Sring 2018 Anthony Gitter gitter@biostat.wisc.edu These slides excluding third-arty material are licensed under

More information

Towards understanding the Lorenz curve using the Uniform distribution. Chris J. Stephens. Newcastle City Council, Newcastle upon Tyne, UK

Towards understanding the Lorenz curve using the Uniform distribution. Chris J. Stephens. Newcastle City Council, Newcastle upon Tyne, UK Towards understanding the Lorenz curve using the Uniform distribution Chris J. Stehens Newcastle City Council, Newcastle uon Tyne, UK (For the Gini-Lorenz Conference, University of Siena, Italy, May 2005)

More information

Session 5: Review of Classical Astrodynamics

Session 5: Review of Classical Astrodynamics Session 5: Review of Classical Astrodynamics In revious lectures we described in detail the rocess to find the otimal secific imulse for a articular situation. Among the mission requirements that serve

More information

CHAPTER 5 STATISTICAL INFERENCE. 1.0 Hypothesis Testing. 2.0 Decision Errors. 3.0 How a Hypothesis is Tested. 4.0 Test for Goodness of Fit

CHAPTER 5 STATISTICAL INFERENCE. 1.0 Hypothesis Testing. 2.0 Decision Errors. 3.0 How a Hypothesis is Tested. 4.0 Test for Goodness of Fit Chater 5 Statistical Inference 69 CHAPTER 5 STATISTICAL INFERENCE.0 Hyothesis Testing.0 Decision Errors 3.0 How a Hyothesis is Tested 4.0 Test for Goodness of Fit 5.0 Inferences about Two Means It ain't

More information

Chapter 10. Supplemental Text Material

Chapter 10. Supplemental Text Material Chater 1. Sulemental Tet Material S1-1. The Covariance Matri of the Regression Coefficients In Section 1-3 of the tetbook, we show that the least squares estimator of β in the linear regression model y=

More information

GENOME-ENABLED PREDICTION RIDGE REGRESSION GENOMIC BLUP ROBUST METHODS: TMAP AND LMAP

GENOME-ENABLED PREDICTION RIDGE REGRESSION GENOMIC BLUP ROBUST METHODS: TMAP AND LMAP GENOME-ENABLED PREDICTION RIDGE REGRESSION GENOMIC BLUP ROBUST METHODS: TMAP AND LMAP RIDGE REGRESSION THE ESTIMATOR (HOERL AND KENNARD, 1979) WAS DERIVED USING A CONSTRAINED MINIMIZATION ARGUMENT Recirocal

More information

Genomic selection models for directional dominance: an example for litter size in pigs

Genomic selection models for directional dominance: an example for litter size in pigs Varona et al. Genet Sel Evol 208 50: htts://doi.org/0.86/s27-08-0374- Genetics Selection Evolution RESEARCH ARTICLE Oen Access Genomic selection models for directional dominance: an examle for litter size

More information

On parameter estimation in deformable models

On parameter estimation in deformable models Downloaded from orbitdtudk on: Dec 7, 07 On arameter estimation in deformable models Fisker, Rune; Carstensen, Jens Michael Published in: Proceedings of the 4th International Conference on Pattern Recognition

More information

Ecological Resemblance. Ecological Resemblance. Modes of Analysis. - Outline - Welcome to Paradise

Ecological Resemblance. Ecological Resemblance. Modes of Analysis. - Outline - Welcome to Paradise Ecological Resemblance - Outline - Ecological Resemblance Mode of analysis Analytical saces Association Coefficients Q-mode similarity coefficients Symmetrical binary coefficients Asymmetrical binary coefficients

More information

Deriving Indicator Direct and Cross Variograms from a Normal Scores Variogram Model (bigaus-full) David F. Machuca Mory and Clayton V.

Deriving Indicator Direct and Cross Variograms from a Normal Scores Variogram Model (bigaus-full) David F. Machuca Mory and Clayton V. Deriving ndicator Direct and Cross Variograms from a Normal Scores Variogram Model (bigaus-full) David F. Machuca Mory and Clayton V. Deutsch Centre for Comutational Geostatistics Deartment of Civil &

More information

Tests for Two Proportions in a Stratified Design (Cochran/Mantel-Haenszel Test)

Tests for Two Proportions in a Stratified Design (Cochran/Mantel-Haenszel Test) Chater 225 Tests for Two Proortions in a Stratified Design (Cochran/Mantel-Haenszel Test) Introduction In a stratified design, the subects are selected from two or more strata which are formed from imortant

More information

A New Asymmetric Interaction Ridge (AIR) Regression Method

A New Asymmetric Interaction Ridge (AIR) Regression Method A New Asymmetric Interaction Ridge (AIR) Regression Method by Kristofer Månsson, Ghazi Shukur, and Pär Sölander The Swedish Retail Institute, HUI Research, Stockholm, Sweden. Deartment of Economics and

More information

Plotting the Wilson distribution

Plotting the Wilson distribution , Survey of English Usage, University College London Setember 018 1 1. Introduction We have discussed the Wilson score interval at length elsewhere (Wallis 013a, b). Given an observed Binomial roortion

More information

VIBRATION ANALYSIS OF BEAMS WITH MULTIPLE CONSTRAINED LAYER DAMPING PATCHES

VIBRATION ANALYSIS OF BEAMS WITH MULTIPLE CONSTRAINED LAYER DAMPING PATCHES Journal of Sound and Vibration (998) 22(5), 78 85 VIBRATION ANALYSIS OF BEAMS WITH MULTIPLE CONSTRAINED LAYER DAMPING PATCHES Acoustics and Dynamics Laboratory, Deartment of Mechanical Engineering, The

More information

Use of Transformations and the Repeated Statement in PROC GLM in SAS Ed Stanek

Use of Transformations and the Repeated Statement in PROC GLM in SAS Ed Stanek Use of Transformations and the Reeated Statement in PROC GLM in SAS Ed Stanek Introduction We describe how the Reeated Statement in PROC GLM in SAS transforms the data to rovide tests of hyotheses of interest.

More information

Estimating function analysis for a class of Tweedie regression models

Estimating function analysis for a class of Tweedie regression models Title Estimating function analysis for a class of Tweedie regression models Author Wagner Hugo Bonat Deartamento de Estatística - DEST, Laboratório de Estatística e Geoinformação - LEG, Universidade Federal

More information

Hidden Predictors: A Factor Analysis Primer

Hidden Predictors: A Factor Analysis Primer Hidden Predictors: A Factor Analysis Primer Ryan C Sanchez Western Washington University Factor Analysis is a owerful statistical method in the modern research sychologist s toolbag When used roerly, factor

More information

Shadow Computing: An Energy-Aware Fault Tolerant Computing Model

Shadow Computing: An Energy-Aware Fault Tolerant Computing Model Shadow Comuting: An Energy-Aware Fault Tolerant Comuting Model Bryan Mills, Taieb Znati, Rami Melhem Deartment of Comuter Science University of Pittsburgh (bmills, znati, melhem)@cs.itt.edu Index Terms

More information

Modeling and Estimation of Full-Chip Leakage Current Considering Within-Die Correlation

Modeling and Estimation of Full-Chip Leakage Current Considering Within-Die Correlation 6.3 Modeling and Estimation of Full-Chi Leaage Current Considering Within-Die Correlation Khaled R. eloue, Navid Azizi, Farid N. Najm Deartment of ECE, University of Toronto,Toronto, Ontario, Canada {haled,nazizi,najm}@eecg.utoronto.ca

More information

Morten Frydenberg Section for Biostatistics Version :Friday, 05 September 2014

Morten Frydenberg Section for Biostatistics Version :Friday, 05 September 2014 Morten Frydenberg Section for Biostatistics Version :Friday, 05 Setember 204 All models are aroximations! The best model does not exist! Comlicated models needs a lot of data. lower your ambitions or get

More information

ME scope Application Note 16

ME scope Application Note 16 ME scoe Alication Note 16 Integration & Differentiation of FFs and Mode Shaes NOTE: The stes used in this Alication Note can be dulicated using any Package that includes the VES-36 Advanced Signal Processing

More information

CHAPTER-II Control Charts for Fraction Nonconforming using m-of-m Runs Rules

CHAPTER-II Control Charts for Fraction Nonconforming using m-of-m Runs Rules CHAPTER-II Control Charts for Fraction Nonconforming using m-of-m Runs Rules. Introduction: The is widely used in industry to monitor the number of fraction nonconforming units. A nonconforming unit is

More information

FE FORMULATIONS FOR PLASTICITY

FE FORMULATIONS FOR PLASTICITY G These slides are designed based on the book: Finite Elements in Plasticity Theory and Practice, D.R.J. Owen and E. Hinton, 1970, Pineridge Press Ltd., Swansea, UK. 1 Course Content: A INTRODUCTION AND

More information

An Ant Colony Optimization Approach to the Probabilistic Traveling Salesman Problem

An Ant Colony Optimization Approach to the Probabilistic Traveling Salesman Problem An Ant Colony Otimization Aroach to the Probabilistic Traveling Salesman Problem Leonora Bianchi 1, Luca Maria Gambardella 1, and Marco Dorigo 2 1 IDSIA, Strada Cantonale Galleria 2, CH-6928 Manno, Switzerland

More information

MCVEAN (2002) showed that predictions for r 2,a

MCVEAN (2002) showed that predictions for r 2,a oyright Ó 2008 by the Genetics Society of America DOI: 10.1534/genetics.108.092270 Note The Influence of Gene onversion on Linkage Diseuilibrium Around a Selective Swee Danielle A. Jones 1 and John Wakeley

More information

START Selected Topics in Assurance

START Selected Topics in Assurance START Selected Toics in Assurance Related Technologies Table of Contents Introduction Statistical Models for Simle Systems (U/Down) and Interretation Markov Models for Simle Systems (U/Down) and Interretation

More information

INTRODUCING THE SHEAR-CAP MATERIAL CRITERION TO AN ICE RUBBLE LOAD MODEL

INTRODUCING THE SHEAR-CAP MATERIAL CRITERION TO AN ICE RUBBLE LOAD MODEL Symosium on Ice (26) INTRODUCING THE SHEAR-CAP MATERIAL CRITERION TO AN ICE RUBBLE LOAD MODEL Mohamed O. ElSeify and Thomas G. Brown University of Calgary, Calgary, Canada ABSTRACT Current ice rubble load

More information

Paper C Exact Volume Balance Versus Exact Mass Balance in Compositional Reservoir Simulation

Paper C Exact Volume Balance Versus Exact Mass Balance in Compositional Reservoir Simulation Paer C Exact Volume Balance Versus Exact Mass Balance in Comositional Reservoir Simulation Submitted to Comutational Geosciences, December 2005. Exact Volume Balance Versus Exact Mass Balance in Comositional

More information

Estimating Internal Consistency Using Bayesian Methods

Estimating Internal Consistency Using Bayesian Methods Journal of Modern Alied Statistical Methods Volume 0 Issue Article 25 5--20 Estimating Internal Consistency Using Bayesian Methods Miguel A. Padilla Old Dominion University, maadill@odu.edu Guili Zhang

More information

An Analysis of Reliable Classifiers through ROC Isometrics

An Analysis of Reliable Classifiers through ROC Isometrics An Analysis of Reliable Classifiers through ROC Isometrics Stijn Vanderlooy s.vanderlooy@cs.unimaas.nl Ida G. Srinkhuizen-Kuyer kuyer@cs.unimaas.nl Evgueni N. Smirnov smirnov@cs.unimaas.nl MICC-IKAT, Universiteit

More information

MULTIVARIATE STATISTICAL PROCESS OF HOTELLING S T CONTROL CHARTS PROCEDURES WITH INDUSTRIAL APPLICATION

MULTIVARIATE STATISTICAL PROCESS OF HOTELLING S T CONTROL CHARTS PROCEDURES WITH INDUSTRIAL APPLICATION Journal of Statistics: Advances in heory and Alications Volume 8, Number, 07, Pages -44 Available at htt://scientificadvances.co.in DOI: htt://dx.doi.org/0.864/jsata_700868 MULIVARIAE SAISICAL PROCESS

More information

Approximating min-max k-clustering

Approximating min-max k-clustering Aroximating min-max k-clustering Asaf Levin July 24, 2007 Abstract We consider the roblems of set artitioning into k clusters with minimum total cost and minimum of the maximum cost of a cluster. The cost

More information

Machine Learning: Homework 4

Machine Learning: Homework 4 10-601 Machine Learning: Homework 4 Due 5.m. Monday, February 16, 2015 Instructions Late homework olicy: Homework is worth full credit if submitted before the due date, half credit during the next 48 hours,

More information

Numerical Methods for Particle Tracing in Vector Fields

Numerical Methods for Particle Tracing in Vector Fields On-Line Visualization Notes Numerical Methods for Particle Tracing in Vector Fields Kenneth I. Joy Visualization and Grahics Research Laboratory Deartment of Comuter Science University of California, Davis

More information

Scaling Multiple Point Statistics for Non-Stationary Geostatistical Modeling

Scaling Multiple Point Statistics for Non-Stationary Geostatistical Modeling Scaling Multile Point Statistics or Non-Stationary Geostatistical Modeling Julián M. Ortiz, Steven Lyster and Clayton V. Deutsch Centre or Comutational Geostatistics Deartment o Civil & Environmental Engineering

More information

Efficient Approximations for Call Admission Control Performance Evaluations in Multi-Service Networks

Efficient Approximations for Call Admission Control Performance Evaluations in Multi-Service Networks Efficient Aroximations for Call Admission Control Performance Evaluations in Multi-Service Networks Emre A. Yavuz, and Victor C. M. Leung Deartment of Electrical and Comuter Engineering The University

More information

Biostat Methods STAT 5500/6500 Handout #12: Methods and Issues in (Binary Response) Logistic Regression

Biostat Methods STAT 5500/6500 Handout #12: Methods and Issues in (Binary Response) Logistic Regression Biostat Methods STAT 5500/6500 Handout #12: Methods and Issues in (Binary Resonse) Logistic Regression Recall general χ 2 test setu: Y 0 1 Trt 0 a b Trt 1 c d I. Basic logistic regression Previously (Handout

More information

Using a Computational Intelligence Hybrid Approach to Recognize the Faults of Variance Shifts for a Manufacturing Process

Using a Computational Intelligence Hybrid Approach to Recognize the Faults of Variance Shifts for a Manufacturing Process Journal of Industrial and Intelligent Information Vol. 4, No. 2, March 26 Using a Comutational Intelligence Hybrid Aroach to Recognize the Faults of Variance hifts for a Manufacturing Process Yuehjen E.

More information

Parallel Quantum-inspired Genetic Algorithm for Combinatorial Optimization Problem

Parallel Quantum-inspired Genetic Algorithm for Combinatorial Optimization Problem Parallel Quantum-insired Genetic Algorithm for Combinatorial Otimization Problem Kuk-Hyun Han Kui-Hong Park Chi-Ho Lee Jong-Hwan Kim Det. of Electrical Engineering and Comuter Science, Korea Advanced Institute

More information

DIFFERENTIAL evolution (DE) [3] has become a popular

DIFFERENTIAL evolution (DE) [3] has become a popular Self-adative Differential Evolution with Neighborhood Search Zhenyu Yang, Ke Tang and Xin Yao Abstract In this aer we investigate several self-adative mechanisms to imrove our revious work on [], which

More information

Lower Confidence Bound for Process-Yield Index S pk with Autocorrelated Process Data

Lower Confidence Bound for Process-Yield Index S pk with Autocorrelated Process Data Quality Technology & Quantitative Management Vol. 1, No.,. 51-65, 15 QTQM IAQM 15 Lower onfidence Bound for Process-Yield Index with Autocorrelated Process Data Fu-Kwun Wang * and Yeneneh Tamirat Deartment

More information

arxiv: v4 [math.st] 3 Jun 2016

arxiv: v4 [math.st] 3 Jun 2016 Electronic Journal of Statistics ISSN: 1935-7524 arxiv: math.pr/0000000 Bayesian Estimation Under Informative Samling Terrance D. Savitsky and Daniell Toth 2 Massachusetts Ave. N.E, Washington, D.C. 20212

More information

Optimal Learning Policies for the Newsvendor Problem with Censored Demand and Unobservable Lost Sales

Optimal Learning Policies for the Newsvendor Problem with Censored Demand and Unobservable Lost Sales Otimal Learning Policies for the Newsvendor Problem with Censored Demand and Unobservable Lost Sales Diana Negoescu Peter Frazier Warren Powell Abstract In this aer, we consider a version of the newsvendor

More information

A generalization of Amdahl's law and relative conditions of parallelism

A generalization of Amdahl's law and relative conditions of parallelism A generalization of Amdahl's law and relative conditions of arallelism Author: Gianluca Argentini, New Technologies and Models, Riello Grou, Legnago (VR), Italy. E-mail: gianluca.argentini@riellogrou.com

More information

Determining Momentum and Energy Corrections for g1c Using Kinematic Fitting

Determining Momentum and Energy Corrections for g1c Using Kinematic Fitting CLAS-NOTE 4-17 Determining Momentum and Energy Corrections for g1c Using Kinematic Fitting Mike Williams, Doug Alegate and Curtis A. Meyer Carnegie Mellon University June 7, 24 Abstract We have used the

More information

Notes on Instrumental Variables Methods

Notes on Instrumental Variables Methods Notes on Instrumental Variables Methods Michele Pellizzari IGIER-Bocconi, IZA and frdb 1 The Instrumental Variable Estimator Instrumental variable estimation is the classical solution to the roblem of

More information

738 SCIENCE IN CHINA (Series A) Vol. 46 Let y = (x 1 x ) and the random variable ß m be the number of sibs' alleles shared identity by descent (IBD) a

738 SCIENCE IN CHINA (Series A) Vol. 46 Let y = (x 1 x ) and the random variable ß m be the number of sibs' alleles shared identity by descent (IBD) a Vol. 46 No. 6 SCIENCE IN CHINA (Series A) November 003 The otimal design for hyothesis test and its alication in genetic linkage analysis IE Minyu (Λ Ξ) 1; & LI Zhaohai ( Π ) 1. Deartment of Statistics,

More information

System Reliability Estimation and Confidence Regions from Subsystem and Full System Tests

System Reliability Estimation and Confidence Regions from Subsystem and Full System Tests 009 American Control Conference Hyatt Regency Riverfront, St. Louis, MO, USA June 0-, 009 FrB4. System Reliability Estimation and Confidence Regions from Subsystem and Full System Tests James C. Sall Abstract

More information

A Qualitative Event-based Approach to Multiple Fault Diagnosis in Continuous Systems using Structural Model Decomposition

A Qualitative Event-based Approach to Multiple Fault Diagnosis in Continuous Systems using Structural Model Decomposition A Qualitative Event-based Aroach to Multile Fault Diagnosis in Continuous Systems using Structural Model Decomosition Matthew J. Daigle a,,, Anibal Bregon b,, Xenofon Koutsoukos c, Gautam Biswas c, Belarmino

More information

Percolation Thresholds of Updated Posteriors for Tracking Causal Markov Processes in Complex Networks

Percolation Thresholds of Updated Posteriors for Tracking Causal Markov Processes in Complex Networks Percolation Thresholds of Udated Posteriors for Tracing Causal Marov Processes in Comlex Networs We resent the adative measurement selection robarxiv:95.2236v1 stat.ml 14 May 29 Patric L. Harrington Jr.

More information

Published: 14 October 2013

Published: 14 October 2013 Electronic Journal of Alied Statistical Analysis EJASA, Electron. J. A. Stat. Anal. htt://siba-ese.unisalento.it/index.h/ejasa/index e-issn: 27-5948 DOI: 1.1285/i275948v6n213 Estimation of Parameters of

More information

Maximum Entropy and the Stress Distribution in Soft Disk Packings Above Jamming

Maximum Entropy and the Stress Distribution in Soft Disk Packings Above Jamming Maximum Entroy and the Stress Distribution in Soft Disk Packings Above Jamming Yegang Wu and S. Teitel Deartment of Physics and Astronomy, University of ochester, ochester, New York 467, USA (Dated: August

More information

Chapter 13 Variable Selection and Model Building

Chapter 13 Variable Selection and Model Building Chater 3 Variable Selection and Model Building The comlete regsion analysis deends on the exlanatory variables ent in the model. It is understood in the regsion analysis that only correct and imortant

More information

A Bound on the Error of Cross Validation Using the Approximation and Estimation Rates, with Consequences for the Training-Test Split

A Bound on the Error of Cross Validation Using the Approximation and Estimation Rates, with Consequences for the Training-Test Split A Bound on the Error of Cross Validation Using the Aroximation and Estimation Rates, with Consequences for the Training-Test Slit Michael Kearns AT&T Bell Laboratories Murray Hill, NJ 7974 mkearns@research.att.com

More information

PER-PATCH METRIC LEARNING FOR ROBUST IMAGE MATCHING. Sezer Karaoglu, Ivo Everts, Jan C. van Gemert, and Theo Gevers

PER-PATCH METRIC LEARNING FOR ROBUST IMAGE MATCHING. Sezer Karaoglu, Ivo Everts, Jan C. van Gemert, and Theo Gevers PER-PATCH METRIC LEARNING FOR ROBUST IMAGE MATCHING Sezer Karaoglu, Ivo Everts, Jan C. van Gemert, and Theo Gevers Intelligent Systems Lab, Amsterdam, University of Amsterdam, 1098 XH Amsterdam, The Netherlands

More information

Chapter 7 Sampling and Sampling Distributions. Introduction. Selecting a Sample. Introduction. Sampling from a Finite Population

Chapter 7 Sampling and Sampling Distributions. Introduction. Selecting a Sample. Introduction. Sampling from a Finite Population Chater 7 and s Selecting a Samle Point Estimation Introduction to s of Proerties of Point Estimators Other Methods Introduction An element is the entity on which data are collected. A oulation is a collection

More information