GaGa: a Parsimonious and Flexible Hierarchical Model for Microarray Data Analysis

Size: px
Start display at page:

Download "GaGa: a Parsimonious and Flexible Hierarchical Model for Microarray Data Analysis"

Transcription

1 GaGa: a Parsimnius and Fleible Hierarchical Mdel fr Micrarray Data Analysis David Rssell Department f Bistatistics M.D. Andersn Cancer Center, Hustn, TX 77030, USA rsselldavid@gmail.cm Abstract Bayesian hierarchical mdels are attractive fr micrarray data since they allw sharing infrmatin acrss genes and acrss different analyses in a cherent manner. Kendzirski et al. (2003) and Newtn et al. (2004) intrduced the gamma-gamma hierarchical mdel. The mdel parsimniusly describes the epressin f thusands f genes with a small number f hyper-parameters. This makes the mdel easy t interpret and analytically tractable. Hwever, we find imprtant limitatins f the mdel when fitting real datasets. The limitatins are due t sme f the assumptins being t restrictive. We prpse a simple etensin f the mdel that imprves the fit substantially with almst n increase in cmpleity. The mdel allws cmparing nt nly mean epressin between grups but als the distributinal shape, which we argue t be f bilgical relevance. We prpse a secnd etensin that uses a miture f gamma distributins t further imprve the fit, at the epense f increased cmputatinal burden. We prpse several apprimatins that significantly reduce the cmputatinal cst. We use ur apprach fr inference abut differential gene epressin and fr class predictin, bth in simulated and real datasets. We find that bth etensins are preferable t the riginal frmulatin f the mdel, and that they prvide advantages ver several ther ppular methds, especially fr small samples. 1 Intrductin Tw cmmn inference prblems with micrarray data are differential epressin analysis, i.e. the cmparisn f sme measure f gene epressin between grups, and class predictin, i.e. the predictin f an unknwn grup label fr a new sample. Fr bth prcedures, ne f the challenges is that the number f genes greatly eceeds the number f replicated measurements that are btained fr each gene. That is, data is abundant at an verall level but it is scarce at the gene level, and therefre there is much ptential fr methds that allw fr the sharing f infrmatin acrss genes. T whm crrespndence shuld be addressed 1

2 Hierarchical mdels naturally allw fr the sharing f infrmatin between genes. Typical eamples are Lönnstedt and Speed (2002) and Smyth (2004), wh mdeled genespecific parameter estimates via hierarchical empirical Bayes methds t btain imprved testing prcedures. Kendzirski et al. (2003), Newtn et al. (2001) and Newtn and Kendzirski (2003) prpsed hierarchical mdels that depend n few parameters i.e. they greatly reduce the dimensinality f the prblem. This feature is particularly imprtant fr small sample sizes. We build n the gamma-gamma hierarchical mdel f Kendzirski et al. (2003). The mdel is used etensively in recent literature (Parmigiani et al., 2003; Müller et al., 2004; Zha et al., 2005; Chigna et al., 2007). The gamma-gamma mdel assumes that the bservatins fr each gene arise frm a gamma distributin with cmmn shape parameter acrss all genes and a scale parameter that arises frm a hierarchical gamma prir. Since the mdel uses a single gamma prir, we refer t it as the Ga mdel. We find the gamma chice appealing, fr it is a fleible family that can capture a variety f distributinal shapes. In this paper we shw that, althugh this mdel is elegant and parsimnius, it fails t prvide an adequate fit in real datasets, and that this is partly due t the assumptin that the shape parameter is cmmn acrss genes. We prpse a simple etensin f the mdel that specifies a gamma prir n bth the shape and the inverse mean parameters (GaGa mdel). The etensin is still parsimnius, requiring nly ne additinal hyper-parameter, and it can be fit bth via empirical Bayes and a fully Bayesian apprach. We develp an algrithm that implements psterir inference with a cmputatinal effrt that is cmparable t the Ga mdel. We then develp a secnd etensin that specifies a gamma prir n the shape parameter and a miture f gamma prirs n the inverse mean (MiGaGa mdel). This prvides additinal fleibility, albeit at the epense f reduced mdel parsimny and increased cmputatinal cst. An advantage f the GaGa and MiGaGa hierarchical mdels is that they allw t cmpare bth the shape and mean parameters between grups. That is, ne nt nly cmpares mean epressin levels but als the distributinal shape. This is imprtant because, as argued by Lapinte et al. (2004) and Cmbes et al. (2007), the latter cmparisn may be bilgically mre meaningful than cmparing mean epressins. As an eample, cnsider the case f cancer bimarkers. Many mutatins, deletins and translcatins affect nly a prprtin f the diseased individuals. When cmparing nrmal and cancer cells, nly that prprtin ehibit a mdified epressin pattern, i.e. the tail behavir r the variability will be different acrss grups even thugh the mean epressin may be almst unchanged. The paper is structured as fllws. In Sectin 2 we review the Ga mdel and we etend it t the GaGa mdel. We derive epressins fr psterir prbabilities f interest and an MCMC sampling scheme t fit the mdel. In Sectin 3 we prpse as a further generalizatin the MiGaGa. Fr bth etensins, the psterir distributins f the gamma shape parameters are knwn nly up t a cnstant. We refer t this distributin, which t ur knwledge has nt been described befre, as the gamma shape distributin. In Sectin 4 we derive useful apprimatins fr this distributin. In Sectin 5 we eplain hw t find differentially epressed (DE) genes and perfrm class predictin. In Sectin 6 we apply ur apprach t simulated data and real datasets. Finally, in Sectin 7 we present sme cncluding remarks. The methds described in this paper are implemented in the R library gaga. 2

3 2 The GaGa mdel We assume that the data has been backgrund crrected, nrmalized and quantified in a sensible manner (Dudit et al., 2002). Let ij be the measure f epressin fr gene i, i = 1... n, in micrarray j, j = 1... J. Let z j {1... K} indicate a grup membership, e.g. z j = 1 fr nrmal cells and z j = 2 fr cancer cells. We dente the vectr f bservatins fr gene i as i and the whle dataset as. We use Ga( ) t dente a gamma distributin, IGa( ) fr the inverse gamma, Mult( ) fr the multinmial, Dirichlet( ) fr the Dirichlet and GaS( ) fr the gamma shape distributin. The GaS distributin is defined in Sectin 4. In the differential epressin prblem the investigatr is interested in determining the epressin pattern that each gene fllws. This inference prblem can be viewed as a hypthesis testing prblem. Thrughut we use the terms hypthesis and epressin pattern interchangeably. A simplest setup is having K = 2 grups and 2 hyptheses: pattern 0 under which bth grups are equally epressed (null hypthesis) and pattern 1 under which they are differentially epressed (alternative hypthesis). Fr K > 2 we may want t cnsider mre than 2 patterns. Fr eample, if grup 1 crrespnds t nrmal cells, grup 2 t cells with type A cancer and grup 3 t type B cancer, ne may be interested in assigning each gene t ne f the fllwing patterns: Pattern 0: Nrmal = Cancer A = Cancer B Pattern 1: Nrmal Cancer A = Cancer B Pattern 2: Nrmal Cancer A Cancer B. (1) Dente by H the number f hyptheses, and let the latent variable δ i {0, 1... H 1} indicate the true epressin pattern fr gene i. We refer t genes with δ i = 0 as equally epressed (EE) and genes with δ i 0 as differentially epressed (DE), and we dente δ = (δ 1... δ n ). 2.1 The mdel The Ga mdel (Kendzirski et al., 2003; Newtn et al., 2001; Newtn and Kendzirski, 2003), assumes that ij are independent realizatins frm Ga(α i,zj, λ i,zj ) (i.e. the mean is α i,zj /λ i,zj ). The mdel assumes δ i Mult(1, π), fies α i,zj = α fr all i, j and specifies the hierarchical prir λ i,zj Ga(α 0, ν) fr all distinct scale parameters under pattern δ i. Here (α 0, ν, α, π) are hyper-parameters cmmn t all genes. Fr δ i = 0 (EE genes) we have λ i,1 =... = λ i,k, and fr δ i 0 sme f the λ i,zj are different frm each ther, accrding t the specificatin f the hyptheses. The Ga mdel impses the restrictin that 1/ α i,zj, the within-grups cefficients f variatin (CV), must be cnstant acrss all genes and grups. The assumptin is analytically cnvenient, but we have fund it nt t be reasnable in typical datasets. Figure 1 shws empirical CVs fr tw datasets described in Sectin 6. The CVs ehibit a skewed distributin with a substantial amunt f variability that cntradicts that CVs are cnstant. Figure 1 highlights the genes declared DE by the Ga mdel in bth datasets. These are ften genes with abve average CV. This is due, we believe, t the cnstant CV assumptin, which makes the Ga mdel view atypical CVs as evidence fr differential epressin. 3

4 Cefficient f variatin Cefficient f variatin Mean epressin (lg scale) Mean epressin (lg scale) Figure 1: Sample mean and CV fr each gene ( dentes genes declared DE by the Ga mdel). (a): Guld dataset; (b): Armstrng dataset When impsing cnstant α i,zj ne can nly cmpare the scale parameters λ i,zj between grups, while in practice it can be mre relevant bilgically t cmpare the full distributin (Lapinte et al., 2004; Cmbes et al., 2007). We prpse a generalizatin that addresses this limitatin. We intrduce gene and pattern-specific shape parameters α i,zj and assume ij Ga(α i,zj, α i,zj /λ i,zj ) (i.e. λ i,zj is the mean), with the fllwing hierarchical prir λ i,k δ i, α 0, ν IGa(α 0, α 0 /ν), indep. fr i = 1... n α i,k δ i, β, µ Ga(β, β/µ), indep. fr i = 1... n, (2) and a prir fr δ i as befre. We refer t (2) as the GaGa mdel. As in the Ga mdel, the values f (α i1, λ i1 )... (α ik, λ ik ) are tied when δ i = 0, whereas under δ i 0 sme f them are different frm each ther (althugh they still arise frm the same marginal distributin). The GaGa mdel replaces the hyper-parameter α f the Ga mdel by the pair (β, µ). That is, the additinal fleibility is achieved with nly ne mre hyperparameter. We cmplete the Bayesian mdel with hyper-prirs: α 0 Ga(a α0, b α0 ); ν IGa(a ν, b ν ); β Ga(a β, b β ); µ IGa(a µ, b µ ); π Dirichlet(p). (3) We believe that eliciting prir distributins can be advantageus in micrarray studies, since there usually is sme degree f prir knwledge. Fr eample, the investigatr may have an idea abut what prprtin f DE genes t epect. As anther eample, many nrmalizatin prcedures result in values that are belw 15 n the lg-scale. The use f hyper-prirs is nt critical. Alternatively, (α 0, ν, β, µ, π) can be fied by an empirical Bayes argument, using an epectatin-maimizatin algrithm cmpletely 4

5 analgus t that fr the Ga mdel (Kendzirski et al. (2003), Appendi). We implement bth methds in ur gaga library. Micrarray datasets are strngly infrmative abut the parameters in (3), since they are cmmn t all genes. In the datasets in Sectin 6 we fund fairly similar results fr tw reasnable prir specificatins and the empirical Bayes methd. 2.2 Psterir distributins We derive the psterir distributin f the first-stage parameters, assuming fied hyperparameters ω = (α 0, ν, β, µ, π). Frm (2) we see that, cnditinal n ω, the gene-specific parameters (δ i, α i1... α ik, λ i1... λ ik ) are independent a psteriri acrss genes i = 1... n. Therefre, it suffices t derive the psterir fr each gene separately. We dente the vectr f parameters fr a single gene as λ i = (λ i,1... λ i,k ), α i = (α i,1... α i,k ) and we let λ = (λ 1... λ n ), α = (α 1... α n ) be the cllectin f these parameters. Let N δi be the number f grups that are distinct under pattern δ i. In ur eample in (1) we have H = 3 patterns: under pattern 0 we have N 0 = 1 distinct grups, and similarly N 1 = 2, N 2 = 3. Let S i,δi,k fr i = 1... n, δ i = 0... H 1 and k = 1... N δi be the sum f bservatins frm gene i that under pattern δ i crrespnd t the k th distinct grup, P i,δi,k be the prduct f the same bservatins and J i,δi,k be the number f terms in the sum. In ur eample S 10,0,1 dentes the sum f all bservatins frm gene 10 (since under pattern 0 there is nly ne distinct grup), S 10,1,1 dentes the sum f bservatins frm nrmal samples (since it is the first distinct grup under pattern 1) and S 10,1,2 the sum frm cancers f type A and B. The psterir prbability that gene i fllws epressin pattern l, which we dente as v il, is given by v il = P (δ i = l, ω) f( i δ i = l, ω)π l fr l = 0... H 1, where [ (α0 /ν) α 0 (β/µ) β f( i δ i, ω) = Γ(α 0 )Γ(β) ] Nδi N δi k=1 1 C(J i,δi,k, β, β/µ lg(p i,δi,k), α 0, α 0 /ν, S i,δi,k), (4) and C( ) is the gamma shape nrmalizatin cnstant, defined in Sectin 4. The psterir distributin f (α ik, λ ik ) cnditinal n δ i is α i,k δ i, ω, GaS(J i,δi,k, β, β/µ lg(p i,δi,k), α 0, α 0 /ν, S i,δi,k) λ i,k α i,k, δ i, ω, IGa(α i,k J i,δi,k α 0, α 0 /ν α i,k S i,δi,k). (5) Fr any given ω, (5) can be used t btain psterir credibility intervals in the usual fashin. Nte that α i,k and λ i,k are nt cnditinally independent a psteriri given (δ i, ω) as they are a priri. 5

6 2.3 Mdel fitting The psterir distributin f ω = (α 0, ν, β, µ, π) given (δ, α, λ) and is characterized as fllws. We find ( n ) α 0 δ, α, λ, GaS N δi, a α0, b α0 S λ, a ν, b ν, S λ ( ν α 0, δ, α, λ, IGa a ν α 0 ) n N δi, b ν α 0 S λ. (6) The hyper-parameters (β, µ, π) are cnditinally independent f (α 0, ν) given (δ, α, λ, ), with ( n β δ, α, λ, GaS ( µ β, δ, α, λ, IGa a µ β N δi, a β, b β S α, a µ, b µ, S α ) ) n N δi, b µ βs α (7) and π δ, α, λ, Dirichlet ( p 1 n I(δ i = 0),..., p H 1 ) n I(δ i = H 1) (8) cnditinally independent f (β, µ). Here S λ = n Nδi k=1 α i,k and S α = n S α = n α i,k. Nδi k=1 λ i,k, S λ = n Nδi k=1 1/λ i,k, Nδi k=1 lg(α i,k) are sums ver all distinct λ i,k and Tgether with the psterirs given in Sectin 2.2, this allws us t implement a Gibbs sampling scheme t fit the mdel (Gelfand and Smith, 1990). The Gibbs sampler is defined by iterative sampling f (δ, α, λ) given ω, and sampling ω given (δ, α, λ). 3 The MiGaGa mdel The GaGa mdel addresses the prblem illustrated in Figure 1 by allwing varying CVs acrss genes. Hwever anther limitatin remains. A well-knwn feature f the GCRMA nrmalizatin prcedure (Wu et al., 2004) is that it creates a distinctly bimdal distributin fr the gene epressins. Figure 2(a) shws the empirical distributin f ij fr the Armstrng dataset (see Sectin 6.2), and cmpares it with the prir-predictive under the GaGa mdel. The mdel des nt capture the bimdality. T address this limitatin we intrduce a further generalizatin, by letting λ i,k arise frm a miture λ i,k δ i, ρ, α 0, ν M ρ m IGa(α 0m, α 0m /ν m ) m=0 ρ Dirichlet(r) (9) 6

7 (a) (b) Prbe 1110_at Density Observed data GaGa MiGaGa Density ALL MLL Epressin levels (lg scale) Epressin levels (lg scale) Figure 2: Armstrng dataset. (a): marginal distributin f data vs. prir predictive f GaGa and MiGaGa with M = 2 and ω = ˆω; (b): gene with evidence fr change in shift and n change in lcatin. Psterir predictive under GaGa mdel with 10 arrays per grup. ( indicates the 24 ALL bservatins; the 18 AML) and specifying the fllwing prirs α 0m Ga(a α0, b α0 ), fr m = 1... M indep. ν m IGa(a ν, b ν ), fr m = 1... M indep. (10) The rest f the mdel is as in (2) and (3). The statement f psterir distributins and the Gibbs sampler are largely analgus t that fr the GaGa prir. The main difference is that fr the MiGaGa ne intrduces latent variables indicating the cluster t which each gene belngs. Cmpared t the GaGa prir, the additinal fleibility in MiGaGa ptentially allws us t btain a better fit t the data, albeit this cmes at the cst f increased mdel cmpleity and cmputatinal burden. Figure 2(a) shws hw the MiGaGa prir predictive imprves the GaGa fit substantially. 4 The Gamma shape distributin We define the distributin that arises as the psterir f the shape parameter f the gamma distributin under independent sampling and a gamma prir, as in (2). We assume that the gamma distributin is indeed by the shape and mean parameters. This distributin, which we refer t as gamma shape distributin, has nt been described befre. It is similar t the distributin that arises when the parameterizatin is in terms f the shape and scale parameters (Damsleth, 1975; Miller, 1980). T simplify ntatin 7

8 we dente by y a psitive cntinuus randm variable that fllws this distributin. Its prbability density functin, indeed by the parameters a 0, b, d, r, s > 0, c > alg(s/a), can be written as: ( ) ayd Γ(ay d) y f(y a, b, c, d, r, s) = C(a, b, c, d, r, s) y b d 1 e yc I(y > 0), (11) Γ(y) a r sy where C(a, b, c, d, r, s) is the nrmalizatin cnstant and Γ( ) is the gamma functin. Fr a = d = 0, (11) simplifies t a gamma distributin. In general, t btain randm draws frm (11) r t evaluate C(a, b, c, d, r, s) ne has t resrt t numerical methds. This is impractical in ur setup, since the psterir simulatin requires repeated and cmputatinally efficient simulatin frm a GaS distributin. Apprimatins are required t decreases the cmputatinal burden. We start by deriving an apprimatin t (11) that is apprpriate fr large values f y. By apprimating Γ( ) with Stirling s frmula and evaluating the limit f the resulting epressin as y we find that (11) is apprimately prprtinal t y a/2b 1/2 1 ep{ y(c alg(s/a))}. (12) One can btain apprimate randm draws frm (11) by drawing frm a Ga(a/2 b 1/2, calg(s/a)). T apprimate C(a, b, c, d, r, s), dente as g(y) the prbability density functin f the gamma apprimatin, and let m be its mde. Evaluating g and (11) at m gives Γ(m) a C(a, b, c, d, r, s) g(m) Γ(am d) ( ) (amd) m m bd1 e mc. (13) r sm Figure 3 shws the gamma shape distributin and its gamma apprimatin fr tw randmly selected cases that were encuntered in psterir simulatins fr the MiGaGa mdel. The density in panel (a) arises as the psterir f the shape parameter fr a single gene with a sample size f 5 bservatins per grup, whereas panel (b) results as the psterir f a parameter shared by a large number f genes i.e. it represents a situatin with a large sample size. In bth cases the apprimatin is very clse. In the micrarray datasets that we have analyzed s far the apprimatin wrked well. In sme rare cases we detected that the mde f the apprimatin did nt match that f (11) (indicated by the first derivative f lg [f(y a, b, c, d, r, s)] nt being clse t zer). In these cases we used a few Newtn-Raphsn steps t lcate the mde and used the gamma apprimatin that matches the lcatin f the mde as well as the value f the secnd derivative f lg [f(y a, b, c, d, r, s)] evaluated at the mde. 5 Inference 5.1 Differentially epressed genes We frmalize inference fr differential epressin by minimizing the Bayesian false negative rate (BFNR) subject t an upper bund α n the Bayesian false discvery rate (BFDR) (Müller et al., 2007). Briefly, BFNR is the psterir epected prprtin f 8

9 (a) (b) Density Gamma shape Gamma appr Density Gamma shape Gamma appr y y Figure 3: Gamma apprimatin t the gamma shape distributin. Parameter values are (a): a=10; b=0.90; c= ; d= ; r= ; s=65.02 (b): a=1532,b=.16,c=3469,d=.16,r=.016,s=159.5 genes declared EE (i.e. assigned t pattern 0) that are actually DE (i.e. nt fllw pattern 0), and BFDR is the epected prprtin f genes declared DE that are actually EE. This definitin remains valid fr mre than tw hyptheses. The Bayes rule is t declare a gene as DE whenever its psterir prbability f DE is abve a certain threshld. The result etends trivially t ur multiple hyptheses setup with a slight adjustment: given that a gene is nt classified int pattern 0, we prpse assigning it t the pattern with the highest psterir prbability. That is, fr given BFDR and BFNR we maimize the number f genes crrectly classified int their epressin pattern. Since the psterir prbabilities in Sectin 2.2 are derived under an assumed prbability mdel, deviatins frm the assumptins may result in pr perfrmance f the prcedure. We prpse t assess its frequentist perating characteristics fllwing the btstrap scheme intrduced by Strey (2007), which allws t estimate the frequentist FDR fr any given α. In Sectin 6.3 we apply this prcedure t a real dataset. Fr mre details, see Strey (2007) and the supplementary material at T ease the cmputatinal burden f the btstrap-based prcedure, we use an apprimatin. Instead f using P (δ i = l ) we use v il = P (δ i = l, ˆω) as given in (4), where ˆω is the psterir mean f ω. We have fund this strategy t deliver very similar results t thse frm the eact versin f the algrithm, but at a much lwer cmputatinal cst. 9

10 5.2 Class predictin We set the gal f maimizing the number f future samples = ( 1... n) that are crrectly classified as type z. The Bayes rule is t assign the new sample t the type k that has the highest psterir prbability P (z = k, ). As in Sectin 5.1, we use the apprimatin u k = P (z = k,, ˆω) f( z = k, ˆω, )P (z = k), where ˆω is the psterir mean f ω, f( z = k, ˆω, ) is the predictive distributin fr the measurements f a sample f type k and P (z = k) is the prir prbability. The prir prbabilities can be based, fr eample, n the prevalence f the disease in the ppulatin under study, the presence f risk factrs r the utcme f previus tests. Fr the GaGa mdel we find where f( z = k, ˆω, ) = f( i z = k,δ i = l, ω, i ) = 1 i n H 1 l=0 f( i z = k, δ i = l, ˆω, i )v il, (14) C(J i,δi,k, β, β/µ lg(p i,δi,k), α 0, α 0 /ν, S i,δi,k) C(J i,δi,k 1, β, β/µ lg(p i,δi,k) lg( i ), α 0, α 0 /ν, i S i,δ i,k). (15) An interesting feature f (14) is that the classifier weights the cntributin f each gene accrding t the psterir prbabilities v il, and that in particular genes with zer psterir prbability f being DE d nt cntribute t the classifier. This is imprtant frm a practical standpint, since frequently ne wants t use nly a subset f the genes. The classifier is rbust with respect t hw many genes are chsen. Similar epressins can be btained fr the MiGaGa mdel by averaging accrding t the cluster weights ρ. 6 Results We assess the perfrmance f the GaGa and MiGaGa mdels in simulated and real data. In Sectin 6.1 we revisit the Guld dataset that (Kendzirski et al., 2003) riginally used t illustrate the Ga mdel. In Sectin 6.2 we analyze the leukemia dataset f Armstrng et al. (2002), and in Sectin 6.3 we cnduct tw simulatin studies based n this dataset. In all analyses we tried tw different prir specificatins. Under the first prir we use a α0 = b α0 = a ν = b ν = a β = b β = a µ = b µ = and all the elements f p and r equal t 0.1. Secnd, we defined a slightly mre infrmative prir taking int accunt that lgepressin levels are rarely abve 15 and that cefficients f variatin (CV) are usually centered arund with large variance. We then fund prir parameter values that were cnsistent with this infrmatin and that at the same time allwed fr substantial prir uncertainty: a α0 = , b α0 = 10 4, a ν = 0.016, b ν = , a β = 0.004, b β = 10 3, a µ = and b µ = 20. The GaGa mdel yielded very similar parameter estimates and lists f differentially epressed genes under bth prirs. The MiGaGa mdel was slightly mre sensitive t the prir specificatin, with the nn-infrmative prir resulting in a higher psterir epectatin fr π 0 and a shrter list f DE genes. We present nly the results arising frm the mre infrmative prir. T fit the mdel we btain 5,000 psterir samples, assess the cnvergence with trajectry plts and save 10

11 the Mnte Carl when cnvergence is judged t have been reached (typically well befre 1,000 samples fr the GaGa mdel and 2,500 samples fr the MiGaGa). We run the Markv chain a secnd time with different starting values and verified that it cnverged t the same target distributin. Fr differential epressin analysis, we cmpare ur methdlgy with the empirical Bayes prcedure f Smyth (2005), adjusting the p-values via the beta-unifrm miture apprach Punds and Mrris (2003) (EBayes-BUM), and with tw-sample Wilcn tests adjusting p-values bth via BUM (Wilcn-BUM) and the Benjamini and Hchberg (1995) methd (Wilcn-BH). We als perfrm class predictin, cmparing ur methdlgy with Fisher s linear discriminant analysis (LDA). We use EBayes as implementated in the R/Bicnductr functins lmfit and ebayes, BUM as implemented in Bum and BH as in p.adjust (libraries limma (Smyth, 2005), ClassCmparisn (Cmbes, 2005) and stats, respectively). All methds were set up t cntrl the FDR belw Guld data We used the already pre-prcessed versin f the Guld dataset prvided with the R library EBarrays. We fit the mdel t lg-transfrmed data, since this reduced the effect f utliers and resulted in a better perfrmance f the mdel. Data is available fr 5,000 genes and 4 inbred lines: 2 parental and 2 ffspring. The parental lines are a Cpenhagen (COP) rat strain resistant t mammary carcingenesis and a Wistar-Furth (WF) rat strain that is highly susceptible. The tw ffspring lines are btained by crssing the parental lines, in such a way that they are hmzygus COP/COP thrughut the genme ecept fr a small regin in which they are hmzygus WF/WF. In ne f the ffspring lines the COP/COP regin is apprimately 30cM (line CI) and in the ther it is 1.5cM (line CII). Therefre, it seems reasnable t epect gene epressin fr the CI and CII lines t be smewhere between COP and WF. Further, CI shuld be clser t the COP line than CII while CII shuld be clser t WF. The dataset cntains 1 micrarray fr the COP grup, 2 fr CI, 5 fr CII and 2 fr WF. T illustrate ur apprach we analyze a randmly chsen subset cntaining 2 micrarrays frm CI, CII and WF (we ignre the COP grup fr lack f replicates). This will allw us t assess hw well the mdel fits the 3 CII micrarrays that are nt used t fit the mdel. Fr each gene, the eperimenters cnsidered fur epressin patterns: Mdel fit Pattern 0: CI = CII = WF Pattern 1: CI CII = WF Pattern 2: CI = CII WF Pattern 3: CI CII WF. (16) The Ga mdel estimates ˆα 0 = , ˆν = 0.463, ˆπ = (0.980, 0.001, 0.001, 0.018) and fies ˆα i,j = ˆα = The GaGa etensin yields psterir means ˆα 0 = , ˆν = 0.152, ˆβ = 1.089, ˆµ = and ˆπ = (0.999, 0, 0.001, 0). That is, the Ga mdel views 98.0% f the genes as being EE, while fr the GaGa mdel it is a 99.9%. Als, Ga assumes a 11

12 (a) (b) Ga mdel GaGa mdel Prbe rc.ai at CI CII WF Prbe rc.ai at CI CII WF Prbe rc.ai at Prbe rc.ai at Figure 4: Guld data. Observed epressin values vs. predictive distributin fr the tw genes with highest psterir prbability f DE accrding t the Ga mdel. Large black symbls are actual bservatins, small gray symbls are draws frm the psterir predictive. cnstant within-grups CV f 1/ = 0.047, while GaGa estimates it t vary acrss genes as a 1/ Ga(1.089, ), indicating that the CVs are nt cnstant. In Figure 1(a) we detected lack f fit f the Ga mdel at an verall level which the GaGa mdel vercmes. Net we cnsider the fit fr individual genes. In particular, ne shuld make sure that the fit is reasnable fr the genes that are declared t be DE. Otherwise the inference wuld be suspect. We select the tw genes are deemed the mst interesting by the Ga mdel, i.e. thse with lwest prbability f being EE. We cmpare their bserved epressin levels with draws frm the psterir predictive distributin fr these tw genes. Figure 4(a) reveals that the Ga mdel seriusly underestimates the variability, inaccurately predicting that the bservatins fall int 3 clearly separated grups. Figure 4(b) presents the same plt fr the GaGa mdel. Here the mdel-based predictins apprpriately reflect the variability. We can n lnger see any separatin between the 3 grups. We cnclude that these genes are fund by Ga due t mdel lack-f-fit Differential epressin analysis The Ga mdel allcates 2 genes t pattern 1, 1 t pattern 2 and 78 t pattern 3 while GaGa allcates all genes t pattern 0. Under the GaGa mdel the largest prbability f DE fr any gene is As we saw in Figure 1, the genes fund by the Ga mdel tend t be thse with CV abve average. Fr cmparisn, perfrming F-tests via EBayes-BUM did nt identify any DE genes either. 12

13 6.1.3 Class predictin Since there is little evidence that any gene is differentially epressed, we d nt epect this dataset t allw us t build a gd classifier. We cnducted a small simulatin study that cnfirmed the lack f predictive pwer, regardless f hw many genes were used t build the classifier. 6.2 Armstrng data The data, btained frm cnsisted f 24 Affymetri U95A arrays frm acute lymphblastic leukemia (ALL) samples, 18 U95A arrays frm lymphblastic leukemia with MLL translcatins (MLL), and 2 U95Av2 arrays als frm the MLL grup. The U95Av2 arrays were btained at a later date than the rest, pssibly under different eperimental cnditins, s we ecluded them frm the analysis. The dataset als cntained samples with acute myelgenus leukemia, but fr illustratin we restrict attentin t the ALL and MLL grups. The data was backgrund crrected, nrmalized and summarized using the functin just.gcrma frm the R library gcrma (Wu and Irizarry, 2007). Nte that different pre-prcessing algrithms result in different distributinal shapes fr the bserved gene epressin quantificatins, and hence the quality f the mdel fit can be affected by the chice f pre-prcessing methd. T eplre the effect f the distributinal shape, in additin t analyzing the GCRMA nrmalized data we apply a mntnic transfrmatin t enfrce unimdality and we analyze it with a GaGa mdel. Within each micrarray, the transfrmatin maps sample quantiles t the crrespnding quantiles f a gamma distributin with matching mean and variance Mdel fit Figure 1(b) reveals a vilatin f the cnstant CV assumptin f the Ga mdel, and that the mdel tends t flag genes with large CVs as DE. Figure 2(a) shws that a MiGaGa fit with M = 2 cmpnents describes the data better than a single-cmpnent GaGa. An analgus plt shws hw the mntne transfrmatin imprves the fit f the GaGa mdel substantially. This plt and further assessment f gdness-f-fit can be fund in supplementary material at T study the behavir f the methds under small sample sizes and evaluate the reliability f the results, we start by fitting the mdel t 5 randmly chsen arrays frm each grup. We then add 5 mre arrays per grup, then 10 and finally we analyze the full dataset. Fr the GCRMA data with 5 arrays per grup the GaGa psterir means are ˆα 0 = 4.520, ˆν = 0.314, ˆβ = 0.826, ˆµ = and ˆπ = (0.954, 0.046). That is, the genespecific shape parameters are estimated t arise frm a gamma with mean and standard deviatin With mre than 5 arrays similar (ˆα 0, ˆν, ˆβ, ˆµ) were btained. Hwever, ˆπ 1 increased t 0.121, and when analyzing 10 arrays, 15 arrays and the full dataset, respectively. The MiGaGa estimates behaved in a similar manner, and s did the GaGa estimates fr the mntnically transfrmed data, btaining similar ˆπ 1 in all mdels. Since we did nt bserve this phenmenn n simulated data, we believe that it is due t sme cmpnent f the mdel being miss-specified, e.g. assuming cnditinal 13

14 5 arrays 10 arrays 15 arrays All data # DE % rep. # DE % rep. # DE % rep. # DE GaGa MiGaGa (M =2) GaGa (transf.) EBayes-BUM Wilcn-BUM Wilcn-BH Table 1: Gene discveries in the Armstrng dataset. # DE: number f genes declared DE; % rep.: percentage f # DE als fund when analyzing the full dataset. ODP reprts the mean f tw analyses, each using B=100 permutatins. independence f δ i given ω. In ur eperience, in real datasets many methds prvide estimated prprtins f DE genes that change widely with sample size. Fr instance, the EBayes-BUM and Wilcn-BUM estimates increase frm t and frm t 0.333, respectively. Cmpared t this ther tw prcedures, under the GaGa and MiGaGa mdels ˆπ 1 is relatively stable acrss sample size Differential epressin analysis Table 1 shws the number f genes declared DE when analyzing a subset f 5, 10 and 15 arrays per grup, as well as the full dataset. The table als prvides the percentage f reprducibility, i.e. hw many amngst thse genes were fund again when analyzing the full dataset. Fr instance, with 5 arrays per grup MiGaGa fund 339 genes, 79.6% f which were cnfirmed in the full data. We see that the three variants f ur mdel find mre genes than EBayes-BUM, Wilcn-BUM and Wilcn-BH, with a reprducibility arund 80%. The reprducibility f the cmpeting methds is very high fr small sample sizes but it diminishes as mre data becmes available. In fact, they find fewer DE genes in the full dataset than in a subset with 15 arrays per grup. One shuld be careful in cmparing the number f genes detected by each methd, fr ur mdels cmpare nt nly mean epressin but als the distributinal shape between ALL and MLL samples. T highlight this feature inspect the 1096 genes that were reprted as DE under the GaGa mdel with 10 arrays per grup. 2(b) shws the predictive distributin fr a gene that has 95% credibility interval fr λ i1 λ i2 cntaining 0 and the interval fr α i1 /α i2 nt cntaining 1. This is a gene with strnger evidence f a difference in shape between grups than in lcatin. The bserved data, which includes the 22 samples nt used t fit the mdel, indeed suggests a larger tail fr ALL than fr AML samples Class predictin We select the 100 genes with the largest psterir prbability f being differentially epressed accrding t each f the three mdels (GaGa, MiGaGa and GaGa applied t the transfrmed data) when fit with 5 bservatins per grup, and predict the class fr the rest f the dataset. The three mdels crrectly classify all samples. The same result is 14

15 fund when nly using the tp 10 genes t build the classifier. Fr cmparisn, using Fisher s Linear Discriminant Analysis with the 5 genes having smallest p-values accrding t Wilcn (BH) crrectly classifies 16 ALL and 13 MLL samples. 6.3 Simulatin study We cnduct tw simulatin studies. First, we assess the frequentist perating characteristics f the Bayesian prcedure. Fr this purpse, we apply the btstrap prcedure described in Strey (2007) t the Armstrng dataset, s that in the generated data all genes are equally epressed. We then cmpute psterir prbabilities f differential epressin fr each gene (setting ω t its psterir mean) and apply the Bayes rule Müller et al. (2004), as described in Sectin 5.1, cntrlling the Bayesian FDR at 5%. We repeat this prcess 500 times and we btain an estimate fr the frequentist FDR, as described in Strey (2007). We find that the GaGa mdel apprpriately cntrls the frequentist FDR belw the desired 5%, bth when applied t the riginal and the mntnically transfrmed data. MiGaGa cntrlled the FDR belw 5% when analyzing 5, 10 and 15 arrays per grup, but when analyzing the full dataset the estimated frequentist FDR was 6.4%. Fr mre details, see the supplementary material at Fr the secnd simulatin study, we generate 12,626 gene epressin values fr 5 MLL and 5 ALL samples frm a GaGa mdel. We set ω t its estimated value fr the Armstrng dataset: α 0 = 4.520, ν = 0.314, β = 0.826, µ = and π = (0.954, 0.046). Fitting the GaGa mdel t the simulated data prvides the psterir means ˆα 0 = 4.630, ˆν = 0.312, ˆβ = 0.906, ˆµ = and ˆπ = (0.955, 0.045). All the 95% credibility intervals cntained the true value, ecept the ne fr β which was (0.882, 0.930). MiGaGa with M = 2 estimates ˆρ = (0.004, ), i.e. it crrectly assigns a negligible psterir prbability t ne f the clusters. The psterir means are ˆα = 4.618, ˆν = (fr the secnd cluster), ˆβ = 0.870, ˆµ = 514.6, ˆπ = (0.956, 0.044). GaGa and MiGaGa tagged 469 and 468 genes as differentially epressed, respectively, 4.5% f which were false psitives. EBayes-BUM fund 449 genes (7.1% false psitives), whereas Wilcn-BUM and Wilcn-BH did nt find any significant genes. 7 Discussin We have intrduced tw simple etensins f the Ga mdel. The GaGa mdel relaes the cnstant cefficient f variatin assumptin. This results in a parsimnius mdel, which substantially imprves the quality f the mdel fit and reliability f the resulting inference. The increased generality cmes at a negligible cmputatinal cst. We derived an apprimatin fr the psterir distributin f the gamma shape parameter that further simplifies cmputatin. The secnd etensin, the MiGaGa, increases the mdel fleibility by incrprating a miture prir, at the epense f mdel parsimny. In practice, a miture with as few as tw cmpnents may suffice t prvide a satisfactry fit, as we have illustrated with the Armstrng dataset. Further, we have shwn hw t imprve the quality f the GaGa fit by using a simple mntnic transfrmatin that guarantees unimdality f the marginal distributin f the data. 15

16 The hierarchical nature f the mdels allws fr the sharing f infrmatin acrss genes. In simulatins and in real data we have shwn hw bth GaGa and MiGaGa find mre genes than three cmpeting methds. Fr instance, when analyzing a subset with 5 arrays per grup frm the Armstrng dataset we detect arund 200 differentially epressed genes, while the mst that any cmpeting methd finds is 47. The fact that arund 80% f these genes were fund again when analyzing the full data gives us cnfidence that these are nt spurius findings. Als, we cnducted simulatins under the null hyptheses which estimated the FDR t be arund the desired 5%. The differences in the number f genes fund by each methd are partly due t the fact that they are testing different hyptheses. Our mdels nt nly seek differences in mean epressins as the cmpeting methds d, but als test fr differences in the distributinal shape. We believe that this may frequently be f bilgical interest, since many mutatins, deletins and translcatins affect nly a prprtin f the diseased individuals, and hence ne epects t see differences in the tail behavir between grups. Since ur mdels are built t be sensitive t tail behavir, the presence f utlying values can have an effect n the inference. If the eperimenter believes that differences in shape lack bilgical relevance and hence that this sensitivity is undesirable, ne can easily use the mdel utput t fcus n genes with lcatin shift. Fr instance, ne can cmpute psterir credibility intervals fr the difference between grup means and disregard thse genes fr which it cntains zer. In additin, we have shwn a fully Bayesian apprach fr class predictin. The apprach allws t specify prir prbabilities that take int accunt the prevalence f the disease under study and the utcme f previus tests, fr instance. In the Guld dataset the mdel revealed that the data lacked predictive pwer, while in the Armstrng dataset the prpsed apprach crrectly classified all the 32 samples that had nt been used t fit the mdel. In bth cases, we shwed hw the classifier is rbust with respect t the number f genes used t build it. As a limitatin, we have nt eplicitely mdeled the dependence between genes. In datasets with strng crrelatins, we epect that this may have a strnger effect n class predictin than n inference abut gene epressin. Nt learning abut the dependence structure als limits the use f the mdel in finding gene netwrks r gene interactins. Interesting future wrk will be t include dependence. Other pssibilities are etending the mdel t include cvariate infrmatin and study-specific randm effects, which wuld make it appealing fr meta-analysis purpses, r using the mdel fr sample size calculatin as in Müller et al. (2004) r sequential sample size calculatin. In the latter applicatin, the cmputatinal efficiency f the GaGa mdel shuld prve a majr asset. Acknwledgments We thank Peter Müller fr his very useful cmments. References S.A. Armstrng, J.E. Stauntn, L.B. Silverman, R. Pieters, M.L. Ber, M.D. Minden, E.S. Sallan, E.S. Lander, T.R. Glub, and S.J. Krsmeyer. Mll translcatins specify a 16

17 distinct gene epressin prfile that distinguishes a unique leukemia. Nature Genetics, 30:41 47, Y. Benjamini and Y. Hchberg. Cntrlling the false discvery rate: A practical and pwerful apprach t multiple testing. Jurnal f the Ryal Statistical Sciety B, 57: , M. Chigna, M.S. Massa, and C. Rmualdi. Effect f nrmalizatins n detecting differentially epressed genes with cdna micrarray eperiments. Technical reprt, Universita degli studi di Padva, Dipartiment di scienze statistiche, K. R. Cmbes, J. Wang, and K.A. Baggerly. A statistical methd fr finding bimarkers frm micrarray data, with applicatin t prstate cancer. Technical reprt, M.D. Andersn Cancer Center, URL utmdabtr00704.pdf. Kevin R. Cmbes. ClassCmparisn: Classes and methds fr class cmparisn prblems n micrarrays, R package versin 1.1. E. Damsleth. Cnjugate classes fr gamma distributins. Scandinavian Jurnal f Statistics, 2:80 84, S. Dudit, H.Y. Yang, M.J. Callw, and T.P. Speed. Statistical methds fr identifying differentially epressed genes in replicated cdna micrarray eperiments. Statistica Sinica, 12: , A.E. Gelfand and A.F.M. Smith. Sampling based appraches t calculating marginal densities. Jurnal f the American Statistical Assciatin, 85: , C.M. Kendzirski, M.A. Newtn, H. Lan, and M.N. Guld. On parametric empirical bayes methds fr cmparing multiple grups using replicated gene epressin prfiles. Statistics in Medicine, 22: , J. Lapinte, C. Li, J.P Higgins, M. Rijn, E. Bair, K. Mntgmery, M. Ferrari, L. Egevad, W. Rayfrd, U. Bergerheim, P. Ekman, A.M. DeMarz, R. Tibshirani, D. Btstein, P.O. Brwn, J.D. Brks, and J.R. Pllack. Gene epressin prfiling identifies clinically relevant subtypes f prstate cancer. Prceedings f the Natinal Academy f Science, 101: , I. Lönnstedt and T. Speed. Replicated micrarray data. Statistica Sinica, 12(1), R.B. Miller. Bayesian analysis f the tw-parameter gamma distributin. Technmetrics, 22:65 69, P. Müller, G. Parmigiani, C. Rbert, and J. Russeau. Optimal sample size fr multiple testing: the case f gene epressin micrarrays. Jurnal f the American Statistical Assciatin, 99: , P. Müller, G. Parmigiani, and K. Rice. FDR and Bayesian Multiple Cmparisns Rules. Ofrd University Press,

18 M.A. Newtn and C.M. Kendzirski. Parametric Empirical Bayes Methds fr Micrarrays. Springer Verlag, New Yrk, M.A. Newtn, C.M. Kendzirski, C.S Richmnd, F.R. Blattner, and K.W. Tsui. On differential variability f epressin ratis: Imprving statistical inference abut gene epressin changes frm micrarray data. Jurnal f Cmputatinal Bilgy, 8:37 52, M.A. Newtn, A. Nueriry, D. Sarkar, and P. Ahlquist. Detecting differential gene epressin with a semiparametric hierarchical miture mdel. Bistatistics, 5: , G. Parmigiani, E.S. Garett, R.A. Irizarry, and S.L. Zeger, editrs. The Analysis f Gene Epressin Data. Springer, S. Punds and S.W. Mrris. Estimating the ccurrence f false psitives and false negatives in micrarray studies by apprimating and partitining the empirical distributin f p-values. Biinfrmatics, 10: , G.K. Smyth. Linear mdels and empirical Bayes methds fr assessing differential epressin in micrarray eperiments. Statistical Applicatins in Genetics and Mlecular Bilgy, 3, G.K. Smyth. Limma: linear mdels fr micrarray data. In R. Gentleman, V. Carey, S. Dudit, R. Irizarry, and W. Huber, editrs, Biinfrmatics and Cmputatinal Bilgy Slutins using R and Bicnductr, pages Springer, New Yrk, J.D. Strey. The ptimal discvery prcedure: A new apprach t simultaneus significance testing. Jurnal f the Ryal Statistical Sciety B, 69: , J. Wu and J.M.J. Irizarry, R. with cntributins frm Gentry. gcrma: Backgrund Adjustment Using Sequence Infrmatin, R package versin Z. Wu, R.A. Irizarry, R. Gentleman, F.M. Murill, and F. Spencer. A mdel based backgrund adjustment fr lignucletide epressin arrays. Technical reprt, Jhns Hpkins University, Dept. f Bistatistics, Y. Zha, M.C. Li, and R. Simn. An adaptive methd fr cdna micrarray nrmalizatin. Biinfrmatics, 6:28,

SUPPLEMENTARY MATERIAL GaGa: a simple and flexible hierarchical model for microarray data analysis

SUPPLEMENTARY MATERIAL GaGa: a simple and flexible hierarchical model for microarray data analysis SUPPLEMENTARY MATERIAL GaGa: a simple and flexible hierarchical mdel fr micrarray data analysis David Rssell Department f Bistatistics M.D. Andersn Cancer Center, Hustn, TX 77030, USA rsselldavid@gmail.cm

More information

Bootstrap Method > # Purpose: understand how bootstrap method works > obs=c(11.96, 5.03, 67.40, 16.07, 31.50, 7.73, 11.10, 22.38) > n=length(obs) >

Bootstrap Method > # Purpose: understand how bootstrap method works > obs=c(11.96, 5.03, 67.40, 16.07, 31.50, 7.73, 11.10, 22.38) > n=length(obs) > Btstrap Methd > # Purpse: understand hw btstrap methd wrks > bs=c(11.96, 5.03, 67.40, 16.07, 31.50, 7.73, 11.10, 22.38) > n=length(bs) > mean(bs) [1] 21.64625 > # estimate f lambda > lambda = 1/mean(bs);

More information

Part 3 Introduction to statistical classification techniques

Part 3 Introduction to statistical classification techniques Part 3 Intrductin t statistical classificatin techniques Machine Learning, Part 3, March 07 Fabi Rli Preamble ØIn Part we have seen that if we knw: Psterir prbabilities P(ω i / ) Or the equivalent terms

More information

, which yields. where z1. and z2

, which yields. where z1. and z2 The Gaussian r Nrmal PDF, Page 1 The Gaussian r Nrmal Prbability Density Functin Authr: Jhn M Cimbala, Penn State University Latest revisin: 11 September 13 The Gaussian r Nrmal Prbability Density Functin

More information

AP Statistics Notes Unit Two: The Normal Distributions

AP Statistics Notes Unit Two: The Normal Distributions AP Statistics Ntes Unit Tw: The Nrmal Distributins Syllabus Objectives: 1.5 The student will summarize distributins f data measuring the psitin using quartiles, percentiles, and standardized scres (z-scres).

More information

Distributions, spatial statistics and a Bayesian perspective

Distributions, spatial statistics and a Bayesian perspective Distributins, spatial statistics and a Bayesian perspective Dug Nychka Natinal Center fr Atmspheric Research Distributins and densities Cnditinal distributins and Bayes Thm Bivariate nrmal Spatial statistics

More information

Biplots in Practice MICHAEL GREENACRE. Professor of Statistics at the Pompeu Fabra University. Chapter 13 Offprint

Biplots in Practice MICHAEL GREENACRE. Professor of Statistics at the Pompeu Fabra University. Chapter 13 Offprint Biplts in Practice MICHAEL GREENACRE Prfessr f Statistics at the Pmpeu Fabra University Chapter 13 Offprint CASE STUDY BIOMEDICINE Cmparing Cancer Types Accrding t Gene Epressin Arrays First published:

More information

Pattern Recognition 2014 Support Vector Machines

Pattern Recognition 2014 Support Vector Machines Pattern Recgnitin 2014 Supprt Vectr Machines Ad Feelders Universiteit Utrecht Ad Feelders ( Universiteit Utrecht ) Pattern Recgnitin 1 / 55 Overview 1 Separable Case 2 Kernel Functins 3 Allwing Errrs (Sft

More information

Hypothesis Tests for One Population Mean

Hypothesis Tests for One Population Mean Hypthesis Tests fr One Ppulatin Mean Chapter 9 Ala Abdelbaki Objective Objective: T estimate the value f ne ppulatin mean Inferential statistics using statistics in rder t estimate parameters We will be

More information

Kinetic Model Completeness

Kinetic Model Completeness 5.68J/10.652J Spring 2003 Lecture Ntes Tuesday April 15, 2003 Kinetic Mdel Cmpleteness We say a chemical kinetic mdel is cmplete fr a particular reactin cnditin when it cntains all the species and reactins

More information

Chapter 3: Cluster Analysis

Chapter 3: Cluster Analysis Chapter 3: Cluster Analysis } 3.1 Basic Cncepts f Clustering 3.1.1 Cluster Analysis 3.1. Clustering Categries } 3. Partitining Methds 3..1 The principle 3.. K-Means Methd 3..3 K-Medids Methd 3..4 CLARA

More information

On Huntsberger Type Shrinkage Estimator for the Mean of Normal Distribution ABSTRACT INTRODUCTION

On Huntsberger Type Shrinkage Estimator for the Mean of Normal Distribution ABSTRACT INTRODUCTION Malaysian Jurnal f Mathematical Sciences 4(): 7-4 () On Huntsberger Type Shrinkage Estimatr fr the Mean f Nrmal Distributin Department f Mathematical and Physical Sciences, University f Nizwa, Sultanate

More information

Resampling Methods. Chapter 5. Chapter 5 1 / 52

Resampling Methods. Chapter 5. Chapter 5 1 / 52 Resampling Methds Chapter 5 Chapter 5 1 / 52 1 51 Validatin set apprach 2 52 Crss validatin 3 53 Btstrap Chapter 5 2 / 52 Abut Resampling An imprtant statistical tl Pretending the data as ppulatin and

More information

Comparing Several Means: ANOVA. Group Means and Grand Mean

Comparing Several Means: ANOVA. Group Means and Grand Mean STAT 511 ANOVA and Regressin 1 Cmparing Several Means: ANOVA Slide 1 Blue Lake snap beans were grwn in 12 pen-tp chambers which are subject t 4 treatments 3 each with O 3 and SO 2 present/absent. The ttal

More information

CHAPTER 4 DIAGNOSTICS FOR INFLUENTIAL OBSERVATIONS

CHAPTER 4 DIAGNOSTICS FOR INFLUENTIAL OBSERVATIONS CHAPTER 4 DIAGNOSTICS FOR INFLUENTIAL OBSERVATIONS 1 Influential bservatins are bservatins whse presence in the data can have a distrting effect n the parameter estimates and pssibly the entire analysis,

More information

Perfrmance f Sensitizing Rules n Shewhart Cntrl Charts with Autcrrelated Data Key Wrds: Autregressive, Mving Average, Runs Tests, Shewhart Cntrl Chart

Perfrmance f Sensitizing Rules n Shewhart Cntrl Charts with Autcrrelated Data Key Wrds: Autregressive, Mving Average, Runs Tests, Shewhart Cntrl Chart Perfrmance f Sensitizing Rules n Shewhart Cntrl Charts with Autcrrelated Data Sandy D. Balkin Dennis K. J. Lin y Pennsylvania State University, University Park, PA 16802 Sandy Balkin is a graduate student

More information

PSU GISPOPSCI June 2011 Ordinary Least Squares & Spatial Linear Regression in GeoDa

PSU GISPOPSCI June 2011 Ordinary Least Squares & Spatial Linear Regression in GeoDa There are tw parts t this lab. The first is intended t demnstrate hw t request and interpret the spatial diagnstics f a standard OLS regressin mdel using GeDa. The diagnstics prvide infrmatin abut the

More information

3.4 Shrinkage Methods Prostate Cancer Data Example (Continued) Ridge Regression

3.4 Shrinkage Methods Prostate Cancer Data Example (Continued) Ridge Regression 3.3.4 Prstate Cancer Data Example (Cntinued) 3.4 Shrinkage Methds 61 Table 3.3 shws the cefficients frm a number f different selectin and shrinkage methds. They are best-subset selectin using an all-subsets

More information

Pressure And Entropy Variations Across The Weak Shock Wave Due To Viscosity Effects

Pressure And Entropy Variations Across The Weak Shock Wave Due To Viscosity Effects Pressure And Entrpy Variatins Acrss The Weak Shck Wave Due T Viscsity Effects OSTAFA A. A. AHOUD Department f athematics Faculty f Science Benha University 13518 Benha EGYPT Abstract:-The nnlinear differential

More information

Lecture 2: Supervised vs. unsupervised learning, bias-variance tradeoff

Lecture 2: Supervised vs. unsupervised learning, bias-variance tradeoff Lecture 2: Supervised vs. unsupervised learning, bias-variance tradeff Reading: Chapter 2 STATS 202: Data mining and analysis September 27, 2017 1 / 20 Supervised vs. unsupervised learning In unsupervised

More information

Lecture 2: Supervised vs. unsupervised learning, bias-variance tradeoff

Lecture 2: Supervised vs. unsupervised learning, bias-variance tradeoff Lecture 2: Supervised vs. unsupervised learning, bias-variance tradeff Reading: Chapter 2 STATS 202: Data mining and analysis September 27, 2017 1 / 20 Supervised vs. unsupervised learning In unsupervised

More information

MATCHING TECHNIQUES. Technical Track Session VI. Emanuela Galasso. The World Bank

MATCHING TECHNIQUES. Technical Track Session VI. Emanuela Galasso. The World Bank MATCHING TECHNIQUES Technical Track Sessin VI Emanuela Galass The Wrld Bank These slides were develped by Christel Vermeersch and mdified by Emanuela Galass fr the purpse f this wrkshp When can we use

More information

Tree Structured Classifier

Tree Structured Classifier Tree Structured Classifier Reference: Classificatin and Regressin Trees by L. Breiman, J. H. Friedman, R. A. Olshen, and C. J. Stne, Chapman & Hall, 98. A Medical Eample (CART): Predict high risk patients

More information

Revision: August 19, E Main Suite D Pullman, WA (509) Voice and Fax

Revision: August 19, E Main Suite D Pullman, WA (509) Voice and Fax .7.4: Direct frequency dmain circuit analysis Revisin: August 9, 00 5 E Main Suite D Pullman, WA 9963 (509) 334 6306 ice and Fax Overview n chapter.7., we determined the steadystate respnse f electrical

More information

Internal vs. external validity. External validity. This section is based on Stock and Watson s Chapter 9.

Internal vs. external validity. External validity. This section is based on Stock and Watson s Chapter 9. Sectin 7 Mdel Assessment This sectin is based n Stck and Watsn s Chapter 9. Internal vs. external validity Internal validity refers t whether the analysis is valid fr the ppulatin and sample being studied.

More information

The blessing of dimensionality for kernel methods

The blessing of dimensionality for kernel methods fr kernel methds Building classifiers in high dimensinal space Pierre Dupnt Pierre.Dupnt@ucluvain.be Classifiers define decisin surfaces in sme feature space where the data is either initially represented

More information

Verification of Quality Parameters of a Solar Panel and Modification in Formulae of its Series Resistance

Verification of Quality Parameters of a Solar Panel and Modification in Formulae of its Series Resistance Verificatin f Quality Parameters f a Slar Panel and Mdificatin in Frmulae f its Series Resistance Sanika Gawhane Pune-411037-India Onkar Hule Pune-411037- India Chinmy Kulkarni Pune-411037-India Ojas Pandav

More information

BOUNDED UNCERTAINTY AND CLIMATE CHANGE ECONOMICS. Christopher Costello, Andrew Solow, Michael Neubert, and Stephen Polasky

BOUNDED UNCERTAINTY AND CLIMATE CHANGE ECONOMICS. Christopher Costello, Andrew Solow, Michael Neubert, and Stephen Polasky BOUNDED UNCERTAINTY AND CLIMATE CHANGE ECONOMICS Christpher Cstell, Andrew Slw, Michael Neubert, and Stephen Plasky Intrductin The central questin in the ecnmic analysis f climate change plicy cncerns

More information

A Matrix Representation of Panel Data

A Matrix Representation of Panel Data web Extensin 6 Appendix 6.A A Matrix Representatin f Panel Data Panel data mdels cme in tw brad varieties, distinct intercept DGPs and errr cmpnent DGPs. his appendix presents matrix algebra representatins

More information

Inference in the Multiple-Regression

Inference in the Multiple-Regression Sectin 5 Mdel Inference in the Multiple-Regressin Kinds f hypthesis tests in a multiple regressin There are several distinct kinds f hypthesis tests we can run in a multiple regressin. Suppse that amng

More information

Enhancing Performance of MLP/RBF Neural Classifiers via an Multivariate Data Distribution Scheme

Enhancing Performance of MLP/RBF Neural Classifiers via an Multivariate Data Distribution Scheme Enhancing Perfrmance f / Neural Classifiers via an Multivariate Data Distributin Scheme Halis Altun, Gökhan Gelen Nigde University, Electrical and Electrnics Engineering Department Nigde, Turkey haltun@nigde.edu.tr

More information

Lead/Lag Compensator Frequency Domain Properties and Design Methods

Lead/Lag Compensator Frequency Domain Properties and Design Methods Lectures 6 and 7 Lead/Lag Cmpensatr Frequency Dmain Prperties and Design Methds Definitin Cnsider the cmpensatr (ie cntrller Fr, it is called a lag cmpensatr s K Fr s, it is called a lead cmpensatr Ntatin

More information

the results to larger systems due to prop'erties of the projection algorithm. First, the number of hidden nodes must

the results to larger systems due to prop'erties of the projection algorithm. First, the number of hidden nodes must M.E. Aggune, M.J. Dambrg, M.A. El-Sharkawi, R.J. Marks II and L.E. Atlas, "Dynamic and static security assessment f pwer systems using artificial neural netwrks", Prceedings f the NSF Wrkshp n Applicatins

More information

Computational modeling techniques

Computational modeling techniques Cmputatinal mdeling techniques Lecture 4: Mdel checing fr ODE mdels In Petre Department f IT, Åb Aademi http://www.users.ab.fi/ipetre/cmpmd/ Cntent Stichimetric matrix Calculating the mass cnservatin relatins

More information

BLAST / HIDDEN MARKOV MODELS

BLAST / HIDDEN MARKOV MODELS CS262 (Winter 2015) Lecture 5 (January 20) Scribe: Kat Gregry BLAST / HIDDEN MARKOV MODELS BLAST CONTINUED HEURISTIC LOCAL ALIGNMENT Use Cmmnly used t search vast bilgical databases (n the rder f terabases/tetrabases)

More information

CS 477/677 Analysis of Algorithms Fall 2007 Dr. George Bebis Course Project Due Date: 11/29/2007

CS 477/677 Analysis of Algorithms Fall 2007 Dr. George Bebis Course Project Due Date: 11/29/2007 CS 477/677 Analysis f Algrithms Fall 2007 Dr. Gerge Bebis Curse Prject Due Date: 11/29/2007 Part1: Cmparisn f Srting Algrithms (70% f the prject grade) The bjective f the first part f the assignment is

More information

SIZE BIAS IN LINE TRANSECT SAMPLING: A FIELD TEST. Mark C. Otto Statistics Research Division, Bureau of the Census Washington, D.C , U.S.A.

SIZE BIAS IN LINE TRANSECT SAMPLING: A FIELD TEST. Mark C. Otto Statistics Research Division, Bureau of the Census Washington, D.C , U.S.A. SIZE BIAS IN LINE TRANSECT SAMPLING: A FIELD TEST Mark C. Ott Statistics Research Divisin, Bureau f the Census Washingtn, D.C. 20233, U.S.A. and Kenneth H. Pllck Department f Statistics, Nrth Carlina State

More information

Eric Klein and Ning Sa

Eric Klein and Ning Sa Week 12. Statistical Appraches t Netwrks: p1 and p* Wasserman and Faust Chapter 15: Statistical Analysis f Single Relatinal Netwrks There are fur tasks in psitinal analysis: 1) Define Equivalence 2) Measure

More information

Pure adaptive search for finite global optimization*

Pure adaptive search for finite global optimization* Mathematical Prgramming 69 (1995) 443-448 Pure adaptive search fr finite glbal ptimizatin* Z.B. Zabinskya.*, G.R. Wd b, M.A. Steel c, W.P. Baritmpa c a Industrial Engineering Prgram, FU-20. University

More information

Name: Block: Date: Science 10: The Great Geyser Experiment A controlled experiment

Name: Block: Date: Science 10: The Great Geyser Experiment A controlled experiment Science 10: The Great Geyser Experiment A cntrlled experiment Yu will prduce a GEYSER by drpping Ments int a bttle f diet pp Sme questins t think abut are: What are yu ging t test? What are yu ging t measure?

More information

initially lcated away frm the data set never win the cmpetitin, resulting in a nnptimal nal cdebk, [2] [3] [4] and [5]. Khnen's Self Organizing Featur

initially lcated away frm the data set never win the cmpetitin, resulting in a nnptimal nal cdebk, [2] [3] [4] and [5]. Khnen's Self Organizing Featur Cdewrd Distributin fr Frequency Sensitive Cmpetitive Learning with One Dimensinal Input Data Aristides S. Galanpuls and Stanley C. Ahalt Department f Electrical Engineering The Ohi State University Abstract

More information

Admissibility Conditions and Asymptotic Behavior of Strongly Regular Graphs

Admissibility Conditions and Asymptotic Behavior of Strongly Regular Graphs Admissibility Cnditins and Asympttic Behavir f Strngly Regular Graphs VASCO MOÇO MANO Department f Mathematics University f Prt Oprt PORTUGAL vascmcman@gmailcm LUÍS ANTÓNIO DE ALMEIDA VIEIRA Department

More information

Resampling Methods. Cross-validation, Bootstrapping. Marek Petrik 2/21/2017

Resampling Methods. Cross-validation, Bootstrapping. Marek Petrik 2/21/2017 Resampling Methds Crss-validatin, Btstrapping Marek Petrik 2/21/2017 Sme f the figures in this presentatin are taken frm An Intrductin t Statistical Learning, with applicatins in R (Springer, 2013) with

More information

Lecture 17: Free Energy of Multi-phase Solutions at Equilibrium

Lecture 17: Free Energy of Multi-phase Solutions at Equilibrium Lecture 17: 11.07.05 Free Energy f Multi-phase Slutins at Equilibrium Tday: LAST TIME...2 FREE ENERGY DIAGRAMS OF MULTI-PHASE SOLUTIONS 1...3 The cmmn tangent cnstructin and the lever rule...3 Practical

More information

Physics 2B Chapter 23 Notes - Faraday s Law & Inductors Spring 2018

Physics 2B Chapter 23 Notes - Faraday s Law & Inductors Spring 2018 Michael Faraday lived in the Lndn area frm 1791 t 1867. He was 29 years ld when Hand Oersted, in 1820, accidentally discvered that electric current creates magnetic field. Thrugh empirical bservatin and

More information

WRITING THE REPORT. Organizing the report. Title Page. Table of Contents

WRITING THE REPORT. Organizing the report. Title Page. Table of Contents WRITING THE REPORT Organizing the reprt Mst reprts shuld be rganized in the fllwing manner. Smetime there is a valid reasn t include extra chapters in within the bdy f the reprt. 1. Title page 2. Executive

More information

Relationship Between Amplifier Settling Time and Pole-Zero Placements for Second-Order Systems *

Relationship Between Amplifier Settling Time and Pole-Zero Placements for Second-Order Systems * Relatinship Between Amplifier Settling Time and Ple-Zer Placements fr Secnd-Order Systems * Mark E. Schlarmann and Randall L. Geiger Iwa State University Electrical and Cmputer Engineering Department Ames,

More information

A Few Basic Facts About Isothermal Mass Transfer in a Binary Mixture

A Few Basic Facts About Isothermal Mass Transfer in a Binary Mixture Few asic Facts but Isthermal Mass Transfer in a inary Miture David Keffer Department f Chemical Engineering University f Tennessee first begun: pril 22, 2004 last updated: January 13, 2006 dkeffer@utk.edu

More information

COMP 551 Applied Machine Learning Lecture 5: Generative models for linear classification

COMP 551 Applied Machine Learning Lecture 5: Generative models for linear classification COMP 551 Applied Machine Learning Lecture 5: Generative mdels fr linear classificatin Instructr: Herke van Hf (herke.vanhf@mail.mcgill.ca) Slides mstly by: Jelle Pineau Class web page: www.cs.mcgill.ca/~hvanh2/cmp551

More information

MATCHING TECHNIQUES Technical Track Session VI Céline Ferré The World Bank

MATCHING TECHNIQUES Technical Track Session VI Céline Ferré The World Bank MATCHING TECHNIQUES Technical Track Sessin VI Céline Ferré The Wrld Bank When can we use matching? What if the assignment t the treatment is nt dne randmly r based n an eligibility index, but n the basis

More information

The Kullback-Leibler Kernel as a Framework for Discriminant and Localized Representations for Visual Recognition

The Kullback-Leibler Kernel as a Framework for Discriminant and Localized Representations for Visual Recognition The Kullback-Leibler Kernel as a Framewrk fr Discriminant and Lcalized Representatins fr Visual Recgnitin Nun Vascncels Purdy H Pedr Mren ECE Department University f Califrnia, San Dieg HP Labs Cambridge

More information

Technical Bulletin. Generation Interconnection Procedures. Revisions to Cluster 4, Phase 1 Study Methodology

Technical Bulletin. Generation Interconnection Procedures. Revisions to Cluster 4, Phase 1 Study Methodology Technical Bulletin Generatin Intercnnectin Prcedures Revisins t Cluster 4, Phase 1 Study Methdlgy Release Date: Octber 20, 2011 (Finalizatin f the Draft Technical Bulletin released n September 19, 2011)

More information

AP Statistics Practice Test Unit Three Exploring Relationships Between Variables. Name Period Date

AP Statistics Practice Test Unit Three Exploring Relationships Between Variables. Name Period Date AP Statistics Practice Test Unit Three Explring Relatinships Between Variables Name Perid Date True r False: 1. Crrelatin and regressin require explanatry and respnse variables. 1. 2. Every least squares

More information

NUROP CONGRESS PAPER CHINESE PINYIN TO CHINESE CHARACTER CONVERSION

NUROP CONGRESS PAPER CHINESE PINYIN TO CHINESE CHARACTER CONVERSION NUROP Chinese Pinyin T Chinese Character Cnversin NUROP CONGRESS PAPER CHINESE PINYIN TO CHINESE CHARACTER CONVERSION CHIA LI SHI 1 AND LUA KIM TENG 2 Schl f Cmputing, Natinal University f Singapre 3 Science

More information

Particle Size Distributions from SANS Data Using the Maximum Entropy Method. By J. A. POTTON, G. J. DANIELL AND B. D. RAINFORD

Particle Size Distributions from SANS Data Using the Maximum Entropy Method. By J. A. POTTON, G. J. DANIELL AND B. D. RAINFORD 3 J. Appl. Cryst. (1988). 21,3-8 Particle Size Distributins frm SANS Data Using the Maximum Entrpy Methd By J. A. PTTN, G. J. DANIELL AND B. D. RAINFRD Physics Department, The University, Suthamptn S9

More information

LHS Mathematics Department Honors Pre-Calculus Final Exam 2002 Answers

LHS Mathematics Department Honors Pre-Calculus Final Exam 2002 Answers LHS Mathematics Department Hnrs Pre-alculus Final Eam nswers Part Shrt Prblems The table at the right gives the ppulatin f Massachusetts ver the past several decades Using an epnential mdel, predict the

More information

Checking the resolved resonance region in EXFOR database

Checking the resolved resonance region in EXFOR database Checking the reslved resnance regin in EXFOR database Gttfried Bertn Sciété de Calcul Mathématique (SCM) Oscar Cabells OECD/NEA Data Bank JEFF Meetings - Sessin JEFF Experiments Nvember 0-4, 017 Bulgne-Billancurt,

More information

Support-Vector Machines

Support-Vector Machines Supprt-Vectr Machines Intrductin Supprt vectr machine is a linear machine with sme very nice prperties. Haykin chapter 6. See Alpaydin chapter 13 fr similar cntent. Nte: Part f this lecture drew material

More information

Simple Linear Regression (single variable)

Simple Linear Regression (single variable) Simple Linear Regressin (single variable) Intrductin t Machine Learning Marek Petrik January 31, 2017 Sme f the figures in this presentatin are taken frm An Intrductin t Statistical Learning, with applicatins

More information

Methods for Determination of Mean Speckle Size in Simulated Speckle Pattern

Methods for Determination of Mean Speckle Size in Simulated Speckle Pattern 0.478/msr-04-004 MEASUREMENT SCENCE REVEW, Vlume 4, N. 3, 04 Methds fr Determinatin f Mean Speckle Size in Simulated Speckle Pattern. Hamarvá, P. Šmíd, P. Hrváth, M. Hrabvský nstitute f Physics f the Academy

More information

Fall 2013 Physics 172 Recitation 3 Momentum and Springs

Fall 2013 Physics 172 Recitation 3 Momentum and Springs Fall 03 Physics 7 Recitatin 3 Mmentum and Springs Purpse: The purpse f this recitatin is t give yu experience wrking with mmentum and the mmentum update frmula. Readings: Chapter.3-.5 Learning Objectives:.3.

More information

ENSC Discrete Time Systems. Project Outline. Semester

ENSC Discrete Time Systems. Project Outline. Semester ENSC 49 - iscrete Time Systems Prject Outline Semester 006-1. Objectives The gal f the prject is t design a channel fading simulatr. Upn successful cmpletin f the prject, yu will reinfrce yur understanding

More information

CHAPTER 24: INFERENCE IN REGRESSION. Chapter 24: Make inferences about the population from which the sample data came.

CHAPTER 24: INFERENCE IN REGRESSION. Chapter 24: Make inferences about the population from which the sample data came. MATH 1342 Ch. 24 April 25 and 27, 2013 Page 1 f 5 CHAPTER 24: INFERENCE IN REGRESSION Chapters 4 and 5: Relatinships between tw quantitative variables. Be able t Make a graph (scatterplt) Summarize the

More information

CAUSAL INFERENCE. Technical Track Session I. Phillippe Leite. The World Bank

CAUSAL INFERENCE. Technical Track Session I. Phillippe Leite. The World Bank CAUSAL INFERENCE Technical Track Sessin I Phillippe Leite The Wrld Bank These slides were develped by Christel Vermeersch and mdified by Phillippe Leite fr the purpse f this wrkshp Plicy questins are causal

More information

A New Evaluation Measure. J. Joiner and L. Werner. The problems of evaluation and the needed criteria of evaluation

A New Evaluation Measure. J. Joiner and L. Werner. The problems of evaluation and the needed criteria of evaluation III-l III. A New Evaluatin Measure J. Jiner and L. Werner Abstract The prblems f evaluatin and the needed criteria f evaluatin measures in the SMART system f infrmatin retrieval are reviewed and discussed.

More information

Testing Groups of Genes

Testing Groups of Genes Testing Grups f Genes Part II: Scring Gene Ontlgy Terms Manuela Hummel, LMU München Adrian Alexa, MPI Saarbrücken NGFN-Curses in Practical DNA Micrarray Analysis Heidelberg, March 6, 2008 Bilgical questins

More information

Least Squares Optimal Filtering with Multirate Observations

Least Squares Optimal Filtering with Multirate Observations Prc. 36th Asilmar Cnf. n Signals, Systems, and Cmputers, Pacific Grve, CA, Nvember 2002 Least Squares Optimal Filtering with Multirate Observatins Charles W. herrien and Anthny H. Hawes Department f Electrical

More information

Snow avalanche runout from two Canadian mountain ranges

Snow avalanche runout from two Canadian mountain ranges Annals f Glacilgy 18 1993 Internati n al Glaciigicai Sciety Snw avalanche runut frm tw Canadian muntain ranges D.J. NIXON AND D. M. MCCLUNG Department f Civil Engineering, University f British Clumbia,

More information

^YawataR&D Laboratory, Nippon Steel Corporation, Tobata, Kitakyushu, Japan

^YawataR&D Laboratory, Nippon Steel Corporation, Tobata, Kitakyushu, Japan Detectin f fatigue crack initiatin frm a ntch under a randm lad C. Makabe," S. Nishida^C. Urashima,' H. Kaneshir* "Department f Mechanical Systems Engineering, University f the Ryukyus, Nishihara, kinawa,

More information

APPLICATION OF THE BRATSETH SCHEME FOR HIGH LATITUDE INTERMITTENT DATA ASSIMILATION USING THE PSU/NCAR MM5 MESOSCALE MODEL

APPLICATION OF THE BRATSETH SCHEME FOR HIGH LATITUDE INTERMITTENT DATA ASSIMILATION USING THE PSU/NCAR MM5 MESOSCALE MODEL JP2.11 APPLICATION OF THE BRATSETH SCHEME FOR HIGH LATITUDE INTERMITTENT DATA ASSIMILATION USING THE PSU/NCAR MM5 MESOSCALE MODEL Xingang Fan * and Jeffrey S. Tilley University f Alaska Fairbanks, Fairbanks,

More information

Lecture 24: Flory-Huggins Theory

Lecture 24: Flory-Huggins Theory Lecture 24: 12.07.05 Flry-Huggins Thery Tday: LAST TIME...2 Lattice Mdels f Slutins...2 ENTROPY OF MIXING IN THE FLORY-HUGGINS MODEL...3 CONFIGURATIONS OF A SINGLE CHAIN...3 COUNTING CONFIGURATIONS FOR

More information

Sequential Allocation with Minimal Switching

Sequential Allocation with Minimal Switching In Cmputing Science and Statistics 28 (1996), pp. 567 572 Sequential Allcatin with Minimal Switching Quentin F. Stut 1 Janis Hardwick 1 EECS Dept., University f Michigan Statistics Dept., Purdue University

More information

Bayesian nonparametric modeling approaches for quantile regression

Bayesian nonparametric modeling approaches for quantile regression Bayesian nnparametric mdeling appraches fr quantile regressin Athanasis Kttas Department f Applied Mathematics and Statistics University f Califrnia, Santa Cruz Department f Statistics Athens University

More information

You need to be able to define the following terms and answer basic questions about them:

You need to be able to define the following terms and answer basic questions about them: CS440/ECE448 Sectin Q Fall 2017 Midterm Review Yu need t be able t define the fllwing terms and answer basic questins abut them: Intr t AI, agents and envirnments Pssible definitins f AI, prs and cns f

More information

Lecture 23: Lattice Models of Materials; Modeling Polymer Solutions

Lecture 23: Lattice Models of Materials; Modeling Polymer Solutions Lecture 23: 12.05.05 Lattice Mdels f Materials; Mdeling Plymer Slutins Tday: LAST TIME...2 The Bltzmann Factr and Partitin Functin: systems at cnstant temperature...2 A better mdel: The Debye slid...3

More information

arxiv:hep-ph/ v1 2 Jun 1995

arxiv:hep-ph/ v1 2 Jun 1995 WIS-95//May-PH The rati F n /F p frm the analysis f data using a new scaling variable S. A. Gurvitz arxiv:hep-ph/95063v1 Jun 1995 Department f Particle Physics, Weizmann Institute f Science, Rehvt 76100,

More information

A Regression Solution to the Problem of Criterion Score Comparability

A Regression Solution to the Problem of Criterion Score Comparability A Regressin Slutin t the Prblem f Criterin Scre Cmparability William M. Pugh Naval Health Research Center When the criterin measure in a study is the accumulatin f respnses r behavirs fr an individual

More information

Maximum A Posteriori (MAP) CS 109 Lecture 22 May 16th, 2016

Maximum A Posteriori (MAP) CS 109 Lecture 22 May 16th, 2016 Maximum A Psteriri (MAP) CS 109 Lecture 22 May 16th, 2016 Previusly in CS109 Game f Estimatrs Maximum Likelihd Nn spiler: this didn t happen Side Plt argmax argmax f lg Mther f ptimizatins? Reviving an

More information

IN a recent article, Geary [1972] discussed the merit of taking first differences

IN a recent article, Geary [1972] discussed the merit of taking first differences The Efficiency f Taking First Differences in Regressin Analysis: A Nte J. A. TILLMAN IN a recent article, Geary [1972] discussed the merit f taking first differences t deal with the prblems that trends

More information

5 th grade Common Core Standards

5 th grade Common Core Standards 5 th grade Cmmn Cre Standards In Grade 5, instructinal time shuld fcus n three critical areas: (1) develping fluency with additin and subtractin f fractins, and develping understanding f the multiplicatin

More information

Module 4: General Formulation of Electric Circuit Theory

Module 4: General Formulation of Electric Circuit Theory Mdule 4: General Frmulatin f Electric Circuit Thery 4. General Frmulatin f Electric Circuit Thery All electrmagnetic phenmena are described at a fundamental level by Maxwell's equatins and the assciated

More information

Simulation Based Optimal Design

Simulation Based Optimal Design BAYESIAN STATISTICS 6, pp. 000 000 J. M. Bernard, J. O. Berger, A. P. Dawid and A. F. M. Smith (Eds.) Oxfrd University Press, 1998 Simulatin Based Optimal Design PETER MULLER Duke University SUMMARY We

More information

Lab 1 The Scientific Method

Lab 1 The Scientific Method INTRODUCTION The fllwing labratry exercise is designed t give yu, the student, an pprtunity t explre unknwn systems, r universes, and hypthesize pssible rules which may gvern the behavir within them. Scientific

More information

A NOTE ON BAYESIAN ANALYSIS OF THE. University of Oxford. and. A. C. Davison. Swiss Federal Institute of Technology. March 10, 1998.

A NOTE ON BAYESIAN ANALYSIS OF THE. University of Oxford. and. A. C. Davison. Swiss Federal Institute of Technology. March 10, 1998. A NOTE ON BAYESIAN ANALYSIS OF THE POLY-WEIBULL MODEL F. Luzada-Net University f Oxfrd and A. C. Davisn Swiss Federal Institute f Technlgy March 10, 1998 Summary We cnsider apprximate Bayesian analysis

More information

x 1 Outline IAML: Logistic Regression Decision Boundaries Example Data

x 1 Outline IAML: Logistic Regression Decision Boundaries Example Data Outline IAML: Lgistic Regressin Charles Suttn and Victr Lavrenk Schl f Infrmatics Semester Lgistic functin Lgistic regressin Learning lgistic regressin Optimizatin The pwer f nn-linear basis functins Least-squares

More information

How do scientists measure trees? What is DBH?

How do scientists measure trees? What is DBH? Hw d scientists measure trees? What is DBH? Purpse Students develp an understanding f tree size and hw scientists measure trees. Students bserve and measure tree ckies and explre the relatinship between

More information

Contributions to the Theory of Robust Inference

Contributions to the Theory of Robust Inference Cntributins t the Thery f Rbust Inference by Matías Salibián-Barrera Licenciad en Matemáticas, Universidad de Buens Aires, Argentina, 1994 A THESIS SUBMITTED IN PARTIAL FULFILLMENT OF THE REQUIREMENTS

More information

CHM112 Lab Graphing with Excel Grading Rubric

CHM112 Lab Graphing with Excel Grading Rubric Name CHM112 Lab Graphing with Excel Grading Rubric Criteria Pints pssible Pints earned Graphs crrectly pltted and adhere t all guidelines (including descriptive title, prperly frmatted axes, trendline

More information

ChE 471: LECTURE 4 Fall 2003

ChE 471: LECTURE 4 Fall 2003 ChE 47: LECTURE 4 Fall 003 IDEL RECTORS One f the key gals f chemical reactin engineering is t quantify the relatinship between prductin rate, reactr size, reactin kinetics and selected perating cnditins.

More information

Introduction: A Generalized approach for computing the trajectories associated with the Newtonian N Body Problem

Introduction: A Generalized approach for computing the trajectories associated with the Newtonian N Body Problem A Generalized apprach fr cmputing the trajectries assciated with the Newtnian N Bdy Prblem AbuBar Mehmd, Syed Umer Abbas Shah and Ghulam Shabbir Faculty f Engineering Sciences, GIK Institute f Engineering

More information

CHAPTER 3 INEQUALITIES. Copyright -The Institute of Chartered Accountants of India

CHAPTER 3 INEQUALITIES. Copyright -The Institute of Chartered Accountants of India CHAPTER 3 INEQUALITIES Cpyright -The Institute f Chartered Accuntants f India INEQUALITIES LEARNING OBJECTIVES One f the widely used decisin making prblems, nwadays, is t decide n the ptimal mix f scarce

More information

Margin Distribution and Learning Algorithms

Margin Distribution and Learning Algorithms ICML 03 Margin Distributin and Learning Algrithms Ashutsh Garg IBM Almaden Research Center, San Jse, CA 9513 USA Dan Rth Department f Cmputer Science, University f Illinis, Urbana, IL 61801 USA ASHUTOSH@US.IBM.COM

More information

THE LIFE OF AN OBJECT IT SYSTEMS

THE LIFE OF AN OBJECT IT SYSTEMS THE LIFE OF AN OBJECT IT SYSTEMS Persns, bjects, r cncepts frm the real wrld, which we mdel as bjects in the IT system, have "lives". Actually, they have tw lives; the riginal in the real wrld has a life,

More information

7.0 Heat Transfer in an External Laminar Boundary Layer

7.0 Heat Transfer in an External Laminar Boundary Layer 7.0 Heat ransfer in an Eternal Laminar Bundary Layer 7. Intrductin In this chapter, we will assume: ) hat the fluid prperties are cnstant and unaffected by temperature variatins. ) he thermal & mmentum

More information

Randomized Quantile Residuals

Randomized Quantile Residuals Randmized Quantile Residuals Peter K. Dunn and Grdn K. Smyth Department f Mathematics, University f Queensland, Brisbane, Q 47, Australia. 4 April 996 Abstract In this paper we give a general definitin

More information

Physical Layer: Outline

Physical Layer: Outline 18-: Intrductin t Telecmmunicatin Netwrks Lectures : Physical Layer Peter Steenkiste Spring 01 www.cs.cmu.edu/~prs/nets-ece Physical Layer: Outline Digital Representatin f Infrmatin Characterizatin f Cmmunicatin

More information

Administrativia. Assignment 1 due thursday 9/23/2004 BEFORE midnight. Midterm exam 10/07/2003 in class. CS 460, Sessions 8-9 1

Administrativia. Assignment 1 due thursday 9/23/2004 BEFORE midnight. Midterm exam 10/07/2003 in class. CS 460, Sessions 8-9 1 Administrativia Assignment 1 due thursday 9/23/2004 BEFORE midnight Midterm eam 10/07/2003 in class CS 460, Sessins 8-9 1 Last time: search strategies Uninfrmed: Use nly infrmatin available in the prblem

More information

Interference is when two (or more) sets of waves meet and combine to produce a new pattern.

Interference is when two (or more) sets of waves meet and combine to produce a new pattern. Interference Interference is when tw (r mre) sets f waves meet and cmbine t prduce a new pattern. This pattern can vary depending n the riginal wave directin, wavelength, amplitude, etc. The tw mst extreme

More information

Equilibrium of Stress

Equilibrium of Stress Equilibrium f Stress Cnsider tw perpendicular planes passing thrugh a pint p. The stress cmpnents acting n these planes are as shwn in ig. 3.4.1a. These stresses are usuall shwn tgether acting n a small

More information

Discussion on Regularized Regression for Categorical Data (Tutz and Gertheiss)

Discussion on Regularized Regression for Categorical Data (Tutz and Gertheiss) Discussin n Regularized Regressin fr Categrical Data (Tutz and Gertheiss) Peter Bühlmann, Ruben Dezeure Seminar fr Statistics, Department f Mathematics, ETH Zürich, Switzerland Address fr crrespndence:

More information