Estimation of Large Families of Bayes Factors from Markov Chain Output
|
|
- Gloria Blankenship
- 5 years ago
- Views:
Transcription
1 Estimatio of Large Families of Bayes Factors from Markov Chai Output Hai Doss Uiversity of Florida Abstract We cosider situatios i Bayesia aalysis where the prior is idexed by a hyperparameter takig o a cotiuum of values. We distiguish some arbitrary value of the hyperparameter, ad cosider the problem of estimatig the Bayes factor for the model idexed by the hyperparameter vs. the model specified by the distiguished poit, as the hyperparameter varies. We assume that we have Markov chai output from the posterior for a fiite umber of the priors, ad develop a method for efficietly computig estimates of the etire family of Bayes factors. As a applicatio of the ideas, we cosider some commoly used hierarchical Bayesia models ad show that the parametric assumptios i these models ca be recast as assumptios regardig the prior. Therefore, our method ca be used as a model selectio criterio i a Bayesia framework. We illustrate our methodology through a detailed example ivolvig Bayesia model selectio. Key words ad phrases: Bayes factors, cotrol variates, ergodicity, importace samplig, Markov chai Mote Carlo
2 Itroductio Suppose we have a data vector Y whose distributio has desity p θ, for some ukow θ Θ. Let {ν h, h H} be a family of prior desities o θ that we are cotemplatig. The selectio of a particular prior from the family is importat i Bayesia data aalysis, ad whe makig this choice oe will ofte wat to cosider the margial likelihood of the data uder the prior ν h, give by m h (y) = l y (θ)ν h (θ) dθ, as h varies over the hyperparameter space H. Here, l y (θ) = p θ (y) is the likelihood fuctio. Values of h for which m h (y) is relatively low may be cosidered poor choices, ad cosideratio of the family {m h (y), h H} may be helpful i arrowig the search of priors to use. It is therefore useful to have a method for computig the family {m h (y), h H}. For the purpose of model selectio, if c is a fixed costat, the iformatio give by {m h (y), h H} ad {c m h (y), h H} is the same. From a computatioal ad statistical poit of view however, it is usually easier to fix a particular hyperparameter value h ad focus o {m h (y)/m h (y), h H}. Give two hyperparameter values h ad h, the quatity B(h, h ) = m h /m h is called the Bayes factor of the model idexed by h vs. the model idexed by h (we write m h istead of m h (y) from ow o). I this paper we preset a method for estimatig the family {B(h, h ), h H}. We have i mid situatios where B(h, h ) caot be obtaied aalytically ad, moreover, we eed to calculate B(h, h ) for a large set of h s, so that computatioal efficiecy is essetial. Our approach requires that there are k hyperparameter values h,..., h k, ad for l =,..., k, we are able to get a sample θ (l) i, i =,..., l, from ν hl,y, the posterior desity of θ give Y = y, assumig that the prior is ν hl. To set the framework, cosider the trivial case where k =, ad we have a sample from the posterior ν h,y geerated by a ergodic Markov chai. Our objective is to estimate {B(h, h ), h H}. For ay h such that ν h (θ) = 0 wheever ν h (θ) = 0, we have i= ν h (θ ν h (θ νh (θ) ν h (θ) ν h,y(θ) dθ (.) = m h ly (θ)ν h (θ)/m h ν h,y(θ) dθ m h l y (θ)ν h (θ)/m h = m h m h νh,y (θ) ν h,y(θ) ν h,y(θ) dθ = m h m h. Therefore, the left side of (.) is a cosistet estimate of the Bayes factor B(h, h ). To fix ideas, cosider as a simple example the followig stadard three-level hierarchical model: coditioal o ψ j, Y j idep φ ψj,σ j, j =,..., m (.2a) coditioal o µ, τ, ψ j iid φ µ,τ, j =,..., m (.2b) (µ, τ) λ c,c 2,c 3,c 4 (.2c) where φ m,s deotes the desity of the ormal distributio with mea m ad stadard deviatio s. I (.2a), the σ i s are assumed kow. I (.2c), λ c,c 2,c 3,c 4 is the ormal / iverse gamma distributio idexed by four hyperparameters (see Sectio 3). This is a very commoly used
3 iid model but, as we discuss later, i some situatios it is preferable to replace (.2b) with ψ j t v,µ,τ, where t v,µ,τ is the desity of the t distributio with v degrees of freedom, locatio µ ad scale τ. I this case, cosider ow the estimate i the left side of (.). The likelihood of (µ, τ) is m m l Y (µ, τ) =... φ ψj,σ j (Y j ) t v,µ,τ (ψ j ) dψ... dψ m. j= This likelihood caot be computed i closed form, ad therefore its cacellatio i (.) gives a o-trivial simplificatio: calculatio of the estimate requires oly the ratio of the desities of the priors ad ot the posteriors. Cosider (.2) with t v,µ,τ istead of φ µ,τ i the middle stage, ad suppose ow that we would like to select v, with the choice v = sigifyig the choice of the ormal distributio φ µ,τ. The distributio of Y is determied by ψ = (ψ,..., ψ m ). A completely equivalet way of describig the model is therefore through the two-level hierarchy i which we let θ = (ψ, µ, τ), ad stipulate: coditioal o θ, j= Y j idep φ ψj,σ j, (ψ, µ, τ) ν h, j =,..., m where ν h (ψ, µ, τ) = ( m j= t v,µ,τ(ψ j ) ) λ c,c 2,c 3,c 4 (µ, τ). Here, the hyperparameter is h = (v, c, c 2, c 3, c 4 ), which icludes the umber of degrees of freedom. Estimatio of the family of Bayes factors {B(h, h ), h H} therefore eables a model selectio step. We ow discuss briefly the accuracy of the estimate o the left side of (.). Whe ν h is early sigular with respect to ν h over the regio where the θ i s are likely to be, the estimate will be ustable. (Formally, the estimate will satisfy a cetral limit theorem if the chai mixes fast eough ad the radom variable ν h (θ)/ν h (θ) (where θ ν h,y) has a high eough momet. This is discussed i more detail i Sectio 2.3.) From a practical poit of view, this meas that there is effectively a radius aroud h withi which oe ca safely move. I all but the very simplest models, the dimesio of H is greater tha, ad therefore estimatio of the Bayes factor as h rages over H raises serious computatioal difficulties, ad it is essetial that for each h, the estimate of B(h, h ) is both accurate ad ca be computed quickly. Our approach is to select k hyperparameter poits h,..., h k, ad get Markov chai samples from ν hl,y for each l =,..., k. The prior ν h i the deomiator of the left side of (.) is replaced by a mixture w ν h + + w k ν hk, with appropriately chose weights. We show how judiciously chose cotrol variates ca be used i cojuctio with multiple Markov chai streams to produce accurate estimates eve with small samples, so that the et result is a computatioally feasible method for producig reliable estimates of the Bayes factors for a wide rage of hyperparameter values. Our approach is motivated by ad uses ideas developed i Kog et al. (2003), which deals with the situatio where we have idepedet samples from k uormalized desities, ad we wish to estimate all possible ratios of the k ormalizig costats. Owe ad Zhou (2000) ad Ta (2004) also discuss the use of cotrol variates to icrease the accuracy of Mote Carlo estimates. I Sectio 4 we retur to these three papers ad discuss i detail how our approach fits i the cotext of this work. The paper is orgaized as follows. Sectio 2 cotais the mai methodological developmet; there, we preset our method for estimatig the family of Bayes factors ad state supportig theoretical results. Sectio 3 illustrates the methodology through a detailed example that ivolves a umber of issues, 2
4 icludig selectio of the parametric family i the model. Sectio 4 gives a discussio of other possible approaches ad related work, ad the Appedix gives the proof of the mai theoretical result of the paper. 2 Estimatio of the Family of Bayes Factors Suppose that for l =,..., k, we have Markov chai Mote Carlo (MCMC) samples θ (l) i, i =,..., l from the posterior desity of θ give Y = y, assumig that the prior is ν hl, havig the form ν hl,y(θ) = l y (θ)ν hl (θ)/m hl. We assume that the k sequeces are idepedet of oe aother. We will ot assume we kow ay of the m hl s. However, we ow explai how kowledge of the Bayes factors m hl /m h, for l = 2,..., k would result i two importat beefits. If we kew these Bayes factors we could the form the estimate ˆB(h, h ) = k l ν h (θ (l) s=. (2.) sν hs (θ (l) m h /m hs l= i= Let = s= s, ad assume that s / a s, s =,..., k. We the have ˆB(h, h ) = = a.s. k l l= i= k s= sl y (θ (l) i m h l m h l l= i= k m h m h l= l y (θ (l) )ν h (θ (l) s= i )ν hs (θ (l) i l ν h,y(θ (l) s ν hs,y(θ (l) )m h /m hs a l ν h,y (θ) s= a sν hs,y(θ) ν h l,y(θ) dθ = m h m h. (2.2) The almost sure covergece i (2.2) occurs uder miimal coditios o the Markov chais θ (l) i, i =,..., l. Asymptotic ormality requires more restrictive coditios, ad is discussed i Sectio 2.3. To compute ˆB(h, h ), the quatities s= sν hs (θ (l) m h /m hs are calculated oce, ad stored. The, for every ew value of h, the computatio of ˆB(h, h ) requires takig ratios ad a sum. Sice this is to be doe for a large umber of h s, it is essetial that for each l, the sequece θ (l) i, i =,..., l be as idepedet as possible, so that the value of be made as small as possible. We ow briefly recall the use of cotrol variates i Mote Carlo samplig. Suppose we wish to estimate the expected value of a radom variable Y, ad we ca fid a radom variable Z that is correlated with Y, ad such that E(Z) is kow (without loss of geerality, E(Z) = 0). The for ay β, the estimate Y βz is a ubiased estimate of E(Y ), ad the value of β miimizig the variace of Y βz is β = Cov(Y, Z)/Var(Z). The idea may be used whe there are several variables Z,..., Z r that are correlated with Y. 3
5 I the preset cotext, we may cosider the fuctios Z j (θ) = ν h j (θ)m h /m hj ν h (θ) s s= ν hs (θ)m h /m hs, j = 2,..., k, whose expectatios uder s= ( s/)ν hs,y are 0. The calculatio of these fuctios requires kowledge of the Bayes factors m hs /m h, s = 2,..., k. The method proposed i this paper ca ow be briefly summarized as follows.. For each l =,..., k, get Markov chai samples θ (l) i, i =,..., N l from ν hl,y. Based o these, the Bayes factors m hs /m h, s = 2,..., k are estimated. The sample sizes N l should be very large, so that these estimates are very accurate. 2. For each l =,..., k, we obtai ew samples θ (l) i, i =,..., l from ν hl,y. Usig these, together with the Bayes factors computed i Step we form the estimate ˆB reg (h, h ), which is similar to (2.), except that we use the fuctios Z j, j = 2,..., k as cotrol variates. The samples i the two steps are used for differet purposes. Those i Step are used solely to estimate m hs /m h, s = 2,..., k, ad i fact, oce these estimates are formed, the samples may be discarded. The samples i Step 2 are used to estimate the family B(h, h ). O occasio, special aalytical structure eables the use of umerical methods to estimate m hs /m h, s = 2,..., k, as log as k is ot too large so Step is bypassed. A review of the literature for this approach is give i Kass ad Raftery (995). Ideally, the samples i Step 2 should be idepedet or early so, which may be accomplished by subsamplig a very log chai. If we have a Markov trasitio fuctio that gives rise to a uiformly ergodic chai, it is possible to use this Markov trasitio fuctio to obtai perfect samples (Hobert ad Robert (2004)), although the time it takes to geerate a perfect sample of legth l may be much greater tha the time to geerate the Markov chai of legth l. Oe may ask what is the poit of havig two steps of samplig, i.e. why ot just use the samples from Step for both estimatio of m hs /m h, s =,..., k, ad for subsequet estimatio of the family B(h, h ). The reaso for havig the two stages is that the estimate of B(h, h ) eeds to be computed for a large umber of h s, ad for every h the amout of computatio is liear i, so this precludes a large value of. Therefore, give that a relatively modest sample size must be used, we eed to reduce the variace of the estimate as much as possible, ad this is the reaso for carryig out Step. The amout of computatio to geerate the Step samples is typically oe or two orders of magitude less tha the amout of computatio eeded to calculate the estimates of B(h, h ) from the Step 2 samples (see the discussio at the ed of Sectio 3). To summarize, the beefit of the two-step approach is a better tradeoff betwee statistical efficiecy ad computatioal time. To see this, it is helpful to cosider a very simple example i which the variaces of various estimators ca actually be computed. Cosider the uormalized desity q h = t h I(t (0, )), ad let m h be the ormalizig costat. Now suppose we wish to estimate m h /m as h rages over a grid of 4000 poits i the iterval (.5, 2.5) ad that we are able to geerate iid observatios from q /m ad q 3 /m 3. We may use the estimator i Kog et al. (2003) (discussed later i this paper), which estimates both m h /m ad m 3 /m from the same sample. Give oe miute of computer time, usig the machie whose specificatios 4
6 are described i Sectio 3, the requiremet that we calculate such a large umber of ratios of ormalizig costats limits the total sample size to = A formula for the asymptotic variace ρ 2 (h) of the Kog et al. (2003) estimate is give i Ta (2004, equatio (8)), ad i this situatio all quatities that are eeded i the formula are available explicitly. Now if we take the miute ad divide it ito two parts, 3 secods ad 57 secods, the with the 3 secods we ca estimate m 3 /m with essetially perfect accuracy, ad with the remaiig 57 secods, if we use the estimate ˆB(h, ), we ca hadle a sample size of 57/60. A formula for the asymptotic variace τ 2 (h) of this estimator which uses the value of m 3 /m calculated i the first stage is give i Theorem of the preset paper, ad ca also be evaluated explicitly. The ratio τ 2 (h)/ρ 2 (h) is bouded above by.2 over the etire grid, ad so with the same computer resources, the variace of the two-stage estimator is uiformly at most.2 60/57.2 that of the oe-stage estimator. (The gais if we use ˆB reg istead of ˆB ca be far greater; see Sectio 3 for a illustratio.) I Sectio 2. we show how the MCMC approach to Step may be implemeted. I Sectio 2.2 we show how estimatio i Step 2 may be implemeted, ad also discuss the beefits of usig the cotrol variates. I Sectio 2.3 we give a result regardig asymptotic ormality of the estimates of the Bayes factors. 2. Estimatio of the Bayes Factors m hs /m h We ow assume that for l =,..., k, we have a sequece θ (l) i, i =,..., N l from a Markov chai correspodig to the posterior ν hl,y. Also, these k sequeces are idepedet of oe aother. Let N = l= N l, ad a l = N l /N. We wish to estimate m hl /m h, l = 2,..., k. Meg ad Wog (996) cosidered this problem ad, to uderstad their method, it is helpful to cosider first the case where k = 2 ad we wish to estimate d = m h2 /m h. For ay fuctio α defied o the commo support of ν h,y ad ν h2,y such that α(θ)ν h (θ)l y (θ)ν h2 (θ) dθ <, we have Therefore, α(θ)ν h2 (θ)ν h,y(θ) dθ α(θ)ν h (θ)ν h2,y(θ) dθ ˆd = = N N 2 m h m h2 N i= N 2 i= α(θ)ν h2 (θ)l y (θ)ν h (θ) dθ α(θ)ν h (θ)l y (θ)ν h2 (θ) dθ α(θ () ν h2 (θ () α(θ (2) ν h (θ (2) = m h 2 m h. (2.3) is a cosistet estimate of d, uder the miimal assumptio of ergodicity of the two chais. Meg ad Wog (996) show that whe {θ (j) i } N j i= are idepedet draws from ν h j,y, the optimal α to use is α opt (θ) = a ν h (θ) + a 2 ν h2 (θ)/d, (2.4) 5
7 which ivolves the quatity we wish to estimate. This suggests the iterative scheme ˆd (t+) = N N 2 N ν h2 (θ () i= a ν h (θ () + a 2 ν h2 (θ () i N 2 ν h (θ (2) i= a ν h (θ (2) + a 2 ν h2 (θ (2) i )/ ˆd (t) )/ ˆd (t), (2.5) for t =, 2,.... For the geeral case where k 2, let d = (m h2 /m h,..., m hk /m h ), but it is more coveiet to work with the vector of compoet-wise reciprocals of d, call it r. For i = 2,..., k, ad j =,..., k, j i, let α ij be kow fuctios defied o the commo support of ν hi ad ν hj satisfyig α ij (θ)ν hi (θ)l y (θ)ν hj (θ) dθ <. Let b ii = j i E ν hj,y( αij (θ)ν hi (θ) ) 2 i k, b ij = E νhi,y( αij (θ)ν hj (θ) ) i j, (2.6) ad b 22 b b 2k b 32 b b 3k B =......, b =. b 2 b 3. b k2 b k2... b kk b k The assumig that B is osigular, we have r = B b. If ˆB α ad ˆb α are the atural estimates of B ad b based o the fuctios α ij ad the samples {θ (j) i } N j i=, j =,..., k, the r may be estimated via ˆr = ˆB ˆb α α. (2.7) Meg ad Wog (996) cosider the fuctios α ij = a i a j s= a sr s ν hs, (2.8) which ivolve the ukow r. The atural extesio of (2.5) is ˆr (t+) = ˆB α tˆb αt, with the vector of fuctios α t give by (2.8), where we use ˆr (t) istead of r. 2.2 Usig Cotrol Variates The use of cotrol variates has had may successes i Mote Carlo samplig, ad a particularly importat paper is Owe ad Zhou (2000). This paper cosiders the use of cotrol variates i cojuctio with importace samplig, whe the importace samplig desity is a mixture, ad the paper motivates some of the ideas below. We ow assume that we have samples θ (l) i, i =,..., l, from ν hl,y, l =,..., k, with idepedece across samples, ad that we kow the costats d 2,..., d k. For uity of otatio, we defie d =. As before = l= l ad l / = a l. The estimate ˆB(h, h ) i (2.) is a average of draws from the mixture distributio p a = s= a sν hs,y. However, these are ot 6
8 idepedet ad idetically distributed sice they form a stratified sample: we have exactly s draws from ν hs,y, s =,..., k, a fact which causes o problems. We wish to estimate the itegral I h = l y (θ)ν h (θ)/m h dθ = B(h, h ). Defie the fuctios H j (θ) = l y (θ)ν hj (θ)/m hj l y (θ)ν h (θ)/m h, j = 2,..., k. We have H j (θ) dθ = 0, or equivaletly E pa ( Hj (θ)/p a (θ) ) = 0, where the subscript idicates that the expectatio is take with respect to the mixture distributio p a. Therefore, for every β = (β 2,..., β k ) the estimate Î h,β = k l l= i= l y (θ (l) i )ν h (θ (l) /m h [ ly (θ (l) ( ν hj (θ (l) /m hj ν h (θ (l) )] /m h s= a sν hs,y(θ (l) j=2 β j is ubiased. As writte, this estimate is ot computable, because it ivolves the ormalizig costats m hj, which are ukow, ad also the likelihood l y (θ), which may ot be available. We rewrite it i computable form as Î h,β = k l l= i= ν h (θ (l) i ) j=2 β [ j νhj (θ (l) /d j ν h (θ (l) ] s= a. (2.9) sν hs (θ (l) /d s We would like to use the value of β, call it β opt, that miimizes the variace of Îh,β, but this β opt is geerally ukow. As i Owe ad Zhou (2000), we ca do ordiary liear regressio of Y (h) Y (h) = o predictors Z (j), where ν h (θ (l) s= a, Z (j) sν hs (θ (l) /d s = ν h j (θ (l) /d j ν h (θ (l) s= a, j = 2,..., k, (2.0) sν hs (θ (l) /d s ad all required quatities are available. We the use the least squares estimate ˆβ, i.e. the estimate of I h is Îh, ˆβ. It is easy to see that Îh, ˆβ is simply ˆβ 0, the estimate of the itercept term i the bigger regressio problem where we iclude the itercept term, i.e. Î h, ˆβ = ˆβ 0. (2.) Oe ca show that if the k sequeces are all iid sequeces, the ˆβ coverges to β opt, ad Îh, ˆβ is guarateed to be at least as efficiet as the aive estimator. But whe we have Markov chais this is ot the case, especially if the chais mix at differet rates. I Sectio 2.3 we cosider the estimates ˆβ ad Îh, ˆβ directly. I particular, we give a precise defiitio of the oradom value β that ˆβ is estimatig (it is β (h) lim i equatio (A.3)), ad show that the effect of usig ˆβ istead of β is asymptotically egligible. 7
9 It is atural to cosider the problem of estimatig β opt i the Markov chais settig. Actually, before thikig about miimizig the variace of (2.9) with respect to β, oe should first ote the followig. The costats a s = s /, s =,..., k, used i formig the values Y (h) are sesible i the iid settig, but whe dealig with Markov chais oe would wat to replace s with a effective sample size, as discussed by Meg ad Wog (996). Therefore, the real problem is two-fold: How do we fid optimal (or good) values to use i place of the a s s i the Y (h) s? Usig the Y (h) s based o these values, how do we estimate the value of β that miimizes the variace of (2.9)? Both problems appear to be very difficult. Ituitively at least, the method described here should perform well if the mixig rates of the Markov chais are ot very differet. But i ay case, the results i Sectio 2.3 show that, whether or ot Îh, ˆβ is optimal, it is a cosistet ad asymptotically ormal estimator whose variace ca be estimated cosistetly. Note that if we do ot use cotrol variates, our estimate is just which is exactly (2.). k l ν h (θ (l) s= a, sν hs (θ (l) /d s l= i= Reductio i Variace from Usig the Cotrol Variates of the resposes Y (h) ad predictors Z (j) give by Cosider the liear combiatio L = k a j Z (j) + Y (h). j=2 (We are droppig the subscripts i, l.) A calculatio shows that if h = h the L =, meaig that we have a estimate with zero variace. Similarly, for t = 2,..., k, let L t be the liear combiatio give by k L t = a j Z (j) + (/d t )Y (h) Z (t). j=2 If h = h t, the L t =. Thus if h {h,..., h k }, our estimate of the Bayes factor B(h, h ) has zero variace. This is ot surprisig sice, after all, we are assumig that we kow B(h j, h ), for j =,..., k; however, this does idicate that if we use these cotrol variates, our estimate will be very precise as log as h is close to at least oe of the h j s. This advatage does ot exist if we use the plai estimate (2.). The itercept term i the regressio of the Y (h) s o the Z (j) s is simply a liear combiatio of the form ˆβ 0 = k l l= i= w Y (h). (2.2) 8
10 The w s eed to be computed just oce, so for every ew value of h the calculatio of ˆB reg (h, h ) requires operatios, which is the same as the umber of operatios eeded to compute ˆB(h, h ) give by (2.). To summarize, usig cotrol variates ca greatly improve the accuracy of the estimates, at o (or trivial) icrease i computatioal cost. 2.3 Asymptotic Normality ad Estimatio of the Variace Here we state a result that says that uder certai regularity coditios ˆB reg (h, h ) ad ˆB(h, h ) are asymptotically ormal, ad we show how to estimate the variace. As discussed i Sectio 2.2, we typically prefer that θ (l) i, i =,..., l, be a iid sample for each l. Nevertheless, our results pertai to the more geeral case where these samples arise from Markov chais. (As before, we assume that l / a l (0, ) ad, whe dealig with the asymptotics, strictly speakig we eed to make a distictio betwee l / ad its limit; however we write a l for both as this makes the bookkeepig easier, ad blurrig the distictio ever creates a problem.) Recall that Y (h) ad Z (j), j = 2,..., k, are defied i (2.0) ad, for ecoomy of otatio, we defie Z () to be for all i, l. Let R be the k k matrix defied by ( k ) R jj = E l= a lz (j) ),l Z(j,l, j, j =,..., k. We assume that for the Markov chais a strog law of large umbers holds (sufficiet coditios are give, for example, i Theorem 2 of Athreya, Doss ad Sethurama (996)), ad we refer to the followig coditios. A For each l =,..., k, the chai {θ (l) i } i= is geometrically ergodic. A2 For each l =,..., k, there exists ɛ > 0 such that E ( (h) Y 2+ɛ ) <. A3 The matrix R is osigular. Theorem Uder coditios A ad A2 ad uder coditios A A3 /2( ˆB(h, h ) B(h, h ) ) /2( ˆBreg (h, h ) B(h, h ) ),l d N ( 0, τ 2 (h) ), d N ( 0, σ 2 (h) ), with τ 2 (h) ad σ 2 (h) give by equatios (A.9) ad (A.7) below. The proof is give i the Appedix, which also explais how oe ca estimate the variaces. Theorem assumes that the vector d is kow either because it ca be computed aalytically or because the sample sizes from Stage samplig are so large that this is effectively true. Buta (2009) has obtaied a versio of Theorem that takes ito accout the variability from the first stage. Very briefly, if N is the total sample size from the first stage, ad if N ad i such a way that /N q [0, ), the /2( ˆB(h, h ) B(h, h ) ) d N ( 0, qτ 2 S(h) + τ 2 (h) ), 9
11 where τs 2 (h) is a correctio term that iflates the variace whe the sample sizes i Stage are fiite. Also, she has a similar result for the estimate that uses cotrol variates. The variaces of ˆB reg (h, h ) ad ˆB(h, h ) deped o the choice of the poits h,..., h k, ad fidig good values of k ad h,..., h k is i geeral a very difficult problem. I our experiece, we have foud that the followig method works reasoably well. Havig specified the rage H, we select trial values h,..., h k, ad i pilot rus plot the variace fuctio τ 2 (h), or σ 2 (h); the if we fid a regio where this is uacceptably large, we cover this regio by movig some h l s closer to the regio, or by simply addig ew h l s i that regio, which icreases k. 3 Illustratio There are may classes of models to which the methodology developed i Sectio 2 applies. These iclude the usual parametric models, ad also Bayesia oparametric models ivolvig mixtures of Dirichlet processes (Atoiak (974)), i which oe of the hyperparameters is the so-called total mass parameter very briefly, this hyperparameter cotrols the extet to which the oparametric model differs from a purely parametric model. Aother applicatio ivolves some problems i Bayesia variable selectio, ad this is described i Doss (2007). I this sectio we give a example ivolvig the hierarchical Bayesia model described i Sectio. While models of much greater complexity ca be cosidered, this relatively simple example has the advatage that the data ca be visualized quickly, ad the hyperparameters have a straightforward iterpretatio so that our aalysis ca be easily uderstood. Meta-Aalysis of Data o No-Steroidal Ati-Iflammatory Drugs ad Cacer Risk Over the last decade, a large umber of epidemiological studies have reported a lik betwee itake of osteroidal ati-iflammatory drugs (NSAIDs) ad cacer risk. The studies, which ivolve differet cacers ad differet NSAIDs, strogly suggest that log-term itake of NSAIDs results i a sigificat reductio i cacer risk for all the major types: colo, breast, lug, ad prostate cacer. I Harris et al. (2005) we carry out a comprehesive review of the published scietific literature o NSAIDs ad cacer. Our review spas 90 papers, which ivestigate several NSAIDs ad te cacers, icludig the four major types. We have extracted data from these papers to make tables such as Table below, which pertais to aspiri ad colo cacer. The table gives, for each of 5 studies, the dose, reported risk ratio (for NSAID use vs. o-nsaid use), ad the log reported risk ratio together with a stadard error. (Harris et al. (2005) does ot give these stadard errors; it gives 95% cofidece itervals for the risk ratios, which ca be used to form 95% cofidece itervals for the log risk ratios, which i tur ca be used to determie the stadard errors.) See Harris et al. (2005) for more iformatio o this table ad refereces for the 5 studies. As ca be see from the table, there is some icosistecy i the studies, with some idicatig a large reductio i cacer risk, while others idicate a smaller reductio, i spite of a large dose. This is ot surprisig, sice there is heterogeeity i the patiet ad cotrol pools (characteristics such as age, ethicity, ad health status vary greatly across the studies). It is 0
12 Publicatio PPW RR LRR SE(LRR) Cooga, Friedma, Garcia-Rod., Giovaucci, Giovaucci, LaVecchia, Muscat, Pagaii-Hill, Publicatio PPW RR LRR SE(LRR) Peleg, Reeves, Roseberg, Roseberg, Schr. & Ev., Suh, Thu, Table : Fiftee studies o aspiri ad colo cacer. Here, PPW represets the dose (umber of 325 mg pills per week), RR is the observed risk ratio for aspiri vs. o aspiri, LRR is its logarithm, ad SE(LRR) is a estimate of the stadard error of LRR. therefore of iterest to carry out a meta-aalysis of these studies. Although there have bee a few meta-aalyses i the literature, these have bee rather iformal: all of them have used fixed effects models, ad oe have take ito accout the dose iformatio. Assume temporarily that all studies ivolved the same dose. I a radom-effects metaaalysis, for each study j there is a latet variable, say ψ j, that gives the true log risk ratio that would be obtaied if the sample sizes for that study were ifiite. Oe is the led to a model such as (.2), i which the distributio of the study-specific effect is the ormal distributio i (.2b). Two modellig issues ow arise. The first is that whereas the first ormality assumptio (lie (.2a)) is supported by a theoretical result (the approximate ormality of fuctios of biomial estimates), the secod ormality assumptio (lie (.2b)) is ot but is typically made for the sake of coveiece. I fact, data for several of the other cacers iclude outliers (see Harris et al. (2005)), ad therefore oe may wish to use a t distributio istead, this decisio beig made prior to lookig at the colo cacer data. A importat modellig issue is the to decide o the umber of degrees of freedom. The secod issue is to determie the parameters of the ormal / iverse gamma prior λ c i (.2c). Here c = (c, c 2, c 3, c 4 ), where c, c 2, c 4 > 0 ad c 3 R ad, uder this prior, the distributio of (µ, τ) is as follows: γ = /τ 2 Gamma(c, c 2 ) ad, coditioal o τ, µ N (c 3, c 4 τ 2 ). This prior is commoly used because it is cojugate to the family N (µ, τ 2 ). With appropriate hyperparameters, λ ca be made to be a flat ( oiformative ) prior, ad commo recommedatios are to take c ad c 2 to be very small (so that the gamma distributio o γ is a approximatio to dγ/γ, the improper Jeffrey s prior), ad to take c 3 = 0 ad c 4 to be very large. Ideed, this is the recommedatio made i the examples i the Bugs documetatio ad tutorials. Nevertheless, such a set of hyperparameter values is ow sometimes criticized because for small values of c ad c 2 the gamma distributio gives high probability to large values of γ (equivaletly small values of τ), which greatly ecourages the ψ j s to be all be equal to µ. I other words, this causes excessive shrikage. See for example Gelma (2006). We wish to address both these issues ad ow also would like to take ito accout the dose. Let L j be the log of the observed risk ratio for study j. Let x j be the dose, defied as umber of pills per day (PPW/7), for study j. Cosider the liear model L j = α j + ψ j x j + ε j, j =,..., m, (3.)
13 where α j ad ψ j are parameters specific to study j, ad ε j is ormally distributed with mea 0 ad stadard deviatio σ j (give i Colum 5 of Table ). Note that α j = 0, sice x j = 0 implies that the treatmet ad cotrol groups are idetical, so that L j has mea 0. Thus, (3.) is rewritte as L j = ψ j x j + ε j, from which we see that ψ j has the iterpretatio as the true log risk ratio if the treatmet group had take pill per day. Thus if we let Y j = L j /x j, we have Y j = ψ j + ε j, j =,..., m, where ε j is ormal with mea 0 ad stadard deviatio σ j = σ j /x j. We ow cosider the hierarchical model Y j idep φ ψj, σ j, j =,..., m, (3.2) with the distributio of ψ determied by the followig: coditioal o µ, τ, ψ j iid t v,µ,τ, j =,..., m, (3.3a) (µ, τ) λ c. (3.3b) Lettig θ = (ψ, µ, τ), the likelihood of Y = (Y,..., Y m ) is give by (3.2), ad the prior o θ is give by (3.3), which is idexed by h = (v, c). Loosely speakig, the value of v determies the choice of the model, ad the c s determies the prior. We may therefore fix some value h ad cosider the family of Bayes factors B(h, h ) as h varies. We ca estimate the family if for values h j, j =,..., k, of the hyperparameter h, we have samples from the posterior distributios ν hj,y of the etire vector θ. We cosidered four differet values of c i which c 3 = 0, c 4 = 000 were fixed (sice there does ot seem to be ay cotroversy about these two parameters) ad we took c = c 2 ad let the commo value, deoted ɛ, start at.005 ad icrease by factors of 5 up to.625. We took the values of the degrees of freedom parameter to be v =, 4, 2, for a total of 2 values of the hyperparameter h. For each of these 2 values we ra a Markov chai of legth about millio ad used these to calculate the vector of ratios of ormalizig costats, via the method of Meg ad Wog (996) reviewed i Sectio 2.. We the ra ew Markov chais to produce a sample of size 00 from each of the 2 posteriors. These samples, which were actually subsamples from loger chais (bur-i of 000, the takig every 50 th value), ca be cosidered iid for practical purposes, ad were used to calculate the estimate ˆB reg (h, h ) of Sectio 2.2. We took h to be the specificatio correspodig to v = 4 ad ɛ =.25, sice prelimiary experimets idicated that this value of h gave a relatively high value of m h. Figure shows ˆB reg (h, h ) as v ad ɛ vary. The maximum stadard error over the rage of the graph was less tha.0. The two plots i Figure show differet views of the same graph. From the left plot we see that a t distributio works better tha does a ormal, with the optimal umber of degrees of freedom beig about 3 or 4. The plot also shows clearly that a very small umber of degrees of freedom is ot appropriate. The right plot shows that as ɛ 0, the Bayes factor coverges to 0 rapidly (i particular, fixig v = 4, the recommedatio i the Bugs literature to use ɛ =.00 gives a Bayes factor of about.036, ad for ɛ =.000 it is.0037), givig strog evidece that very small values of ɛ should ot be used. For some models the improper prior dγ/γ gives rise to a proper posterior, ad for others, icludig model (3.3b), it is possible to prove that the posterior is improper (Berger (985, 2
14 Bayes factor 0.6 Bayes factor epsilo df df epsilo Figure : Model assessmet for the aspiri ad colo cacer data. The Bayes factor as a fuctio of v, the umber of degrees of freedom i (3.3a), ad ɛ, the commo value of c ad c 2 i the gamma prior i (3.3b), is show from two differet agles. Here the baselie value of the hyperparameter correspods to v = 4 ad ɛ =.25. p. 87)), so that the pathological behavior resultig from ɛ 0 should be expected. For some more complicated models, whether the posterior is proper or ot is ukow (posterior propriety may eve deped o the data values), ad i these cases, plots such as those i Figure may be useful because they may lead oe to ivestigate a possible posterior impropriety. The choice of hyperparameter h does have a ifluece o our iferece. Let ψ ew deote the latet variable for a future study, a quatity of iterest i meta-aalysis. We cosidered two specificatios of h: (v =, ɛ =.00) ad (v = 4, ɛ =.625). The first choice may be cosidered a default choice, ad the secod a choice guided by cosideratio of the plot of Bayes factors. For the choice (v =, ɛ =.00), we have E(ψ ew ) =.95 ad P (ψ ew > 0) =.04, whereas for (v = 4, ɛ =.625), we have E(ψ ew ) =.87 ad P (ψ ew > 0) =.08. I other words, the t model suggests a stroger aspiri effect, but the iferece is more tetative. Remarks o Computatio ad Accuracy We ow give a idea of how the computatioal effort is distributed. The Stage samples (2 chais, each of legth 0 6 ) took 83 secods to geerate o a 3.8 GHz dual core P4 ruig Liux. By cotrast, the plot i Figure, which ivolves a grid of 4000 poits, took oe hour to compute, i spite of the fact that it is based o a total sample size of oly 200, for what must be cosidered a rather simple model. Clearly usig a very large value of is ot feasible, ad this is why we eed to ru the prelimiary chais i order to get a very accurate estimate of d. We ow illustrate the extet to which ˆB reg (h, h ) is more efficiet tha ˆB(h, h ). Figure 2 gives a plot of the ratio of the variaces of the two estimates as h varies. Both ˆB reg (h, h ) ad ˆB(h, h ) use the desig discussed earlier, which ivolves a total sample size of 200. This figure is obtaied by geeratig 00 Mote Carlo replicates of ˆB reg (h, h ) ad ˆB(h, h ) for 3
15 each h i a grid somewhat more coarse tha the oe used i Figure. As ca be see from the figure, the ratio is about.0 over most of the grid, ad is less tha. over the etire grid, with the exceptio of the values of h for which df =.5 (for those values, the Bayes factor itself is very small, ad the two estimates each have miiscule variaces). We also ote that the ratio is exactly 0 at the desig poits. 0.3 Ratio of variaces df epsilo 8 Figure 2: Improvemet i accuracy that results whe we use cotrol variates. The plot gives Var ( ˆBreg (h, h ) )/ Var ( ˆB(h, h ) ) as h rages over the same regio as i Figure. 4 Discussio Whe faced with ucertaity regardig the choice of hyperparameters, oe approach is to put a prior o the hyperparameters, that is, add oe layer to the hierarchical model. This approach, which goes uder the geeral ame of Bayesia model averagig, ca be very useful. O the other had, there are several good reasos why oe may wat to avoid it. First, the choice of prior o the hyperparameters ca have a great ifluece o the aalysis. Oe is tempted to use a flat prior but, as is well kow, for certai parameters such a prior ca i fact be very iformative. I the illustratio of Sectio 3, a flat prior o the degrees of freedom parameter i effect skews the results i favor of the ormal distributio. Secod, oe may wish to do Bayesia model selectio, as opposed to Bayesia model averagig, because the subsequet iferece is the more parsimoious ad iterpretable. These poits are discussed more fully i George ad Foster (2000) ad Robert (200, Chapter 7). There are a umber of papers that deal with estimatio of Bayes factors via MCMC. Che, Shao ad Ibrahim (2000, Chapter 5) ad Ha ad Carli (200) give a overview of much of this work, ad we metio also the more recet paper by Meg ad Schillig (2002), which is directly relevat. Most of these papers deal with the case of a sigle Bayes factor, whereas the preset paper is cocered with estimatio of large families of Bayes factors. Nevertheless i priciple, ay of the methods i this literature ca be applied to estimate the vector d. 4
16 Especially importat is Kog et al. (2003), whose work we describe i the otatio of the preset paper. The situatio cosidered there has k kow uormalized desities q h,..., q hk, with ukow ormalizig costats m h,..., m hk, respectively, ad for l =,..., k, there from q hl /m hl. The problem is the simultaeous estimatio of all ratios m hl /m hs, l, s =,..., k, or equivaletly, all ratios d l = m hl /m h, l =,..., k. I a certai framework, they show that the maximum likelihood estimate (MLE) of d is obtaied by solvig the system of k equatios is a iid sample θ (l),..., θ (l) l ˆd r = k l q hr (θ (l) s= a sq hs (θ (l) / ˆd, r =,..., k. (4.) s l= i= To put this i our cotext, let q hl (θ) = l y (θ)ν hl (θ), l =,..., k, ad suppose we have iid samples from the ormalized q hl s. We may imagie that we have k + uormalized desities q h,..., q hk, q h, with a sample of size 0 from the ormalized q h. The estimate of m h /m h the becomes k l l= i= ν h (θ (l) s= a sν hs (θ (l) / ˆd s. We recogize this as precisely ˆB(h, h ) i (2.), except that ˆd,..., ˆd k are formed by solvig (4.), i.e., are estimated from the sequeces θ (l),..., θ (l) l, l =,..., k. Thus, ˆB(h, h ) is the same as the estimate of Kog et al. (2003), except that the vector d is precomputed based o previously ru very log chais. Therefore, it is perhaps atural to cosider estimatig d o the basis of these very log Markov chais usig the method of Kog et al. (2003) (as opposed to the method discussed i Sectio 2.), ad we ow discuss this possibility. I their approach, Kog et al. (2003) assume that the q hl s are desities with respect to a domiatig measure µ, ad they obtai the MLE ˆµ of µ (ˆµ is give up to a multiplicative costat). They ca the estimate the ratios m hl /m hs sice the ormalizig costats are kow fuctios of µ. Their approach works if for each l, θ (l),..., θ (l) l is a iid sample. Although they exted it to the case where these are a Markov chai, i the extesio q hl is replaced by the Markov trasitio fuctios P hl (, θ (l), i = 0,..., l, assumed absolutely cotiuous with respect to a sigma-fiite measure µ (precludig Metropolis-Hastigs chais), ad if each of these is kow oly up to a ormalizig costat as is typically the case the the system (4.) becomes a system of k equatios. This is prohibitively difficult to solve. Ta (2004) shows how cotrol variates ca be icorporated i the likelihood framework of Kog et al. (2003). Whe there are r fuctios H j, j =,..., r, for which we kow that Hj dµ = 0, the parameter space is restricted to the set of all sigma-fiite measures satisfyig these r costraits. For the case where θ (l) i, i =,..., l, are iid for each l =,..., k, he obtais the MLE of µ i this reduced parameter space, ad therefore a correspodig estimate of m h /m h, ad shows that this approach gives estimates that are asymptotically equivalet to estimates that use cotrol variates via regressio. His estimate ca still be used whe we have Markov chai draws, but is o loger optimal for the same reaso that the estimate i the preset paper is ot optimal (see the discussio i the middle of Sectio 2.2). The optimal estimator is obtaied by usig the likelihood that arises from the Markov chai structure, ad i the case of geeral Markov chais its calculatio is computatioally very demadig. See 5
17 Ta (2006, 2008) for advaces i this directio. Ta (2004) also obtais results o asymptotic ormality of his estimators that are valid whe we have the iid structure, but it should be possible to obtai versios for Markov chai draws, uder regularity coditios such as those of the preset paper. Owe ad Zhou (2000) use cotrol variates i cojuctio with importace samplig. I the otatio above, they assume that the q hl s are ormalized desities, ad that for every l, they have a iid sample of size l from q hl. As before, let a l = l / s= s. Because these are ormalized desities, each of the k variables q hl (θ)/ ( a k s s= q h s (θ) ) has expectatio uder the distributio s= a sq hs, ad so ca be used as cotrol variates. Their method does ot work directly i our situatio because the q hl = l y (θ)ν hl (θ) are uormalized desities. It is therefore atural to cosider estimatig the ormalizig costats of q hl, l =,..., k, from the Stage rus. Ideed, there are methods for doig this from Markov chai output (Chib (995), Chib ad Jeliazkov (200)). However, estimatio of ratios of ormalizig costats teds to be far more stable tha estimatio of the ormalizig costats themselves. For example, if we wish to estimate m h /m h, the a procedure that ivolves estimatig m h ad m h separately ad the takig the ratio is ot guarateed to provide accurate estimates eve whe h = h, whereas i this case the simple estimate (.) gives a ubiased estimate with zero variace. Moreover, if we ru Markov chais for models idexed by h,..., h k, the estimate of a sigle ratio m hs /m h usig the method of Sectio 2. makes use of all the chais, providig greater stability. The cotrol variates that we use are essetially equivalet to those used by Owe ad Zhou (2000), but their computatio requires oly kowledge of the vector d. R fuctios for producig the estimates ˆB(h, h ) ad ˆB reg (h, h ), ad plots such as those i Figure for the hierarchical model (3.2) (3.3) ad relatives, are available from the author upo request. Ackowledgemets I thak two referees for their careful readig ad Eugeia Buta for helpful commets. I am especially grateful to a associate editor for a very isightful ad thorough report, ad for suggestios that led to several improvemets i the paper. Appedix: Proof of Theorem ad l l l i= Y (h) for l =,..., k ad j = 2,..., k (corollary to Theorem of Uder Coditios A ad A2 we have a cetral limit theorem for the averages l i= Z(j) Y (h) Ibragimov ad Liik (97)); however, there are other sets of coditios that could be used. For example, the ɛ > 0 is ot eeded, i.e. a fiite secod momet suffices if the chai is reversible (Roberts ad Rosethal (997)) for istace if the chai is a Metropolis algorithm, or if it is a two-cycle Gibbs sampler or if it is uiformly ergodic (Cogbur (972)). These are the most commoly used assumptios, but for a fuller discussio of cetral limit theorems for Markov chais see Cha ad Geyer (994). 6
18 We first prove the assertio regardig ˆB reg (h, h ). Let Z be the k matrix whose traspose is Z Z (2),... Z (2), Z (2),2... Z (2) 2,2... Z (2),k... Z (2) = k,k , Z (k),... Z (k), Z (k),2... Z (k) 2,2... Z (k),k... Z (k) k,k ad let Y = Y (h) = Y = ( Y (h),,..., Y (h),, Y (h),2,..., Y (h) 2,2,..., Y (h),k,..., Y (h) k,k). Note: we sometimes suppress the superscript h i order to lighte the otatio. The least squares estimate is ( ˆβ(h) 0, ˆβ (h) ) = (Z Z) Z Y /, assumig that Z Z is osigular. (Here, ˆβ (h) = (h) (h) ( ˆβ 2,..., ˆβ k )). Note that k l k Z (j) ) l l Z(j = Z (j) ) a.s. Z(j R j,j l= i= l= by the strog law of large umbers (clearly Z (j) Z Z/ a.s. R, so by A3 we have l i= are bouded radom variables). Therefore (Z Z) a.s. R ad, i particular, with probability oe, Z Z is osigular for large. We have Z Y = k l= l= l i= Z() Y. l i= Z(k) Y a.s. l= a le ( Z (),l Y,l. l= a le ( Z (k),l Y,l ). ) (A.) (A.2) Let v = (v,..., v k ) be the vector o the right side of (A.2). From (A.) ad (A.2) we have ( ˆβ(h) 0, ˆβ (h) ) a.s. ( β (h) 0,lim, lim) β(h) = R v. (A.3) Cosider (2.9), usig β (h) lim for β. We have Î h,β (h) lim = k l ( Y ) k j=2 β(h) j,lim Z(j) l= i= = ( k a l l= l l i= U ), (A.4) where U = Y j=2 β(h) j,lim Z(j). Let µ l(h) = E(U,l ). By A2, E( U,l 2+ɛ ) < ad therefore, by A we have ( l /2 i= U ) d l µ l (h) N ( 0, σl 2 (h) ), l where σ 2 l (h) = Var(U,l ) + 2 g= Cov(U,l, U +g,l ). (A.5) 7
19 Sice the Markov chais are idepedet, this implies that /2 (Îh,β (h) lim l= a lµ l (h) ) d N ( 0, σ 2 (h) ), (A.6) where Note that (/) l l= i= Y Therefore, from the first equatio i (A.4), proves that l= a lµ l (h) = B(h, h ). σ 2 (h) = l= a lσ 2 l a.s. B(h, h ) ad (/) Îh,β (h) lim To coclude the proof, we cosider the differece betwee E ( Z (j),l ). We have a.s. (h). (A.7) l= l i= Z(j) a.s. 0, j = 2,..., k. B(h, h ) which, together with (A.6), Îh, (h) ad Îh,β (h). Let e(j, l) = ˆβ lim ( ) ) k (Îh, /2 (h) Îh,β (h) = /2 (β ˆβ j,lim ˆβ k l j ) Z (j) lim j=2 l= i= ( k k = (β j,lim ˆβ l [ (j) ] Z j ) a l /2 e(j, l) ), (A.8) l j=2 where the secod equality i (A.8) follows from the fact that l= a le(j, l) = 0. Now, for each l =,..., k, ad j = 2,..., k, by A, /2 l [ (j) ] i= (Z e(j, l))/ l is asymptotically ormal, so i particular is bouded i probability. Together with (A.3), this implies that the right side of (A.8) coverges i probability to 0. We coclude that /2( ˆBreg (h, h ) B(h, h ) ) d N ( 0, σ 2 (h) ). The proof for ˆB(h, h ) is simpler. Let f l = E(Y,l ), ad ote that l= a lf l = B(h, h ). We have /2( ˆB(h, h ) B(h, h ) ) = /2 ( i which l= k l ) Y f l l= i= i= = k l= d N ( 0, τ 2 (h) ), l a /2 i= (Y f l ) l /2 l τ 2 (h) = l= a lτl 2 2 (h), where τl (h) = Var(Y,l) + 2 g= Cov(Y,l, Y +g,l ). (A.9) The variace term σl 2 (h) i (A.5) is the asymptotic variace of the stadardized versio of the average l i= U. If we kew the U s, we could estimate σl 2 (h) by estimatig the iitial segmet of the series i (A.5) usig stadard methods from time series (see Geyer (992)) or via batchig. Now the U s ivolve β (h) lim, which is ukow, but our proof idicates that the effect of usig ˆβ (h) istead of β (h) lim i the expressio for U is asymptotically egligible. 8
20 Refereces Atoiak, C. E. (974). Mixtures of Dirichlet processes with applicatios to Bayesia oparametric problems. The Aals of Statistics Athreya, K. B., Doss, H. ad Sethurama, J. (996). O the covergece of the Markov chai simulatio method. The Aals of Statistics Berger, J. O. (985). Statistical Decisio Theory ad Bayesia Aalysis (Secod Editio). Spriger-Verlag, New York. Buta, E. (2009). Computatioal Methods i Bayesia Sesitivity Aalysis. Ph.D. thesis, Uiversity of Florida. Cha, K. S. ad Geyer, C. J. (994). Commet o Markov chais for explorig posterior distributios. The Aals of Statistics Che, M.-H., Shao, Q.-M. ad Ibrahim, J. G. (2000). Mote Carlo Methods i Bayesia Computatio. Spriger-Verlag, New York. Chib, S. (995). Margial likelihood from the Gibbs output. Joural of the America Statistical Associatio Chib, S. ad Jeliazkov, I. (200). Margial likelihood from the Metropolis-Hastigs output. Joural of the America Statistical Associatio Cogbur, R. (972). The cetral limit theorem for Markov processes. I Proceedigs of the Sixth Berkeley Symposium o Mathematical Statistics ad Probability, Volume 2. Uiversity of Califoria Press, Berkeley. Doss, H. (2007). Bayesia model selectio: Some thoughts o future directios. Statistica Siica Gelma, A. (2006). Prior distributios for variace parameters i hierarchical models. Bayesia Aalysis George, E. I. ad Foster, D. P. (2000). Biometrika Calibratio ad empirical Bayes variable selectio. Geyer, C. J. (992). Practical Markov chai Mote Carlo (Disc: p ). Statistical Sciece Ha, C. ad Carli, B. P. (200). Markov chai Mote Carlo methods for computig Bayes factors: A comparative review. Joural of the America Statistical Associatio Harris, R., Beebe-Dok, J., Doss, H. ad Burr, D. (2005). Aspiri, Ibuprofe ad other osteroidal ati-iflammatory drugs i cacer prevetio: A critical review of o-selective COX-2 blockade. Ocology Reports
21 Hobert, J. P. ad Robert, C. P. (2004). A mixture represetatio of π with applicatios i Markov chai Mote Carlo ad perfect samplig. The Aals of Applied Probability Ibragimov, I. A. ad Liik, Y. V. (97). Idepedet ad Statioary Sequeces of Radom Variables. Wolters-Noordhoff, Groige. Kass, R. E. ad Raftery, A. E. (995). Bayes factors. Joural of the America Statistical Associatio Kog, A., McCullagh, P., Meg, X.-L., Nicolae, D. ad Ta, Z. (2003). A theory of statistical models for Mote Carlo itegratio (with discussio). Joural of the Royal Statistical Society, Series B Meg, X.-L. ad Schillig, S. (2002). Warp bridge samplig. Joural of Computatioal ad Graphical Statistics Meg, X.-L. ad Wog, W. H. (996). Simulatig ratios of ormalizig costats via a simple idetity: A theoretical exploratio. Statistica Siica Owe, A. ad Zhou, Y. (2000). Safe ad effective importace samplig. Joural of the America Statistical Associatio Robert, C. P. (200). The Bayesia Choice: from Decisio-Theoretic Foudatios to Computatioal Implemetatio. Spriger-Verlag, New York. Roberts, G. O. ad Rosethal, J. S. (997). Geometric ergodicity ad hybrid Markov chais. Electroic Commuicatios i Probability Ta, Z. (2004). O a likelihood approach for Mote Carlo itegratio. Joural of the America Statistical Associatio Ta, Z. (2006). Mote Carlo itegratio with acceptace-rejectio. Joural of Computatioal ad Graphical Statistics Ta, Z. (2008). Mote Carlo itegratio with Markov chai. Joural of Statistical Plaig ad Iferece
MATH 320: Probability and Statistics 9. Estimation and Testing of Parameters. Readings: Pruim, Chapter 4
MATH 30: Probability ad Statistics 9. Estimatio ad Testig of Parameters Estimatio ad Testig of Parameters We have bee dealig situatios i which we have full kowledge of the distributio of a radom variable.
More informationThe standard deviation of the mean
Physics 6C Fall 20 The stadard deviatio of the mea These otes provide some clarificatio o the distictio betwee the stadard deviatio ad the stadard deviatio of the mea.. The sample mea ad variace Cosider
More informationOutput Analysis and Run-Length Control
IEOR E4703: Mote Carlo Simulatio Columbia Uiversity c 2017 by Marti Haugh Output Aalysis ad Ru-Legth Cotrol I these otes we describe how the Cetral Limit Theorem ca be used to costruct approximate (1 α%
More informationInfinite Sequences and Series
Chapter 6 Ifiite Sequeces ad Series 6.1 Ifiite Sequeces 6.1.1 Elemetary Cocepts Simply speakig, a sequece is a ordered list of umbers writte: {a 1, a 2, a 3,...a, a +1,...} where the elemets a i represet
More information1 Introduction to reducing variance in Monte Carlo simulations
Copyright c 010 by Karl Sigma 1 Itroductio to reducig variace i Mote Carlo simulatios 11 Review of cofidece itervals for estimatig a mea I statistics, we estimate a ukow mea µ = E(X) of a distributio by
More informationStatistics 511 Additional Materials
Cofidece Itervals o mu Statistics 511 Additioal Materials This topic officially moves us from probability to statistics. We begi to discuss makig ifereces about the populatio. Oe way to differetiate probability
More informationLecture 2: Monte Carlo Simulation
STAT/Q SCI 43: Itroductio to Resamplig ethods Sprig 27 Istructor: Ye-Chi Che Lecture 2: ote Carlo Simulatio 2 ote Carlo Itegratio Assume we wat to evaluate the followig itegratio: e x3 dx What ca we do?
More information1 Inferential Methods for Correlation and Regression Analysis
1 Iferetial Methods for Correlatio ad Regressio Aalysis I the chapter o Correlatio ad Regressio Aalysis tools for describig bivariate cotiuous data were itroduced. The sample Pearso Correlatio Coefficiet
More informationDiscrete Mathematics for CS Spring 2008 David Wagner Note 22
CS 70 Discrete Mathematics for CS Sprig 2008 David Wager Note 22 I.I.D. Radom Variables Estimatig the bias of a coi Questio: We wat to estimate the proportio p of Democrats i the US populatio, by takig
More informationLinear regression. Daniel Hsu (COMS 4771) (y i x T i β)2 2πσ. 2 2σ 2. 1 n. (x T i β y i ) 2. 1 ˆβ arg min. β R n d
Liear regressio Daiel Hsu (COMS 477) Maximum likelihood estimatio Oe of the simplest liear regressio models is the followig: (X, Y ),..., (X, Y ), (X, Y ) are iid radom pairs takig values i R d R, ad Y
More informationGUIDELINES ON REPRESENTATIVE SAMPLING
DRUGS WORKING GROUP VALIDATION OF THE GUIDELINES ON REPRESENTATIVE SAMPLING DOCUMENT TYPE : REF. CODE: ISSUE NO: ISSUE DATE: VALIDATION REPORT DWG-SGL-001 002 08 DECEMBER 2012 Ref code: DWG-SGL-001 Issue
More information6.3 Testing Series With Positive Terms
6.3. TESTING SERIES WITH POSITIVE TERMS 307 6.3 Testig Series With Positive Terms 6.3. Review of what is kow up to ow I theory, testig a series a i for covergece amouts to fidig the i= sequece of partial
More informationRandom Variables, Sampling and Estimation
Chapter 1 Radom Variables, Samplig ad Estimatio 1.1 Itroductio This chapter will cover the most importat basic statistical theory you eed i order to uderstad the ecoometric material that will be comig
More informationA statistical method to determine sample size to estimate characteristic value of soil parameters
A statistical method to determie sample size to estimate characteristic value of soil parameters Y. Hojo, B. Setiawa 2 ad M. Suzuki 3 Abstract Sample size is a importat factor to be cosidered i determiig
More informationBayesian Methods: Introduction to Multi-parameter Models
Bayesia Methods: Itroductio to Multi-parameter Models Parameter: θ = ( θ, θ) Give Likelihood p(y θ) ad prior p(θ ), the posterior p proportioal to p(y θ) x p(θ ) Margial posterior ( θ, θ y) is Iterested
More informationFACULTY OF MATHEMATICAL STUDIES MATHEMATICS FOR PART I ENGINEERING. Lectures
FACULTY OF MATHEMATICAL STUDIES MATHEMATICS FOR PART I ENGINEERING Lectures MODULE 5 STATISTICS II. Mea ad stadard error of sample data. Biomial distributio. Normal distributio 4. Samplig 5. Cofidece itervals
More informationChapter 6 Sampling Distributions
Chapter 6 Samplig Distributios 1 I most experimets, we have more tha oe measuremet for ay give variable, each measuremet beig associated with oe radomly selected a member of a populatio. Hece we eed to
More informationTopic 9: Sampling Distributions of Estimators
Topic 9: Samplig Distributios of Estimators Course 003, 2016 Page 0 Samplig distributios of estimators Sice our estimators are statistics (particular fuctios of radom variables), their distributio ca be
More informationThis is an introductory course in Analysis of Variance and Design of Experiments.
1 Notes for M 384E, Wedesday, Jauary 21, 2009 (Please ote: I will ot pass out hard-copy class otes i future classes. If there are writte class otes, they will be posted o the web by the ight before class
More informationCS434a/541a: Pattern Recognition Prof. Olga Veksler. Lecture 5
CS434a/54a: Patter Recogitio Prof. Olga Veksler Lecture 5 Today Itroductio to parameter estimatio Two methods for parameter estimatio Maimum Likelihood Estimatio Bayesia Estimatio Itroducto Bayesia Decisio
More informationEstimation for Complete Data
Estimatio for Complete Data complete data: there is o loss of iformatio durig study. complete idividual complete data= grouped data A complete idividual data is the oe i which the complete iformatio of
More informationStatistical Inference (Chapter 10) Statistical inference = learn about a population based on the information provided by a sample.
Statistical Iferece (Chapter 10) Statistical iferece = lear about a populatio based o the iformatio provided by a sample. Populatio: The set of all values of a radom variable X of iterest. Characterized
More informationResampling Methods. X (1/2), i.e., Pr (X i m) = 1/2. We order the data: X (1) X (2) X (n). Define the sample median: ( n.
Jauary 1, 2019 Resamplig Methods Motivatio We have so may estimators with the property θ θ d N 0, σ 2 We ca also write θ a N θ, σ 2 /, where a meas approximately distributed as Oce we have a cosistet estimator
More informationLecture 19: Convergence
Lecture 19: Covergece Asymptotic approach I statistical aalysis or iferece, a key to the success of fidig a good procedure is beig able to fid some momets ad/or distributios of various statistics. I may
More informationIf, for instance, we were required to test whether the population mean μ could be equal to a certain value μ
STATISTICAL INFERENCE INTRODUCTION Statistical iferece is that brach of Statistics i which oe typically makes a statemet about a populatio based upo the results of a sample. I oesample testig, we essetially
More information4. Partial Sums and the Central Limit Theorem
1 of 10 7/16/2009 6:05 AM Virtual Laboratories > 6. Radom Samples > 1 2 3 4 5 6 7 4. Partial Sums ad the Cetral Limit Theorem The cetral limit theorem ad the law of large umbers are the two fudametal theorems
More informationElement sampling: Part 2
Chapter 4 Elemet samplig: Part 2 4.1 Itroductio We ow cosider uequal probability samplig desigs which is very popular i practice. I the uequal probability samplig, we ca improve the efficiecy of the resultig
More informationRates of Convergence by Moduli of Continuity
Rates of Covergece by Moduli of Cotiuity Joh Duchi: Notes for Statistics 300b March, 017 1 Itroductio I this ote, we give a presetatio showig the importace, ad relatioship betwee, the modulis of cotiuity
More informationCEE 522 Autumn Uncertainty Concepts for Geotechnical Engineering
CEE 5 Autum 005 Ucertaity Cocepts for Geotechical Egieerig Basic Termiology Set A set is a collectio of (mutually exclusive) objects or evets. The sample space is the (collectively exhaustive) collectio
More information6.867 Machine learning, lecture 7 (Jaakkola) 1
6.867 Machie learig, lecture 7 (Jaakkola) 1 Lecture topics: Kerel form of liear regressio Kerels, examples, costructio, properties Liear regressio ad kerels Cosider a slightly simpler model where we omit
More informationProperties and Hypothesis Testing
Chapter 3 Properties ad Hypothesis Testig 3.1 Types of data The regressio techiques developed i previous chapters ca be applied to three differet kids of data. 1. Cross-sectioal data. 2. Time series data.
More informationEECS564 Estimation, Filtering, and Detection Hwk 2 Solns. Winter p θ (z) = (2θz + 1 θ), 0 z 1
EECS564 Estimatio, Filterig, ad Detectio Hwk 2 Sols. Witer 25 4. Let Z be a sigle observatio havig desity fuctio where. p (z) = (2z + ), z (a) Assumig that is a oradom parameter, fid ad plot the maximum
More informationChapter 3. Strong convergence. 3.1 Definition of almost sure convergence
Chapter 3 Strog covergece As poited out i the Chapter 2, there are multiple ways to defie the otio of covergece of a sequece of radom variables. That chapter defied covergece i probability, covergece i
More informationStochastic Simulation
Stochastic Simulatio 1 Itroductio Readig Assigmet: Read Chapter 1 of text. We shall itroduce may of the key issues to be discussed i this course via a couple of model problems. Model Problem 1 (Jackso
More informationIt should be unbiased, or approximately unbiased. Variance of the variance estimator should be small. That is, the variance estimator is stable.
Chapter 10 Variace Estimatio 10.1 Itroductio Variace estimatio is a importat practical problem i survey samplig. Variace estimates are used i two purposes. Oe is the aalytic purpose such as costructig
More informationFrequentist Inference
Frequetist Iferece The topics of the ext three sectios are useful applicatios of the Cetral Limit Theorem. Without kowig aythig about the uderlyig distributio of a sequece of radom variables {X i }, for
More informationIntroductory statistics
CM9S: Machie Learig for Bioiformatics Lecture - 03/3/06 Itroductory statistics Lecturer: Sriram Sakararama Scribe: Sriram Sakararama We will provide a overview of statistical iferece focussig o the key
More informationCHAPTER 10 INFINITE SEQUENCES AND SERIES
CHAPTER 10 INFINITE SEQUENCES AND SERIES 10.1 Sequeces 10.2 Ifiite Series 10.3 The Itegral Tests 10.4 Compariso Tests 10.5 The Ratio ad Root Tests 10.6 Alteratig Series: Absolute ad Coditioal Covergece
More informationDepartment of Mathematics
Departmet of Mathematics Ma 3/103 KC Border Itroductio to Probability ad Statistics Witer 2017 Lecture 19: Estimatio II Relevat textbook passages: Larse Marx [1]: Sectios 5.2 5.7 19.1 The method of momets
More informationSample Size Estimation in the Proportional Hazards Model for K-sample or Regression Settings Scott S. Emerson, M.D., Ph.D.
ample ie Estimatio i the Proportioal Haards Model for K-sample or Regressio ettigs cott. Emerso, M.D., Ph.D. ample ie Formula for a Normally Distributed tatistic uppose a statistic is kow to be ormally
More informationMachine Learning Brett Bernstein
Machie Learig Brett Berstei Week 2 Lecture: Cocept Check Exercises Starred problems are optioal. Excess Risk Decompositio 1. Let X = Y = {1, 2,..., 10}, A = {1,..., 10, 11} ad suppose the data distributio
More information7-1. Chapter 4. Part I. Sampling Distributions and Confidence Intervals
7-1 Chapter 4 Part I. Samplig Distributios ad Cofidece Itervals 1 7- Sectio 1. Samplig Distributio 7-3 Usig Statistics Statistical Iferece: Predict ad forecast values of populatio parameters... Test hypotheses
More informationEconomics 241B Relation to Method of Moments and Maximum Likelihood OLSE as a Maximum Likelihood Estimator
Ecoomics 24B Relatio to Method of Momets ad Maximum Likelihood OLSE as a Maximum Likelihood Estimator Uder Assumptio 5 we have speci ed the distributio of the error, so we ca estimate the model parameters
More informationDS 100: Principles and Techniques of Data Science Date: April 13, Discussion #10
DS 00: Priciples ad Techiques of Data Sciece Date: April 3, 208 Name: Hypothesis Testig Discussio #0. Defie these terms below as they relate to hypothesis testig. a) Data Geeratio Model: Solutio: A set
More informationDouble Stage Shrinkage Estimator of Two Parameters. Generalized Exponential Distribution
Iteratioal Mathematical Forum, Vol., 3, o. 3, 3-53 HIKARI Ltd, www.m-hikari.com http://dx.doi.org/.9/imf.3.335 Double Stage Shrikage Estimator of Two Parameters Geeralized Expoetial Distributio Alaa M.
More informationDistribution of Random Samples & Limit theorems
STAT/MATH 395 A - PROBABILITY II UW Witer Quarter 2017 Néhémy Lim Distributio of Radom Samples & Limit theorems 1 Distributio of i.i.d. Samples Motivatig example. Assume that the goal of a study is to
More informationKLMED8004 Medical statistics. Part I, autumn Estimation. We have previously learned: Population and sample. New questions
We have previously leared: KLMED8004 Medical statistics Part I, autum 00 How kow probability distributios (e.g. biomial distributio, ormal distributio) with kow populatio parameters (mea, variace) ca give
More informationProbability, Expectation Value and Uncertainty
Chapter 1 Probability, Expectatio Value ad Ucertaity We have see that the physically observable properties of a quatum system are represeted by Hermitea operators (also referred to as observables ) such
More information7.1 Convergence of sequences of random variables
Chapter 7 Limit Theorems Throughout this sectio we will assume a probability space (, F, P), i which is defied a ifiite sequece of radom variables (X ) ad a radom variable X. The fact that for every ifiite
More informationRegression with an Evaporating Logarithmic Trend
Regressio with a Evaporatig Logarithmic Tred Peter C. B. Phillips Cowles Foudatio, Yale Uiversity, Uiversity of Aucklad & Uiversity of York ad Yixiao Su Departmet of Ecoomics Yale Uiversity October 5,
More informationConvergence of random variables. (telegram style notes) P.J.C. Spreij
Covergece of radom variables (telegram style otes).j.c. Spreij this versio: September 6, 2005 Itroductio As we kow, radom variables are by defiitio measurable fuctios o some uderlyig measurable space
More informationChapter 6 Principles of Data Reduction
Chapter 6 for BST 695: Special Topics i Statistical Theory. Kui Zhag, 0 Chapter 6 Priciples of Data Reductio Sectio 6. Itroductio Goal: To summarize or reduce the data X, X,, X to get iformatio about a
More informationt distribution [34] : used to test a mean against an hypothesized value (H 0 : µ = µ 0 ) or the difference
EXST30 Backgroud material Page From the textbook The Statistical Sleuth Mea [0]: I your text the word mea deotes a populatio mea (µ) while the work average deotes a sample average ( ). Variace [0]: The
More informationExponential Families and Bayesian Inference
Computer Visio Expoetial Families ad Bayesia Iferece Lecture Expoetial Families A expoetial family of distributios is a d-parameter family f(x; havig the followig form: f(x; = h(xe g(t T (x B(, (. where
More informationAAEC/ECON 5126 FINAL EXAM: SOLUTIONS
AAEC/ECON 5126 FINAL EXAM: SOLUTIONS SPRING 2015 / INSTRUCTOR: KLAUS MOELTNER This exam is ope-book, ope-otes, but please work strictly o your ow. Please make sure your ame is o every sheet you re hadig
More informationStatistical Inference Based on Extremum Estimators
T. Rotheberg Fall, 2007 Statistical Iferece Based o Extremum Estimators Itroductio Suppose 0, the true value of a p-dimesioal parameter, is kow to lie i some subset S R p : Ofte we choose to estimate 0
More informationSequences. Notation. Convergence of a Sequence
Sequeces A sequece is essetially just a list. Defiitio (Sequece of Real Numbers). A sequece of real umbers is a fuctio Z (, ) R for some real umber. Do t let the descriptio of the domai cofuse you; it
More informationStat 421-SP2012 Interval Estimation Section
Stat 41-SP01 Iterval Estimatio Sectio 11.1-11. We ow uderstad (Chapter 10) how to fid poit estimators of a ukow parameter. o However, a poit estimate does ot provide ay iformatio about the ucertaity (possible
More informationOn an Application of Bayesian Estimation
O a Applicatio of ayesia Estimatio KIYOHARU TANAKA School of Sciece ad Egieerig, Kiki Uiversity, Kowakae, Higashi-Osaka, JAPAN Email: ktaaka@ifokidaiacjp EVGENIY GRECHNIKOV Departmet of Mathematics, auma
More information1.010 Uncertainty in Engineering Fall 2008
MIT OpeCourseWare http://ocw.mit.edu.00 Ucertaity i Egieerig Fall 2008 For iformatio about citig these materials or our Terms of Use, visit: http://ocw.mit.edu.terms. .00 - Brief Notes # 9 Poit ad Iterval
More informationStatisticians use the word population to refer the total number of (potential) observations under consideration
6 Samplig Distributios Statisticias use the word populatio to refer the total umber of (potetial) observatios uder cosideratio The populatio is just the set of all possible outcomes i our sample space
More informationTopic 9: Sampling Distributions of Estimators
Topic 9: Samplig Distributios of Estimators Course 003, 2018 Page 0 Samplig distributios of estimators Sice our estimators are statistics (particular fuctios of radom variables), their distributio ca be
More informationMonte Carlo Integration
Mote Carlo Itegratio I these otes we first review basic umerical itegratio methods (usig Riema approximatio ad the trapezoidal rule) ad their limitatios for evaluatig multidimesioal itegrals. Next we itroduce
More informationAdvanced Stochastic Processes.
Advaced Stochastic Processes. David Gamarik LECTURE 2 Radom variables ad measurable fuctios. Strog Law of Large Numbers (SLLN). Scary stuff cotiued... Outlie of Lecture Radom variables ad measurable fuctios.
More informationSimulation. Two Rule For Inverting A Distribution Function
Simulatio Two Rule For Ivertig A Distributio Fuctio Rule 1. If F(x) = u is costat o a iterval [x 1, x 2 ), the the uiform value u is mapped oto x 2 through the iversio process. Rule 2. If there is a jump
More informationOPTIMAL ALGORITHMS -- SUPPLEMENTAL NOTES
OPTIMAL ALGORITHMS -- SUPPLEMENTAL NOTES Peter M. Maurer Why Hashig is θ(). As i biary search, hashig assumes that keys are stored i a array which is idexed by a iteger. However, hashig attempts to bypass
More informationTests of Hypotheses Based on a Single Sample (Devore Chapter Eight)
Tests of Hypotheses Based o a Sigle Sample Devore Chapter Eight MATH-252-01: Probability ad Statistics II Sprig 2018 Cotets 1 Hypothesis Tests illustrated with z-tests 1 1.1 Overview of Hypothesis Testig..........
More informationPSYCHOLOGICAL RESEARCH (PYC 304-C) Lecture 9
Hypothesis testig PSYCHOLOGICAL RESEARCH (PYC 34-C Lecture 9 Statistical iferece is that brach of Statistics i which oe typically makes a statemet about a populatio based upo the results of a sample. I
More informationII. Descriptive Statistics D. Linear Correlation and Regression. 1. Linear Correlation
II. Descriptive Statistics D. Liear Correlatio ad Regressio I this sectio Liear Correlatio Cause ad Effect Liear Regressio 1. Liear Correlatio Quatifyig Liear Correlatio The Pearso product-momet correlatio
More informationEXAMINATIONS OF THE ROYAL STATISTICAL SOCIETY
EXAMINATIONS OF THE ROYAL STATISTICAL SOCIETY GRADUATE DIPLOMA, 016 MODULE : Statistical Iferece Time allowed: Three hours Cadidates should aswer FIVE questios. All questios carry equal marks. The umber
More informationProblem Set 4 Due Oct, 12
EE226: Radom Processes i Systems Lecturer: Jea C. Walrad Problem Set 4 Due Oct, 12 Fall 06 GSI: Assae Gueye This problem set essetially reviews detectio theory ad hypothesis testig ad some basic otios
More informationTopic 9: Sampling Distributions of Estimators
Topic 9: Samplig Distributios of Estimators Course 003, 2018 Page 0 Samplig distributios of estimators Sice our estimators are statistics (particular fuctios of radom variables), their distributio ca be
More information6. Sufficient, Complete, and Ancillary Statistics
Sufficiet, Complete ad Acillary Statistics http://www.math.uah.edu/stat/poit/sufficiet.xhtml 1 of 7 7/16/2009 6:13 AM Virtual Laboratories > 7. Poit Estimatio > 1 2 3 4 5 6 6. Sufficiet, Complete, ad Acillary
More informationMASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.265/15.070J Fall 2013 Lecture 2 9/9/2013. Large Deviations for i.i.d. Random Variables
MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.265/15.070J Fall 2013 Lecture 2 9/9/2013 Large Deviatios for i.i.d. Radom Variables Cotet. Cheroff boud usig expoetial momet geeratig fuctios. Properties of a momet
More informationLecture 9: September 19
36-700: Probability ad Mathematical Statistics I Fall 206 Lecturer: Siva Balakrisha Lecture 9: September 9 9. Review ad Outlie Last class we discussed: Statistical estimatio broadly Pot estimatio Bias-Variace
More informationStatistical inference: example 1. Inferential Statistics
Statistical iferece: example 1 Iferetial Statistics POPULATION SAMPLE A clothig store chai regularly buys from a supplier large quatities of a certai piece of clothig. Each item ca be classified either
More informationR. van Zyl 1, A.J. van der Merwe 2. Quintiles International, University of the Free State
Bayesia Cotrol Charts for the Two-parameter Expoetial Distributio if the Locatio Parameter Ca Take o Ay Value Betwee Mius Iity ad Plus Iity R. va Zyl, A.J. va der Merwe 2 Quitiles Iteratioal, ruaavz@gmail.com
More informationU8L1: Sec Equations of Lines in R 2
MCVU U8L: Sec. 8.9. Equatios of Lies i R Review of Equatios of a Straight Lie (-D) Cosider the lie passig through A (-,) with slope, as show i the diagram below. I poit slope form, the equatio of the lie
More informationInvestigating the Significance of a Correlation Coefficient using Jackknife Estimates
Iteratioal Joural of Scieces: Basic ad Applied Research (IJSBAR) ISSN 2307-4531 (Prit & Olie) http://gssrr.org/idex.php?joural=jouralofbasicadapplied ---------------------------------------------------------------------------------------------------------------------------
More informationLecture 3. Properties of Summary Statistics: Sampling Distribution
Lecture 3 Properties of Summary Statistics: Samplig Distributio Mai Theme How ca we use math to justify that our umerical summaries from the sample are good summaries of the populatio? Lecture Summary
More informationEstimation of a population proportion March 23,
1 Social Studies 201 Notes for March 23, 2005 Estimatio of a populatio proportio Sectio 8.5, p. 521. For the most part, we have dealt with meas ad stadard deviatios this semester. This sectio of the otes
More informationEcon 325 Notes on Point Estimator and Confidence Interval 1 By Hiro Kasahara
Poit Estimator Eco 325 Notes o Poit Estimator ad Cofidece Iterval 1 By Hiro Kasahara Parameter, Estimator, ad Estimate The ormal probability desity fuctio is fully characterized by two costats: populatio
More informationChapter 2 The Monte Carlo Method
Chapter 2 The Mote Carlo Method The Mote Carlo Method stads for a broad class of computatioal algorithms that rely o radom sampligs. It is ofte used i physical ad mathematical problems ad is most useful
More informationJanuary 25, 2017 INTRODUCTION TO MATHEMATICAL STATISTICS
Jauary 25, 207 INTRODUCTION TO MATHEMATICAL STATISTICS Abstract. A basic itroductio to statistics assumig kowledge of probability theory.. Probability I a typical udergraduate problem i probability, we
More informationBinomial Distribution
0.0 0.5 1.0 1.5 2.0 2.5 3.0 0 1 2 3 4 5 6 7 0.0 0.5 1.0 1.5 2.0 2.5 3.0 Overview Example: coi tossed three times Defiitio Formula Recall that a r.v. is discrete if there are either a fiite umber of possible
More informationn n i=1 Often we also need to estimate the variance. Below are three estimators each of which is optimal in some sense: n 1 i=1 k=1 i=1 k=1 i=1 k=1
MATH88T Maria Camero Cotets Basic cocepts of statistics Estimators, estimates ad samplig distributios 2 Ordiary least squares estimate 3 3 Maximum lielihood estimator 3 4 Bayesia estimatio Refereces 9
More informationBasics of Probability Theory (for Theory of Computation courses)
Basics of Probability Theory (for Theory of Computatio courses) Oded Goldreich Departmet of Computer Sciece Weizma Istitute of Sciece Rehovot, Israel. oded.goldreich@weizma.ac.il November 24, 2008 Preface.
More informationLecture 11 October 27
STATS 300A: Theory of Statistics Fall 205 Lecture October 27 Lecturer: Lester Mackey Scribe: Viswajith Veugopal, Vivek Bagaria, Steve Yadlowsky Warig: These otes may cotai factual ad/or typographic errors..
More information7.1 Convergence of sequences of random variables
Chapter 7 Limit theorems Throughout this sectio we will assume a probability space (Ω, F, P), i which is defied a ifiite sequece of radom variables (X ) ad a radom variable X. The fact that for every ifiite
More informationThere is no straightforward approach for choosing the warmup period l.
B. Maddah INDE 504 Discrete-Evet Simulatio Output Aalysis () Statistical Aalysis for Steady-State Parameters I a otermiatig simulatio, the iterest is i estimatig the log ru steady state measures of performace.
More informationGoodness-of-Fit Tests and Categorical Data Analysis (Devore Chapter Fourteen)
Goodess-of-Fit Tests ad Categorical Data Aalysis (Devore Chapter Fourtee) MATH-252-01: Probability ad Statistics II Sprig 2019 Cotets 1 Chi-Squared Tests with Kow Probabilities 1 1.1 Chi-Squared Testig................
More informationCSE 527, Additional notes on MLE & EM
CSE 57 Lecture Notes: MLE & EM CSE 57, Additioal otes o MLE & EM Based o earlier otes by C. Grat & M. Narasimha Itroductio Last lecture we bega a examiatio of model based clusterig. This lecture will be
More informationApproximate Confidence Interval for the Reciprocal of a Normal Mean with a Known Coefficient of Variation
Metodološki zvezki, Vol. 13, No., 016, 117-130 Approximate Cofidece Iterval for the Reciprocal of a Normal Mea with a Kow Coefficiet of Variatio Wararit Paichkitkosolkul 1 Abstract A approximate cofidece
More informationCS284A: Representations and Algorithms in Molecular Biology
CS284A: Represetatios ad Algorithms i Molecular Biology Scribe Notes o Lectures 3 & 4: Motif Discovery via Eumeratio & Motif Represetatio Usig Positio Weight Matrix Joshua Gervi Based o presetatios by
More informationECE 901 Lecture 12: Complexity Regularization and the Squared Loss
ECE 90 Lecture : Complexity Regularizatio ad the Squared Loss R. Nowak 5/7/009 I the previous lectures we made use of the Cheroff/Hoeffdig bouds for our aalysis of classifier errors. Hoeffdig s iequality
More informationChapter 8: Estimating with Confidence
Chapter 8: Estimatig with Cofidece Sectio 8.2 The Practice of Statistics, 4 th editio For AP* STARNES, YATES, MOORE Chapter 8 Estimatig with Cofidece 8.1 Cofidece Itervals: The Basics 8.2 8.3 Estimatig
More informationBasis for simulation techniques
Basis for simulatio techiques M. Veeraraghava, March 7, 004 Estimatio is based o a collectio of experimetal outcomes, x, x,, x, where each experimetal outcome is a value of a radom variable. x i. Defiitios
More information6 Sample Size Calculations
6 Sample Size Calculatios Oe of the major resposibilities of a cliical trial statisticia is to aid the ivestigators i determiig the sample size required to coduct a study The most commo procedure for determiig
More informationBecause it tests for differences between multiple pairs of means in one test, it is called an omnibus test.
Math 308 Sprig 018 Classes 19 ad 0: Aalysis of Variace (ANOVA) Page 1 of 6 Itroductio ANOVA is a statistical procedure for determiig whether three or more sample meas were draw from populatios with equal
More informationENGI 4421 Confidence Intervals (Two Samples) Page 12-01
ENGI 44 Cofidece Itervals (Two Samples) Page -0 Two Sample Cofidece Iterval for a Differece i Populatio Meas [Navidi sectios 5.4-5.7; Devore chapter 9] From the cetral limit theorem, we kow that, for sufficietly
More information