Estimation of Large Families of Bayes Factors from Markov Chain Output

Size: px
Start display at page:

Download "Estimation of Large Families of Bayes Factors from Markov Chain Output"

Transcription

1 Estimatio of Large Families of Bayes Factors from Markov Chai Output Hai Doss Uiversity of Florida Abstract We cosider situatios i Bayesia aalysis where the prior is idexed by a hyperparameter takig o a cotiuum of values. We distiguish some arbitrary value of the hyperparameter, ad cosider the problem of estimatig the Bayes factor for the model idexed by the hyperparameter vs. the model specified by the distiguished poit, as the hyperparameter varies. We assume that we have Markov chai output from the posterior for a fiite umber of the priors, ad develop a method for efficietly computig estimates of the etire family of Bayes factors. As a applicatio of the ideas, we cosider some commoly used hierarchical Bayesia models ad show that the parametric assumptios i these models ca be recast as assumptios regardig the prior. Therefore, our method ca be used as a model selectio criterio i a Bayesia framework. We illustrate our methodology through a detailed example ivolvig Bayesia model selectio. Key words ad phrases: Bayes factors, cotrol variates, ergodicity, importace samplig, Markov chai Mote Carlo

2 Itroductio Suppose we have a data vector Y whose distributio has desity p θ, for some ukow θ Θ. Let {ν h, h H} be a family of prior desities o θ that we are cotemplatig. The selectio of a particular prior from the family is importat i Bayesia data aalysis, ad whe makig this choice oe will ofte wat to cosider the margial likelihood of the data uder the prior ν h, give by m h (y) = l y (θ)ν h (θ) dθ, as h varies over the hyperparameter space H. Here, l y (θ) = p θ (y) is the likelihood fuctio. Values of h for which m h (y) is relatively low may be cosidered poor choices, ad cosideratio of the family {m h (y), h H} may be helpful i arrowig the search of priors to use. It is therefore useful to have a method for computig the family {m h (y), h H}. For the purpose of model selectio, if c is a fixed costat, the iformatio give by {m h (y), h H} ad {c m h (y), h H} is the same. From a computatioal ad statistical poit of view however, it is usually easier to fix a particular hyperparameter value h ad focus o {m h (y)/m h (y), h H}. Give two hyperparameter values h ad h, the quatity B(h, h ) = m h /m h is called the Bayes factor of the model idexed by h vs. the model idexed by h (we write m h istead of m h (y) from ow o). I this paper we preset a method for estimatig the family {B(h, h ), h H}. We have i mid situatios where B(h, h ) caot be obtaied aalytically ad, moreover, we eed to calculate B(h, h ) for a large set of h s, so that computatioal efficiecy is essetial. Our approach requires that there are k hyperparameter values h,..., h k, ad for l =,..., k, we are able to get a sample θ (l) i, i =,..., l, from ν hl,y, the posterior desity of θ give Y = y, assumig that the prior is ν hl. To set the framework, cosider the trivial case where k =, ad we have a sample from the posterior ν h,y geerated by a ergodic Markov chai. Our objective is to estimate {B(h, h ), h H}. For ay h such that ν h (θ) = 0 wheever ν h (θ) = 0, we have i= ν h (θ ν h (θ νh (θ) ν h (θ) ν h,y(θ) dθ (.) = m h ly (θ)ν h (θ)/m h ν h,y(θ) dθ m h l y (θ)ν h (θ)/m h = m h m h νh,y (θ) ν h,y(θ) ν h,y(θ) dθ = m h m h. Therefore, the left side of (.) is a cosistet estimate of the Bayes factor B(h, h ). To fix ideas, cosider as a simple example the followig stadard three-level hierarchical model: coditioal o ψ j, Y j idep φ ψj,σ j, j =,..., m (.2a) coditioal o µ, τ, ψ j iid φ µ,τ, j =,..., m (.2b) (µ, τ) λ c,c 2,c 3,c 4 (.2c) where φ m,s deotes the desity of the ormal distributio with mea m ad stadard deviatio s. I (.2a), the σ i s are assumed kow. I (.2c), λ c,c 2,c 3,c 4 is the ormal / iverse gamma distributio idexed by four hyperparameters (see Sectio 3). This is a very commoly used

3 iid model but, as we discuss later, i some situatios it is preferable to replace (.2b) with ψ j t v,µ,τ, where t v,µ,τ is the desity of the t distributio with v degrees of freedom, locatio µ ad scale τ. I this case, cosider ow the estimate i the left side of (.). The likelihood of (µ, τ) is m m l Y (µ, τ) =... φ ψj,σ j (Y j ) t v,µ,τ (ψ j ) dψ... dψ m. j= This likelihood caot be computed i closed form, ad therefore its cacellatio i (.) gives a o-trivial simplificatio: calculatio of the estimate requires oly the ratio of the desities of the priors ad ot the posteriors. Cosider (.2) with t v,µ,τ istead of φ µ,τ i the middle stage, ad suppose ow that we would like to select v, with the choice v = sigifyig the choice of the ormal distributio φ µ,τ. The distributio of Y is determied by ψ = (ψ,..., ψ m ). A completely equivalet way of describig the model is therefore through the two-level hierarchy i which we let θ = (ψ, µ, τ), ad stipulate: coditioal o θ, j= Y j idep φ ψj,σ j, (ψ, µ, τ) ν h, j =,..., m where ν h (ψ, µ, τ) = ( m j= t v,µ,τ(ψ j ) ) λ c,c 2,c 3,c 4 (µ, τ). Here, the hyperparameter is h = (v, c, c 2, c 3, c 4 ), which icludes the umber of degrees of freedom. Estimatio of the family of Bayes factors {B(h, h ), h H} therefore eables a model selectio step. We ow discuss briefly the accuracy of the estimate o the left side of (.). Whe ν h is early sigular with respect to ν h over the regio where the θ i s are likely to be, the estimate will be ustable. (Formally, the estimate will satisfy a cetral limit theorem if the chai mixes fast eough ad the radom variable ν h (θ)/ν h (θ) (where θ ν h,y) has a high eough momet. This is discussed i more detail i Sectio 2.3.) From a practical poit of view, this meas that there is effectively a radius aroud h withi which oe ca safely move. I all but the very simplest models, the dimesio of H is greater tha, ad therefore estimatio of the Bayes factor as h rages over H raises serious computatioal difficulties, ad it is essetial that for each h, the estimate of B(h, h ) is both accurate ad ca be computed quickly. Our approach is to select k hyperparameter poits h,..., h k, ad get Markov chai samples from ν hl,y for each l =,..., k. The prior ν h i the deomiator of the left side of (.) is replaced by a mixture w ν h + + w k ν hk, with appropriately chose weights. We show how judiciously chose cotrol variates ca be used i cojuctio with multiple Markov chai streams to produce accurate estimates eve with small samples, so that the et result is a computatioally feasible method for producig reliable estimates of the Bayes factors for a wide rage of hyperparameter values. Our approach is motivated by ad uses ideas developed i Kog et al. (2003), which deals with the situatio where we have idepedet samples from k uormalized desities, ad we wish to estimate all possible ratios of the k ormalizig costats. Owe ad Zhou (2000) ad Ta (2004) also discuss the use of cotrol variates to icrease the accuracy of Mote Carlo estimates. I Sectio 4 we retur to these three papers ad discuss i detail how our approach fits i the cotext of this work. The paper is orgaized as follows. Sectio 2 cotais the mai methodological developmet; there, we preset our method for estimatig the family of Bayes factors ad state supportig theoretical results. Sectio 3 illustrates the methodology through a detailed example that ivolves a umber of issues, 2

4 icludig selectio of the parametric family i the model. Sectio 4 gives a discussio of other possible approaches ad related work, ad the Appedix gives the proof of the mai theoretical result of the paper. 2 Estimatio of the Family of Bayes Factors Suppose that for l =,..., k, we have Markov chai Mote Carlo (MCMC) samples θ (l) i, i =,..., l from the posterior desity of θ give Y = y, assumig that the prior is ν hl, havig the form ν hl,y(θ) = l y (θ)ν hl (θ)/m hl. We assume that the k sequeces are idepedet of oe aother. We will ot assume we kow ay of the m hl s. However, we ow explai how kowledge of the Bayes factors m hl /m h, for l = 2,..., k would result i two importat beefits. If we kew these Bayes factors we could the form the estimate ˆB(h, h ) = k l ν h (θ (l) s=. (2.) sν hs (θ (l) m h /m hs l= i= Let = s= s, ad assume that s / a s, s =,..., k. We the have ˆB(h, h ) = = a.s. k l l= i= k s= sl y (θ (l) i m h l m h l l= i= k m h m h l= l y (θ (l) )ν h (θ (l) s= i )ν hs (θ (l) i l ν h,y(θ (l) s ν hs,y(θ (l) )m h /m hs a l ν h,y (θ) s= a sν hs,y(θ) ν h l,y(θ) dθ = m h m h. (2.2) The almost sure covergece i (2.2) occurs uder miimal coditios o the Markov chais θ (l) i, i =,..., l. Asymptotic ormality requires more restrictive coditios, ad is discussed i Sectio 2.3. To compute ˆB(h, h ), the quatities s= sν hs (θ (l) m h /m hs are calculated oce, ad stored. The, for every ew value of h, the computatio of ˆB(h, h ) requires takig ratios ad a sum. Sice this is to be doe for a large umber of h s, it is essetial that for each l, the sequece θ (l) i, i =,..., l be as idepedet as possible, so that the value of be made as small as possible. We ow briefly recall the use of cotrol variates i Mote Carlo samplig. Suppose we wish to estimate the expected value of a radom variable Y, ad we ca fid a radom variable Z that is correlated with Y, ad such that E(Z) is kow (without loss of geerality, E(Z) = 0). The for ay β, the estimate Y βz is a ubiased estimate of E(Y ), ad the value of β miimizig the variace of Y βz is β = Cov(Y, Z)/Var(Z). The idea may be used whe there are several variables Z,..., Z r that are correlated with Y. 3

5 I the preset cotext, we may cosider the fuctios Z j (θ) = ν h j (θ)m h /m hj ν h (θ) s s= ν hs (θ)m h /m hs, j = 2,..., k, whose expectatios uder s= ( s/)ν hs,y are 0. The calculatio of these fuctios requires kowledge of the Bayes factors m hs /m h, s = 2,..., k. The method proposed i this paper ca ow be briefly summarized as follows.. For each l =,..., k, get Markov chai samples θ (l) i, i =,..., N l from ν hl,y. Based o these, the Bayes factors m hs /m h, s = 2,..., k are estimated. The sample sizes N l should be very large, so that these estimates are very accurate. 2. For each l =,..., k, we obtai ew samples θ (l) i, i =,..., l from ν hl,y. Usig these, together with the Bayes factors computed i Step we form the estimate ˆB reg (h, h ), which is similar to (2.), except that we use the fuctios Z j, j = 2,..., k as cotrol variates. The samples i the two steps are used for differet purposes. Those i Step are used solely to estimate m hs /m h, s = 2,..., k, ad i fact, oce these estimates are formed, the samples may be discarded. The samples i Step 2 are used to estimate the family B(h, h ). O occasio, special aalytical structure eables the use of umerical methods to estimate m hs /m h, s = 2,..., k, as log as k is ot too large so Step is bypassed. A review of the literature for this approach is give i Kass ad Raftery (995). Ideally, the samples i Step 2 should be idepedet or early so, which may be accomplished by subsamplig a very log chai. If we have a Markov trasitio fuctio that gives rise to a uiformly ergodic chai, it is possible to use this Markov trasitio fuctio to obtai perfect samples (Hobert ad Robert (2004)), although the time it takes to geerate a perfect sample of legth l may be much greater tha the time to geerate the Markov chai of legth l. Oe may ask what is the poit of havig two steps of samplig, i.e. why ot just use the samples from Step for both estimatio of m hs /m h, s =,..., k, ad for subsequet estimatio of the family B(h, h ). The reaso for havig the two stages is that the estimate of B(h, h ) eeds to be computed for a large umber of h s, ad for every h the amout of computatio is liear i, so this precludes a large value of. Therefore, give that a relatively modest sample size must be used, we eed to reduce the variace of the estimate as much as possible, ad this is the reaso for carryig out Step. The amout of computatio to geerate the Step samples is typically oe or two orders of magitude less tha the amout of computatio eeded to calculate the estimates of B(h, h ) from the Step 2 samples (see the discussio at the ed of Sectio 3). To summarize, the beefit of the two-step approach is a better tradeoff betwee statistical efficiecy ad computatioal time. To see this, it is helpful to cosider a very simple example i which the variaces of various estimators ca actually be computed. Cosider the uormalized desity q h = t h I(t (0, )), ad let m h be the ormalizig costat. Now suppose we wish to estimate m h /m as h rages over a grid of 4000 poits i the iterval (.5, 2.5) ad that we are able to geerate iid observatios from q /m ad q 3 /m 3. We may use the estimator i Kog et al. (2003) (discussed later i this paper), which estimates both m h /m ad m 3 /m from the same sample. Give oe miute of computer time, usig the machie whose specificatios 4

6 are described i Sectio 3, the requiremet that we calculate such a large umber of ratios of ormalizig costats limits the total sample size to = A formula for the asymptotic variace ρ 2 (h) of the Kog et al. (2003) estimate is give i Ta (2004, equatio (8)), ad i this situatio all quatities that are eeded i the formula are available explicitly. Now if we take the miute ad divide it ito two parts, 3 secods ad 57 secods, the with the 3 secods we ca estimate m 3 /m with essetially perfect accuracy, ad with the remaiig 57 secods, if we use the estimate ˆB(h, ), we ca hadle a sample size of 57/60. A formula for the asymptotic variace τ 2 (h) of this estimator which uses the value of m 3 /m calculated i the first stage is give i Theorem of the preset paper, ad ca also be evaluated explicitly. The ratio τ 2 (h)/ρ 2 (h) is bouded above by.2 over the etire grid, ad so with the same computer resources, the variace of the two-stage estimator is uiformly at most.2 60/57.2 that of the oe-stage estimator. (The gais if we use ˆB reg istead of ˆB ca be far greater; see Sectio 3 for a illustratio.) I Sectio 2. we show how the MCMC approach to Step may be implemeted. I Sectio 2.2 we show how estimatio i Step 2 may be implemeted, ad also discuss the beefits of usig the cotrol variates. I Sectio 2.3 we give a result regardig asymptotic ormality of the estimates of the Bayes factors. 2. Estimatio of the Bayes Factors m hs /m h We ow assume that for l =,..., k, we have a sequece θ (l) i, i =,..., N l from a Markov chai correspodig to the posterior ν hl,y. Also, these k sequeces are idepedet of oe aother. Let N = l= N l, ad a l = N l /N. We wish to estimate m hl /m h, l = 2,..., k. Meg ad Wog (996) cosidered this problem ad, to uderstad their method, it is helpful to cosider first the case where k = 2 ad we wish to estimate d = m h2 /m h. For ay fuctio α defied o the commo support of ν h,y ad ν h2,y such that α(θ)ν h (θ)l y (θ)ν h2 (θ) dθ <, we have Therefore, α(θ)ν h2 (θ)ν h,y(θ) dθ α(θ)ν h (θ)ν h2,y(θ) dθ ˆd = = N N 2 m h m h2 N i= N 2 i= α(θ)ν h2 (θ)l y (θ)ν h (θ) dθ α(θ)ν h (θ)l y (θ)ν h2 (θ) dθ α(θ () ν h2 (θ () α(θ (2) ν h (θ (2) = m h 2 m h. (2.3) is a cosistet estimate of d, uder the miimal assumptio of ergodicity of the two chais. Meg ad Wog (996) show that whe {θ (j) i } N j i= are idepedet draws from ν h j,y, the optimal α to use is α opt (θ) = a ν h (θ) + a 2 ν h2 (θ)/d, (2.4) 5

7 which ivolves the quatity we wish to estimate. This suggests the iterative scheme ˆd (t+) = N N 2 N ν h2 (θ () i= a ν h (θ () + a 2 ν h2 (θ () i N 2 ν h (θ (2) i= a ν h (θ (2) + a 2 ν h2 (θ (2) i )/ ˆd (t) )/ ˆd (t), (2.5) for t =, 2,.... For the geeral case where k 2, let d = (m h2 /m h,..., m hk /m h ), but it is more coveiet to work with the vector of compoet-wise reciprocals of d, call it r. For i = 2,..., k, ad j =,..., k, j i, let α ij be kow fuctios defied o the commo support of ν hi ad ν hj satisfyig α ij (θ)ν hi (θ)l y (θ)ν hj (θ) dθ <. Let b ii = j i E ν hj,y( αij (θ)ν hi (θ) ) 2 i k, b ij = E νhi,y( αij (θ)ν hj (θ) ) i j, (2.6) ad b 22 b b 2k b 32 b b 3k B =......, b =. b 2 b 3. b k2 b k2... b kk b k The assumig that B is osigular, we have r = B b. If ˆB α ad ˆb α are the atural estimates of B ad b based o the fuctios α ij ad the samples {θ (j) i } N j i=, j =,..., k, the r may be estimated via ˆr = ˆB ˆb α α. (2.7) Meg ad Wog (996) cosider the fuctios α ij = a i a j s= a sr s ν hs, (2.8) which ivolve the ukow r. The atural extesio of (2.5) is ˆr (t+) = ˆB α tˆb αt, with the vector of fuctios α t give by (2.8), where we use ˆr (t) istead of r. 2.2 Usig Cotrol Variates The use of cotrol variates has had may successes i Mote Carlo samplig, ad a particularly importat paper is Owe ad Zhou (2000). This paper cosiders the use of cotrol variates i cojuctio with importace samplig, whe the importace samplig desity is a mixture, ad the paper motivates some of the ideas below. We ow assume that we have samples θ (l) i, i =,..., l, from ν hl,y, l =,..., k, with idepedece across samples, ad that we kow the costats d 2,..., d k. For uity of otatio, we defie d =. As before = l= l ad l / = a l. The estimate ˆB(h, h ) i (2.) is a average of draws from the mixture distributio p a = s= a sν hs,y. However, these are ot 6

8 idepedet ad idetically distributed sice they form a stratified sample: we have exactly s draws from ν hs,y, s =,..., k, a fact which causes o problems. We wish to estimate the itegral I h = l y (θ)ν h (θ)/m h dθ = B(h, h ). Defie the fuctios H j (θ) = l y (θ)ν hj (θ)/m hj l y (θ)ν h (θ)/m h, j = 2,..., k. We have H j (θ) dθ = 0, or equivaletly E pa ( Hj (θ)/p a (θ) ) = 0, where the subscript idicates that the expectatio is take with respect to the mixture distributio p a. Therefore, for every β = (β 2,..., β k ) the estimate Î h,β = k l l= i= l y (θ (l) i )ν h (θ (l) /m h [ ly (θ (l) ( ν hj (θ (l) /m hj ν h (θ (l) )] /m h s= a sν hs,y(θ (l) j=2 β j is ubiased. As writte, this estimate is ot computable, because it ivolves the ormalizig costats m hj, which are ukow, ad also the likelihood l y (θ), which may ot be available. We rewrite it i computable form as Î h,β = k l l= i= ν h (θ (l) i ) j=2 β [ j νhj (θ (l) /d j ν h (θ (l) ] s= a. (2.9) sν hs (θ (l) /d s We would like to use the value of β, call it β opt, that miimizes the variace of Îh,β, but this β opt is geerally ukow. As i Owe ad Zhou (2000), we ca do ordiary liear regressio of Y (h) Y (h) = o predictors Z (j), where ν h (θ (l) s= a, Z (j) sν hs (θ (l) /d s = ν h j (θ (l) /d j ν h (θ (l) s= a, j = 2,..., k, (2.0) sν hs (θ (l) /d s ad all required quatities are available. We the use the least squares estimate ˆβ, i.e. the estimate of I h is Îh, ˆβ. It is easy to see that Îh, ˆβ is simply ˆβ 0, the estimate of the itercept term i the bigger regressio problem where we iclude the itercept term, i.e. Î h, ˆβ = ˆβ 0. (2.) Oe ca show that if the k sequeces are all iid sequeces, the ˆβ coverges to β opt, ad Îh, ˆβ is guarateed to be at least as efficiet as the aive estimator. But whe we have Markov chais this is ot the case, especially if the chais mix at differet rates. I Sectio 2.3 we cosider the estimates ˆβ ad Îh, ˆβ directly. I particular, we give a precise defiitio of the oradom value β that ˆβ is estimatig (it is β (h) lim i equatio (A.3)), ad show that the effect of usig ˆβ istead of β is asymptotically egligible. 7

9 It is atural to cosider the problem of estimatig β opt i the Markov chais settig. Actually, before thikig about miimizig the variace of (2.9) with respect to β, oe should first ote the followig. The costats a s = s /, s =,..., k, used i formig the values Y (h) are sesible i the iid settig, but whe dealig with Markov chais oe would wat to replace s with a effective sample size, as discussed by Meg ad Wog (996). Therefore, the real problem is two-fold: How do we fid optimal (or good) values to use i place of the a s s i the Y (h) s? Usig the Y (h) s based o these values, how do we estimate the value of β that miimizes the variace of (2.9)? Both problems appear to be very difficult. Ituitively at least, the method described here should perform well if the mixig rates of the Markov chais are ot very differet. But i ay case, the results i Sectio 2.3 show that, whether or ot Îh, ˆβ is optimal, it is a cosistet ad asymptotically ormal estimator whose variace ca be estimated cosistetly. Note that if we do ot use cotrol variates, our estimate is just which is exactly (2.). k l ν h (θ (l) s= a, sν hs (θ (l) /d s l= i= Reductio i Variace from Usig the Cotrol Variates of the resposes Y (h) ad predictors Z (j) give by Cosider the liear combiatio L = k a j Z (j) + Y (h). j=2 (We are droppig the subscripts i, l.) A calculatio shows that if h = h the L =, meaig that we have a estimate with zero variace. Similarly, for t = 2,..., k, let L t be the liear combiatio give by k L t = a j Z (j) + (/d t )Y (h) Z (t). j=2 If h = h t, the L t =. Thus if h {h,..., h k }, our estimate of the Bayes factor B(h, h ) has zero variace. This is ot surprisig sice, after all, we are assumig that we kow B(h j, h ), for j =,..., k; however, this does idicate that if we use these cotrol variates, our estimate will be very precise as log as h is close to at least oe of the h j s. This advatage does ot exist if we use the plai estimate (2.). The itercept term i the regressio of the Y (h) s o the Z (j) s is simply a liear combiatio of the form ˆβ 0 = k l l= i= w Y (h). (2.2) 8

10 The w s eed to be computed just oce, so for every ew value of h the calculatio of ˆB reg (h, h ) requires operatios, which is the same as the umber of operatios eeded to compute ˆB(h, h ) give by (2.). To summarize, usig cotrol variates ca greatly improve the accuracy of the estimates, at o (or trivial) icrease i computatioal cost. 2.3 Asymptotic Normality ad Estimatio of the Variace Here we state a result that says that uder certai regularity coditios ˆB reg (h, h ) ad ˆB(h, h ) are asymptotically ormal, ad we show how to estimate the variace. As discussed i Sectio 2.2, we typically prefer that θ (l) i, i =,..., l, be a iid sample for each l. Nevertheless, our results pertai to the more geeral case where these samples arise from Markov chais. (As before, we assume that l / a l (0, ) ad, whe dealig with the asymptotics, strictly speakig we eed to make a distictio betwee l / ad its limit; however we write a l for both as this makes the bookkeepig easier, ad blurrig the distictio ever creates a problem.) Recall that Y (h) ad Z (j), j = 2,..., k, are defied i (2.0) ad, for ecoomy of otatio, we defie Z () to be for all i, l. Let R be the k k matrix defied by ( k ) R jj = E l= a lz (j) ),l Z(j,l, j, j =,..., k. We assume that for the Markov chais a strog law of large umbers holds (sufficiet coditios are give, for example, i Theorem 2 of Athreya, Doss ad Sethurama (996)), ad we refer to the followig coditios. A For each l =,..., k, the chai {θ (l) i } i= is geometrically ergodic. A2 For each l =,..., k, there exists ɛ > 0 such that E ( (h) Y 2+ɛ ) <. A3 The matrix R is osigular. Theorem Uder coditios A ad A2 ad uder coditios A A3 /2( ˆB(h, h ) B(h, h ) ) /2( ˆBreg (h, h ) B(h, h ) ),l d N ( 0, τ 2 (h) ), d N ( 0, σ 2 (h) ), with τ 2 (h) ad σ 2 (h) give by equatios (A.9) ad (A.7) below. The proof is give i the Appedix, which also explais how oe ca estimate the variaces. Theorem assumes that the vector d is kow either because it ca be computed aalytically or because the sample sizes from Stage samplig are so large that this is effectively true. Buta (2009) has obtaied a versio of Theorem that takes ito accout the variability from the first stage. Very briefly, if N is the total sample size from the first stage, ad if N ad i such a way that /N q [0, ), the /2( ˆB(h, h ) B(h, h ) ) d N ( 0, qτ 2 S(h) + τ 2 (h) ), 9

11 where τs 2 (h) is a correctio term that iflates the variace whe the sample sizes i Stage are fiite. Also, she has a similar result for the estimate that uses cotrol variates. The variaces of ˆB reg (h, h ) ad ˆB(h, h ) deped o the choice of the poits h,..., h k, ad fidig good values of k ad h,..., h k is i geeral a very difficult problem. I our experiece, we have foud that the followig method works reasoably well. Havig specified the rage H, we select trial values h,..., h k, ad i pilot rus plot the variace fuctio τ 2 (h), or σ 2 (h); the if we fid a regio where this is uacceptably large, we cover this regio by movig some h l s closer to the regio, or by simply addig ew h l s i that regio, which icreases k. 3 Illustratio There are may classes of models to which the methodology developed i Sectio 2 applies. These iclude the usual parametric models, ad also Bayesia oparametric models ivolvig mixtures of Dirichlet processes (Atoiak (974)), i which oe of the hyperparameters is the so-called total mass parameter very briefly, this hyperparameter cotrols the extet to which the oparametric model differs from a purely parametric model. Aother applicatio ivolves some problems i Bayesia variable selectio, ad this is described i Doss (2007). I this sectio we give a example ivolvig the hierarchical Bayesia model described i Sectio. While models of much greater complexity ca be cosidered, this relatively simple example has the advatage that the data ca be visualized quickly, ad the hyperparameters have a straightforward iterpretatio so that our aalysis ca be easily uderstood. Meta-Aalysis of Data o No-Steroidal Ati-Iflammatory Drugs ad Cacer Risk Over the last decade, a large umber of epidemiological studies have reported a lik betwee itake of osteroidal ati-iflammatory drugs (NSAIDs) ad cacer risk. The studies, which ivolve differet cacers ad differet NSAIDs, strogly suggest that log-term itake of NSAIDs results i a sigificat reductio i cacer risk for all the major types: colo, breast, lug, ad prostate cacer. I Harris et al. (2005) we carry out a comprehesive review of the published scietific literature o NSAIDs ad cacer. Our review spas 90 papers, which ivestigate several NSAIDs ad te cacers, icludig the four major types. We have extracted data from these papers to make tables such as Table below, which pertais to aspiri ad colo cacer. The table gives, for each of 5 studies, the dose, reported risk ratio (for NSAID use vs. o-nsaid use), ad the log reported risk ratio together with a stadard error. (Harris et al. (2005) does ot give these stadard errors; it gives 95% cofidece itervals for the risk ratios, which ca be used to form 95% cofidece itervals for the log risk ratios, which i tur ca be used to determie the stadard errors.) See Harris et al. (2005) for more iformatio o this table ad refereces for the 5 studies. As ca be see from the table, there is some icosistecy i the studies, with some idicatig a large reductio i cacer risk, while others idicate a smaller reductio, i spite of a large dose. This is ot surprisig, sice there is heterogeeity i the patiet ad cotrol pools (characteristics such as age, ethicity, ad health status vary greatly across the studies). It is 0

12 Publicatio PPW RR LRR SE(LRR) Cooga, Friedma, Garcia-Rod., Giovaucci, Giovaucci, LaVecchia, Muscat, Pagaii-Hill, Publicatio PPW RR LRR SE(LRR) Peleg, Reeves, Roseberg, Roseberg, Schr. & Ev., Suh, Thu, Table : Fiftee studies o aspiri ad colo cacer. Here, PPW represets the dose (umber of 325 mg pills per week), RR is the observed risk ratio for aspiri vs. o aspiri, LRR is its logarithm, ad SE(LRR) is a estimate of the stadard error of LRR. therefore of iterest to carry out a meta-aalysis of these studies. Although there have bee a few meta-aalyses i the literature, these have bee rather iformal: all of them have used fixed effects models, ad oe have take ito accout the dose iformatio. Assume temporarily that all studies ivolved the same dose. I a radom-effects metaaalysis, for each study j there is a latet variable, say ψ j, that gives the true log risk ratio that would be obtaied if the sample sizes for that study were ifiite. Oe is the led to a model such as (.2), i which the distributio of the study-specific effect is the ormal distributio i (.2b). Two modellig issues ow arise. The first is that whereas the first ormality assumptio (lie (.2a)) is supported by a theoretical result (the approximate ormality of fuctios of biomial estimates), the secod ormality assumptio (lie (.2b)) is ot but is typically made for the sake of coveiece. I fact, data for several of the other cacers iclude outliers (see Harris et al. (2005)), ad therefore oe may wish to use a t distributio istead, this decisio beig made prior to lookig at the colo cacer data. A importat modellig issue is the to decide o the umber of degrees of freedom. The secod issue is to determie the parameters of the ormal / iverse gamma prior λ c i (.2c). Here c = (c, c 2, c 3, c 4 ), where c, c 2, c 4 > 0 ad c 3 R ad, uder this prior, the distributio of (µ, τ) is as follows: γ = /τ 2 Gamma(c, c 2 ) ad, coditioal o τ, µ N (c 3, c 4 τ 2 ). This prior is commoly used because it is cojugate to the family N (µ, τ 2 ). With appropriate hyperparameters, λ ca be made to be a flat ( oiformative ) prior, ad commo recommedatios are to take c ad c 2 to be very small (so that the gamma distributio o γ is a approximatio to dγ/γ, the improper Jeffrey s prior), ad to take c 3 = 0 ad c 4 to be very large. Ideed, this is the recommedatio made i the examples i the Bugs documetatio ad tutorials. Nevertheless, such a set of hyperparameter values is ow sometimes criticized because for small values of c ad c 2 the gamma distributio gives high probability to large values of γ (equivaletly small values of τ), which greatly ecourages the ψ j s to be all be equal to µ. I other words, this causes excessive shrikage. See for example Gelma (2006). We wish to address both these issues ad ow also would like to take ito accout the dose. Let L j be the log of the observed risk ratio for study j. Let x j be the dose, defied as umber of pills per day (PPW/7), for study j. Cosider the liear model L j = α j + ψ j x j + ε j, j =,..., m, (3.)

13 where α j ad ψ j are parameters specific to study j, ad ε j is ormally distributed with mea 0 ad stadard deviatio σ j (give i Colum 5 of Table ). Note that α j = 0, sice x j = 0 implies that the treatmet ad cotrol groups are idetical, so that L j has mea 0. Thus, (3.) is rewritte as L j = ψ j x j + ε j, from which we see that ψ j has the iterpretatio as the true log risk ratio if the treatmet group had take pill per day. Thus if we let Y j = L j /x j, we have Y j = ψ j + ε j, j =,..., m, where ε j is ormal with mea 0 ad stadard deviatio σ j = σ j /x j. We ow cosider the hierarchical model Y j idep φ ψj, σ j, j =,..., m, (3.2) with the distributio of ψ determied by the followig: coditioal o µ, τ, ψ j iid t v,µ,τ, j =,..., m, (3.3a) (µ, τ) λ c. (3.3b) Lettig θ = (ψ, µ, τ), the likelihood of Y = (Y,..., Y m ) is give by (3.2), ad the prior o θ is give by (3.3), which is idexed by h = (v, c). Loosely speakig, the value of v determies the choice of the model, ad the c s determies the prior. We may therefore fix some value h ad cosider the family of Bayes factors B(h, h ) as h varies. We ca estimate the family if for values h j, j =,..., k, of the hyperparameter h, we have samples from the posterior distributios ν hj,y of the etire vector θ. We cosidered four differet values of c i which c 3 = 0, c 4 = 000 were fixed (sice there does ot seem to be ay cotroversy about these two parameters) ad we took c = c 2 ad let the commo value, deoted ɛ, start at.005 ad icrease by factors of 5 up to.625. We took the values of the degrees of freedom parameter to be v =, 4, 2, for a total of 2 values of the hyperparameter h. For each of these 2 values we ra a Markov chai of legth about millio ad used these to calculate the vector of ratios of ormalizig costats, via the method of Meg ad Wog (996) reviewed i Sectio 2.. We the ra ew Markov chais to produce a sample of size 00 from each of the 2 posteriors. These samples, which were actually subsamples from loger chais (bur-i of 000, the takig every 50 th value), ca be cosidered iid for practical purposes, ad were used to calculate the estimate ˆB reg (h, h ) of Sectio 2.2. We took h to be the specificatio correspodig to v = 4 ad ɛ =.25, sice prelimiary experimets idicated that this value of h gave a relatively high value of m h. Figure shows ˆB reg (h, h ) as v ad ɛ vary. The maximum stadard error over the rage of the graph was less tha.0. The two plots i Figure show differet views of the same graph. From the left plot we see that a t distributio works better tha does a ormal, with the optimal umber of degrees of freedom beig about 3 or 4. The plot also shows clearly that a very small umber of degrees of freedom is ot appropriate. The right plot shows that as ɛ 0, the Bayes factor coverges to 0 rapidly (i particular, fixig v = 4, the recommedatio i the Bugs literature to use ɛ =.00 gives a Bayes factor of about.036, ad for ɛ =.000 it is.0037), givig strog evidece that very small values of ɛ should ot be used. For some models the improper prior dγ/γ gives rise to a proper posterior, ad for others, icludig model (3.3b), it is possible to prove that the posterior is improper (Berger (985, 2

14 Bayes factor 0.6 Bayes factor epsilo df df epsilo Figure : Model assessmet for the aspiri ad colo cacer data. The Bayes factor as a fuctio of v, the umber of degrees of freedom i (3.3a), ad ɛ, the commo value of c ad c 2 i the gamma prior i (3.3b), is show from two differet agles. Here the baselie value of the hyperparameter correspods to v = 4 ad ɛ =.25. p. 87)), so that the pathological behavior resultig from ɛ 0 should be expected. For some more complicated models, whether the posterior is proper or ot is ukow (posterior propriety may eve deped o the data values), ad i these cases, plots such as those i Figure may be useful because they may lead oe to ivestigate a possible posterior impropriety. The choice of hyperparameter h does have a ifluece o our iferece. Let ψ ew deote the latet variable for a future study, a quatity of iterest i meta-aalysis. We cosidered two specificatios of h: (v =, ɛ =.00) ad (v = 4, ɛ =.625). The first choice may be cosidered a default choice, ad the secod a choice guided by cosideratio of the plot of Bayes factors. For the choice (v =, ɛ =.00), we have E(ψ ew ) =.95 ad P (ψ ew > 0) =.04, whereas for (v = 4, ɛ =.625), we have E(ψ ew ) =.87 ad P (ψ ew > 0) =.08. I other words, the t model suggests a stroger aspiri effect, but the iferece is more tetative. Remarks o Computatio ad Accuracy We ow give a idea of how the computatioal effort is distributed. The Stage samples (2 chais, each of legth 0 6 ) took 83 secods to geerate o a 3.8 GHz dual core P4 ruig Liux. By cotrast, the plot i Figure, which ivolves a grid of 4000 poits, took oe hour to compute, i spite of the fact that it is based o a total sample size of oly 200, for what must be cosidered a rather simple model. Clearly usig a very large value of is ot feasible, ad this is why we eed to ru the prelimiary chais i order to get a very accurate estimate of d. We ow illustrate the extet to which ˆB reg (h, h ) is more efficiet tha ˆB(h, h ). Figure 2 gives a plot of the ratio of the variaces of the two estimates as h varies. Both ˆB reg (h, h ) ad ˆB(h, h ) use the desig discussed earlier, which ivolves a total sample size of 200. This figure is obtaied by geeratig 00 Mote Carlo replicates of ˆB reg (h, h ) ad ˆB(h, h ) for 3

15 each h i a grid somewhat more coarse tha the oe used i Figure. As ca be see from the figure, the ratio is about.0 over most of the grid, ad is less tha. over the etire grid, with the exceptio of the values of h for which df =.5 (for those values, the Bayes factor itself is very small, ad the two estimates each have miiscule variaces). We also ote that the ratio is exactly 0 at the desig poits. 0.3 Ratio of variaces df epsilo 8 Figure 2: Improvemet i accuracy that results whe we use cotrol variates. The plot gives Var ( ˆBreg (h, h ) )/ Var ( ˆB(h, h ) ) as h rages over the same regio as i Figure. 4 Discussio Whe faced with ucertaity regardig the choice of hyperparameters, oe approach is to put a prior o the hyperparameters, that is, add oe layer to the hierarchical model. This approach, which goes uder the geeral ame of Bayesia model averagig, ca be very useful. O the other had, there are several good reasos why oe may wat to avoid it. First, the choice of prior o the hyperparameters ca have a great ifluece o the aalysis. Oe is tempted to use a flat prior but, as is well kow, for certai parameters such a prior ca i fact be very iformative. I the illustratio of Sectio 3, a flat prior o the degrees of freedom parameter i effect skews the results i favor of the ormal distributio. Secod, oe may wish to do Bayesia model selectio, as opposed to Bayesia model averagig, because the subsequet iferece is the more parsimoious ad iterpretable. These poits are discussed more fully i George ad Foster (2000) ad Robert (200, Chapter 7). There are a umber of papers that deal with estimatio of Bayes factors via MCMC. Che, Shao ad Ibrahim (2000, Chapter 5) ad Ha ad Carli (200) give a overview of much of this work, ad we metio also the more recet paper by Meg ad Schillig (2002), which is directly relevat. Most of these papers deal with the case of a sigle Bayes factor, whereas the preset paper is cocered with estimatio of large families of Bayes factors. Nevertheless i priciple, ay of the methods i this literature ca be applied to estimate the vector d. 4

16 Especially importat is Kog et al. (2003), whose work we describe i the otatio of the preset paper. The situatio cosidered there has k kow uormalized desities q h,..., q hk, with ukow ormalizig costats m h,..., m hk, respectively, ad for l =,..., k, there from q hl /m hl. The problem is the simultaeous estimatio of all ratios m hl /m hs, l, s =,..., k, or equivaletly, all ratios d l = m hl /m h, l =,..., k. I a certai framework, they show that the maximum likelihood estimate (MLE) of d is obtaied by solvig the system of k equatios is a iid sample θ (l),..., θ (l) l ˆd r = k l q hr (θ (l) s= a sq hs (θ (l) / ˆd, r =,..., k. (4.) s l= i= To put this i our cotext, let q hl (θ) = l y (θ)ν hl (θ), l =,..., k, ad suppose we have iid samples from the ormalized q hl s. We may imagie that we have k + uormalized desities q h,..., q hk, q h, with a sample of size 0 from the ormalized q h. The estimate of m h /m h the becomes k l l= i= ν h (θ (l) s= a sν hs (θ (l) / ˆd s. We recogize this as precisely ˆB(h, h ) i (2.), except that ˆd,..., ˆd k are formed by solvig (4.), i.e., are estimated from the sequeces θ (l),..., θ (l) l, l =,..., k. Thus, ˆB(h, h ) is the same as the estimate of Kog et al. (2003), except that the vector d is precomputed based o previously ru very log chais. Therefore, it is perhaps atural to cosider estimatig d o the basis of these very log Markov chais usig the method of Kog et al. (2003) (as opposed to the method discussed i Sectio 2.), ad we ow discuss this possibility. I their approach, Kog et al. (2003) assume that the q hl s are desities with respect to a domiatig measure µ, ad they obtai the MLE ˆµ of µ (ˆµ is give up to a multiplicative costat). They ca the estimate the ratios m hl /m hs sice the ormalizig costats are kow fuctios of µ. Their approach works if for each l, θ (l),..., θ (l) l is a iid sample. Although they exted it to the case where these are a Markov chai, i the extesio q hl is replaced by the Markov trasitio fuctios P hl (, θ (l), i = 0,..., l, assumed absolutely cotiuous with respect to a sigma-fiite measure µ (precludig Metropolis-Hastigs chais), ad if each of these is kow oly up to a ormalizig costat as is typically the case the the system (4.) becomes a system of k equatios. This is prohibitively difficult to solve. Ta (2004) shows how cotrol variates ca be icorporated i the likelihood framework of Kog et al. (2003). Whe there are r fuctios H j, j =,..., r, for which we kow that Hj dµ = 0, the parameter space is restricted to the set of all sigma-fiite measures satisfyig these r costraits. For the case where θ (l) i, i =,..., l, are iid for each l =,..., k, he obtais the MLE of µ i this reduced parameter space, ad therefore a correspodig estimate of m h /m h, ad shows that this approach gives estimates that are asymptotically equivalet to estimates that use cotrol variates via regressio. His estimate ca still be used whe we have Markov chai draws, but is o loger optimal for the same reaso that the estimate i the preset paper is ot optimal (see the discussio i the middle of Sectio 2.2). The optimal estimator is obtaied by usig the likelihood that arises from the Markov chai structure, ad i the case of geeral Markov chais its calculatio is computatioally very demadig. See 5

17 Ta (2006, 2008) for advaces i this directio. Ta (2004) also obtais results o asymptotic ormality of his estimators that are valid whe we have the iid structure, but it should be possible to obtai versios for Markov chai draws, uder regularity coditios such as those of the preset paper. Owe ad Zhou (2000) use cotrol variates i cojuctio with importace samplig. I the otatio above, they assume that the q hl s are ormalized desities, ad that for every l, they have a iid sample of size l from q hl. As before, let a l = l / s= s. Because these are ormalized desities, each of the k variables q hl (θ)/ ( a k s s= q h s (θ) ) has expectatio uder the distributio s= a sq hs, ad so ca be used as cotrol variates. Their method does ot work directly i our situatio because the q hl = l y (θ)ν hl (θ) are uormalized desities. It is therefore atural to cosider estimatig the ormalizig costats of q hl, l =,..., k, from the Stage rus. Ideed, there are methods for doig this from Markov chai output (Chib (995), Chib ad Jeliazkov (200)). However, estimatio of ratios of ormalizig costats teds to be far more stable tha estimatio of the ormalizig costats themselves. For example, if we wish to estimate m h /m h, the a procedure that ivolves estimatig m h ad m h separately ad the takig the ratio is ot guarateed to provide accurate estimates eve whe h = h, whereas i this case the simple estimate (.) gives a ubiased estimate with zero variace. Moreover, if we ru Markov chais for models idexed by h,..., h k, the estimate of a sigle ratio m hs /m h usig the method of Sectio 2. makes use of all the chais, providig greater stability. The cotrol variates that we use are essetially equivalet to those used by Owe ad Zhou (2000), but their computatio requires oly kowledge of the vector d. R fuctios for producig the estimates ˆB(h, h ) ad ˆB reg (h, h ), ad plots such as those i Figure for the hierarchical model (3.2) (3.3) ad relatives, are available from the author upo request. Ackowledgemets I thak two referees for their careful readig ad Eugeia Buta for helpful commets. I am especially grateful to a associate editor for a very isightful ad thorough report, ad for suggestios that led to several improvemets i the paper. Appedix: Proof of Theorem ad l l l i= Y (h) for l =,..., k ad j = 2,..., k (corollary to Theorem of Uder Coditios A ad A2 we have a cetral limit theorem for the averages l i= Z(j) Y (h) Ibragimov ad Liik (97)); however, there are other sets of coditios that could be used. For example, the ɛ > 0 is ot eeded, i.e. a fiite secod momet suffices if the chai is reversible (Roberts ad Rosethal (997)) for istace if the chai is a Metropolis algorithm, or if it is a two-cycle Gibbs sampler or if it is uiformly ergodic (Cogbur (972)). These are the most commoly used assumptios, but for a fuller discussio of cetral limit theorems for Markov chais see Cha ad Geyer (994). 6

18 We first prove the assertio regardig ˆB reg (h, h ). Let Z be the k matrix whose traspose is Z Z (2),... Z (2), Z (2),2... Z (2) 2,2... Z (2),k... Z (2) = k,k , Z (k),... Z (k), Z (k),2... Z (k) 2,2... Z (k),k... Z (k) k,k ad let Y = Y (h) = Y = ( Y (h),,..., Y (h),, Y (h),2,..., Y (h) 2,2,..., Y (h),k,..., Y (h) k,k). Note: we sometimes suppress the superscript h i order to lighte the otatio. The least squares estimate is ( ˆβ(h) 0, ˆβ (h) ) = (Z Z) Z Y /, assumig that Z Z is osigular. (Here, ˆβ (h) = (h) (h) ( ˆβ 2,..., ˆβ k )). Note that k l k Z (j) ) l l Z(j = Z (j) ) a.s. Z(j R j,j l= i= l= by the strog law of large umbers (clearly Z (j) Z Z/ a.s. R, so by A3 we have l i= are bouded radom variables). Therefore (Z Z) a.s. R ad, i particular, with probability oe, Z Z is osigular for large. We have Z Y = k l= l= l i= Z() Y. l i= Z(k) Y a.s. l= a le ( Z (),l Y,l. l= a le ( Z (k),l Y,l ). ) (A.) (A.2) Let v = (v,..., v k ) be the vector o the right side of (A.2). From (A.) ad (A.2) we have ( ˆβ(h) 0, ˆβ (h) ) a.s. ( β (h) 0,lim, lim) β(h) = R v. (A.3) Cosider (2.9), usig β (h) lim for β. We have Î h,β (h) lim = k l ( Y ) k j=2 β(h) j,lim Z(j) l= i= = ( k a l l= l l i= U ), (A.4) where U = Y j=2 β(h) j,lim Z(j). Let µ l(h) = E(U,l ). By A2, E( U,l 2+ɛ ) < ad therefore, by A we have ( l /2 i= U ) d l µ l (h) N ( 0, σl 2 (h) ), l where σ 2 l (h) = Var(U,l ) + 2 g= Cov(U,l, U +g,l ). (A.5) 7

19 Sice the Markov chais are idepedet, this implies that /2 (Îh,β (h) lim l= a lµ l (h) ) d N ( 0, σ 2 (h) ), (A.6) where Note that (/) l l= i= Y Therefore, from the first equatio i (A.4), proves that l= a lµ l (h) = B(h, h ). σ 2 (h) = l= a lσ 2 l a.s. B(h, h ) ad (/) Îh,β (h) lim To coclude the proof, we cosider the differece betwee E ( Z (j),l ). We have a.s. (h). (A.7) l= l i= Z(j) a.s. 0, j = 2,..., k. B(h, h ) which, together with (A.6), Îh, (h) ad Îh,β (h). Let e(j, l) = ˆβ lim ( ) ) k (Îh, /2 (h) Îh,β (h) = /2 (β ˆβ j,lim ˆβ k l j ) Z (j) lim j=2 l= i= ( k k = (β j,lim ˆβ l [ (j) ] Z j ) a l /2 e(j, l) ), (A.8) l j=2 where the secod equality i (A.8) follows from the fact that l= a le(j, l) = 0. Now, for each l =,..., k, ad j = 2,..., k, by A, /2 l [ (j) ] i= (Z e(j, l))/ l is asymptotically ormal, so i particular is bouded i probability. Together with (A.3), this implies that the right side of (A.8) coverges i probability to 0. We coclude that /2( ˆBreg (h, h ) B(h, h ) ) d N ( 0, σ 2 (h) ). The proof for ˆB(h, h ) is simpler. Let f l = E(Y,l ), ad ote that l= a lf l = B(h, h ). We have /2( ˆB(h, h ) B(h, h ) ) = /2 ( i which l= k l ) Y f l l= i= i= = k l= d N ( 0, τ 2 (h) ), l a /2 i= (Y f l ) l /2 l τ 2 (h) = l= a lτl 2 2 (h), where τl (h) = Var(Y,l) + 2 g= Cov(Y,l, Y +g,l ). (A.9) The variace term σl 2 (h) i (A.5) is the asymptotic variace of the stadardized versio of the average l i= U. If we kew the U s, we could estimate σl 2 (h) by estimatig the iitial segmet of the series i (A.5) usig stadard methods from time series (see Geyer (992)) or via batchig. Now the U s ivolve β (h) lim, which is ukow, but our proof idicates that the effect of usig ˆβ (h) istead of β (h) lim i the expressio for U is asymptotically egligible. 8

20 Refereces Atoiak, C. E. (974). Mixtures of Dirichlet processes with applicatios to Bayesia oparametric problems. The Aals of Statistics Athreya, K. B., Doss, H. ad Sethurama, J. (996). O the covergece of the Markov chai simulatio method. The Aals of Statistics Berger, J. O. (985). Statistical Decisio Theory ad Bayesia Aalysis (Secod Editio). Spriger-Verlag, New York. Buta, E. (2009). Computatioal Methods i Bayesia Sesitivity Aalysis. Ph.D. thesis, Uiversity of Florida. Cha, K. S. ad Geyer, C. J. (994). Commet o Markov chais for explorig posterior distributios. The Aals of Statistics Che, M.-H., Shao, Q.-M. ad Ibrahim, J. G. (2000). Mote Carlo Methods i Bayesia Computatio. Spriger-Verlag, New York. Chib, S. (995). Margial likelihood from the Gibbs output. Joural of the America Statistical Associatio Chib, S. ad Jeliazkov, I. (200). Margial likelihood from the Metropolis-Hastigs output. Joural of the America Statistical Associatio Cogbur, R. (972). The cetral limit theorem for Markov processes. I Proceedigs of the Sixth Berkeley Symposium o Mathematical Statistics ad Probability, Volume 2. Uiversity of Califoria Press, Berkeley. Doss, H. (2007). Bayesia model selectio: Some thoughts o future directios. Statistica Siica Gelma, A. (2006). Prior distributios for variace parameters i hierarchical models. Bayesia Aalysis George, E. I. ad Foster, D. P. (2000). Biometrika Calibratio ad empirical Bayes variable selectio. Geyer, C. J. (992). Practical Markov chai Mote Carlo (Disc: p ). Statistical Sciece Ha, C. ad Carli, B. P. (200). Markov chai Mote Carlo methods for computig Bayes factors: A comparative review. Joural of the America Statistical Associatio Harris, R., Beebe-Dok, J., Doss, H. ad Burr, D. (2005). Aspiri, Ibuprofe ad other osteroidal ati-iflammatory drugs i cacer prevetio: A critical review of o-selective COX-2 blockade. Ocology Reports

21 Hobert, J. P. ad Robert, C. P. (2004). A mixture represetatio of π with applicatios i Markov chai Mote Carlo ad perfect samplig. The Aals of Applied Probability Ibragimov, I. A. ad Liik, Y. V. (97). Idepedet ad Statioary Sequeces of Radom Variables. Wolters-Noordhoff, Groige. Kass, R. E. ad Raftery, A. E. (995). Bayes factors. Joural of the America Statistical Associatio Kog, A., McCullagh, P., Meg, X.-L., Nicolae, D. ad Ta, Z. (2003). A theory of statistical models for Mote Carlo itegratio (with discussio). Joural of the Royal Statistical Society, Series B Meg, X.-L. ad Schillig, S. (2002). Warp bridge samplig. Joural of Computatioal ad Graphical Statistics Meg, X.-L. ad Wog, W. H. (996). Simulatig ratios of ormalizig costats via a simple idetity: A theoretical exploratio. Statistica Siica Owe, A. ad Zhou, Y. (2000). Safe ad effective importace samplig. Joural of the America Statistical Associatio Robert, C. P. (200). The Bayesia Choice: from Decisio-Theoretic Foudatios to Computatioal Implemetatio. Spriger-Verlag, New York. Roberts, G. O. ad Rosethal, J. S. (997). Geometric ergodicity ad hybrid Markov chais. Electroic Commuicatios i Probability Ta, Z. (2004). O a likelihood approach for Mote Carlo itegratio. Joural of the America Statistical Associatio Ta, Z. (2006). Mote Carlo itegratio with acceptace-rejectio. Joural of Computatioal ad Graphical Statistics Ta, Z. (2008). Mote Carlo itegratio with Markov chai. Joural of Statistical Plaig ad Iferece

MATH 320: Probability and Statistics 9. Estimation and Testing of Parameters. Readings: Pruim, Chapter 4

MATH 320: Probability and Statistics 9. Estimation and Testing of Parameters. Readings: Pruim, Chapter 4 MATH 30: Probability ad Statistics 9. Estimatio ad Testig of Parameters Estimatio ad Testig of Parameters We have bee dealig situatios i which we have full kowledge of the distributio of a radom variable.

More information

The standard deviation of the mean

The standard deviation of the mean Physics 6C Fall 20 The stadard deviatio of the mea These otes provide some clarificatio o the distictio betwee the stadard deviatio ad the stadard deviatio of the mea.. The sample mea ad variace Cosider

More information

Output Analysis and Run-Length Control

Output Analysis and Run-Length Control IEOR E4703: Mote Carlo Simulatio Columbia Uiversity c 2017 by Marti Haugh Output Aalysis ad Ru-Legth Cotrol I these otes we describe how the Cetral Limit Theorem ca be used to costruct approximate (1 α%

More information

Infinite Sequences and Series

Infinite Sequences and Series Chapter 6 Ifiite Sequeces ad Series 6.1 Ifiite Sequeces 6.1.1 Elemetary Cocepts Simply speakig, a sequece is a ordered list of umbers writte: {a 1, a 2, a 3,...a, a +1,...} where the elemets a i represet

More information

1 Introduction to reducing variance in Monte Carlo simulations

1 Introduction to reducing variance in Monte Carlo simulations Copyright c 010 by Karl Sigma 1 Itroductio to reducig variace i Mote Carlo simulatios 11 Review of cofidece itervals for estimatig a mea I statistics, we estimate a ukow mea µ = E(X) of a distributio by

More information

Statistics 511 Additional Materials

Statistics 511 Additional Materials Cofidece Itervals o mu Statistics 511 Additioal Materials This topic officially moves us from probability to statistics. We begi to discuss makig ifereces about the populatio. Oe way to differetiate probability

More information

Lecture 2: Monte Carlo Simulation

Lecture 2: Monte Carlo Simulation STAT/Q SCI 43: Itroductio to Resamplig ethods Sprig 27 Istructor: Ye-Chi Che Lecture 2: ote Carlo Simulatio 2 ote Carlo Itegratio Assume we wat to evaluate the followig itegratio: e x3 dx What ca we do?

More information

1 Inferential Methods for Correlation and Regression Analysis

1 Inferential Methods for Correlation and Regression Analysis 1 Iferetial Methods for Correlatio ad Regressio Aalysis I the chapter o Correlatio ad Regressio Aalysis tools for describig bivariate cotiuous data were itroduced. The sample Pearso Correlatio Coefficiet

More information

Discrete Mathematics for CS Spring 2008 David Wagner Note 22

Discrete Mathematics for CS Spring 2008 David Wagner Note 22 CS 70 Discrete Mathematics for CS Sprig 2008 David Wager Note 22 I.I.D. Radom Variables Estimatig the bias of a coi Questio: We wat to estimate the proportio p of Democrats i the US populatio, by takig

More information

Linear regression. Daniel Hsu (COMS 4771) (y i x T i β)2 2πσ. 2 2σ 2. 1 n. (x T i β y i ) 2. 1 ˆβ arg min. β R n d

Linear regression. Daniel Hsu (COMS 4771) (y i x T i β)2 2πσ. 2 2σ 2. 1 n. (x T i β y i ) 2. 1 ˆβ arg min. β R n d Liear regressio Daiel Hsu (COMS 477) Maximum likelihood estimatio Oe of the simplest liear regressio models is the followig: (X, Y ),..., (X, Y ), (X, Y ) are iid radom pairs takig values i R d R, ad Y

More information

GUIDELINES ON REPRESENTATIVE SAMPLING

GUIDELINES ON REPRESENTATIVE SAMPLING DRUGS WORKING GROUP VALIDATION OF THE GUIDELINES ON REPRESENTATIVE SAMPLING DOCUMENT TYPE : REF. CODE: ISSUE NO: ISSUE DATE: VALIDATION REPORT DWG-SGL-001 002 08 DECEMBER 2012 Ref code: DWG-SGL-001 Issue

More information

6.3 Testing Series With Positive Terms

6.3 Testing Series With Positive Terms 6.3. TESTING SERIES WITH POSITIVE TERMS 307 6.3 Testig Series With Positive Terms 6.3. Review of what is kow up to ow I theory, testig a series a i for covergece amouts to fidig the i= sequece of partial

More information

Random Variables, Sampling and Estimation

Random Variables, Sampling and Estimation Chapter 1 Radom Variables, Samplig ad Estimatio 1.1 Itroductio This chapter will cover the most importat basic statistical theory you eed i order to uderstad the ecoometric material that will be comig

More information

A statistical method to determine sample size to estimate characteristic value of soil parameters

A statistical method to determine sample size to estimate characteristic value of soil parameters A statistical method to determie sample size to estimate characteristic value of soil parameters Y. Hojo, B. Setiawa 2 ad M. Suzuki 3 Abstract Sample size is a importat factor to be cosidered i determiig

More information

Bayesian Methods: Introduction to Multi-parameter Models

Bayesian Methods: Introduction to Multi-parameter Models Bayesia Methods: Itroductio to Multi-parameter Models Parameter: θ = ( θ, θ) Give Likelihood p(y θ) ad prior p(θ ), the posterior p proportioal to p(y θ) x p(θ ) Margial posterior ( θ, θ y) is Iterested

More information

FACULTY OF MATHEMATICAL STUDIES MATHEMATICS FOR PART I ENGINEERING. Lectures

FACULTY OF MATHEMATICAL STUDIES MATHEMATICS FOR PART I ENGINEERING. Lectures FACULTY OF MATHEMATICAL STUDIES MATHEMATICS FOR PART I ENGINEERING Lectures MODULE 5 STATISTICS II. Mea ad stadard error of sample data. Biomial distributio. Normal distributio 4. Samplig 5. Cofidece itervals

More information

Chapter 6 Sampling Distributions

Chapter 6 Sampling Distributions Chapter 6 Samplig Distributios 1 I most experimets, we have more tha oe measuremet for ay give variable, each measuremet beig associated with oe radomly selected a member of a populatio. Hece we eed to

More information

Topic 9: Sampling Distributions of Estimators

Topic 9: Sampling Distributions of Estimators Topic 9: Samplig Distributios of Estimators Course 003, 2016 Page 0 Samplig distributios of estimators Sice our estimators are statistics (particular fuctios of radom variables), their distributio ca be

More information

This is an introductory course in Analysis of Variance and Design of Experiments.

This is an introductory course in Analysis of Variance and Design of Experiments. 1 Notes for M 384E, Wedesday, Jauary 21, 2009 (Please ote: I will ot pass out hard-copy class otes i future classes. If there are writte class otes, they will be posted o the web by the ight before class

More information

CS434a/541a: Pattern Recognition Prof. Olga Veksler. Lecture 5

CS434a/541a: Pattern Recognition Prof. Olga Veksler. Lecture 5 CS434a/54a: Patter Recogitio Prof. Olga Veksler Lecture 5 Today Itroductio to parameter estimatio Two methods for parameter estimatio Maimum Likelihood Estimatio Bayesia Estimatio Itroducto Bayesia Decisio

More information

Estimation for Complete Data

Estimation for Complete Data Estimatio for Complete Data complete data: there is o loss of iformatio durig study. complete idividual complete data= grouped data A complete idividual data is the oe i which the complete iformatio of

More information

Statistical Inference (Chapter 10) Statistical inference = learn about a population based on the information provided by a sample.

Statistical Inference (Chapter 10) Statistical inference = learn about a population based on the information provided by a sample. Statistical Iferece (Chapter 10) Statistical iferece = lear about a populatio based o the iformatio provided by a sample. Populatio: The set of all values of a radom variable X of iterest. Characterized

More information

Resampling Methods. X (1/2), i.e., Pr (X i m) = 1/2. We order the data: X (1) X (2) X (n). Define the sample median: ( n.

Resampling Methods. X (1/2), i.e., Pr (X i m) = 1/2. We order the data: X (1) X (2) X (n). Define the sample median: ( n. Jauary 1, 2019 Resamplig Methods Motivatio We have so may estimators with the property θ θ d N 0, σ 2 We ca also write θ a N θ, σ 2 /, where a meas approximately distributed as Oce we have a cosistet estimator

More information

Lecture 19: Convergence

Lecture 19: Convergence Lecture 19: Covergece Asymptotic approach I statistical aalysis or iferece, a key to the success of fidig a good procedure is beig able to fid some momets ad/or distributios of various statistics. I may

More information

If, for instance, we were required to test whether the population mean μ could be equal to a certain value μ

If, for instance, we were required to test whether the population mean μ could be equal to a certain value μ STATISTICAL INFERENCE INTRODUCTION Statistical iferece is that brach of Statistics i which oe typically makes a statemet about a populatio based upo the results of a sample. I oesample testig, we essetially

More information

4. Partial Sums and the Central Limit Theorem

4. Partial Sums and the Central Limit Theorem 1 of 10 7/16/2009 6:05 AM Virtual Laboratories > 6. Radom Samples > 1 2 3 4 5 6 7 4. Partial Sums ad the Cetral Limit Theorem The cetral limit theorem ad the law of large umbers are the two fudametal theorems

More information

Element sampling: Part 2

Element sampling: Part 2 Chapter 4 Elemet samplig: Part 2 4.1 Itroductio We ow cosider uequal probability samplig desigs which is very popular i practice. I the uequal probability samplig, we ca improve the efficiecy of the resultig

More information

Rates of Convergence by Moduli of Continuity

Rates of Convergence by Moduli of Continuity Rates of Covergece by Moduli of Cotiuity Joh Duchi: Notes for Statistics 300b March, 017 1 Itroductio I this ote, we give a presetatio showig the importace, ad relatioship betwee, the modulis of cotiuity

More information

CEE 522 Autumn Uncertainty Concepts for Geotechnical Engineering

CEE 522 Autumn Uncertainty Concepts for Geotechnical Engineering CEE 5 Autum 005 Ucertaity Cocepts for Geotechical Egieerig Basic Termiology Set A set is a collectio of (mutually exclusive) objects or evets. The sample space is the (collectively exhaustive) collectio

More information

6.867 Machine learning, lecture 7 (Jaakkola) 1

6.867 Machine learning, lecture 7 (Jaakkola) 1 6.867 Machie learig, lecture 7 (Jaakkola) 1 Lecture topics: Kerel form of liear regressio Kerels, examples, costructio, properties Liear regressio ad kerels Cosider a slightly simpler model where we omit

More information

Properties and Hypothesis Testing

Properties and Hypothesis Testing Chapter 3 Properties ad Hypothesis Testig 3.1 Types of data The regressio techiques developed i previous chapters ca be applied to three differet kids of data. 1. Cross-sectioal data. 2. Time series data.

More information

EECS564 Estimation, Filtering, and Detection Hwk 2 Solns. Winter p θ (z) = (2θz + 1 θ), 0 z 1

EECS564 Estimation, Filtering, and Detection Hwk 2 Solns. Winter p θ (z) = (2θz + 1 θ), 0 z 1 EECS564 Estimatio, Filterig, ad Detectio Hwk 2 Sols. Witer 25 4. Let Z be a sigle observatio havig desity fuctio where. p (z) = (2z + ), z (a) Assumig that is a oradom parameter, fid ad plot the maximum

More information

Chapter 3. Strong convergence. 3.1 Definition of almost sure convergence

Chapter 3. Strong convergence. 3.1 Definition of almost sure convergence Chapter 3 Strog covergece As poited out i the Chapter 2, there are multiple ways to defie the otio of covergece of a sequece of radom variables. That chapter defied covergece i probability, covergece i

More information

Stochastic Simulation

Stochastic Simulation Stochastic Simulatio 1 Itroductio Readig Assigmet: Read Chapter 1 of text. We shall itroduce may of the key issues to be discussed i this course via a couple of model problems. Model Problem 1 (Jackso

More information

It should be unbiased, or approximately unbiased. Variance of the variance estimator should be small. That is, the variance estimator is stable.

It should be unbiased, or approximately unbiased. Variance of the variance estimator should be small. That is, the variance estimator is stable. Chapter 10 Variace Estimatio 10.1 Itroductio Variace estimatio is a importat practical problem i survey samplig. Variace estimates are used i two purposes. Oe is the aalytic purpose such as costructig

More information

Frequentist Inference

Frequentist Inference Frequetist Iferece The topics of the ext three sectios are useful applicatios of the Cetral Limit Theorem. Without kowig aythig about the uderlyig distributio of a sequece of radom variables {X i }, for

More information

Introductory statistics

Introductory statistics CM9S: Machie Learig for Bioiformatics Lecture - 03/3/06 Itroductory statistics Lecturer: Sriram Sakararama Scribe: Sriram Sakararama We will provide a overview of statistical iferece focussig o the key

More information

CHAPTER 10 INFINITE SEQUENCES AND SERIES

CHAPTER 10 INFINITE SEQUENCES AND SERIES CHAPTER 10 INFINITE SEQUENCES AND SERIES 10.1 Sequeces 10.2 Ifiite Series 10.3 The Itegral Tests 10.4 Compariso Tests 10.5 The Ratio ad Root Tests 10.6 Alteratig Series: Absolute ad Coditioal Covergece

More information

Department of Mathematics

Department of Mathematics Departmet of Mathematics Ma 3/103 KC Border Itroductio to Probability ad Statistics Witer 2017 Lecture 19: Estimatio II Relevat textbook passages: Larse Marx [1]: Sectios 5.2 5.7 19.1 The method of momets

More information

Sample Size Estimation in the Proportional Hazards Model for K-sample or Regression Settings Scott S. Emerson, M.D., Ph.D.

Sample Size Estimation in the Proportional Hazards Model for K-sample or Regression Settings Scott S. Emerson, M.D., Ph.D. ample ie Estimatio i the Proportioal Haards Model for K-sample or Regressio ettigs cott. Emerso, M.D., Ph.D. ample ie Formula for a Normally Distributed tatistic uppose a statistic is kow to be ormally

More information

Machine Learning Brett Bernstein

Machine Learning Brett Bernstein Machie Learig Brett Berstei Week 2 Lecture: Cocept Check Exercises Starred problems are optioal. Excess Risk Decompositio 1. Let X = Y = {1, 2,..., 10}, A = {1,..., 10, 11} ad suppose the data distributio

More information

7-1. Chapter 4. Part I. Sampling Distributions and Confidence Intervals

7-1. Chapter 4. Part I. Sampling Distributions and Confidence Intervals 7-1 Chapter 4 Part I. Samplig Distributios ad Cofidece Itervals 1 7- Sectio 1. Samplig Distributio 7-3 Usig Statistics Statistical Iferece: Predict ad forecast values of populatio parameters... Test hypotheses

More information

Economics 241B Relation to Method of Moments and Maximum Likelihood OLSE as a Maximum Likelihood Estimator

Economics 241B Relation to Method of Moments and Maximum Likelihood OLSE as a Maximum Likelihood Estimator Ecoomics 24B Relatio to Method of Momets ad Maximum Likelihood OLSE as a Maximum Likelihood Estimator Uder Assumptio 5 we have speci ed the distributio of the error, so we ca estimate the model parameters

More information

DS 100: Principles and Techniques of Data Science Date: April 13, Discussion #10

DS 100: Principles and Techniques of Data Science Date: April 13, Discussion #10 DS 00: Priciples ad Techiques of Data Sciece Date: April 3, 208 Name: Hypothesis Testig Discussio #0. Defie these terms below as they relate to hypothesis testig. a) Data Geeratio Model: Solutio: A set

More information

Double Stage Shrinkage Estimator of Two Parameters. Generalized Exponential Distribution

Double Stage Shrinkage Estimator of Two Parameters. Generalized Exponential Distribution Iteratioal Mathematical Forum, Vol., 3, o. 3, 3-53 HIKARI Ltd, www.m-hikari.com http://dx.doi.org/.9/imf.3.335 Double Stage Shrikage Estimator of Two Parameters Geeralized Expoetial Distributio Alaa M.

More information

Distribution of Random Samples & Limit theorems

Distribution of Random Samples & Limit theorems STAT/MATH 395 A - PROBABILITY II UW Witer Quarter 2017 Néhémy Lim Distributio of Radom Samples & Limit theorems 1 Distributio of i.i.d. Samples Motivatig example. Assume that the goal of a study is to

More information

KLMED8004 Medical statistics. Part I, autumn Estimation. We have previously learned: Population and sample. New questions

KLMED8004 Medical statistics. Part I, autumn Estimation. We have previously learned: Population and sample. New questions We have previously leared: KLMED8004 Medical statistics Part I, autum 00 How kow probability distributios (e.g. biomial distributio, ormal distributio) with kow populatio parameters (mea, variace) ca give

More information

Probability, Expectation Value and Uncertainty

Probability, Expectation Value and Uncertainty Chapter 1 Probability, Expectatio Value ad Ucertaity We have see that the physically observable properties of a quatum system are represeted by Hermitea operators (also referred to as observables ) such

More information

7.1 Convergence of sequences of random variables

7.1 Convergence of sequences of random variables Chapter 7 Limit Theorems Throughout this sectio we will assume a probability space (, F, P), i which is defied a ifiite sequece of radom variables (X ) ad a radom variable X. The fact that for every ifiite

More information

Regression with an Evaporating Logarithmic Trend

Regression with an Evaporating Logarithmic Trend Regressio with a Evaporatig Logarithmic Tred Peter C. B. Phillips Cowles Foudatio, Yale Uiversity, Uiversity of Aucklad & Uiversity of York ad Yixiao Su Departmet of Ecoomics Yale Uiversity October 5,

More information

Convergence of random variables. (telegram style notes) P.J.C. Spreij

Convergence of random variables. (telegram style notes) P.J.C. Spreij Covergece of radom variables (telegram style otes).j.c. Spreij this versio: September 6, 2005 Itroductio As we kow, radom variables are by defiitio measurable fuctios o some uderlyig measurable space

More information

Chapter 6 Principles of Data Reduction

Chapter 6 Principles of Data Reduction Chapter 6 for BST 695: Special Topics i Statistical Theory. Kui Zhag, 0 Chapter 6 Priciples of Data Reductio Sectio 6. Itroductio Goal: To summarize or reduce the data X, X,, X to get iformatio about a

More information

t distribution [34] : used to test a mean against an hypothesized value (H 0 : µ = µ 0 ) or the difference

t distribution [34] : used to test a mean against an hypothesized value (H 0 : µ = µ 0 ) or the difference EXST30 Backgroud material Page From the textbook The Statistical Sleuth Mea [0]: I your text the word mea deotes a populatio mea (µ) while the work average deotes a sample average ( ). Variace [0]: The

More information

Exponential Families and Bayesian Inference

Exponential Families and Bayesian Inference Computer Visio Expoetial Families ad Bayesia Iferece Lecture Expoetial Families A expoetial family of distributios is a d-parameter family f(x; havig the followig form: f(x; = h(xe g(t T (x B(, (. where

More information

AAEC/ECON 5126 FINAL EXAM: SOLUTIONS

AAEC/ECON 5126 FINAL EXAM: SOLUTIONS AAEC/ECON 5126 FINAL EXAM: SOLUTIONS SPRING 2015 / INSTRUCTOR: KLAUS MOELTNER This exam is ope-book, ope-otes, but please work strictly o your ow. Please make sure your ame is o every sheet you re hadig

More information

Statistical Inference Based on Extremum Estimators

Statistical Inference Based on Extremum Estimators T. Rotheberg Fall, 2007 Statistical Iferece Based o Extremum Estimators Itroductio Suppose 0, the true value of a p-dimesioal parameter, is kow to lie i some subset S R p : Ofte we choose to estimate 0

More information

Sequences. Notation. Convergence of a Sequence

Sequences. Notation. Convergence of a Sequence Sequeces A sequece is essetially just a list. Defiitio (Sequece of Real Numbers). A sequece of real umbers is a fuctio Z (, ) R for some real umber. Do t let the descriptio of the domai cofuse you; it

More information

Stat 421-SP2012 Interval Estimation Section

Stat 421-SP2012 Interval Estimation Section Stat 41-SP01 Iterval Estimatio Sectio 11.1-11. We ow uderstad (Chapter 10) how to fid poit estimators of a ukow parameter. o However, a poit estimate does ot provide ay iformatio about the ucertaity (possible

More information

On an Application of Bayesian Estimation

On an Application of Bayesian Estimation O a Applicatio of ayesia Estimatio KIYOHARU TANAKA School of Sciece ad Egieerig, Kiki Uiversity, Kowakae, Higashi-Osaka, JAPAN Email: ktaaka@ifokidaiacjp EVGENIY GRECHNIKOV Departmet of Mathematics, auma

More information

1.010 Uncertainty in Engineering Fall 2008

1.010 Uncertainty in Engineering Fall 2008 MIT OpeCourseWare http://ocw.mit.edu.00 Ucertaity i Egieerig Fall 2008 For iformatio about citig these materials or our Terms of Use, visit: http://ocw.mit.edu.terms. .00 - Brief Notes # 9 Poit ad Iterval

More information

Statisticians use the word population to refer the total number of (potential) observations under consideration

Statisticians use the word population to refer the total number of (potential) observations under consideration 6 Samplig Distributios Statisticias use the word populatio to refer the total umber of (potetial) observatios uder cosideratio The populatio is just the set of all possible outcomes i our sample space

More information

Topic 9: Sampling Distributions of Estimators

Topic 9: Sampling Distributions of Estimators Topic 9: Samplig Distributios of Estimators Course 003, 2018 Page 0 Samplig distributios of estimators Sice our estimators are statistics (particular fuctios of radom variables), their distributio ca be

More information

Monte Carlo Integration

Monte Carlo Integration Mote Carlo Itegratio I these otes we first review basic umerical itegratio methods (usig Riema approximatio ad the trapezoidal rule) ad their limitatios for evaluatig multidimesioal itegrals. Next we itroduce

More information

Advanced Stochastic Processes.

Advanced Stochastic Processes. Advaced Stochastic Processes. David Gamarik LECTURE 2 Radom variables ad measurable fuctios. Strog Law of Large Numbers (SLLN). Scary stuff cotiued... Outlie of Lecture Radom variables ad measurable fuctios.

More information

Simulation. Two Rule For Inverting A Distribution Function

Simulation. Two Rule For Inverting A Distribution Function Simulatio Two Rule For Ivertig A Distributio Fuctio Rule 1. If F(x) = u is costat o a iterval [x 1, x 2 ), the the uiform value u is mapped oto x 2 through the iversio process. Rule 2. If there is a jump

More information

OPTIMAL ALGORITHMS -- SUPPLEMENTAL NOTES

OPTIMAL ALGORITHMS -- SUPPLEMENTAL NOTES OPTIMAL ALGORITHMS -- SUPPLEMENTAL NOTES Peter M. Maurer Why Hashig is θ(). As i biary search, hashig assumes that keys are stored i a array which is idexed by a iteger. However, hashig attempts to bypass

More information

Tests of Hypotheses Based on a Single Sample (Devore Chapter Eight)

Tests of Hypotheses Based on a Single Sample (Devore Chapter Eight) Tests of Hypotheses Based o a Sigle Sample Devore Chapter Eight MATH-252-01: Probability ad Statistics II Sprig 2018 Cotets 1 Hypothesis Tests illustrated with z-tests 1 1.1 Overview of Hypothesis Testig..........

More information

PSYCHOLOGICAL RESEARCH (PYC 304-C) Lecture 9

PSYCHOLOGICAL RESEARCH (PYC 304-C) Lecture 9 Hypothesis testig PSYCHOLOGICAL RESEARCH (PYC 34-C Lecture 9 Statistical iferece is that brach of Statistics i which oe typically makes a statemet about a populatio based upo the results of a sample. I

More information

II. Descriptive Statistics D. Linear Correlation and Regression. 1. Linear Correlation

II. Descriptive Statistics D. Linear Correlation and Regression. 1. Linear Correlation II. Descriptive Statistics D. Liear Correlatio ad Regressio I this sectio Liear Correlatio Cause ad Effect Liear Regressio 1. Liear Correlatio Quatifyig Liear Correlatio The Pearso product-momet correlatio

More information

EXAMINATIONS OF THE ROYAL STATISTICAL SOCIETY

EXAMINATIONS OF THE ROYAL STATISTICAL SOCIETY EXAMINATIONS OF THE ROYAL STATISTICAL SOCIETY GRADUATE DIPLOMA, 016 MODULE : Statistical Iferece Time allowed: Three hours Cadidates should aswer FIVE questios. All questios carry equal marks. The umber

More information

Problem Set 4 Due Oct, 12

Problem Set 4 Due Oct, 12 EE226: Radom Processes i Systems Lecturer: Jea C. Walrad Problem Set 4 Due Oct, 12 Fall 06 GSI: Assae Gueye This problem set essetially reviews detectio theory ad hypothesis testig ad some basic otios

More information

Topic 9: Sampling Distributions of Estimators

Topic 9: Sampling Distributions of Estimators Topic 9: Samplig Distributios of Estimators Course 003, 2018 Page 0 Samplig distributios of estimators Sice our estimators are statistics (particular fuctios of radom variables), their distributio ca be

More information

6. Sufficient, Complete, and Ancillary Statistics

6. Sufficient, Complete, and Ancillary Statistics Sufficiet, Complete ad Acillary Statistics http://www.math.uah.edu/stat/poit/sufficiet.xhtml 1 of 7 7/16/2009 6:13 AM Virtual Laboratories > 7. Poit Estimatio > 1 2 3 4 5 6 6. Sufficiet, Complete, ad Acillary

More information

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.265/15.070J Fall 2013 Lecture 2 9/9/2013. Large Deviations for i.i.d. Random Variables

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.265/15.070J Fall 2013 Lecture 2 9/9/2013. Large Deviations for i.i.d. Random Variables MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.265/15.070J Fall 2013 Lecture 2 9/9/2013 Large Deviatios for i.i.d. Radom Variables Cotet. Cheroff boud usig expoetial momet geeratig fuctios. Properties of a momet

More information

Lecture 9: September 19

Lecture 9: September 19 36-700: Probability ad Mathematical Statistics I Fall 206 Lecturer: Siva Balakrisha Lecture 9: September 9 9. Review ad Outlie Last class we discussed: Statistical estimatio broadly Pot estimatio Bias-Variace

More information

Statistical inference: example 1. Inferential Statistics

Statistical inference: example 1. Inferential Statistics Statistical iferece: example 1 Iferetial Statistics POPULATION SAMPLE A clothig store chai regularly buys from a supplier large quatities of a certai piece of clothig. Each item ca be classified either

More information

R. van Zyl 1, A.J. van der Merwe 2. Quintiles International, University of the Free State

R. van Zyl 1, A.J. van der Merwe 2. Quintiles International, University of the Free State Bayesia Cotrol Charts for the Two-parameter Expoetial Distributio if the Locatio Parameter Ca Take o Ay Value Betwee Mius Iity ad Plus Iity R. va Zyl, A.J. va der Merwe 2 Quitiles Iteratioal, ruaavz@gmail.com

More information

U8L1: Sec Equations of Lines in R 2

U8L1: Sec Equations of Lines in R 2 MCVU U8L: Sec. 8.9. Equatios of Lies i R Review of Equatios of a Straight Lie (-D) Cosider the lie passig through A (-,) with slope, as show i the diagram below. I poit slope form, the equatio of the lie

More information

Investigating the Significance of a Correlation Coefficient using Jackknife Estimates

Investigating the Significance of a Correlation Coefficient using Jackknife Estimates Iteratioal Joural of Scieces: Basic ad Applied Research (IJSBAR) ISSN 2307-4531 (Prit & Olie) http://gssrr.org/idex.php?joural=jouralofbasicadapplied ---------------------------------------------------------------------------------------------------------------------------

More information

Lecture 3. Properties of Summary Statistics: Sampling Distribution

Lecture 3. Properties of Summary Statistics: Sampling Distribution Lecture 3 Properties of Summary Statistics: Samplig Distributio Mai Theme How ca we use math to justify that our umerical summaries from the sample are good summaries of the populatio? Lecture Summary

More information

Estimation of a population proportion March 23,

Estimation of a population proportion March 23, 1 Social Studies 201 Notes for March 23, 2005 Estimatio of a populatio proportio Sectio 8.5, p. 521. For the most part, we have dealt with meas ad stadard deviatios this semester. This sectio of the otes

More information

Econ 325 Notes on Point Estimator and Confidence Interval 1 By Hiro Kasahara

Econ 325 Notes on Point Estimator and Confidence Interval 1 By Hiro Kasahara Poit Estimator Eco 325 Notes o Poit Estimator ad Cofidece Iterval 1 By Hiro Kasahara Parameter, Estimator, ad Estimate The ormal probability desity fuctio is fully characterized by two costats: populatio

More information

Chapter 2 The Monte Carlo Method

Chapter 2 The Monte Carlo Method Chapter 2 The Mote Carlo Method The Mote Carlo Method stads for a broad class of computatioal algorithms that rely o radom sampligs. It is ofte used i physical ad mathematical problems ad is most useful

More information

January 25, 2017 INTRODUCTION TO MATHEMATICAL STATISTICS

January 25, 2017 INTRODUCTION TO MATHEMATICAL STATISTICS Jauary 25, 207 INTRODUCTION TO MATHEMATICAL STATISTICS Abstract. A basic itroductio to statistics assumig kowledge of probability theory.. Probability I a typical udergraduate problem i probability, we

More information

Binomial Distribution

Binomial Distribution 0.0 0.5 1.0 1.5 2.0 2.5 3.0 0 1 2 3 4 5 6 7 0.0 0.5 1.0 1.5 2.0 2.5 3.0 Overview Example: coi tossed three times Defiitio Formula Recall that a r.v. is discrete if there are either a fiite umber of possible

More information

n n i=1 Often we also need to estimate the variance. Below are three estimators each of which is optimal in some sense: n 1 i=1 k=1 i=1 k=1 i=1 k=1

n n i=1 Often we also need to estimate the variance. Below are three estimators each of which is optimal in some sense: n 1 i=1 k=1 i=1 k=1 i=1 k=1 MATH88T Maria Camero Cotets Basic cocepts of statistics Estimators, estimates ad samplig distributios 2 Ordiary least squares estimate 3 3 Maximum lielihood estimator 3 4 Bayesia estimatio Refereces 9

More information

Basics of Probability Theory (for Theory of Computation courses)

Basics of Probability Theory (for Theory of Computation courses) Basics of Probability Theory (for Theory of Computatio courses) Oded Goldreich Departmet of Computer Sciece Weizma Istitute of Sciece Rehovot, Israel. oded.goldreich@weizma.ac.il November 24, 2008 Preface.

More information

Lecture 11 October 27

Lecture 11 October 27 STATS 300A: Theory of Statistics Fall 205 Lecture October 27 Lecturer: Lester Mackey Scribe: Viswajith Veugopal, Vivek Bagaria, Steve Yadlowsky Warig: These otes may cotai factual ad/or typographic errors..

More information

7.1 Convergence of sequences of random variables

7.1 Convergence of sequences of random variables Chapter 7 Limit theorems Throughout this sectio we will assume a probability space (Ω, F, P), i which is defied a ifiite sequece of radom variables (X ) ad a radom variable X. The fact that for every ifiite

More information

There is no straightforward approach for choosing the warmup period l.

There is no straightforward approach for choosing the warmup period l. B. Maddah INDE 504 Discrete-Evet Simulatio Output Aalysis () Statistical Aalysis for Steady-State Parameters I a otermiatig simulatio, the iterest is i estimatig the log ru steady state measures of performace.

More information

Goodness-of-Fit Tests and Categorical Data Analysis (Devore Chapter Fourteen)

Goodness-of-Fit Tests and Categorical Data Analysis (Devore Chapter Fourteen) Goodess-of-Fit Tests ad Categorical Data Aalysis (Devore Chapter Fourtee) MATH-252-01: Probability ad Statistics II Sprig 2019 Cotets 1 Chi-Squared Tests with Kow Probabilities 1 1.1 Chi-Squared Testig................

More information

CSE 527, Additional notes on MLE & EM

CSE 527, Additional notes on MLE & EM CSE 57 Lecture Notes: MLE & EM CSE 57, Additioal otes o MLE & EM Based o earlier otes by C. Grat & M. Narasimha Itroductio Last lecture we bega a examiatio of model based clusterig. This lecture will be

More information

Approximate Confidence Interval for the Reciprocal of a Normal Mean with a Known Coefficient of Variation

Approximate Confidence Interval for the Reciprocal of a Normal Mean with a Known Coefficient of Variation Metodološki zvezki, Vol. 13, No., 016, 117-130 Approximate Cofidece Iterval for the Reciprocal of a Normal Mea with a Kow Coefficiet of Variatio Wararit Paichkitkosolkul 1 Abstract A approximate cofidece

More information

CS284A: Representations and Algorithms in Molecular Biology

CS284A: Representations and Algorithms in Molecular Biology CS284A: Represetatios ad Algorithms i Molecular Biology Scribe Notes o Lectures 3 & 4: Motif Discovery via Eumeratio & Motif Represetatio Usig Positio Weight Matrix Joshua Gervi Based o presetatios by

More information

ECE 901 Lecture 12: Complexity Regularization and the Squared Loss

ECE 901 Lecture 12: Complexity Regularization and the Squared Loss ECE 90 Lecture : Complexity Regularizatio ad the Squared Loss R. Nowak 5/7/009 I the previous lectures we made use of the Cheroff/Hoeffdig bouds for our aalysis of classifier errors. Hoeffdig s iequality

More information

Chapter 8: Estimating with Confidence

Chapter 8: Estimating with Confidence Chapter 8: Estimatig with Cofidece Sectio 8.2 The Practice of Statistics, 4 th editio For AP* STARNES, YATES, MOORE Chapter 8 Estimatig with Cofidece 8.1 Cofidece Itervals: The Basics 8.2 8.3 Estimatig

More information

Basis for simulation techniques

Basis for simulation techniques Basis for simulatio techiques M. Veeraraghava, March 7, 004 Estimatio is based o a collectio of experimetal outcomes, x, x,, x, where each experimetal outcome is a value of a radom variable. x i. Defiitios

More information

6 Sample Size Calculations

6 Sample Size Calculations 6 Sample Size Calculatios Oe of the major resposibilities of a cliical trial statisticia is to aid the ivestigators i determiig the sample size required to coduct a study The most commo procedure for determiig

More information

Because it tests for differences between multiple pairs of means in one test, it is called an omnibus test.

Because it tests for differences between multiple pairs of means in one test, it is called an omnibus test. Math 308 Sprig 018 Classes 19 ad 0: Aalysis of Variace (ANOVA) Page 1 of 6 Itroductio ANOVA is a statistical procedure for determiig whether three or more sample meas were draw from populatios with equal

More information

ENGI 4421 Confidence Intervals (Two Samples) Page 12-01

ENGI 4421 Confidence Intervals (Two Samples) Page 12-01 ENGI 44 Cofidece Itervals (Two Samples) Page -0 Two Sample Cofidece Iterval for a Differece i Populatio Meas [Navidi sectios 5.4-5.7; Devore chapter 9] From the cetral limit theorem, we kow that, for sufficietly

More information