Applied Bayesian Nonparametric Mixture Modeling Session 1 Introduction/Dirichlet process priors

Size: px

Start display at page:

Download "Applied Bayesian Nonparametric Mixture Modeling Session 1 Introduction/Dirichlet process priors"

Arleen Scott
5 years ago
Views:

1 Applied Bayesian Nnparametric Mixture Mdeling Sessin 1 Intrductin/Dirichlet prcess prirs Athanasis Kttas (thans@ams.ucsc.edu) Abel Rdriguez (abel@ams.ucsc.edu) University f Califrnia, Santa Cruz 2014 ISBA Wrld Meeting Cancún, Mexic Sunday July 13, 2014

2 Outline 1 Intrductin and mtivatin 2 Definitin f the Dirichlet prcess 3 Cnstructive definitin f the Dirichlet prcess 4 Pólya urn representatin f the Dirichlet prcess 5 Psterir inference 6 Applicatins

3 Mtivating example: ROC curves fr bimarker evaluatin Individuals are drawn frm ne f tw ppulatins (call them Healthy and Diseased). We measure a univariate bimarker Y in rder t diagnse a disease (e.g., prstate-specific antigen PSA t detect prstate cancer). Individuals with an elevated bimarker (i.e., Y > t fr sme threshld t) are declared Diseased. Sensitivity (r true psitive rate) Specificity (r true negative rate) We assume that a gld-standard is available: disease status can be assumed knwn fr the training set used t estimate the reliability f the diagnstic test.

4 Mtivating example: ROC curves fr bimarker evaluatin Let F H and F D be the c.d.f.s f the bimarker fr the healthy and diseased grups. FPR(t) = 1 F H (t) = F H (t). FNR(t) = F D (t) = 1 F D (t). Selecting t implies a tradeff between FPR and FNR! Healthy Diseased FNR FPR Bimarker

5 Mtivating example: ROC curves fr bimarker evaluatin That tradeff is captured by the ROC curve, implicitly defined as {(FNR(t), TPR(t)) : t (, )} Alternatively ROC(u) = 1 F D (F 1 (1 u)) fr u [0, 1]. H Hence, estimating the ROC curve requires estimatin f tw distributin functins. TPR FPR

6 Mtivating example: ROC curves fr bimarker evaluatin Bld PSA levels fr 454 healthy and 229 diseased males, with ages ranging frm 47 t 81 years (Etzini et al., 1999). Healthy Diseased Density Density Lg ttal PSA Lg ttal PSA Tw simple (frequentist) analyses fr the data invlve Parametric: Assume nrmality f lgscres (skewness?). Nnparametric: Estimate c.d.f.s thrugh the empirical c.d.f.s.

7 Mtivating example: ROC curves fr bimarker evaluatin ROC curves and btstrap-based pintwise cnfidence intervals fr bth simple analyses. Sensitivity Sensitivity Sensitivity Specificity Specificity Specificity As a Bayesian, btaining an analg t the left-hand picture is easy. But hw d we get a picture that is analgus t the ne n the center?

8 Parametric vs. nnparametric Bayes: A simple example Let y i Y, y i F iid F, F F, F = {N(y µ, τ 2 ), µ R, τ R + }. In this parametric specificatin a prir n F bils dwn t a prir n (µ, τ 2 ). Hwever, F is tiny cmpared t F = {All distributins n Y}. Nnparametric Bayes invlves prirs n much larger subsets f F (infinitedimensinal spaces). One handy way t d this is t use stchastic prcesses.

9 Bayesian nnparametrics Bayesian nnparametrics prirs n spaces f functins, {g( ) : g G} (infinite dimensinal spaces). An xymrn? Even thugh we fcus n distributin functins, the methds are much mre widely useful: hazard r cumulative hazard functin, link functin, calibratin functin... Mre generally, enriching usual parametric mdels, typically leading t semiparametric mdels. Wandering nnparametrically near a standard class. What makes a nnparametric mdel gd? (e.g., Fergusn, 1973) The mdel shuld be tractable, i.e., it shuld be easily cmputed, either analytically r thrugh simulatins. The mdel shuld be rich, in the sense f having large supprt. The hyperparameters in the mdel shuld be easily interpretable.

10 Sme references General review papers n Bayesian nnparametrics: Walker, Damien, Laud and Smith (1999); Müller and Quintana (2004); Hansn, Branscum and Jhnsn (2005). Review papers n specific applicatin areas f Bayesian nnparametric and semiparametric methds: Hjrt (1996); Sinha and Dey (1997); Gelfand (1999). Bks: Dey, Müller and Sinha (1998) (edited vlume with a cllectin f papers, mainly, n applicatins f Bayesian nnparametrics); Ghsh and Ramamrthi (2003) (emphasis n theretical develpment f Bayesian nnparametric prirs); Hjrt, Hlmes, Müller and Walker (2010) (edited vlume with an verview f thery, methds, and sme applicatins f Bayesian nnparametrics); Müller and Rdriguez (2013) (lecture ntes frm a week-lng curse at UCSC). Sftware: functins fr nnparametric Bayesian inference are spread ver varius R packages. Only cmprehensive package we are aware f is the DPpackage.

11 The Dirichlet prcess as a mdel fr randm distributins A Bayesian nnparametric apprach t mdeling, say, distributin functins, requires prirs fr spaces f distributin functins. Frmally, it requires stchastic prcesses with sample paths that are distributin functins defined n an apprpriate sample space X (e.g., X = R, r R +, r R d ), equipped with a σ-field B f subsets f X (e.g., the Brel σ-field fr X R d ) The Dirichlet prcess (DP), anticipated in the wrk f Freedman (1963) and Fabius (1964), and frmally develped by Fergusn (1973, 1974), is the first prir defined fr spaces f distributin functins. The DP is, frmally, a (randm) prbability measure n the space f prbability measures (distributins) n (X, B) Hence, the DP generates randm distributins n (X, B), and thus, fr X R d, equivalently, randm c.d.f.s n X

12 Mtivating the cnstructin f the Dirichlet prcess Suppse yu are dealing with a sample space with nly tw utcmes, say, X = {0, 1} and yu are interested in estimating x, the prbability f bserving 1. A natural prir fr x is a beta distributin, p(x) = Γ(a + b) Γ(a)Γ(b) x a 1 (1 x) b 1, 0 x 1. Mre generally, if X is finite with q elements, the prbability distributin ver X is given by q numbers x 1,..., x q such that q i=1 x i = 1. A natural prir fr (x 1,..., x q ), which generalizes the Beta distributin, is the Dirichlet distributin (see the next tw slides). With the Dirichlet prcess we further generalize t infinite but cuntable spaces.

13 Prperties f the Dirichlet distributin Start with independent randm variables with a j > 0. Define Z j Gam(a j, 1), j = 1,..., k, Y j = Z j k l=1 Z, j = 1,..., k. l Then (Y 1,..., Y k ) Dirichlet(a 1,..., a k ). This distributin is singular w.r.t. Lebesgue measure n R k, since k j=1 Y j = 1.

14 Prperties f the Dirichlet distributin (Y 1,..., Y k 1 ) has density ( k ) Γ j=1 a ( j k j=1 Γ (a 1 k 1 j) j=1 y j ) ak 1 k 1 j=1 Nte that fr k = 2, Dirichlet(a 1, a 2 ) Beta(a 1, a 2 ). The mments f the Dirichlet distributin are: E(Y j ) = y a j 1 j. a j k l=1 a, E(Y 2 a j (a j + 1) j ) = k l l=1 a l(1 + k l=1 a l), a i a j E(Y i Y j ) = k l=1 a l(1 + k l=1 a l). We can think abut the Dirichlet as having tw parameters: g = {a j / k l=1 a l}, the mean vectr. α = k l=1 a l, a cncentratin parameter cntrlling its variance.

15 Definitin f the Dirichlet prcess The DP is characterized by tw parameters: A psitive scalar parameter α. A specified prbability measure n (X, B), Q 0 (r, equivalently, a distributin functin n X, G 0). DEFINITION (Fergusn, 1973): The DP generates randm prbability measures (randm distributins) Q n (X, B) such that fr any finite measurable partitin B 1,...,B k f X, (Q(B 1 ),..., Q(B k )) Dirichlet(αQ 0 (B 1 ),..., αq 0 (B k )). Here, Q(B i ) (a randm variable) and Q 0(B i ) (a cnstant) dente the prbability f set B i under Q and Q 0, respectively. Als, the B i, i = 1,..., k, define a measurable partitin if B i B, they are pairwise disjint, and their unin is X.

16 Interpreting the parameters f the Dirichlet prcess Fr any measurable subset B f X, we have frm the definitin that Q(B) beta(αq 0 (B), αq 0 (B c )), and thus E {Q(B)} = Q 0 (B), Var {Q(B)} = Q 0(B){1 Q 0 (B)} α + 1 Q 0 plays the rle f the center f the DP (als referred t as baseline prbability measure, r baseline distributin). α can be viewed as a precisin parameter: fr large α there is small variability in DP realizatins; the larger α is, the clser we expect a realizatin Q frm the prcess t be t Q 0. See Fergusn (1973) fr the rle f Q 0 n mre technical prperties f the DP (e.g., Fergusn shws that the supprt f the DP cntains all prbability measures n (X, B) that are abslutely cntinuus w.r.t. Q 0 ).

17 Interpreting the parameters f the Dirichlet prcess Analgus definitin fr the randm distributin functin G n X generated frm a DP with parameters α and G 0, a specified distributin functin n X Fr example, with X = R, B = (, x], x R, and Q(B) = G(x), and thus G(x) beta(αg 0 (x), α{1 G 0 (x)}), E {G(x)} = G 0 (x), Var {G(x)} = G 0(x){1 G 0 (x)}. α + 1 Ntatin: depending n the cntext, G will dente either the randm prbability measure r the randm distributin functin. G DP(α, G 0 ) will indicate that a DP prir is placed n G.

18 Simulating c.d.f. realizatins frm a Dirichlet prcess Cnsider any grid f pints x 1 < x 2 <... < x k in X R. Then, the randm vectr (G(x 1), G(x 2) G(x 1),..., G(x k ) G(x k 1 ), 1 G(x k )) fllws a Dirichlet distributin with parameter vectr (αg 0(x 1), α(g 0(x 2) G 0(x 1)),..., α(g 0(x k ) G 0(x k 1 )), α(1 G 0(x k ))) Hence, if (u 1, u 2,..., u k ) is a draw frm this Dirichlet distributin, then (u 1,..., i j=1 u j,..., k j=1 u j) is a draw frm the distributin f (G(x 1 ),..., G(x i ),..., G(x k )).

19 Simulating c.d.f. realizatins frm a Dirichlet prcess library(mcmcpack) alpha = 1 x = seq(0.005, 0.995, by=0.005) k = length(x) dg0 = punif(x[-1],0,1)-punif(x[-k],0,1) dg = rdirichlet(1, alpha*dg0) G = cumsum(dg) Distributin x

20 Simulating c.d.f. realizatins frm a Dirichlet prcess α = 0.1 α = 10 α = 100 Distributin Distributin Distributin x x x Figure 1.1: Realizatins frm a Dirichlet prcess with different values fr the precisin parameter. The slid black line crrespnds t the baseline measure (a unifrm n the interval [0, 1] in this case), while the dashed clred lines represent multiple realizatins.

21 Cnstructive definitin f the DP Due t Sethuraman and Tiwari (1982) and Sethuraman (1994). Let {z r : r = 1, 2,...} and {ϑ l : l = 1, 2,...} be independent sequences f i.i.d. randm variables z r beta(1, α), r = 1, 2,... ϑ l G 0, l = 1, 2,... Define ω 1 = z 1 and ω l = z l l 1 r=1 (1 z r ), fr l = 2, 3,... Then, a realizatin G frm DP(α, G 0 ) is (almst surely) f the frm G( ) = ω l δ ϑl ( ) l=1 where δ z ( ) dentes a pint mass at z. Nte this definitin implies l=1 ω l = 1 (almst surely).

22 Mre n the cnstructive definitin f the DP The DP generates distributins that have an (almst sure) representatin as cuntable mixtures f pint masses: The lcatins ϑ l are i.i.d. draws frm the base distributin. The assciated weights ω l are defined using the stick-breaking (smetimes called residual allcatin) cnstructin. This is nt as restrictive as it might sund: Any distributin n R d can be apprximated arbitrarily well using a cuntable mixture f pint masses. The realizatins we shwed befre already hinted at this fact.

23 The stick-breaking cnstructin Start with a stick f length 1 (representing the ttal prbability t be distributed amng the different atms). Draw a randm z 1 beta(1, α), which defines the prtin f the riginal stick assigned t atm 1, s that ω 1 = z 1 then, the remaining part f the stick has length 1 z 1. Draw a randm z 2 beta(1, α) (independently f z 1 ), which defines the prtin f the remaining stick assigned t atm 2, therefre, ω 2 = z 2 (1 z 1 ) nw, the remaining part f the stick has length (1 z 2 )(1 z 1 ). Cntinue ad infinitum... We dente the jint distributin n the vectr f weights by (ω 1, ω 2,...) SB(α).

24 The stick-breaking cnstructin

25 Mre n the cnstructive definitin f the DP The DP cnstructive definitin yields anther methd t simulate frm DP prirs in fact, it prvides (up t a truncatin apprximatin) the entire distributin G, nt just c.d.f. sample paths. Fr example, a pssible apprximatin is G J = J p j δ ϑj, with p j = ω j fr j = 1,..., J 1, and p J = 1 J 1 j=1 ω j = J 1 r=1 (1 z r ). T specify J, nte that J J E = 1 E(1 z r ) = 1 j=1 ω j r=1 J r=1 j=1 ( ) J α α α + 1 = 1 α + 1 Hence, J culd be chsen such that (α/(α + 1)) J = ε, fr small ε.

26 Mre n the cnstructive definitin f the DP w P(X<x) x x Figure 1.2: Illustratin fr a DP with G 0 = N(0, 1) and α = 20. In the left panel, the spiked lines are lcated at 1000 N(0, 1) draws with heights given by the (truncated) stick-breaking weights. These spikes are then summed t generate ne c.d.f. sample path. The right panel shws 8 such sample paths indicated by the lighter jagged lines. The heavy smth line indicates the N(0, 1) c.d.f.

27 Generalizing the DP Many randm prbability measures can be defined by means f a stick-breaking cnstructin the z r are drawn independently frm a distributin n [0, 1]. Fr example, the Beta tw-parameter prcess (Ishwaran and Zarepur, 2000) is defined by chsing z r beta(a, b). If z r beta(1 a, b + ra), fr r = 1, 2,... and sme a [0, 1) and b ( a, ) we btain the tw-parameter Pissn-Dirichlet prcess (e.g., Pitman and Yr, 1997). The general case, z r beta(a r, b r ) (Ishwaran and James, 2001). The prbit stick-breaking prcess, where z r = Φ(x r ) with x r N(µ, σ 2 ) and Φ denting the standard nrmal c.d.f. (Rdríguez and Dunsn, 2011).

28 Further extensins based n the DP cnstructive definitin The cnstructive definitin f the DP has mtivated several f its extensins, including: ɛ-dp (Muliere and Tardella, 1998), generalized DPs (Hjrt, 2000); general stick-breaking prirs (Ishwaran and James, 2001). Dependent DP prirs (MacEachern, 1999, 2000; De Iri et al., 2004; Griffin and Steel, 2006). Hierarchical DPs (Tmlinsn and Escbar, 1999; Teh et al., 2006). Spatial DPs mdels (Gelfand, Kttas and MacEachern, 2005; Kttas, Duan and Gelfand, 2008; Duan, Guindani and Gelfand, 2007). Nested DPs (Rdriguez, Dunsn and Gelfand, 2008).

29 Pólya urn characterizatin f the DP If, fr i = 1,..., n, x i G are i.i.d. frm G, and G DP(α, G 0 ), the jint distributin fr the x i induced by marginalizing G ver its DP prir is given by n α p(x 1,..., x n ) = G 0 (x 1 ) α + i 1 G 1 i 1 0(x i ) + δ xj (x i ) α + i 1 i=2 (Blackwell and MacQueen, 1973). That is, the sequence f the x i fllws a generalized Pólya urn scheme such that: x 1 G 0, and fr any i = 2,..., n, x i x 1,..., x i 1 fllws the mixed distributin that places pint mass (α + i 1) 1 at x j, j = 1,..., i 1, and cntinuus mass α(α + i 1) 1 n G 0. j=1

30 The Chinese restaurant prcess The Pólya urn characterizatin f the DP can be visualized using the Chinese restaurant analgy: A custmer arriving at the restaurant jins a table that already has sme custmers, with prbability prprtinal t the number f peple in the table, r takes the first seat at a new table with prbability prprtinal t α. All custmers sitting in the same table share a dish.

31 Prir t psterir updating with DP prirs Let G dente the randm distributin functin fr the fllwing results. Fergusn (1973) has shwn that if the bservatins y i G are i.i.d. frm G, i = 1,..., n, and G DP(α, G 0 ), then the psterir distributin f G is a DP( α, G 0 ), with α = α + n, and G 0 (t) = α α + n G 0(t) + 1 α + n n 1 [yi, )(t) i=1 Hence, the DP is a cnjugate prir. All the results and prperties develped fr DPs can be used directly fr the psterir distributin f G.

32 Prir t psterir updating with DP prirs Fr example, the psterir mean estimate fr G(t), E {G(t) y 1,..., y n } = α α + n G 0(t) + n α + n G n(t) where G n (t) = n 1 n i=1 1 [y i, )(t) is the empirical distributin functin f the data (the standard classical nnparametric estimatr). Fr small α relative t n, little weight is placed n the prir guess G 0. Fr large α relative t n, little weight is placed n the data. Hence, α can be viewed as a measure f faith in the prir guess G 0 measured in units f number f bservatins (thus, α = 1 indicates strength f belief in G 0 wrth ne bservatin). Hwever, taking α very small in rder t be nninfrmative is very dangerus.

33 C.d.f. estimatin using DP prirs α = 5, n = 10 α = 5, n = 50 F(x) True Baseline E(F) ECDF Psterir E(F y) F(x) True Baseline E(F) ECDF Psterir E(F y) x x Figure 1.3: Estimating the distributin functin under a DP prir, using simulated data. Bth the true distributin generating the data and the baseline distributin are Gaussian. The left panel crrespnds t a sample f n = 10 bservatins while the right panel crrespnds t a sample f n = 50 bservatins.

34 A Bayesian nnparametric apprach t ROC estimatin Let y i,h F H F H fr i = 1,..., n H, and y j,d F D F D fr j = 1,..., n D. Assign independent Dirichlet prcess prirs t each F H and F D. F H DP(α H, G 0,H ), F D DP(α D, G 0,D ). (Neither α H and α D nr G 0,H and G 0,D need t be the same). Questins abut the PSA example: Hw wuld yu chse the baseline measure? Hw wuld yu assess the impact f these chices in yur analysis?

35 A Bayesian nnparametric apprach t ROC estimatin Hw wuld yu chse the baseline measure? In this case, it wuld make sense t take G 0,D = G 0,H = N(0, 1) (bth frm ur knwledge f cmmn PSA levels and frm ur descriptive analysis f the data). It is less clear hw t chse α D and α H. Maybe α D = α H = 5? Recall the equivalent sample size interpretatin! Hw wuld yu assess the impact f these chices in yur analysis? We can simulate frm this prir using the truncatin apprximatin. Since G 0,D = G 0,H, the mean ROC curve is a diagnal line (ignring reflectins). Larger values f α H and α D imply a prir mre cncentrated arund this mean.

36 A Bayesian nnparametric apprach t ROC estimatin Psterir distributins ( F H {y i,h } DP F D {y j,d } DP ( α H + n H, α D + n D, α H α H + n H G 0,H + α D α D + n D G 0,D + 1 α H + n H 1 α D + n D n H 1 [yi,h, ) i=1 n D 1 [yj,d, ) j=1 ) ) Inferences abut the ROC curve can be btained by sampling psterir realizatins f F H and F D using a truncatin. Nte that E(ROC(u) data) 1 F D ( F 1 (1 u))! H Because sample sizes are fairly large, results are similar t thse frm the (frequentist) nnparametric mdel we presented in the intrductin.

37 Dse-respnse mdeling with Dirichlet prcess prirs Study ptency f a stimulus by administering it at k dse levels t a number f subjects at each level. x i : dse levels (with x 1 < x 2 <... < x k ). n i : number f subjects at dse level i. y i : number f psitive respnses at dse level i. F (x) = Pr(psitive respnse at dse level x) (i.e., the ptency f level x f the stimulus). F is referred t as the ptency curve, r dse-respnse curve, r tlerance distributin. Standard assumptin in biassay settings: the prbability f a psitive respnse increases with the dse level, i.e., F is a nn-decreasing functin, i.e., F can be mdeled as a c.d.f. n X R.

38 Dse-respnse mdeling with Dirichlet prcess prirs Questins f interest: Inference fr F (x) fr specified dse levels x. Inference fr unbserved dse level x 0 such that F (x 0) = γ fr specified γ (0, 1). Optimal selectin f {x i, n i } t best accmplish gals 1 and 2 abve (design prblem). Parametric mdeling: F is assumed t be a member f a parametric family f c.d.f.s (e.g., lgit, r prbit mdels). Bayesian nnparametric mdeling: uses a nnparametric prir fr the infinite dimensinal parameter F, i.e., a prir fr the space f c.d.f.s n X wrk based n a DP prir fr F : Antniak (1974), Bhattacharya (1981), Disch (1981), Ku (1983), Gelfand and Ku (1991), Mukhpadhyay (2000).

39 Dse-respnse mdeling with Dirichlet prcess prirs Assuming (cnditinally) independent utcmes at different dse levels, the likelihd is given by k i=1 where p i = F (x i ) fr i = 1,..., k. p y i i (1 p i ) n i y i, If the prir fr F is a DP with precisin parameter α > 0 and base c.d.f. F 0 (the prir guess fr the ptency curve), then a priri (p 1, p 2 p 1,..., p k p k 1, 1 p k ) fllws a Dirichlet distributin with parameters (αf 0 (x 1 ), α(f 0 (x 2 ) F 0 (x 1 )),..., α(f 0 (x k ) F 0 (x k 1 )), α(1 F 0 (x k ))). The psterir fr F is a mixture f Dirichlet prcesses (Antniak, 1974).

40 Mixtures f Dirichlet prcesses Extensin f the DP t a hierarchical versin: set G α, ψ DP(α, G 0 ( ψ)), where (parametric) prirs are added t the precisin parameter α and/r the parameters f the base distributin, ψ. Nt surprisingly, mixtures f Dirichlet prcesses are als cnjugate prirs. Mixtures f Dirichlet prcesses are different frm Dirichlet prcess mixture mdels (which we will study in Sessin 2). Hwever, there are imprtant cnnectins between the tw.

41 Bayesian nnparametric mdeling fr cytgenetic dsimetry Cytgenetic dsimetry (in vitr setting): samples f cell cultures expsed t a range f dses f a given agent in each sample, at each dse level, a measure f cell disability is recrded. Dse-respnse mdeling framewrk, where dse is the frm f expsure t radiatin, and respnse is the measure f genetic aberratin (in viv setting, human expsures), r cell disability (in vitr setting, cell cultures f human lymphcytes) Fcus n (rdered) plytmus categrical respnses: x i : dse levels (with x 1 < x 2 <... < x k ). n i : number f cells at dse level i. y i = (y i1,..., y ir ): respnse vectr (r 2 classificatins) at dse level i. Hence, nw y i p i Mult(n i, p i ), where p i = (p i1,..., p ir ).

42 Bayesian nnparametric mdeling fr cytgenetic dsimetry Bayesian nnparametric mdeling fr plytmus respnse (Kttas, Branc and Gelfand, 2002). Cnsider simple case with r = 3 mdel fr p i1 and p i2 is needed. Mdel p i1 = F 1 (x i ) and p i1 + p i2 = F 2 (x i ), and thus F 1 ( ) F 2 ( ). Bayesian nnparametric mdel requires prir n the space {(F 1, F 2 ) : F 1 ( ) F 2 ( )} f stchastically rdered pairs f c.d.f.s (F 1, F 2 ). Cnstructive apprach: F 1 ( ) = G 1 ( )G 2 ( ), and F 2 ( ) = G 1 ( ), with independent DP(α l, G 0l ) prirs fr G l, l = 1, 2. Induced prir fr q l = (q l,1,..., q l,k ), l = 1, 2, where q l,i = G l (x i ).

43 Bayesian nnparametric mdeling fr cytgenetic dsimetry Cmbining with the likelihd, the psterir fr (q 1, q 2 ) is p(q 1, q 2 data) k { } (1 q 1i ) y i3 q y i1 2i (1 q 2i ) y i2 where i=1 q y i1+y i2 1i q γ (q 12 q 11 ) γ (q 1k q 1,k 1 ) γ k 1 (1 q 1k ) γ k+1 1 q δ (q 22 q 21 ) δ (q 2k q 2,k 1 ) δ k 1 (1 q 2k ) δ k+1 1 γ i = α 1 (G 01 (x i ) G 01 (x i 1 )), δ i = α 2 (G 02 (x i ) G 02 (x i 1 )). Psterirs fr G l (x i ) fr l = 1, 2 prvide psterirs fr F 1 (x i ) and F 2 (x i ), fr all x i.

44 Bayesian nnparametric mdeling fr cytgenetic dsimetry Inference based n an MCMC algrithm. Fr any unbserved dse level x 0, the psterir (predictive) distributin fr q l,0 = G l (x 0) fr l = 1, 2, is given by p(q l,0 data) = p(q l,0 q l )p(q l data)dq l where p(q l,0 q l ) is a rescaled Beta distributin. The inversin prblem (inference fr an unknwn x 0 fr specified respnse values y 0 = (y 01, y 02, y 03)) can be handled by extending the MCMC algrithm t the augmented psterir that includes the additinal parameter vectr (x 0, q 10, q 20). Fr the data illustratins, we cmpare with a parametric lgit mdel, lg p ij p i3 = β 1j + β 2j x i, i = 1,..., k, j = 1, 2 (mdel fitting, predictin, and inversin are straightfrward under this mdel).

45 Cytgenetic dsimetry: simulated data examples Figure 1.4: Tw data sets generated frm the parametric mdel. Psterir inference fr F 1 (upper panels) and F 2 (lwer panels) under the parametric (dashed lines) and nnparametric (slid lines) mdel. dentes the bserved data. The left and right panels crrespnd t the data set with the smaller and large sample sizes, respectively.

46 Cytgenetic dsimetry: simulated data examples Figure 1.5: Tw data sets generated using nn-standard (bimdal) shapes fr F 1 and F 2. Psterir inference fr F 1 (upper panels) and F 2 (lwer panels) under the parametric (dashed lines) and nnparametric (slid lines) mdel. dentes the bserved data. The left and right panels crrespnd t the data set with the smaller and large sample sizes, respectively.

47 Semiparametric regressin fr categrical respnses Applicatin f DP-based mdeling t semiparametric regressin with categrical respnses. Categrical respnses y i, i = 1,..., N (e.g., cunts r prprtins). Cvariate vectr x i fr the i-th respnse, cmprising either categrical predictrs r quantitative predictrs with a finite set f pssible values. K N predictr prfiles (cells), where each cell k (k = 1,..., K) is a cmbinatin f bserved predictr values k(i) dentes the cell crrespnding t the i-th respnse. Assume that all respnses in a cell are exchangeable with distributin F k, k = 1,..., K.

48 Semiparametric regressin fr categrical respnses Prduct f mixtures f Dirichlet prcesses prir (Cifarelli and Regazzini, 1978) fr the cell-specific randm distributins F k, k = 1,..., K: cnditinally n hyperparameters α k and θ k, the F k are assigned independent DP(α k, F 0k ( ; θ k )) prirs, where, in general, θ k = (θ 1k,..., θ Dk ) the F k are related by mdeling the α k (k = 1,..., K) and/r the θ dk (k = 1,..., K; d = 1,..., D) as linear cmbinatins f the predictrs (thrugh specified link functins h d, d = 0, 1,..., D) h 0(α k ) = x T k γ, k = 1,..., K h d (θ dk ) = x T k β d, k = 1,..., K; d = 1,..., D (parametric) prirs fr the vectrs f regressin cefficients γ and β d DP-based prir mdel that induces dependence in the finite cllectin f distributins {F 1,..., F K }, thugh a weaker type f dependence than dependent DP prirs (MacEachern, 2000). [Dependent nnparametric prir mdels will be studied in Sessin 4.]

49 Semiparametric regressin fr categrical respnses Semiparametric structure centered arund a parametric backbne defined by the F 0k ( ; θ k ) useful interpretatin and cnnectins with parametric regressin mdels. Example: regressin mdel fr cunts (Carta and Parmigiani, 2002) y i {F 1,..., F K } N F k(i) (y i ) ind. i=1 F k α k, θ k DP(α k, Pissn( ; θ k )), k = 1,..., K lg(α k ) = x T k γ lg(θ k) = x T k β, k = 1,..., K with prirs fr β and γ Related wrk fr: change-pint prblems (Mira and Petrne, 1996); dse-respnse mdeling fr txiclgy data (Dminici and Parmigiani, 2001); variable selectin in survival analysis (Giudici, Mezzetti and Muliere, 2003).

Bayesian nonparametric modeling approaches for quantile regression

Bayesian nonparametric modeling approaches for quantile regression Bayesian nnparametric mdeling appraches fr quantile regressin Athanasis Kttas Department f Applied Mathematics and Statistics University f Califrnia, Santa Cruz Department f Statistics Athens University