Beta-Stacy Bandit Problems with Censored Data

Size: px

Start display at page:

Download "Beta-Stacy Bandit Problems with Censored Data"

Felix Hopkins
5 years ago
Views:

1 Beta-Stacy Bandit Problems with Censored Data Stefano Peluso Antonietta Mira Pietro Muliere December 4, 26 Abstract Eisting Bayesian nonparametric methodologies for bandit problems focus on eact observations, leaving a gap in those bandit applications where censored observations are crucial. We address this gap by etending a Bayesian nonparametric two-armed bandit problem to rightcensored data, where each arm is generated from a beta-stacy process as defined by Walker and Muliere 997. We prove the eistence of optimal stay-with-a-winner and switch-on-a-loser strategies, by imposing non-restrictive conditions on the parameters of the beta-stacy processes, including the special cases of the homogeneous process and the Dirichlet process. Numerical estimations and simulations for a variety of discrete and continuous state space settings are presented to illustrate the performance and fleibility of our method. Introduction In a discrete-time two-armed bandit problem, there are two stochastic processes the two arms and a sequential decision process a strategy that selects, at each time, which one of the two processes to observe. This selection is made on the basis of the previous observations, and it balances two conflicting benefits: the immediate payoff coming from the eploitation of an arm so far known to be better and the information concerning future payoffs coming from the eploration of a less known arm. A strategy is said to be optimal if it yields the maimal epected payoff, and an arm is said to be optimal if it is selected at the beginning of an optimal strategy. A strategy can be seen as a function that assigns, to each partial history of observations, the integer or 2 indicating the arm to be observed at the net stage Berry and Fristedt 985. With the eception of the simplest cases, eplicit specifications of optimal strategies are hindered by computational issues. As a consequence, following Chattopadhyay 994, optimal strategies can only be partially characterized in terms of break-even observations. We will consider two kinds of optimal strategies: stay-with-a-winner and switch-on-a-loser strategies. Assuming, without loss of generality, that a higher realized value of a random variable gives a higher payoff in the bandit problem, the break-even observation in a stay-with-a-winner strategy is that value at which the Corresponding author. Università Cattolica del Sacro Cuore, Department of Statistical Sciences, stefano.peluso@unicatt.it. Largo Gemelli, 236 Milan, Italy. Università della Svizzera Italiana, InterDisciplinary Institute of Data Science. Università della Svizzera Italiana, InterDisciplinary Institute of Data Science. Università degli Studi dell Insubria, Department of Sciences and High Technology. Università Commerciale Luigi Bocconi, Department of Decision Sciences. Bocconi Institute for Data Science and Analytics BIDSA.

2 epected advantage of choosing arm over arm 2 is null. Then the epected advantage is positive negative for values observed from arm greater lower than the break-even, and at these values arm arm 2 is chosen at the net stage a optimal. On the other hand, the break-even observation in a switch-on-a-loser strategy is that realized value at which the epected advantage remains unaltered at the net stage. For values observed from arm higher lower than the break-even, the epected advantage increases decreases relative to its initial value, and arm arm 2 is optimally chosen at the net stage. More formally, let X j and Y j be random variables generated from, respectively, arm and 2 at stage j. For any positive integer k, X, X 2,..., X k given F are i.i.d with probability measure F, and Y, Y 2,..., Y k given F 2 are i.i.d with probability measure F 2. The possibly infinite bandit horizon is n and A n = a, a 2,..., a n is a nonincreasing sequence of discount factors. Early eamples of bandit problems are treated in Robbins 952, Bellman 956 and Bradt et al Among later works, Chernoff 968 focuses on two Gaussian arms F i = Nµ i, σ 2, i =, 2, with unknown drifts and known constant variance; Berry 972 gives sufficient conditions for optimal selection in a Bernoulli two-armed bandit, F i = Bernp i, i =, 2, proving a staywith-a-winner strategy; Berry and Fristedt 979 characterize optimal strategies for Bernoulli one-armed bandits F = Bernp and F 2 known with regular discount sequences; Gittins 979 introduces dynamic allocation indices for optimal strategies in multi-armed bandits. Clayton and Berry 985 is the first paper that etends the bandit problem to a Bayesian nonparametric framework, considering a random F DP α and known F 2 : the probability measure associated to the random variables in one of the two arms is random and etracted from the Dirichlet process introduced in Ferguson 973. Dirichlet bandits are generalized to two-armed problems F i DP α i, i =, 2, in Chattopadhyay 994, where the eistence of stay-with-a-winner and switchon-a-loser optimal strategies is proven. Some other properties of Dirichlet bandits are studied in Yu 2. In the this paper we etend Bayesian nonparametric bandits to problems where each arm generates an infinite sequence of echangeable random variables de Finetti 937 having, as de Finetti measure, the beta-stacy process of Walker and Muliere 997. In our framework the two arms are random, with F i BSα i, β i, i =, 2, where α i and β i, discussed in Section 2, characterize the two beta-stacy processes. The Dirichlet bandit of Clayton and Berry 985 and Chattopadhyay 994 is an important special case of our setting, as is the bandit arms with the homogeneous process of Susarla and Van Ryzin 976 and Ferguson and Phadia 979. Our main result is that, under constraints on the parameters of the beta-stacy processes constraints that include the cases of the homogeneous process and the Dirichlet process, optimal stay-witha-winner and switch-on-a-loser strategies eist and can be used for dealing with right censored or eact observations. As specified in Phadia 23, the beta-stacy process belongs to the class of neutral to the right NTR processes introduced by Doksum 974, and it generalizes the Dirichlet process in two respects: more fleible prior information may be represented and, unlike the Dirichlet process, it is conjugate to right censored data. Also, when the prior process is assumed to be Dirichlet, the posterior distribution given right censored observations is a beta-stacy process. Beta-Stacy bandit problems are motivated by the importance of dealing with censored observations in typical bandit applications: the two arms can be two treatments available for a certain disease Berry and Fristedt 985; patients arrive one at a time and a treatment is assigned. The patient returns information on the effectiveness of the treatment, but this response can be censored, e.g. if the patient interrupts the treatment. Another classical eample of a setting where censored 2

3 observations may arise is managing a team of industrial scientists working on several research projects, and deciding which sequence of project eecutions would maimize the epected total value Nash 973: the observed project value is censored if the project is interrupted due, for instance, to reduced financial support. A final eample of a bandit problem with censored data is that of an industrial processor choosing which jobs to process to minimize the processing time Gittins et al. 2 and references therein: jobs may return censored observations if the task is unfinished for system breakdowns. The arms introduced in the present paper have random distribution functions generated by beta-stacy processes, permitting the analysis of bandit problems with censored observations, while retaining mathematical tractability. In Section 2 we introduce the beta-stacy process and the bandit problem. Then, we show the eistence of stay-with-a-winner and switch-on-a-loser strategies, for α and α 2 having discrete Section 3 and continuous support Section 4. We apply our methods to simulated studies in Section 5, and conclude with eamples of potential further applications and research directions in Section 6. 2 Preliminaries 2. Beta-Stacy process Let F be the space of cumulative distribution functions cdf s on,. A probability distribution is placed on F through the definition of a stochastic process F on,, A, where A is the Borel σ-field of subsets, such that, with probability, the sample paths of F are cdf s. Let the right continuous measure α, with α =, and the positive function β both be defined on,, and let {t k } be the countable set of discontinuity points of α, such that α{t k } = αt k αt k = S k > for all k, and S k is the jump in t k. Let α c t = αt t k t α{t k}, so that α c is a continuous measure. Note that in the rest of the paper, whenever clear, the eplicit dependence on α and on β of all quantities of interest has been omitted for notational clarity. Definition 2.. F is a beta-stacy process on,, A with parameters α and β, that is F BSα, β, if for all t, F t = ep{ Zt}, where Z is a Lévy process with Lévy measure for Zt given, for v >, by dn t v = dv ep v and with moment generating function given by log E φzt] = t k t where eps k Betaα{t k }, βt k. t loge ep φs k ] + ep vβs + α{s}dα c s t ep φv dn t v, We now state the two theorems from Walker and Muliere 997 we will use in the sequel, on the conjugacy of the beta-stacy process, distinguishing between discrete and continuous beta-stacy parameters. Theorem 2.2. Walker and Muliere 997 The posterior distribution of a beta-stacy process with discrete parameters {α j, β j }, j N, is also a beta-stacy process, with parameters α j + n j and 3

4 β j + m j, where n j is the number of eact observations at j and m j is the sum of the number of eact observations in {l : l > j} and censored observations in {l : l j}. The theorem above clarifies an important property of the beta-stacy process: its conjugacy under sampling, possibly with right censoring. Furthermore, to have a.s. a cdf, the discrete parameters α and β in Walker and Muliere 997 are required to satisfy the condition α j =, j N. β j + α j j In Theorem 4 of Walker and Muliere 997 the conjugacy under sampling with possible right censorship is formalized in the continuous case: Theorem 2.3. Walker and Muliere 997 The posterior distribution of a beta-stacy process with continuous parameters {α, β } is also a beta-stacy process, with parameters αt + N{t} and βt + Mt, where N{s} is the counting process for uncensored observations in s, and Ms is the sum of the number of eact observations in {t : t > s} and of censored observations in {t : t s}. A posteriori the beta-stacy process with continuous α and β can be represented as F t = e Zct+Z f t Ferguson 974; Walker and Damien 998, where the component Z f incorporates the fied points of discontinuity in the locations of the eact observations, and Z c is the residual part of the Lévy process. In particular, the corresponding jump S i at location t i, where the eact observation occurred, is such that ep S i BetaN{t i }, βt i + Mt i. Furthermore, from Walker and Muliere 997, continuous α and β satisfy the condition dαs/βs =. In the rest of the paper, we only consider beta-stacy processes, but it is reasonable to assume that the results can be generalized to the class of NTR processes. The NTR process of Doksum 974 may be viewed in terms of a process with independent non-negative increments, via the parameterization F t = e Zt, t R +, where Z is a process with independent nonnegative increments. The beta-stacy process is a NTR process where Z is a log-beta process, that keeps the conjugacy property under sampling eact or right censored observations. When βt = αt, for all t, we obtain the Dirichlet process of Ferguson 973. Another important special case is the homogenous process of Susarla and Van Ryzin 976 and Ferguson and Phadia 979, arising when βt = β constant for all t. 2.2 Bandit problem In the proposed framework {α, β }, {α 2, β 2 }; A n denote the two-armed bandit problem where arm i has a beta-stacy prior with parameters α i, β i, for i =, 2, and A n = a, a 2,..., a n is a nonincreasing discount sequence. Therefore F BSα, β and the observation at stage from arm is a realization of the random variable X F F. At the generic k-th stage, if the sample has been collected in the past stages from the observation of arm, the new observation 4

5 will be the realized value of X k F, F, where F is still a random distribution from a beta-stacy process with updated parameters, thanks to the conjugacy theorems provided in the previous subsection. Note that the sample can be partitioned in censored and eact observations: = eact, cens ]. Equivalent definitions hold for Y, Y 2,..., Y k and the observed sample y from arm 2. By censored observation we mean that the observed value at stage k from arm is the realized value of X k = min{x k, C k}, the minimum between a true eact value at stage k and a censoring time C k, and equivalently for Y k from arm 2. A special choice of β i, detailed below, and absence of censored observations reduce our setting to the Dirichlet bandit problem {α }, {α 2 }; A n of Chattopadhyay 994. Similarly to Berry and Fristedt 985, we let W {α, β }, {α 2, β 2 }; A n be the epected payoff under an optimal strategy; W i {α, β }, {α 2, β 2 }; A n the epected payoff of a strategy starting from arm i and proceeding optimally; {α, β }, {α 2, β 2 }; A n the epected advantage of initially choosing arm over arm 2 assuming optimal continuation in both stay-with-a-winner and switchon-a-loser strategies; finally and + {α, β }, {α 2, β 2 }; A n = ma, {α, β }, {α 2, β 2 }; A n {α, β }, {α 2, β 2 }; A n = min, {α, β }, {α 2, β 2 }; A n. More generally, we define the discount sequence A k n = a k+, a k+2,..., a n, and we now introduce the two strategies that we study: stay-with-a-winner and switch-on-a-loser strategies. Definition 2.4. For X k F F, Y k F 2 F 2, F BSα, β and F 2 BSα 2, β 2, at the generic stage k, the stay-with-the winner strategy selects arm at stage k + if one of the following two conditions holds: at stage k, X k = eact or censored from arm is observed and {α,, β, }, {α 2, β 2 }; A n {α, β }, {α 2, β 2 }; A n, at stage k, Y k = y eact or censored from arm 2 is observed and {α, β }, {α 2,y, β 2,y }; A n {α, β }, {α 2, β 2 }; A n, and selects arm 2 otherwise. The switch-on-a-loser strategy selects arm at stage k + if one of the following two conditions holds: at stage k, X k = eact or censored from arm is observed and {α,, β, }, {α 2, β 2 }; A n, at stage k, Y k = y eact or censored from arm 2 is observed and and selects arm 2 otherwise. {α, β }, {α 2,y, β 2,y }; A n, Both strategies at stage k = select arm if {α, β }, {α 2, β 2 }; A n > and arm 2 otherwise. The break-even point of a strategy is that realized value of X from the first arm or Y from the second arm at which the strategy is indifferent in the choice of the two arms, and a strategy is said to eist if its break-even point eists. 5

6 3 Break-Even Observations: Discrete Case The optimal strategy is given in terms of a break-even observation: the first second arm is observed until it yields a value higher lower than the break-even one. In this section we prove that a breakeven observation for a stay-with-a-winner and a switch-on-a-loser optimal strategies eist for the discrete beta-stacy two-armed bandit problem, under certain conditions on the parameters of the processes governing the arms. Let F i the random distribution function corresponding to arm i, with X k and Y k having supports in N for all k. Let α i = α i, αi 2,... and βi = β i, βi 2,... for i =, 2. From the construction of the discrete time beta-stacy process, Walker and Muliere 997 show that PX = j {α, β }, j N, is j PX = j {α, β } = α j αj + β j i= α i αi + β i and similarly for Y. Then the prior means of the two arms are respectively and Given observations E X X {α, β }] = = E Y Y {α 2, β 2 }] = + j= + j= i<j + j= i<j PX j {α, β } α i αi + =: µ β i α2 i αi 2 + =: µ 2. β2 i =,..., k from arm, y = y,..., y k from arm 2, the conditional epectation of any function hx can be computed using Theorem 2.2, and it is denoted with E X hx ], where we remind that we omit in all quantities of interest the dependence from {α, β } and {α 2, β 2 }. Then PX = j is therefore, for j N, where with P X = j = α, j α, j + β, j j i= α, j := αj + n j, β, j := βj + m j, α, i α, i + β, i n j : number of eact observations equal to j, m j : number of eact observations in {l : l > j} and censored observations in {l : l j}, 6

7 and with dependence of n j and m j from neglected for ease of notation. The posterior mean is therefore + + α, i E X X ] = PX j = α, i + β, =: µ, i j= Following the notation introduced above, Therefore, and W {α, β }, {α 2, β 2 }; A n j= i<j = ma { W {α, β }, {α 2, β 2 }; A n, W 2 {α, β }, {α 2, β 2 }; A n }, {α, β }, {α 2, β 2 }; A n = W {α, β }, {α 2, β 2 }; A n W 2 {α, β }, {α 2, β 2 }; A n, W {α, β }, {α 2, β 2 }; A n = W {α, β }, {α 2, β 2 }; A n + {α, β }, {α 2, β 2 }; A n, W 2 {α, β }, {α 2, β 2 }; A n = W {α, β }, {α 2, β 2 }; A n + {α, β }, {α 2, β 2 }; A n. W {α, β }, {α 2, β 2 }; A n = a µ + E X W {α,x, β,x }, {α 2, β 2 }; A n ] = a µ + E X W 2 {α,x, β,x }, {α 2, β 2 }; A n ] +E X + {α,x, β,x }, {α 2, β 2 }; A n ], W 2 {α, β }, {α 2, β 2 }; A n = a µ 2 + E Y W {α, β }, {α 2,Y, β 2,Y }; A n ] = a µ 2 + E Y W {α, β }, {α 2,Y, β 2,Y }; A n ] E Y {α, β }, {α 2,Y, β 2,Y }; A n ], {α, β }, {α 2, β 2 }; A n = W {α, β }, {α 2, β 2 }; A n W 2 {α, β }, {α 2, β 2 }; A n = a µ + E X W 2 {α, β }, {α 2, β 2 }; A n ] a µ 2 E Y W {α, β }, {α 2,Y, β 2,Y }; A n ] +E X + {α,x, β,x }, {α 2, β 2 }; A n ] +E Y {α, β }, {α 2,Y, β 2,Y }; A n ]. Using arguments similar to those in Berry and Fristedt 985 and Chattopadhyay 994, a µ + E X W 2 {α,x, β,x }, {α 2, β 2 }; A n ] is the epected payoff of first selecting arm, followed by arm 2 and then continuing optimally. Similarly, a µ 2 + E Y W {α, β }, {α 2,Y, β 2,Y }; A n ] 7

8 is the epected payoff of selecting arm 2 first and arm second and then continuing optimally. Subtracting the second payoff from the first one we obtain a a 2 µ µ 2. From this fact, {α, β }, {α 2, β 2 }; A n = a a 2 µ µ 2 +E X + {α,x, β,x }, {α 2, β 2 }; A n ] +E Y {α, β }, {α 2,Y, β 2,Y }; A n ] 2 In the net proposition we show that, given an eact or a right censored observation X = from arm, the advantage of choosing arm over arm 2 increases as increases. Proposition 3.. For all {α, β } and {α 2, β 2 } such that βj β j+ + α j+, for all j N, and for all nonincreasing discount sequences A n, {α,, β, }, {α 2, β 2 }; A n is nondecreasing in. Proof. By induction, for n =, we have {α,, β, }, {α 2, β 2 }; A = a + j= i<j α, i α, i + β, i µ 2 = a µ, µ 2. 3 Fi = +. We first prove that µ, µ,. For this purpose, we study separately the j-terms in the sum of µ, and µ, when j, j = and j >. When is an eact observation, The j-terms with j are the same in µ, and µ,. For j =, in µ, we have i<j β i + β αi + β i + α + β +, whilst in µ,, and the -term of µ, i<j is weakly higher. β i + β + αi + β i + α + β +, For j >, the j-term of µ, is β i + β β αi + β i + α + β + α + β i< <i<j β i αi +, β i whilst the j-term of µ, i< is β i + β + αi + β i + α + β + 8 β α + β + <i<j β i αi + ; β i

9 the j-term of µ, is weakly higher if β + α + β + β α + β, equivalent to β α + β, for all and for all >. Similarly, the monotonicity of µ, can be proved when is a right censored observation: the j- terms with k are the same in µ, and µ,, whilst for j > the two terms in, respectively, µ, and µ, are β i + β + β βi α i< i + β i + α + β + α + β α <i<j i +, β i β i + β + β + βi αi + β i + α + β + α + β + αi +, β i i< <i<j where the term in µ, is weakly higher. Then, for n = the statement is true since µ, is nondecreasing in and a. From the induction hypothesis, we assume the monotonic property for n = m. By 2, {α,, β, }, {α 2, β 2 }; A m = a a 2 µ, µ 2 ] +E X + {α,,x, β,,x }, {α 2, β 2 }; A m +E Y {α,, β, }, {α 2,Y, β 2,Y }; A m ]. 4 The first term in the right hand side of 4 is nondecreasing in since µ, is nondecreasing in and a a 2. The second and third term are nondecreasing in from the induction hypothesis. Remark 3.2. The constraints βj β j+ + α j+ are needed for the monotonicity of µ,. The condition is not required if all observations are censored, but is necessary if some observations are eact. The constraint is naturally verified in the Dirichlet two-armed problem, obtained from the beta-stacy in the special case of βj = β j+ + α j+, j N. Also, a bandit problem with simple homogeneous processes Susarla and Van Ryzin 976; Ferguson and Phadia 979 for each arm, corresponding to the case βj+ = β j for all j N, satisfies the constraints. The following propositions will be used in Theorems 3.5 and 3.6. Proposition 3.3. For all {α, β } and {α 2, β 2 } as in Proposition 3. and for all nonincreasing discount sequences A n, if the condition α i αi + β i + > 5 is verified, then and i<+ lim {α,, β, }, {α 2, β 2 }; A n = + lim {α,, β, }, {α 2, β 2 }; A n = min {α,, β, }, {α 2, β 2 }; A n. 9

10 Proof. The result for the is a direct consequence of the monotonicity property shown in Proposition 3.. We are left to prove the limit to +. Consider increasing to. By induction, for n =, µ, diverges to + as + since lim + µ, + j= i<+ = i<+ β i + α i + β i + αi + αi + β i + = +. 6 j= Then, {α,, β, }, {α 2, β 2 }; A = a µ, µ 2 goes to + since µ, is divergent and a >. Assume now that the statement is true for n = m. By 4, {α,, β, }, {α 2, β 2 }; A m = a a 2 µ, µ 2 ] +E X + {α,,x, β,,x }, {α 2, β 2 }; A m +E Y {α,, β, }, {α 2,Y, β 2,Y }; A m ]. For the first term a a 2 µ, µ 2 on the right hand side of the formula above there are two possible cases: a a 2 > or a a 2 =. In the latter case the term is zero, while when a a 2 > it diverges to +. For the second term, note that + {α,,x, β,,x }, {α 2, β 2 }; A m is a nondecreasing sequence in by Proposition 3., bounded below by by definition and divergent to + by the induction hypothesis. We can then apply the monotone convergence theorem and obtain ] lim E X + {α,,x, β,,x }, {α 2, β 2 }; A + m ] = E X lim + + {α,,x, β,,x }, {α 2, β 2 }; A m = +. For the third term, notice that {α,, β, }, {α 2,y, β 2,y }; A m = {α 2,y, β 2,y }, {α,, β, }; A m. Furthermore, + {α 2,y, β 2,y }, {α,, β, }; A m converges to as l diverges, and it is bounded above by + {α 2,y, β 2,y }, {α,=, β,= }; A m. By the dominated convergence theorem we have lim E Y {α,, β, }, {α 2,Y, β 2,Y }; A + m ] = lim E Y + {α 2,Y, β 2,Y }, {α,, β, }; A + m ] ] = E Y lim + + {α 2,Y, β 2,Y }, {α,, β, }; A m =. Remark 3.4. In Proposition 3.3 condition 5 is required, and the beta-stacy process in the discrete case is defined such that condition is verified. Both conditions are satisfied when their ratio diverges, that is when lim + α i + βi j βi αi + β i + = +. i<j

11 This constraint does not pose restrictions, and it is satisfied, as epected, in the special cases of the homogeneous process and the Dirichlet process. We finally state the following theorems, showing that there eist break-even points determining, respectively, a stay-with-a-winner and a switch-on-a-loser optimal strategy. The theorems generalize Theorem 2. and Theorem 2.2 of Chattopadhyay 994, proving the eistence of the break-even observations in a contet more general than the Dirichlet arms, at the cost of some restrictions on the choice of the parameters of the beta-stacy process. Theorem 3.5. For all {α, β } and {α 2, β 2 } as in Proposition 3., for all nonincreasing discount sequences A n and n 2, if the condition lim {α,, β, }, {α 2, β 2 }; A n {α, β }, {α 2, β 2 }; A n 7 holds, there eists a break-even point b {α, β }, {α 2, β 2 }; A n such that {α,, β, }, {α 2, β 2 }; A n {α, β }, {α 2, β 2 }; A n if b {α, β }, {α 2, β 2 }; A n and {α,, β, }, {α 2, β 2 }; A n {α, β }, {α 2, β 2 }; A n if b {α, β }, {α 2, β 2 }; A n. Proof. From Proposition 3., {α,, β, }, {α 2, β 2 }; A n is non decreasing in, starting from a value lower than and {α, β }, {α 2, β 2 }; A n and going to infinity Proposition 3.3. This is enough to claim that there eists a break-even point b which satisfies the properties in the theorem. Theorem 3.6. For all {α, β } and {α 2, β 2 } as in Proposition 3., for all nonincreasing discount sequences A n and n 2, if the condition lim {α,, β, }, {α 2, β 2 }; A n 8 holds, there eists a break-even point d {α, β }, {α 2, β 2 }; A n such that {α,, β, }, {α 2, β 2 }; A n if d {α, β }, {α 2, β 2 }; A n and {α,, β, }, {α 2, β 2 }; A n if d {α, β }, {α 2, β 2 }; A n. Proof. As in the proof of Theorem 3.5, there eists a point d satisfying the properties. Remark 3.7. The domain of {α,, β, }, {α 2, β 2 }; A n, as a function of, is N, equipped with a discrete topology for which every subset is open, and all functions from a discrete topological space to any topological space are continuous. In the two theorems above, it is not possible to apply the intermediate value theorem for continuous functions since {α,, β, }, {α 2, β 2 }; A n, seen as a function of, has a domain that is not a connected topological space. As a consequence it is not guaranteed that an observation eactly equal to the break-even can be observed since it could be that b {α, β }, {α 2, β 2 }; A n / N or d {α, β }, {α 2, β 2 }; A n / N. Still, the break-evens give two reference points for optimal choices between the arms.

12 Remark 3.8. Conditions 7 and 8 are needed because the support of the base measure of the beta-stacy process in limited to, +. Both Clayton and Berry 985 and Chattopadhyay 994 notice that when the support is bounded, the eistence of break-even observations requires additional conditions at the boundaries. In particular, both conditions intuitively mean that if a very bad observation from arm is etracted close to, the epected advantage of choosing that arm in the net time instant reduces and the alternative arm is preferred under the current strategy. We now study the two-armed bandit problem when the base measures of the beta-stacy processes are continuous measures. 4 Break-Even Observations: Continuous Case In the present section we treat the continuous beta-stacy two-armed problem. X k and Y k, respectively from arm and arm 2 at stage k, can assume values in R + =, + and α and β are, respectively, a continuous measure and a positive function, both defined on R +. α is assumed, without loss of generality, to have no discontinuity points. Equation 8 in Walker and Muliere 997, t R +, says that a,a 2 ] PX t = ep =:,t] dα } s β s dα s β s + α {s} where,t] denotes the product integral, an operator commonly used in the survival analysis literature. For any partition a = z < z < < z m = a 2, if l m = ma i=,...,m i i, the product integral is defined as { + dγz} := lim { + Γz j Γz j }. l m See Gill and Johansen 99 for a survey of applications of product integrals to survival analysis. Then, we can compute, in analogy with the discrete case, E X X] = PX > tdt = dα s β dt =: µ s and, similarly, E Y Y ] =,t] j,t] dα2 s β 2 dt =: µ 2 s assuming, without loss of generality, that µ µ 2. The conditional epectation of a function h X, given observations from the first arm, =,..., k, is denoted EhX ], removing the eplicit dependence on α and β. We further define α, s = α s + N{s} and β, s = β s + Ms 2

13 for all s,. Analogous notation is used for h 2 Y given y = y,..., y k from the second arm. Therefore, from Theorem 2.3, dα } s PX t = ep β s + N{s} + Ms N{t} =:,t] βt + N{t} + Mt dα, s β, s + α, {s} and, partitioning = eact, cens ] for respectively eact and censored observations, the posterior mean is E X X ] = PX / eact PX > t X / eact, dt + PX eact PX = X eact, eact = PX / eact PX eact,t] dα, cens s β, censs + α, cens{s} eact PX = X eact, =: µ, dt + The function {α, β }, {α 2, β 2 }; A n can be epressed as in 2. In the following propositions we study the properties of, with the aim of proving the eistence of break-even observations determining optimal stay-with-a-winner and switch-on-a-loser strategies. Proposition 4.. For all {α, β } and {α 2, β 2 } such that β α and for all nonincreasing discount sequences A n, {α,, β, }, {α 2, β 2 }; A n is nondecreasing in, for all,. Proof. By induction, for n =, and censored to the right, {α,, β, }, {α 2, β 2 }; A = a µ, µ 2 = a PX > t dt µ 2 = a dα s + dns β dt µ 2 s + N{s} + Ms,t] where Ns = N{, s]}. We first show that µ, is nondecreasing in, for being censored to the 3

14 right. Notice that µ, can be written as dα } s µ, = ep β s + N{s} + Ms N{t} dt βt + N{t} + Mt dα } s = ep dt + ep { β s + dα s β s + + t dα } s β dt. s The integrand in µ, as a function of t has a discontinuity point when t =, but its value at this point is ignored since it does not contribute to the evaluation of µ,. Take now any >, and separate the cases t, t, and t : When t, the integrands in µ, and µ, When t, the integrand in µ, is { ep whilst the integral in µ, is { ep are the same. dα s t β s + + dα } s β, s dα } s t β s + + dα s β, s with the integrand in µ, always greater than or equal to the one in µ,. Finally, when t,, the integrands in µ, and µ, are, respectively, { dα s t ep β s + + dα } s β s and dα } s ep β, s + proving that µ, µ, and that the statement is true for n =. On the other hand, when is not censored µ, = PX µ + PX = { dα } s = ep β s + β + { µ + ep dα } s β s + β +, and µ, µ, for all >, if and only if PX =, the probability of X from arm being equal to the previous eact observation, is nondecreasing in. This condition is equivalent to β α, as required in the proposition. By induction, assuming the monotonicity property for n = m, the proof is completed along the lines of Proposition 3.. 4

15 Remark 4.2. As in the discrete case, monotonicity of the posterior mean is recovered under a condition on the parameters of the beta-stacy process. The condition βj β j+ + α j+, j N, in Proposition 3. finds its continuous analogue β α, R +, in Proposition 4.. It is important to note that the condition is not required if only censored observations are etracted from the arms, but it is necessary in case of eact observations. As in the discrete section, the special cases of Dirichlet and homogeneous processes are included, and they correspond, respectively, to β = α and to β =. Proposition 4.3. For all {α, β } and {α 2, β 2 } as in Proposition 4. and such that dα s/β s+ <, for all R + and all nonincreasing discount sequences A n, and lim {α,, β, }, {α 2, β 2 }; A n = + lim {α,, β, }, {α 2, β 2 }; A n = min {α,, β, }, {α 2, β 2 }; A n. Proof. The case is an immediate consequence of Proposition 4.. To study the case where diverges, we proceed by induction. Note that for n =, {α,, β, }, {α 2, β 2 }; A = a µ, µ 2 and lim µ, = + ep ep { dα } s β dt s + } dt = +, dα s β s + where the last equality is true since dα s/β s + < is equivalent to { dα } s ep β >. s + This proves that lim + {α,, β, }, {α 2, β 2 }; A =. The rest of the proof follows the same lines as the proof of Proposition 3.3. Remark 4.4. In the above proposition the additional condition required. Note that the beta-stacy process is defined such that dα s/β s + < is dα s/β s =. These two improper integrals should have a different asymptotic behavior, a condition that is verified when, from the limit comparison test for integrals, the limit of the ratio of the two integrands is different from, that is when lim + s β. s For finite β s, this is satisfied, and, as epected, includes the special cases of the homogeneous process and the Dirichlet process. In short, the additional constraint rules out cases of eploding β s. Usually, for some base distribution F, β s = M F s,, converging to see Walker and Muliere 997. Proposition 4.5. For all {α, β } and {α 2, β 2 } and all nonincreasing discount sequences A n, {α,, β, }, {α 2, β 2 }; A n is a continuous function of, for, censored to the right. 5

16 Proof. It is enough to show that, for any increasing or decreasing sequence {} converging to, {α,, β, }, {α 2, β 2 }; A n {α,, β, }, {α 2, β 2 }; A n. We provide the proof only for an increasing sequence {}, since the decreasing sequence case is similar. By induction, first fi n =, so that {α,, β, }, {α 2, β 2 }; A = a µ, µ 2. The continuity of in is shown through the continuity of µ,. Taking any increasing sequence {} converging to, then lim µ, = lim + = + ep ep ep { ep { dα s β s + dα s β s + dα } s β dt s + dα s β s + + } dt } lim t where the last equality is justified by the continuity in of } ep dt and ep dα s β s + dα } s β dt s ep { dα } s β dt, s dα } s β. s + To finally see that µ, is continuous, we need to prove the continuity in of the function dα } s H := ep β dt. s Note that the function H is a parameterized Riemann integral, whose integration etremes are also dependent on the parameter. H is given by the composition of two functions: dα } s H 2 h, := ep β dt s h and h =. The latter is obviously continuous. For the continuity of H 2, note that t { ep dα } s β, s and we can apply the dominated convergence theorem to the sequence of functions in dα } s ep β s for any given value of h,. Then lim H 2 h, = lim = h h ep ep dα s β s 6 dα s β s } dt } dt = H 2 h,.

17 Assume now that the statement is true for n = m. By 4, {α,, β, }, {α 2, β 2 }; A m = a a 2 µ, µ 2 ] +E X + {α,,x, β,,x }, {α 2, β 2 }; A m +E Y {α,, β, }, {α 2,Y, β 2,Y }; A m ]. The first term a a 2 µ, µ 2 on the right hand side of the formula is continuous in this follows from the continuity of µ,. For the second term, note that + {α,,x, β,,x }, {α 2, β 2 }; A m is a nondecreasing sequence in by Proposition 4., bounded below by by its definition and convergent to + {α,,x, β,,x }, {α 2, β 2 }; A m by the induction hypothesis. We can then apply the monotone convergence theorem: ] lim E X + {α,,x, β,,x }, {α 2, β 2 }; A m ] = E X lim + {α,,x, β,,x }, {α 2, β 2 }; A m. For the third term, notice that {α,, β, }, {α 2,y, β 2,y }; A m = {α 2,y, β 2,y }, {α,, β, }; A m. Furthermore, + {α 2,y, β 2,y }, {α,, β, }; A m converges to + {α 2,y, β 2,y }, {α,, β, }; A m as converges by the induction hypothesis, and it is bounded above by + {α 2,y, β 2,y }, {α,=, β,= }; A m. By the dominated convergence theorem, lim E Y {α,, β, }, {α 2,Y, β 2,Y }; A m ] = lim E Y + {α 2,Y, β 2,Y }, {α,, β, }; A m ] ] = E Y lim + {α 2,Y, β 2,Y }, {α,, β, }; A m = E Y {α,, β, }, {α 2,Y, β 2,Y }; A m ], proving continuity for the generic bandit horizon n. Theorem 4.6. For all {α, β } and {α 2, β 2 } as in Proposition 4.3, for all nonincreasing discount sequences A n and n 2, if condition 7 holds, there eists a break-even point b {α, β }, {α 2, β 2 }; A n such that {α,, β, }, {α 2, β 2 }; A n {α, β }, {α 2, β 2 }; A n if b {α, β }, {α 2, β 2 }; A n and {α,, β, }, {α 2, β 2 }; A n {α, β }, {α 2, β 2 }; A n if b {α, β }, {α 2, β 2 }; A n. 7

18 Proof. From Propositions 4. and 4.5, {α,, β, }, {α 2, β 2 }; A n is nondecreasing in and continuous the latter only for censored, starting from a value lower than {α, β }, {α 2, β 2 }; A n and growing to infinity Proposition 4.3. Then there eists a point b {α, β }, {α 2, β 2 }; A n which satisfies the properties of the theorem. Theorem 4.7. For all {α, β } and {α 2, β 2 } as in Proposition 4.3, for all nonincreasing discount sequences A n and n 2, if condition 8 holds, there eists a break-even point d {α, β }, {α 2, β 2 }; A n such that {α,, β, }, {α 2, β 2 }; A n if d {α, β }, {α 2, β 2 }; A n and {α,, β, }, {α 2, β 2 }; A n if d {α, β }, {α 2, β 2 }; A n. Proof. As in Theorem 4.6, there eists a point d satisfying the properties. Remark 4.8. For Theorems 4.6 and 4.7, Remark 3.8 on the boundary conditions is still valid. Remark 3.7 can be applied if there are eact observations, whilst the intermediate value theorem for continuous functions guarantees that an observation eactly equal to the break-even can be observed in case of censored data. 5 Applications 5. Geometric beta-stacy parameters Fi, for all j N, βj = M.9 j and αj =.M.9 j for the first arm, and βj 2 = M 2.92 j and αj 2 =.8M 2.92 j for the second arm, for j =, 2,..., with µ < µ 2 a priori. Let M i = α i N, i =, 2 be the total masses of the measures α and α 2. Note that different values of M and M 2 do not affect prior means, µ and µ 2, but only posterior means. We observe censored to the right observations, fi the bandit horizon to n = 3 and the discount sequence is A 3 =,.9,.8. Choosing higher values for n is feasible and to higher values correspond higher processing times. We generate X l and Y l from the two arms, for l =,..., T =. Then for each X l we generate T copies of X 2 from the first arm, and for each Y l we generate T copies of Y 2 from the second arm. Sampling from prior and posterior beta-stacy processes is done, respectively, with Algorithm A and B in Al Labadi and Zarepour 23. See also De Blasi 27 for an alternative way of simulating from the beta-stacy process. For each scenario, we evaluate µ,, µ,, 2, µ 2,y and µ 2,y,y 2 ; we then evaluate {α, β }, {α 2, β 2 }; A 3, reported in Table for different values of M and M 2. Holding everything else constant, there is a tendency for {α, β }, {α 2, β 2 }; A 3 to increase in M 2 and decrease in M. This result is coherent with the eploitation-eploration trade-off mentioned in the Introduction, and suggests that the less is known about the arm, the more appealing is to select the arm, since higher information can be gained from its eploration. Furthermore, as both M and M 2 increase, {α, β }, {α 2, β 2 }; A 3 approaches µ µ 2 = 2.5. When {α, β }, {α 2, β 2 }; A 3 is positive, the optimal arm is the first one, and viceversa when is negative. Most of the times, the difference in the prior means makes the second arm the optimal one, ecept in cases with M << M 2 : there are situations where the higher uncertainty lower M around the base distribution of the first arm, relative to the high confidence in the base distribution of the second arm larger M 2, makes the first arm preferable to be eplored. 8

19 Table : Estimated {α, β }, {α 2, β 2 }; A 3, with β j = M.9 j and α j =.M.9 j for the first arm, and β 2 j = M 2.92 j and α 2 j =.8M 2.92 j for the second arm, for j =, 2,... and for different values of M i = α i N, i =, 2. M 2 M Discrete uniform beta-stacy parameters Consider the beta-stacy two-armed bandit problem with, for k =, 2, αi k = M k 2h k + i {µ k h k,...,µ k +h k }, 9 βi k hk + µ k i = M k 2h k + i {µ k h k,...,µ k +h k } + i<µk h k, where h k N and h k < µ k, and with M k = α k N. The parameter h k is positively related to the variability of the base distribution of the beta-stacy process of arm k, whilst M k is negatively related to the variability around the base measure. We fi µ = µ 2 = 2, and M = h =, to see how the epected advantage of one arm over the other is affected by a change in h 2 and in M 2, without being affected by different prior means. With µ and µ 2 equal, the choice od the optimal arm is entirely driven by the chance of eploring less-known more volatile arms. The rest of the setting is fied as in the previous eample. In Figure we report the value of {α, β }, {α 2, β 2 }; A 3 for different h2 and M 2. The green dotted line, corresponding to the case h = h 2 =, shows how a lower variability higher M 2 around the base measure of arm 2, makes this arm less interesting to eplore, in favor of arm. The same effect is caused by a change in the variability of the base measure of arm 2: for h 2 h < and for M = M 2 =, arm is preferred, up to a {α, β }, {α 2, β 2 }; A 3 6 for h2 h =. Viceversa, higher positive values of h 2 h correspond to higher preference for arm 2. Furthermore, the effect of a change in M 2 seems to dominate: as we increase M 2, the distances among the scenarios with different h 2 decrease and concentrate on positive epected advantages of the first arm over the second one. 5.3 Eponential beta-stacy parameters We etend to the two-armed bandit problem a numerical eample in Ferguson and Phadia 979 and Walker and Muliere 997. For the first arm fi β s = ep s/ and dα s = ep s//ds, whist for the second arm β 2 s = ep s/2 and dα 2 s = ep s/2/2ds, for s R +, so that µ < µ 2 a priori. We fi the bandit horizon to n = 3 and the discount sequence A 3 =,.9,.8. As in the discrete eample, we generate X l and Y l from the two arms, for l =,..., T = 5. Then for each X l we generate T copies of X 2 from the first arm, and for each Y l we gener- 9

20 Figure : Estimated {α, β }, {α 2, β 2 }; A 3, with parameters as specified in equations 9 and, for different variability around the base measure M 2 and different base measure variability h of the beta-stacy process from the second arm. The prior means are both equal to µ = µ 2 = 2, and M = h =. Advantage of arm Delta h 2 h = h 2 h = 5 h 2 h = h 2 h = 5 h 2 h = logm 2 ate T copies of Y 2 from the second arm, also using Algorithm A and B in Al Labadi and Zarepour 23. In the top-left plot of Figure 2, two randomly etracted prior distributions for the two arms are reported. For each scenario, we evaluate µ,, µ,, 2, µ 2,y and µ 2,y,y 2 ; we then evaluate {α, β }, {α 2, β 2 }; A 3 and {α,, β, }, {α 2, β 2 }; A 3, for,. In the top-right plot of Figure 2 these s are shown, when the observation in the first and second periods are respectively eact and censored. Monotonicity in of {α,, β, }, {α 2, β 2 }; A 3 is numerically verified. Note also that conditions 7 and 8 are satisfied, so that the two breakeven points eist. Since {α, β }, {α 2, β 2 }; A 3 = 2.47 <, b {α, β }, {α 2, β 2 }; A 3 d {α, β }, {α 2, β 2 }; A 3. In particular, the break-even observation for the stay-with-a-winner strategy is b {α, β }, {α 2, β 2 }; A 3 = 5.42, whilst the break-even for the switch-on-a-loser strategy is d {α, β }, {α 2, β 2 }; A 3 = Optimal strategies can be completely determined. For instance, arm 2 is optimally selected at the beginning since {α, β }, {α 2, β 2 }; A 3 <. If an eact observation from arm 2 is etracted equal, say, to y = 4, for the stay-with-a-winner strategy arm is optimal at this stage since {α, β }, {α 2,y =4, β 2,y =4 }; A 3 =.6, greater than {α, β }, {α 2, β 2 }; A 3 = At this stage arm would not be optimal for the switch-on-a-loser strategy, since {α, β }, {α 2,y=4, β 2,y=4 }; A 3 <. Suppose now a censored observation from arm equal, say, to 2 = 5.5 is observed, for which {α, 2=5.5, β, 2=5.5 }, {α 2,y =4, β 2,y =4 }; A 2 3 =, greater than {α, β }, {α 2,y=4, β 2,y=4 }; A 3. Therefore in the last stage arm is again optimal for the stay-with-a-winner strategy. Furthermore, in the bottom-left plot we report {α,, β, }, {α 2, β 2 }; A 3 2

21 Figure 2: Top-left: Two distributions sampled from the beta-stacy process of Walker and Muliere 997, with algorithm A in Al Labadi and Zarepour 23. The black line is from arm, the red one from arm 2, with parameters β s = ep s/, dα s = ep s//ds, β 2 s = ep s/2 and dα 2 s = ep s/2/2ds, for s,. Top-right: in blue {α,, β, }, {α 2, β 2 }; A 3, in green {α, β }, {α 2, β 2 }; A 3. The value is also highlighted in red. The intersections determine the break-even observations of the two strategies outlined in the tet. Bottom-left: in blue {α,, β, }, {α 2, β 2 }; A 3 for the beta-stacy bandit problem. The green line represents the corresponding quantity for the Dirichlet bandit that ignores the censorship. Bottom-right: Error probability of the stay-with-the winner strategy implied by the Dirichlet bandit problem, when censorship is neglected, as function of the first observation from arm. Sampled Distributions Beta Stacy Break Even Points F Dirichlet vs Beta Stacy Dirichlet Errors Error Probability when, is right-censored, and the epected advantage of arm if the data were incorrectly supposed to be eact. In other words, we compare the beta-stacy bandit problem with the corresponding Dirichlet bandit problem that ignores the censorship, to quantify the difference between the two in the simulation setting and highlight the relevance of properly accounting for right-censored data. There is a range of values from 7.97 to 6.95 at which the Dirichlet bandit problem would take the wrong strategy, since {α,, β, }, {α 2, β 2 }; A 3 would be of opposite sign, relative to the corresponding beta-stacy quantity. The break-even for the Dirichlet bandit is too low, since it judges the observations to be eact and therefore does not account for the increased chance of observing higher values in the future. If we repeat the eperiment 5 times for each value of from to 3, we can compute the probability for the Dirichlet bandit being in error in the choice of the optimal arm, after the observation of from arm. The probability is reported in the bottom-right plot of Figure 2. 2

22 6 Conclusions and further directions We have studied Bayesian nonparametric bandit problems with right-censored data, where two independent arms are generated by beta-stacy processes. The problem etends the one-armed and two-armed Dirichlet bandit problem of Clayton and Berry 985 and Chattopadhyay 994 since the beta-stacy process simplifies to the Dirichlet process for a special choice of the process parameters and in the absence of censored observations. We have shown some properties of the epected advantage of the first arm over the second arm, and the eistence of stay-with-a-winner and switch-on-a-loser optimal strategies, under non-restrictive constraints on the process parameters. Our framework can be further etended in several directions to different bandit strategies and multi-armed problems. First, the common formulation of the Bernoulli bandit can be replicated through the choice of Bernoulli base measures, centered on success probabilities that are learnt as observations are collected. Second, semi-uniform strategies with greedy behaviour can be addressed: epsilon-greedy and epsilon-first strategies Watkins 989; Sutton and Barto 998 that dedicate a proportion of phases to, respectively, random and purely eploratory phases, can be derived by randomizing the reinforcement learning mechanism of the arms parameters Muliere et al. 26; epsilon-decreasing and VBDE strategies Cesa-Bianchi and Fisher 998; Tokic 2 would require a beta-stacy parameter update mechanism depedent on the number of steps or on the values etracted from the arms. Third, probability matching strategies such as Thompson sampling Thompson 933 are easy to implement in our framework: in multi-armed bandit problems, beta- Stacy random distributions can be sampled from their posterior distributions at each stage, and, conditional to the sample, some reward related to the single arm may be computed and all rewards compared. Fourth, the etension to multi-armed contetual bandits Langford and Zhang 28 can be implemented by introducing dependence of the arm parameters on eternal regressors, or introducing dependence between Bayesian nonparametric arms through partial echangeability de Finetti 938, 959, for instance with the miture of Dirichlet processes of Antoniak 974, the Bivariate Dirichlet process of Walker and Muliere 23 or the Bivariate beta-stacy process of Muliere et al. 27. In this direction, Battiston et al. 26 adopt hierarchical Poisson- Dirichlet processes in multi-armed bandit problems. Fifth, the sequential nature of the Bayesian framework and the fleibility of nonparametric priors permit to handle more general cases of nonstationary bandit problems Garivier and Moulines 28, where the underlying base distribution of the beta-stacy processes can change during play: in this contet all past observations simply affect the modified priors of the new models. Finally, it could be of interest to apply the proposed framework to study the results in Gittins 979 and Gittins et al. 2 in multi-armed Bayesian nonparametric problems: in this case, Dynamic Allocation Indices may be computed from the simulations of the stochastic processes related to each arm. Acknoledgements We thank two anonymous referees, the Editor and the Associate Editor, for their detailed and insightful comments that contributed significanty to improve the paper. 22

23 References Al Labadi, L. and Zarepour, M. 23. A Bayesian nonparametric goodness of fit test for right censored data based on approimate samples from the beta-stacy process. The Canadian Journal of Statistics, 4: Antoniak, C Mitures of Dirichlet Processes with Applications to Bayesian Nonparametric Problems. Annals of Statistics, 2: Battiston, M., Favaro, S., and Teh, Y. 26. Multi-armed bandit for species discovery: a Bayesian nonparametric approach. Journal of the American Statistical Association. Forthcoming. Bellman, R A Problem in the Sequential Design of Eperiments. Sankhya, 6: Berry, D A Bernoulli Two-Armed Bandit. The Annals of Mathematical Statistics, 43: Berry, D. and Fristedt, B Bernoulli One-Armed Bandits - Arbitrary Discount Sequences. The Annals of Statistics, 7:86 5. Berry, D. and Fristedt, B Bandit Problems: Sequential Allocation of Eperiments. Chapman and Hall, New York. Bradt, R., Johnson, S., and Karlin, S On Sequential Designs for Maimizing the Sum of n Observations. The Annals of Mathematical Statistics, 33: Cesa-Bianchi, N. and Fisher, P Finite-time regret bounds for the multiarmed bandit problem. In Proceedings of the 5th International Conference on Machine Learning, pages 8, Morgan Kaufmann, San Francisco, CA. Chattopadhyay, M Two-Armed Dirichlet Bandits With Discounting. The Annals of Statistics, 22: Chernoff, H Optimal Stochastic Control. Sankhya, 3: Clayton, M. and Berry, D Bayesian Nonparametric Bandits. The Annals of Statistics, 3: De Blasi, P. 27. Simulation of the Beta-Stacy Process with Application to Analysis of Censored Data. In Encyclopedia of Statistics in Quality and Reliability, F. Ruggeri, R.S. Kennt and F. Faltin, pages 84 89, John Wiley & Sons Ltd., Chichester, UK. de Finetti, B La prévision: ses lois logiques, ses sources subjectives. Annales de l institut Henri Poincaré, 7: 68. de Finetti, B Sur la condition d equivalence partielle, VI Colloque Geneve. Acta. Sci. Ind. Paris, 739:5 8. de Finetti, B La probabilitá e la statistica nei rapporti con l induzione, secondo i diversi punti di vista. Atti corso CIME su Induzione e Statistica, Varenna. Doksum, K Tailfree and neutral random probabilities and their posterior distributions. Annals of Probability, 2:83 2. Ferguson, T A Bayesian Analysis of Some Nonparametric Problems. The Annals of Statistics, : Ferguson, T Prior Distributions on Spaces of Probability Measures. The Annals of Statistics, 2:

On Simulations form the Two-Parameter. Poisson-Dirichlet Process and the Normalized. Inverse-Gaussian Process

On Simulations form the Two-Parameter arxiv:1209.5359v1 [stat.co] 24 Sep 2012 Poisson-Dirichlet Process and the Normalized Inverse-Gaussian Process Luai Al Labadi and Mahmoud Zarepour May 8, 2018 ABSTRACT