Beta-Stacy Bandit Problems with Censored Data

Size: px
Start display at page:

Download "Beta-Stacy Bandit Problems with Censored Data"

Transcription

1 Beta-Stacy Bandit Problems with Censored Data Stefano Peluso Antonietta Mira Pietro Muliere December 4, 26 Abstract Eisting Bayesian nonparametric methodologies for bandit problems focus on eact observations, leaving a gap in those bandit applications where censored observations are crucial. We address this gap by etending a Bayesian nonparametric two-armed bandit problem to rightcensored data, where each arm is generated from a beta-stacy process as defined by Walker and Muliere 997. We prove the eistence of optimal stay-with-a-winner and switch-on-a-loser strategies, by imposing non-restrictive conditions on the parameters of the beta-stacy processes, including the special cases of the homogeneous process and the Dirichlet process. Numerical estimations and simulations for a variety of discrete and continuous state space settings are presented to illustrate the performance and fleibility of our method. Introduction In a discrete-time two-armed bandit problem, there are two stochastic processes the two arms and a sequential decision process a strategy that selects, at each time, which one of the two processes to observe. This selection is made on the basis of the previous observations, and it balances two conflicting benefits: the immediate payoff coming from the eploitation of an arm so far known to be better and the information concerning future payoffs coming from the eploration of a less known arm. A strategy is said to be optimal if it yields the maimal epected payoff, and an arm is said to be optimal if it is selected at the beginning of an optimal strategy. A strategy can be seen as a function that assigns, to each partial history of observations, the integer or 2 indicating the arm to be observed at the net stage Berry and Fristedt 985. With the eception of the simplest cases, eplicit specifications of optimal strategies are hindered by computational issues. As a consequence, following Chattopadhyay 994, optimal strategies can only be partially characterized in terms of break-even observations. We will consider two kinds of optimal strategies: stay-with-a-winner and switch-on-a-loser strategies. Assuming, without loss of generality, that a higher realized value of a random variable gives a higher payoff in the bandit problem, the break-even observation in a stay-with-a-winner strategy is that value at which the Corresponding author. Università Cattolica del Sacro Cuore, Department of Statistical Sciences, stefano.peluso@unicatt.it. Largo Gemelli, 236 Milan, Italy. Università della Svizzera Italiana, InterDisciplinary Institute of Data Science. Università della Svizzera Italiana, InterDisciplinary Institute of Data Science. Università degli Studi dell Insubria, Department of Sciences and High Technology. Università Commerciale Luigi Bocconi, Department of Decision Sciences. Bocconi Institute for Data Science and Analytics BIDSA.

2 epected advantage of choosing arm over arm 2 is null. Then the epected advantage is positive negative for values observed from arm greater lower than the break-even, and at these values arm arm 2 is chosen at the net stage a optimal. On the other hand, the break-even observation in a switch-on-a-loser strategy is that realized value at which the epected advantage remains unaltered at the net stage. For values observed from arm higher lower than the break-even, the epected advantage increases decreases relative to its initial value, and arm arm 2 is optimally chosen at the net stage. More formally, let X j and Y j be random variables generated from, respectively, arm and 2 at stage j. For any positive integer k, X, X 2,..., X k given F are i.i.d with probability measure F, and Y, Y 2,..., Y k given F 2 are i.i.d with probability measure F 2. The possibly infinite bandit horizon is n and A n = a, a 2,..., a n is a nonincreasing sequence of discount factors. Early eamples of bandit problems are treated in Robbins 952, Bellman 956 and Bradt et al Among later works, Chernoff 968 focuses on two Gaussian arms F i = Nµ i, σ 2, i =, 2, with unknown drifts and known constant variance; Berry 972 gives sufficient conditions for optimal selection in a Bernoulli two-armed bandit, F i = Bernp i, i =, 2, proving a staywith-a-winner strategy; Berry and Fristedt 979 characterize optimal strategies for Bernoulli one-armed bandits F = Bernp and F 2 known with regular discount sequences; Gittins 979 introduces dynamic allocation indices for optimal strategies in multi-armed bandits. Clayton and Berry 985 is the first paper that etends the bandit problem to a Bayesian nonparametric framework, considering a random F DP α and known F 2 : the probability measure associated to the random variables in one of the two arms is random and etracted from the Dirichlet process introduced in Ferguson 973. Dirichlet bandits are generalized to two-armed problems F i DP α i, i =, 2, in Chattopadhyay 994, where the eistence of stay-with-a-winner and switchon-a-loser optimal strategies is proven. Some other properties of Dirichlet bandits are studied in Yu 2. In the this paper we etend Bayesian nonparametric bandits to problems where each arm generates an infinite sequence of echangeable random variables de Finetti 937 having, as de Finetti measure, the beta-stacy process of Walker and Muliere 997. In our framework the two arms are random, with F i BSα i, β i, i =, 2, where α i and β i, discussed in Section 2, characterize the two beta-stacy processes. The Dirichlet bandit of Clayton and Berry 985 and Chattopadhyay 994 is an important special case of our setting, as is the bandit arms with the homogeneous process of Susarla and Van Ryzin 976 and Ferguson and Phadia 979. Our main result is that, under constraints on the parameters of the beta-stacy processes constraints that include the cases of the homogeneous process and the Dirichlet process, optimal stay-witha-winner and switch-on-a-loser strategies eist and can be used for dealing with right censored or eact observations. As specified in Phadia 23, the beta-stacy process belongs to the class of neutral to the right NTR processes introduced by Doksum 974, and it generalizes the Dirichlet process in two respects: more fleible prior information may be represented and, unlike the Dirichlet process, it is conjugate to right censored data. Also, when the prior process is assumed to be Dirichlet, the posterior distribution given right censored observations is a beta-stacy process. Beta-Stacy bandit problems are motivated by the importance of dealing with censored observations in typical bandit applications: the two arms can be two treatments available for a certain disease Berry and Fristedt 985; patients arrive one at a time and a treatment is assigned. The patient returns information on the effectiveness of the treatment, but this response can be censored, e.g. if the patient interrupts the treatment. Another classical eample of a setting where censored 2

3 observations may arise is managing a team of industrial scientists working on several research projects, and deciding which sequence of project eecutions would maimize the epected total value Nash 973: the observed project value is censored if the project is interrupted due, for instance, to reduced financial support. A final eample of a bandit problem with censored data is that of an industrial processor choosing which jobs to process to minimize the processing time Gittins et al. 2 and references therein: jobs may return censored observations if the task is unfinished for system breakdowns. The arms introduced in the present paper have random distribution functions generated by beta-stacy processes, permitting the analysis of bandit problems with censored observations, while retaining mathematical tractability. In Section 2 we introduce the beta-stacy process and the bandit problem. Then, we show the eistence of stay-with-a-winner and switch-on-a-loser strategies, for α and α 2 having discrete Section 3 and continuous support Section 4. We apply our methods to simulated studies in Section 5, and conclude with eamples of potential further applications and research directions in Section 6. 2 Preliminaries 2. Beta-Stacy process Let F be the space of cumulative distribution functions cdf s on,. A probability distribution is placed on F through the definition of a stochastic process F on,, A, where A is the Borel σ-field of subsets, such that, with probability, the sample paths of F are cdf s. Let the right continuous measure α, with α =, and the positive function β both be defined on,, and let {t k } be the countable set of discontinuity points of α, such that α{t k } = αt k αt k = S k > for all k, and S k is the jump in t k. Let α c t = αt t k t α{t k}, so that α c is a continuous measure. Note that in the rest of the paper, whenever clear, the eplicit dependence on α and on β of all quantities of interest has been omitted for notational clarity. Definition 2.. F is a beta-stacy process on,, A with parameters α and β, that is F BSα, β, if for all t, F t = ep{ Zt}, where Z is a Lévy process with Lévy measure for Zt given, for v >, by dn t v = dv ep v and with moment generating function given by log E φzt] = t k t where eps k Betaα{t k }, βt k. t loge ep φs k ] + ep vβs + α{s}dα c s t ep φv dn t v, We now state the two theorems from Walker and Muliere 997 we will use in the sequel, on the conjugacy of the beta-stacy process, distinguishing between discrete and continuous beta-stacy parameters. Theorem 2.2. Walker and Muliere 997 The posterior distribution of a beta-stacy process with discrete parameters {α j, β j }, j N, is also a beta-stacy process, with parameters α j + n j and 3

4 β j + m j, where n j is the number of eact observations at j and m j is the sum of the number of eact observations in {l : l > j} and censored observations in {l : l j}. The theorem above clarifies an important property of the beta-stacy process: its conjugacy under sampling, possibly with right censoring. Furthermore, to have a.s. a cdf, the discrete parameters α and β in Walker and Muliere 997 are required to satisfy the condition α j =, j N. β j + α j j In Theorem 4 of Walker and Muliere 997 the conjugacy under sampling with possible right censorship is formalized in the continuous case: Theorem 2.3. Walker and Muliere 997 The posterior distribution of a beta-stacy process with continuous parameters {α, β } is also a beta-stacy process, with parameters αt + N{t} and βt + Mt, where N{s} is the counting process for uncensored observations in s, and Ms is the sum of the number of eact observations in {t : t > s} and of censored observations in {t : t s}. A posteriori the beta-stacy process with continuous α and β can be represented as F t = e Zct+Z f t Ferguson 974; Walker and Damien 998, where the component Z f incorporates the fied points of discontinuity in the locations of the eact observations, and Z c is the residual part of the Lévy process. In particular, the corresponding jump S i at location t i, where the eact observation occurred, is such that ep S i BetaN{t i }, βt i + Mt i. Furthermore, from Walker and Muliere 997, continuous α and β satisfy the condition dαs/βs =. In the rest of the paper, we only consider beta-stacy processes, but it is reasonable to assume that the results can be generalized to the class of NTR processes. The NTR process of Doksum 974 may be viewed in terms of a process with independent non-negative increments, via the parameterization F t = e Zt, t R +, where Z is a process with independent nonnegative increments. The beta-stacy process is a NTR process where Z is a log-beta process, that keeps the conjugacy property under sampling eact or right censored observations. When βt = αt, for all t, we obtain the Dirichlet process of Ferguson 973. Another important special case is the homogenous process of Susarla and Van Ryzin 976 and Ferguson and Phadia 979, arising when βt = β constant for all t. 2.2 Bandit problem In the proposed framework {α, β }, {α 2, β 2 }; A n denote the two-armed bandit problem where arm i has a beta-stacy prior with parameters α i, β i, for i =, 2, and A n = a, a 2,..., a n is a nonincreasing discount sequence. Therefore F BSα, β and the observation at stage from arm is a realization of the random variable X F F. At the generic k-th stage, if the sample has been collected in the past stages from the observation of arm, the new observation 4

5 will be the realized value of X k F, F, where F is still a random distribution from a beta-stacy process with updated parameters, thanks to the conjugacy theorems provided in the previous subsection. Note that the sample can be partitioned in censored and eact observations: = eact, cens ]. Equivalent definitions hold for Y, Y 2,..., Y k and the observed sample y from arm 2. By censored observation we mean that the observed value at stage k from arm is the realized value of X k = min{x k, C k}, the minimum between a true eact value at stage k and a censoring time C k, and equivalently for Y k from arm 2. A special choice of β i, detailed below, and absence of censored observations reduce our setting to the Dirichlet bandit problem {α }, {α 2 }; A n of Chattopadhyay 994. Similarly to Berry and Fristedt 985, we let W {α, β }, {α 2, β 2 }; A n be the epected payoff under an optimal strategy; W i {α, β }, {α 2, β 2 }; A n the epected payoff of a strategy starting from arm i and proceeding optimally; {α, β }, {α 2, β 2 }; A n the epected advantage of initially choosing arm over arm 2 assuming optimal continuation in both stay-with-a-winner and switchon-a-loser strategies; finally and + {α, β }, {α 2, β 2 }; A n = ma, {α, β }, {α 2, β 2 }; A n {α, β }, {α 2, β 2 }; A n = min, {α, β }, {α 2, β 2 }; A n. More generally, we define the discount sequence A k n = a k+, a k+2,..., a n, and we now introduce the two strategies that we study: stay-with-a-winner and switch-on-a-loser strategies. Definition 2.4. For X k F F, Y k F 2 F 2, F BSα, β and F 2 BSα 2, β 2, at the generic stage k, the stay-with-the winner strategy selects arm at stage k + if one of the following two conditions holds: at stage k, X k = eact or censored from arm is observed and {α,, β, }, {α 2, β 2 }; A n {α, β }, {α 2, β 2 }; A n, at stage k, Y k = y eact or censored from arm 2 is observed and {α, β }, {α 2,y, β 2,y }; A n {α, β }, {α 2, β 2 }; A n, and selects arm 2 otherwise. The switch-on-a-loser strategy selects arm at stage k + if one of the following two conditions holds: at stage k, X k = eact or censored from arm is observed and {α,, β, }, {α 2, β 2 }; A n, at stage k, Y k = y eact or censored from arm 2 is observed and and selects arm 2 otherwise. {α, β }, {α 2,y, β 2,y }; A n, Both strategies at stage k = select arm if {α, β }, {α 2, β 2 }; A n > and arm 2 otherwise. The break-even point of a strategy is that realized value of X from the first arm or Y from the second arm at which the strategy is indifferent in the choice of the two arms, and a strategy is said to eist if its break-even point eists. 5

6 3 Break-Even Observations: Discrete Case The optimal strategy is given in terms of a break-even observation: the first second arm is observed until it yields a value higher lower than the break-even one. In this section we prove that a breakeven observation for a stay-with-a-winner and a switch-on-a-loser optimal strategies eist for the discrete beta-stacy two-armed bandit problem, under certain conditions on the parameters of the processes governing the arms. Let F i the random distribution function corresponding to arm i, with X k and Y k having supports in N for all k. Let α i = α i, αi 2,... and βi = β i, βi 2,... for i =, 2. From the construction of the discrete time beta-stacy process, Walker and Muliere 997 show that PX = j {α, β }, j N, is j PX = j {α, β } = α j αj + β j i= α i αi + β i and similarly for Y. Then the prior means of the two arms are respectively and Given observations E X X {α, β }] = = E Y Y {α 2, β 2 }] = + j= + j= i<j + j= i<j PX j {α, β } α i αi + =: µ β i α2 i αi 2 + =: µ 2. β2 i =,..., k from arm, y = y,..., y k from arm 2, the conditional epectation of any function hx can be computed using Theorem 2.2, and it is denoted with E X hx ], where we remind that we omit in all quantities of interest the dependence from {α, β } and {α 2, β 2 }. Then PX = j is therefore, for j N, where with P X = j = α, j α, j + β, j j i= α, j := αj + n j, β, j := βj + m j, α, i α, i + β, i n j : number of eact observations equal to j, m j : number of eact observations in {l : l > j} and censored observations in {l : l j}, 6

7 and with dependence of n j and m j from neglected for ease of notation. The posterior mean is therefore + + α, i E X X ] = PX j = α, i + β, =: µ, i j= Following the notation introduced above, Therefore, and W {α, β }, {α 2, β 2 }; A n j= i<j = ma { W {α, β }, {α 2, β 2 }; A n, W 2 {α, β }, {α 2, β 2 }; A n }, {α, β }, {α 2, β 2 }; A n = W {α, β }, {α 2, β 2 }; A n W 2 {α, β }, {α 2, β 2 }; A n, W {α, β }, {α 2, β 2 }; A n = W {α, β }, {α 2, β 2 }; A n + {α, β }, {α 2, β 2 }; A n, W 2 {α, β }, {α 2, β 2 }; A n = W {α, β }, {α 2, β 2 }; A n + {α, β }, {α 2, β 2 }; A n. W {α, β }, {α 2, β 2 }; A n = a µ + E X W {α,x, β,x }, {α 2, β 2 }; A n ] = a µ + E X W 2 {α,x, β,x }, {α 2, β 2 }; A n ] +E X + {α,x, β,x }, {α 2, β 2 }; A n ], W 2 {α, β }, {α 2, β 2 }; A n = a µ 2 + E Y W {α, β }, {α 2,Y, β 2,Y }; A n ] = a µ 2 + E Y W {α, β }, {α 2,Y, β 2,Y }; A n ] E Y {α, β }, {α 2,Y, β 2,Y }; A n ], {α, β }, {α 2, β 2 }; A n = W {α, β }, {α 2, β 2 }; A n W 2 {α, β }, {α 2, β 2 }; A n = a µ + E X W 2 {α, β }, {α 2, β 2 }; A n ] a µ 2 E Y W {α, β }, {α 2,Y, β 2,Y }; A n ] +E X + {α,x, β,x }, {α 2, β 2 }; A n ] +E Y {α, β }, {α 2,Y, β 2,Y }; A n ]. Using arguments similar to those in Berry and Fristedt 985 and Chattopadhyay 994, a µ + E X W 2 {α,x, β,x }, {α 2, β 2 }; A n ] is the epected payoff of first selecting arm, followed by arm 2 and then continuing optimally. Similarly, a µ 2 + E Y W {α, β }, {α 2,Y, β 2,Y }; A n ] 7

8 is the epected payoff of selecting arm 2 first and arm second and then continuing optimally. Subtracting the second payoff from the first one we obtain a a 2 µ µ 2. From this fact, {α, β }, {α 2, β 2 }; A n = a a 2 µ µ 2 +E X + {α,x, β,x }, {α 2, β 2 }; A n ] +E Y {α, β }, {α 2,Y, β 2,Y }; A n ] 2 In the net proposition we show that, given an eact or a right censored observation X = from arm, the advantage of choosing arm over arm 2 increases as increases. Proposition 3.. For all {α, β } and {α 2, β 2 } such that βj β j+ + α j+, for all j N, and for all nonincreasing discount sequences A n, {α,, β, }, {α 2, β 2 }; A n is nondecreasing in. Proof. By induction, for n =, we have {α,, β, }, {α 2, β 2 }; A = a + j= i<j α, i α, i + β, i µ 2 = a µ, µ 2. 3 Fi = +. We first prove that µ, µ,. For this purpose, we study separately the j-terms in the sum of µ, and µ, when j, j = and j >. When is an eact observation, The j-terms with j are the same in µ, and µ,. For j =, in µ, we have i<j β i + β αi + β i + α + β +, whilst in µ,, and the -term of µ, i<j is weakly higher. β i + β + αi + β i + α + β +, For j >, the j-term of µ, is β i + β β αi + β i + α + β + α + β i< <i<j β i αi +, β i whilst the j-term of µ, i< is β i + β + αi + β i + α + β + 8 β α + β + <i<j β i αi + ; β i

9 the j-term of µ, is weakly higher if β + α + β + β α + β, equivalent to β α + β, for all and for all >. Similarly, the monotonicity of µ, can be proved when is a right censored observation: the j- terms with k are the same in µ, and µ,, whilst for j > the two terms in, respectively, µ, and µ, are β i + β + β βi α i< i + β i + α + β + α + β α <i<j i +, β i β i + β + β + βi αi + β i + α + β + α + β + αi +, β i i< <i<j where the term in µ, is weakly higher. Then, for n = the statement is true since µ, is nondecreasing in and a. From the induction hypothesis, we assume the monotonic property for n = m. By 2, {α,, β, }, {α 2, β 2 }; A m = a a 2 µ, µ 2 ] +E X + {α,,x, β,,x }, {α 2, β 2 }; A m +E Y {α,, β, }, {α 2,Y, β 2,Y }; A m ]. 4 The first term in the right hand side of 4 is nondecreasing in since µ, is nondecreasing in and a a 2. The second and third term are nondecreasing in from the induction hypothesis. Remark 3.2. The constraints βj β j+ + α j+ are needed for the monotonicity of µ,. The condition is not required if all observations are censored, but is necessary if some observations are eact. The constraint is naturally verified in the Dirichlet two-armed problem, obtained from the beta-stacy in the special case of βj = β j+ + α j+, j N. Also, a bandit problem with simple homogeneous processes Susarla and Van Ryzin 976; Ferguson and Phadia 979 for each arm, corresponding to the case βj+ = β j for all j N, satisfies the constraints. The following propositions will be used in Theorems 3.5 and 3.6. Proposition 3.3. For all {α, β } and {α 2, β 2 } as in Proposition 3. and for all nonincreasing discount sequences A n, if the condition α i αi + β i + > 5 is verified, then and i<+ lim {α,, β, }, {α 2, β 2 }; A n = + lim {α,, β, }, {α 2, β 2 }; A n = min {α,, β, }, {α 2, β 2 }; A n. 9

10 Proof. The result for the is a direct consequence of the monotonicity property shown in Proposition 3.. We are left to prove the limit to +. Consider increasing to. By induction, for n =, µ, diverges to + as + since lim + µ, + j= i<+ = i<+ β i + α i + β i + αi + αi + β i + = +. 6 j= Then, {α,, β, }, {α 2, β 2 }; A = a µ, µ 2 goes to + since µ, is divergent and a >. Assume now that the statement is true for n = m. By 4, {α,, β, }, {α 2, β 2 }; A m = a a 2 µ, µ 2 ] +E X + {α,,x, β,,x }, {α 2, β 2 }; A m +E Y {α,, β, }, {α 2,Y, β 2,Y }; A m ]. For the first term a a 2 µ, µ 2 on the right hand side of the formula above there are two possible cases: a a 2 > or a a 2 =. In the latter case the term is zero, while when a a 2 > it diverges to +. For the second term, note that + {α,,x, β,,x }, {α 2, β 2 }; A m is a nondecreasing sequence in by Proposition 3., bounded below by by definition and divergent to + by the induction hypothesis. We can then apply the monotone convergence theorem and obtain ] lim E X + {α,,x, β,,x }, {α 2, β 2 }; A + m ] = E X lim + + {α,,x, β,,x }, {α 2, β 2 }; A m = +. For the third term, notice that {α,, β, }, {α 2,y, β 2,y }; A m = {α 2,y, β 2,y }, {α,, β, }; A m. Furthermore, + {α 2,y, β 2,y }, {α,, β, }; A m converges to as l diverges, and it is bounded above by + {α 2,y, β 2,y }, {α,=, β,= }; A m. By the dominated convergence theorem we have lim E Y {α,, β, }, {α 2,Y, β 2,Y }; A + m ] = lim E Y + {α 2,Y, β 2,Y }, {α,, β, }; A + m ] ] = E Y lim + + {α 2,Y, β 2,Y }, {α,, β, }; A m =. Remark 3.4. In Proposition 3.3 condition 5 is required, and the beta-stacy process in the discrete case is defined such that condition is verified. Both conditions are satisfied when their ratio diverges, that is when lim + α i + βi j βi αi + β i + = +. i<j

11 This constraint does not pose restrictions, and it is satisfied, as epected, in the special cases of the homogeneous process and the Dirichlet process. We finally state the following theorems, showing that there eist break-even points determining, respectively, a stay-with-a-winner and a switch-on-a-loser optimal strategy. The theorems generalize Theorem 2. and Theorem 2.2 of Chattopadhyay 994, proving the eistence of the break-even observations in a contet more general than the Dirichlet arms, at the cost of some restrictions on the choice of the parameters of the beta-stacy process. Theorem 3.5. For all {α, β } and {α 2, β 2 } as in Proposition 3., for all nonincreasing discount sequences A n and n 2, if the condition lim {α,, β, }, {α 2, β 2 }; A n {α, β }, {α 2, β 2 }; A n 7 holds, there eists a break-even point b {α, β }, {α 2, β 2 }; A n such that {α,, β, }, {α 2, β 2 }; A n {α, β }, {α 2, β 2 }; A n if b {α, β }, {α 2, β 2 }; A n and {α,, β, }, {α 2, β 2 }; A n {α, β }, {α 2, β 2 }; A n if b {α, β }, {α 2, β 2 }; A n. Proof. From Proposition 3., {α,, β, }, {α 2, β 2 }; A n is non decreasing in, starting from a value lower than and {α, β }, {α 2, β 2 }; A n and going to infinity Proposition 3.3. This is enough to claim that there eists a break-even point b which satisfies the properties in the theorem. Theorem 3.6. For all {α, β } and {α 2, β 2 } as in Proposition 3., for all nonincreasing discount sequences A n and n 2, if the condition lim {α,, β, }, {α 2, β 2 }; A n 8 holds, there eists a break-even point d {α, β }, {α 2, β 2 }; A n such that {α,, β, }, {α 2, β 2 }; A n if d {α, β }, {α 2, β 2 }; A n and {α,, β, }, {α 2, β 2 }; A n if d {α, β }, {α 2, β 2 }; A n. Proof. As in the proof of Theorem 3.5, there eists a point d satisfying the properties. Remark 3.7. The domain of {α,, β, }, {α 2, β 2 }; A n, as a function of, is N, equipped with a discrete topology for which every subset is open, and all functions from a discrete topological space to any topological space are continuous. In the two theorems above, it is not possible to apply the intermediate value theorem for continuous functions since {α,, β, }, {α 2, β 2 }; A n, seen as a function of, has a domain that is not a connected topological space. As a consequence it is not guaranteed that an observation eactly equal to the break-even can be observed since it could be that b {α, β }, {α 2, β 2 }; A n / N or d {α, β }, {α 2, β 2 }; A n / N. Still, the break-evens give two reference points for optimal choices between the arms.

12 Remark 3.8. Conditions 7 and 8 are needed because the support of the base measure of the beta-stacy process in limited to, +. Both Clayton and Berry 985 and Chattopadhyay 994 notice that when the support is bounded, the eistence of break-even observations requires additional conditions at the boundaries. In particular, both conditions intuitively mean that if a very bad observation from arm is etracted close to, the epected advantage of choosing that arm in the net time instant reduces and the alternative arm is preferred under the current strategy. We now study the two-armed bandit problem when the base measures of the beta-stacy processes are continuous measures. 4 Break-Even Observations: Continuous Case In the present section we treat the continuous beta-stacy two-armed problem. X k and Y k, respectively from arm and arm 2 at stage k, can assume values in R + =, + and α and β are, respectively, a continuous measure and a positive function, both defined on R +. α is assumed, without loss of generality, to have no discontinuity points. Equation 8 in Walker and Muliere 997, t R +, says that a,a 2 ] PX t = ep =:,t] dα } s β s dα s β s + α {s} where,t] denotes the product integral, an operator commonly used in the survival analysis literature. For any partition a = z < z < < z m = a 2, if l m = ma i=,...,m i i, the product integral is defined as { + dγz} := lim { + Γz j Γz j }. l m See Gill and Johansen 99 for a survey of applications of product integrals to survival analysis. Then, we can compute, in analogy with the discrete case, E X X] = PX > tdt = dα s β dt =: µ s and, similarly, E Y Y ] =,t] j,t] dα2 s β 2 dt =: µ 2 s assuming, without loss of generality, that µ µ 2. The conditional epectation of a function h X, given observations from the first arm, =,..., k, is denoted EhX ], removing the eplicit dependence on α and β. We further define α, s = α s + N{s} and β, s = β s + Ms 2

13 for all s,. Analogous notation is used for h 2 Y given y = y,..., y k from the second arm. Therefore, from Theorem 2.3, dα } s PX t = ep β s + N{s} + Ms N{t} =:,t] βt + N{t} + Mt dα, s β, s + α, {s} and, partitioning = eact, cens ] for respectively eact and censored observations, the posterior mean is E X X ] = PX / eact PX > t X / eact, dt + PX eact PX = X eact, eact = PX / eact PX eact,t] dα, cens s β, censs + α, cens{s} eact PX = X eact, =: µ, dt + The function {α, β }, {α 2, β 2 }; A n can be epressed as in 2. In the following propositions we study the properties of, with the aim of proving the eistence of break-even observations determining optimal stay-with-a-winner and switch-on-a-loser strategies. Proposition 4.. For all {α, β } and {α 2, β 2 } such that β α and for all nonincreasing discount sequences A n, {α,, β, }, {α 2, β 2 }; A n is nondecreasing in, for all,. Proof. By induction, for n =, and censored to the right, {α,, β, }, {α 2, β 2 }; A = a µ, µ 2 = a PX > t dt µ 2 = a dα s + dns β dt µ 2 s + N{s} + Ms,t] where Ns = N{, s]}. We first show that µ, is nondecreasing in, for being censored to the 3

14 right. Notice that µ, can be written as dα } s µ, = ep β s + N{s} + Ms N{t} dt βt + N{t} + Mt dα } s = ep dt + ep { β s + dα s β s + + t dα } s β dt. s The integrand in µ, as a function of t has a discontinuity point when t =, but its value at this point is ignored since it does not contribute to the evaluation of µ,. Take now any >, and separate the cases t, t, and t : When t, the integrands in µ, and µ, When t, the integrand in µ, is { ep whilst the integral in µ, is { ep are the same. dα s t β s + + dα } s β, s dα } s t β s + + dα s β, s with the integrand in µ, always greater than or equal to the one in µ,. Finally, when t,, the integrands in µ, and µ, are, respectively, { dα s t ep β s + + dα } s β s and dα } s ep β, s + proving that µ, µ, and that the statement is true for n =. On the other hand, when is not censored µ, = PX µ + PX = { dα } s = ep β s + β + { µ + ep dα } s β s + β +, and µ, µ, for all >, if and only if PX =, the probability of X from arm being equal to the previous eact observation, is nondecreasing in. This condition is equivalent to β α, as required in the proposition. By induction, assuming the monotonicity property for n = m, the proof is completed along the lines of Proposition 3.. 4

15 Remark 4.2. As in the discrete case, monotonicity of the posterior mean is recovered under a condition on the parameters of the beta-stacy process. The condition βj β j+ + α j+, j N, in Proposition 3. finds its continuous analogue β α, R +, in Proposition 4.. It is important to note that the condition is not required if only censored observations are etracted from the arms, but it is necessary in case of eact observations. As in the discrete section, the special cases of Dirichlet and homogeneous processes are included, and they correspond, respectively, to β = α and to β =. Proposition 4.3. For all {α, β } and {α 2, β 2 } as in Proposition 4. and such that dα s/β s+ <, for all R + and all nonincreasing discount sequences A n, and lim {α,, β, }, {α 2, β 2 }; A n = + lim {α,, β, }, {α 2, β 2 }; A n = min {α,, β, }, {α 2, β 2 }; A n. Proof. The case is an immediate consequence of Proposition 4.. To study the case where diverges, we proceed by induction. Note that for n =, {α,, β, }, {α 2, β 2 }; A = a µ, µ 2 and lim µ, = + ep ep { dα } s β dt s + } dt = +, dα s β s + where the last equality is true since dα s/β s + < is equivalent to { dα } s ep β >. s + This proves that lim + {α,, β, }, {α 2, β 2 }; A =. The rest of the proof follows the same lines as the proof of Proposition 3.3. Remark 4.4. In the above proposition the additional condition required. Note that the beta-stacy process is defined such that dα s/β s + < is dα s/β s =. These two improper integrals should have a different asymptotic behavior, a condition that is verified when, from the limit comparison test for integrals, the limit of the ratio of the two integrands is different from, that is when lim + s β. s For finite β s, this is satisfied, and, as epected, includes the special cases of the homogeneous process and the Dirichlet process. In short, the additional constraint rules out cases of eploding β s. Usually, for some base distribution F, β s = M F s,, converging to see Walker and Muliere 997. Proposition 4.5. For all {α, β } and {α 2, β 2 } and all nonincreasing discount sequences A n, {α,, β, }, {α 2, β 2 }; A n is a continuous function of, for, censored to the right. 5

16 Proof. It is enough to show that, for any increasing or decreasing sequence {} converging to, {α,, β, }, {α 2, β 2 }; A n {α,, β, }, {α 2, β 2 }; A n. We provide the proof only for an increasing sequence {}, since the decreasing sequence case is similar. By induction, first fi n =, so that {α,, β, }, {α 2, β 2 }; A = a µ, µ 2. The continuity of in is shown through the continuity of µ,. Taking any increasing sequence {} converging to, then lim µ, = lim + = + ep ep ep { ep { dα s β s + dα s β s + dα } s β dt s + dα s β s + + } dt } lim t where the last equality is justified by the continuity in of } ep dt and ep dα s β s + dα } s β dt s ep { dα } s β dt, s dα } s β. s + To finally see that µ, is continuous, we need to prove the continuity in of the function dα } s H := ep β dt. s Note that the function H is a parameterized Riemann integral, whose integration etremes are also dependent on the parameter. H is given by the composition of two functions: dα } s H 2 h, := ep β dt s h and h =. The latter is obviously continuous. For the continuity of H 2, note that t { ep dα } s β, s and we can apply the dominated convergence theorem to the sequence of functions in dα } s ep β s for any given value of h,. Then lim H 2 h, = lim = h h ep ep dα s β s 6 dα s β s } dt } dt = H 2 h,.

17 Assume now that the statement is true for n = m. By 4, {α,, β, }, {α 2, β 2 }; A m = a a 2 µ, µ 2 ] +E X + {α,,x, β,,x }, {α 2, β 2 }; A m +E Y {α,, β, }, {α 2,Y, β 2,Y }; A m ]. The first term a a 2 µ, µ 2 on the right hand side of the formula is continuous in this follows from the continuity of µ,. For the second term, note that + {α,,x, β,,x }, {α 2, β 2 }; A m is a nondecreasing sequence in by Proposition 4., bounded below by by its definition and convergent to + {α,,x, β,,x }, {α 2, β 2 }; A m by the induction hypothesis. We can then apply the monotone convergence theorem: ] lim E X + {α,,x, β,,x }, {α 2, β 2 }; A m ] = E X lim + {α,,x, β,,x }, {α 2, β 2 }; A m. For the third term, notice that {α,, β, }, {α 2,y, β 2,y }; A m = {α 2,y, β 2,y }, {α,, β, }; A m. Furthermore, + {α 2,y, β 2,y }, {α,, β, }; A m converges to + {α 2,y, β 2,y }, {α,, β, }; A m as converges by the induction hypothesis, and it is bounded above by + {α 2,y, β 2,y }, {α,=, β,= }; A m. By the dominated convergence theorem, lim E Y {α,, β, }, {α 2,Y, β 2,Y }; A m ] = lim E Y + {α 2,Y, β 2,Y }, {α,, β, }; A m ] ] = E Y lim + {α 2,Y, β 2,Y }, {α,, β, }; A m = E Y {α,, β, }, {α 2,Y, β 2,Y }; A m ], proving continuity for the generic bandit horizon n. Theorem 4.6. For all {α, β } and {α 2, β 2 } as in Proposition 4.3, for all nonincreasing discount sequences A n and n 2, if condition 7 holds, there eists a break-even point b {α, β }, {α 2, β 2 }; A n such that {α,, β, }, {α 2, β 2 }; A n {α, β }, {α 2, β 2 }; A n if b {α, β }, {α 2, β 2 }; A n and {α,, β, }, {α 2, β 2 }; A n {α, β }, {α 2, β 2 }; A n if b {α, β }, {α 2, β 2 }; A n. 7

18 Proof. From Propositions 4. and 4.5, {α,, β, }, {α 2, β 2 }; A n is nondecreasing in and continuous the latter only for censored, starting from a value lower than {α, β }, {α 2, β 2 }; A n and growing to infinity Proposition 4.3. Then there eists a point b {α, β }, {α 2, β 2 }; A n which satisfies the properties of the theorem. Theorem 4.7. For all {α, β } and {α 2, β 2 } as in Proposition 4.3, for all nonincreasing discount sequences A n and n 2, if condition 8 holds, there eists a break-even point d {α, β }, {α 2, β 2 }; A n such that {α,, β, }, {α 2, β 2 }; A n if d {α, β }, {α 2, β 2 }; A n and {α,, β, }, {α 2, β 2 }; A n if d {α, β }, {α 2, β 2 }; A n. Proof. As in Theorem 4.6, there eists a point d satisfying the properties. Remark 4.8. For Theorems 4.6 and 4.7, Remark 3.8 on the boundary conditions is still valid. Remark 3.7 can be applied if there are eact observations, whilst the intermediate value theorem for continuous functions guarantees that an observation eactly equal to the break-even can be observed in case of censored data. 5 Applications 5. Geometric beta-stacy parameters Fi, for all j N, βj = M.9 j and αj =.M.9 j for the first arm, and βj 2 = M 2.92 j and αj 2 =.8M 2.92 j for the second arm, for j =, 2,..., with µ < µ 2 a priori. Let M i = α i N, i =, 2 be the total masses of the measures α and α 2. Note that different values of M and M 2 do not affect prior means, µ and µ 2, but only posterior means. We observe censored to the right observations, fi the bandit horizon to n = 3 and the discount sequence is A 3 =,.9,.8. Choosing higher values for n is feasible and to higher values correspond higher processing times. We generate X l and Y l from the two arms, for l =,..., T =. Then for each X l we generate T copies of X 2 from the first arm, and for each Y l we generate T copies of Y 2 from the second arm. Sampling from prior and posterior beta-stacy processes is done, respectively, with Algorithm A and B in Al Labadi and Zarepour 23. See also De Blasi 27 for an alternative way of simulating from the beta-stacy process. For each scenario, we evaluate µ,, µ,, 2, µ 2,y and µ 2,y,y 2 ; we then evaluate {α, β }, {α 2, β 2 }; A 3, reported in Table for different values of M and M 2. Holding everything else constant, there is a tendency for {α, β }, {α 2, β 2 }; A 3 to increase in M 2 and decrease in M. This result is coherent with the eploitation-eploration trade-off mentioned in the Introduction, and suggests that the less is known about the arm, the more appealing is to select the arm, since higher information can be gained from its eploration. Furthermore, as both M and M 2 increase, {α, β }, {α 2, β 2 }; A 3 approaches µ µ 2 = 2.5. When {α, β }, {α 2, β 2 }; A 3 is positive, the optimal arm is the first one, and viceversa when is negative. Most of the times, the difference in the prior means makes the second arm the optimal one, ecept in cases with M << M 2 : there are situations where the higher uncertainty lower M around the base distribution of the first arm, relative to the high confidence in the base distribution of the second arm larger M 2, makes the first arm preferable to be eplored. 8

19 Table : Estimated {α, β }, {α 2, β 2 }; A 3, with β j = M.9 j and α j =.M.9 j for the first arm, and β 2 j = M 2.92 j and α 2 j =.8M 2.92 j for the second arm, for j =, 2,... and for different values of M i = α i N, i =, 2. M 2 M Discrete uniform beta-stacy parameters Consider the beta-stacy two-armed bandit problem with, for k =, 2, αi k = M k 2h k + i {µ k h k,...,µ k +h k }, 9 βi k hk + µ k i = M k 2h k + i {µ k h k,...,µ k +h k } + i<µk h k, where h k N and h k < µ k, and with M k = α k N. The parameter h k is positively related to the variability of the base distribution of the beta-stacy process of arm k, whilst M k is negatively related to the variability around the base measure. We fi µ = µ 2 = 2, and M = h =, to see how the epected advantage of one arm over the other is affected by a change in h 2 and in M 2, without being affected by different prior means. With µ and µ 2 equal, the choice od the optimal arm is entirely driven by the chance of eploring less-known more volatile arms. The rest of the setting is fied as in the previous eample. In Figure we report the value of {α, β }, {α 2, β 2 }; A 3 for different h2 and M 2. The green dotted line, corresponding to the case h = h 2 =, shows how a lower variability higher M 2 around the base measure of arm 2, makes this arm less interesting to eplore, in favor of arm. The same effect is caused by a change in the variability of the base measure of arm 2: for h 2 h < and for M = M 2 =, arm is preferred, up to a {α, β }, {α 2, β 2 }; A 3 6 for h2 h =. Viceversa, higher positive values of h 2 h correspond to higher preference for arm 2. Furthermore, the effect of a change in M 2 seems to dominate: as we increase M 2, the distances among the scenarios with different h 2 decrease and concentrate on positive epected advantages of the first arm over the second one. 5.3 Eponential beta-stacy parameters We etend to the two-armed bandit problem a numerical eample in Ferguson and Phadia 979 and Walker and Muliere 997. For the first arm fi β s = ep s/ and dα s = ep s//ds, whist for the second arm β 2 s = ep s/2 and dα 2 s = ep s/2/2ds, for s R +, so that µ < µ 2 a priori. We fi the bandit horizon to n = 3 and the discount sequence A 3 =,.9,.8. As in the discrete eample, we generate X l and Y l from the two arms, for l =,..., T = 5. Then for each X l we generate T copies of X 2 from the first arm, and for each Y l we gener- 9

20 Figure : Estimated {α, β }, {α 2, β 2 }; A 3, with parameters as specified in equations 9 and, for different variability around the base measure M 2 and different base measure variability h of the beta-stacy process from the second arm. The prior means are both equal to µ = µ 2 = 2, and M = h =. Advantage of arm Delta h 2 h = h 2 h = 5 h 2 h = h 2 h = 5 h 2 h = logm 2 ate T copies of Y 2 from the second arm, also using Algorithm A and B in Al Labadi and Zarepour 23. In the top-left plot of Figure 2, two randomly etracted prior distributions for the two arms are reported. For each scenario, we evaluate µ,, µ,, 2, µ 2,y and µ 2,y,y 2 ; we then evaluate {α, β }, {α 2, β 2 }; A 3 and {α,, β, }, {α 2, β 2 }; A 3, for,. In the top-right plot of Figure 2 these s are shown, when the observation in the first and second periods are respectively eact and censored. Monotonicity in of {α,, β, }, {α 2, β 2 }; A 3 is numerically verified. Note also that conditions 7 and 8 are satisfied, so that the two breakeven points eist. Since {α, β }, {α 2, β 2 }; A 3 = 2.47 <, b {α, β }, {α 2, β 2 }; A 3 d {α, β }, {α 2, β 2 }; A 3. In particular, the break-even observation for the stay-with-a-winner strategy is b {α, β }, {α 2, β 2 }; A 3 = 5.42, whilst the break-even for the switch-on-a-loser strategy is d {α, β }, {α 2, β 2 }; A 3 = Optimal strategies can be completely determined. For instance, arm 2 is optimally selected at the beginning since {α, β }, {α 2, β 2 }; A 3 <. If an eact observation from arm 2 is etracted equal, say, to y = 4, for the stay-with-a-winner strategy arm is optimal at this stage since {α, β }, {α 2,y =4, β 2,y =4 }; A 3 =.6, greater than {α, β }, {α 2, β 2 }; A 3 = At this stage arm would not be optimal for the switch-on-a-loser strategy, since {α, β }, {α 2,y=4, β 2,y=4 }; A 3 <. Suppose now a censored observation from arm equal, say, to 2 = 5.5 is observed, for which {α, 2=5.5, β, 2=5.5 }, {α 2,y =4, β 2,y =4 }; A 2 3 =, greater than {α, β }, {α 2,y=4, β 2,y=4 }; A 3. Therefore in the last stage arm is again optimal for the stay-with-a-winner strategy. Furthermore, in the bottom-left plot we report {α,, β, }, {α 2, β 2 }; A 3 2

21 Figure 2: Top-left: Two distributions sampled from the beta-stacy process of Walker and Muliere 997, with algorithm A in Al Labadi and Zarepour 23. The black line is from arm, the red one from arm 2, with parameters β s = ep s/, dα s = ep s//ds, β 2 s = ep s/2 and dα 2 s = ep s/2/2ds, for s,. Top-right: in blue {α,, β, }, {α 2, β 2 }; A 3, in green {α, β }, {α 2, β 2 }; A 3. The value is also highlighted in red. The intersections determine the break-even observations of the two strategies outlined in the tet. Bottom-left: in blue {α,, β, }, {α 2, β 2 }; A 3 for the beta-stacy bandit problem. The green line represents the corresponding quantity for the Dirichlet bandit that ignores the censorship. Bottom-right: Error probability of the stay-with-the winner strategy implied by the Dirichlet bandit problem, when censorship is neglected, as function of the first observation from arm. Sampled Distributions Beta Stacy Break Even Points F Dirichlet vs Beta Stacy Dirichlet Errors Error Probability when, is right-censored, and the epected advantage of arm if the data were incorrectly supposed to be eact. In other words, we compare the beta-stacy bandit problem with the corresponding Dirichlet bandit problem that ignores the censorship, to quantify the difference between the two in the simulation setting and highlight the relevance of properly accounting for right-censored data. There is a range of values from 7.97 to 6.95 at which the Dirichlet bandit problem would take the wrong strategy, since {α,, β, }, {α 2, β 2 }; A 3 would be of opposite sign, relative to the corresponding beta-stacy quantity. The break-even for the Dirichlet bandit is too low, since it judges the observations to be eact and therefore does not account for the increased chance of observing higher values in the future. If we repeat the eperiment 5 times for each value of from to 3, we can compute the probability for the Dirichlet bandit being in error in the choice of the optimal arm, after the observation of from arm. The probability is reported in the bottom-right plot of Figure 2. 2

22 6 Conclusions and further directions We have studied Bayesian nonparametric bandit problems with right-censored data, where two independent arms are generated by beta-stacy processes. The problem etends the one-armed and two-armed Dirichlet bandit problem of Clayton and Berry 985 and Chattopadhyay 994 since the beta-stacy process simplifies to the Dirichlet process for a special choice of the process parameters and in the absence of censored observations. We have shown some properties of the epected advantage of the first arm over the second arm, and the eistence of stay-with-a-winner and switch-on-a-loser optimal strategies, under non-restrictive constraints on the process parameters. Our framework can be further etended in several directions to different bandit strategies and multi-armed problems. First, the common formulation of the Bernoulli bandit can be replicated through the choice of Bernoulli base measures, centered on success probabilities that are learnt as observations are collected. Second, semi-uniform strategies with greedy behaviour can be addressed: epsilon-greedy and epsilon-first strategies Watkins 989; Sutton and Barto 998 that dedicate a proportion of phases to, respectively, random and purely eploratory phases, can be derived by randomizing the reinforcement learning mechanism of the arms parameters Muliere et al. 26; epsilon-decreasing and VBDE strategies Cesa-Bianchi and Fisher 998; Tokic 2 would require a beta-stacy parameter update mechanism depedent on the number of steps or on the values etracted from the arms. Third, probability matching strategies such as Thompson sampling Thompson 933 are easy to implement in our framework: in multi-armed bandit problems, beta- Stacy random distributions can be sampled from their posterior distributions at each stage, and, conditional to the sample, some reward related to the single arm may be computed and all rewards compared. Fourth, the etension to multi-armed contetual bandits Langford and Zhang 28 can be implemented by introducing dependence of the arm parameters on eternal regressors, or introducing dependence between Bayesian nonparametric arms through partial echangeability de Finetti 938, 959, for instance with the miture of Dirichlet processes of Antoniak 974, the Bivariate Dirichlet process of Walker and Muliere 23 or the Bivariate beta-stacy process of Muliere et al. 27. In this direction, Battiston et al. 26 adopt hierarchical Poisson- Dirichlet processes in multi-armed bandit problems. Fifth, the sequential nature of the Bayesian framework and the fleibility of nonparametric priors permit to handle more general cases of nonstationary bandit problems Garivier and Moulines 28, where the underlying base distribution of the beta-stacy processes can change during play: in this contet all past observations simply affect the modified priors of the new models. Finally, it could be of interest to apply the proposed framework to study the results in Gittins 979 and Gittins et al. 2 in multi-armed Bayesian nonparametric problems: in this case, Dynamic Allocation Indices may be computed from the simulations of the stochastic processes related to each arm. Acknoledgements We thank two anonymous referees, the Editor and the Associate Editor, for their detailed and insightful comments that contributed significanty to improve the paper. 22

23 References Al Labadi, L. and Zarepour, M. 23. A Bayesian nonparametric goodness of fit test for right censored data based on approimate samples from the beta-stacy process. The Canadian Journal of Statistics, 4: Antoniak, C Mitures of Dirichlet Processes with Applications to Bayesian Nonparametric Problems. Annals of Statistics, 2: Battiston, M., Favaro, S., and Teh, Y. 26. Multi-armed bandit for species discovery: a Bayesian nonparametric approach. Journal of the American Statistical Association. Forthcoming. Bellman, R A Problem in the Sequential Design of Eperiments. Sankhya, 6: Berry, D A Bernoulli Two-Armed Bandit. The Annals of Mathematical Statistics, 43: Berry, D. and Fristedt, B Bernoulli One-Armed Bandits - Arbitrary Discount Sequences. The Annals of Statistics, 7:86 5. Berry, D. and Fristedt, B Bandit Problems: Sequential Allocation of Eperiments. Chapman and Hall, New York. Bradt, R., Johnson, S., and Karlin, S On Sequential Designs for Maimizing the Sum of n Observations. The Annals of Mathematical Statistics, 33: Cesa-Bianchi, N. and Fisher, P Finite-time regret bounds for the multiarmed bandit problem. In Proceedings of the 5th International Conference on Machine Learning, pages 8, Morgan Kaufmann, San Francisco, CA. Chattopadhyay, M Two-Armed Dirichlet Bandits With Discounting. The Annals of Statistics, 22: Chernoff, H Optimal Stochastic Control. Sankhya, 3: Clayton, M. and Berry, D Bayesian Nonparametric Bandits. The Annals of Statistics, 3: De Blasi, P. 27. Simulation of the Beta-Stacy Process with Application to Analysis of Censored Data. In Encyclopedia of Statistics in Quality and Reliability, F. Ruggeri, R.S. Kennt and F. Faltin, pages 84 89, John Wiley & Sons Ltd., Chichester, UK. de Finetti, B La prévision: ses lois logiques, ses sources subjectives. Annales de l institut Henri Poincaré, 7: 68. de Finetti, B Sur la condition d equivalence partielle, VI Colloque Geneve. Acta. Sci. Ind. Paris, 739:5 8. de Finetti, B La probabilitá e la statistica nei rapporti con l induzione, secondo i diversi punti di vista. Atti corso CIME su Induzione e Statistica, Varenna. Doksum, K Tailfree and neutral random probabilities and their posterior distributions. Annals of Probability, 2:83 2. Ferguson, T A Bayesian Analysis of Some Nonparametric Problems. The Annals of Statistics, : Ferguson, T Prior Distributions on Spaces of Probability Measures. The Annals of Statistics, 2:

On Simulations form the Two-Parameter. Poisson-Dirichlet Process and the Normalized. Inverse-Gaussian Process

On Simulations form the Two-Parameter. Poisson-Dirichlet Process and the Normalized. Inverse-Gaussian Process On Simulations form the Two-Parameter arxiv:1209.5359v1 [stat.co] 24 Sep 2012 Poisson-Dirichlet Process and the Normalized Inverse-Gaussian Process Luai Al Labadi and Mahmoud Zarepour May 8, 2018 ABSTRACT

More information

On the robustness of a one-period look-ahead policy in multi-armed bandit problems

On the robustness of a one-period look-ahead policy in multi-armed bandit problems Procedia Computer Science Procedia Computer Science 00 (2010) 1 10 On the robustness of a one-period look-ahead policy in multi-armed bandit problems Ilya O. Ryzhov a, Peter Frazier b, Warren B. Powell

More information

On the robustness of a one-period look-ahead policy in multi-armed bandit problems

On the robustness of a one-period look-ahead policy in multi-armed bandit problems Procedia Computer Science 00 (200) (202) 0 635 644 International Conference on Computational Science, ICCS 200 Procedia Computer Science www.elsevier.com/locate/procedia On the robustness of a one-period

More information

Foundations of Nonparametric Bayesian Methods

Foundations of Nonparametric Bayesian Methods 1 / 27 Foundations of Nonparametric Bayesian Methods Part II: Models on the Simplex Peter Orbanz http://mlg.eng.cam.ac.uk/porbanz/npb-tutorial.html 2 / 27 Tutorial Overview Part I: Basics Part II: Models

More information

Bandit models: a tutorial

Bandit models: a tutorial Gdt COS, December 3rd, 2015 Multi-Armed Bandit model: general setting K arms: for a {1,..., K}, (X a,t ) t N is a stochastic process. (unknown distributions) Bandit game: a each round t, an agent chooses

More information

ebay/google short course: Problem set 2

ebay/google short course: Problem set 2 18 Jan 013 ebay/google short course: Problem set 1. (the Echange Parado) You are playing the following game against an opponent, with a referee also taking part. The referee has two envelopes (numbered

More information

Multi-armed bandit models: a tutorial

Multi-armed bandit models: a tutorial Multi-armed bandit models: a tutorial CERMICS seminar, March 30th, 2016 Multi-Armed Bandit model: general setting K arms: for a {1,..., K}, (X a,t ) t N is a stochastic process. (unknown distributions)

More information

Evaluation of multi armed bandit algorithms and empirical algorithm

Evaluation of multi armed bandit algorithms and empirical algorithm Acta Technica 62, No. 2B/2017, 639 656 c 2017 Institute of Thermomechanics CAS, v.v.i. Evaluation of multi armed bandit algorithms and empirical algorithm Zhang Hong 2,3, Cao Xiushan 1, Pu Qiumei 1,4 Abstract.

More information

The knowledge gradient method for multi-armed bandit problems

The knowledge gradient method for multi-armed bandit problems The knowledge gradient method for multi-armed bandit problems Moving beyond inde policies Ilya O. Ryzhov Warren Powell Peter Frazier Department of Operations Research and Financial Engineering Princeton

More information

Analysis of Thompson Sampling for the multi-armed bandit problem

Analysis of Thompson Sampling for the multi-armed bandit problem Analysis of Thompson Sampling for the multi-armed bandit problem Shipra Agrawal Microsoft Research India shipra@microsoft.com avin Goyal Microsoft Research India navingo@microsoft.com Abstract We show

More information

Lecture 4: Lower Bounds (ending); Thompson Sampling

Lecture 4: Lower Bounds (ending); Thompson Sampling CMSC 858G: Bandits, Experts and Games 09/12/16 Lecture 4: Lower Bounds (ending); Thompson Sampling Instructor: Alex Slivkins Scribed by: Guowei Sun,Cheng Jie 1 Lower bounds on regret (ending) Recap from

More information

A class of neutral to the right priors induced by superposition of beta processes

A class of neutral to the right priors induced by superposition of beta processes A class of neutral to the right priors induced by superposition of beta processes Pierpaolo De Blasi Stefano Favaro Pietro Muliere No. 13 December 29 www.carloalberto.org/working_papers 29 by Pierpaolo

More information

14.30 Introduction to Statistical Methods in Economics Spring 2009

14.30 Introduction to Statistical Methods in Economics Spring 2009 MIT OpenCourseWare http://ocw.mit.edu 4.0 Introduction to Statistical Methods in Economics Spring 009 For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.

More information

Grundlagen der Künstlichen Intelligenz

Grundlagen der Künstlichen Intelligenz Grundlagen der Künstlichen Intelligenz Uncertainty & Probabilities & Bandits Daniel Hennes 16.11.2017 (WS 2017/18) University Stuttgart - IPVS - Machine Learning & Robotics 1 Today Uncertainty Probability

More information

Dependent hierarchical processes for multi armed bandits

Dependent hierarchical processes for multi armed bandits Dependent hierarchical processes for multi armed bandits Federico Camerlenghi University of Bologna, BIDSA & Collegio Carlo Alberto First Italian meeting on Probability and Mathematical Statistics, Torino

More information

Taylor Series and Asymptotic Expansions

Taylor Series and Asymptotic Expansions Taylor Series and Asymptotic Epansions The importance of power series as a convenient representation, as an approimation tool, as a tool for solving differential equations and so on, is pretty obvious.

More information

Basics of reinforcement learning

Basics of reinforcement learning Basics of reinforcement learning Lucian Buşoniu TMLSS, 20 July 2018 Main idea of reinforcement learning (RL) Learn a sequential decision policy to optimize the cumulative performance of an unknown system

More information

MDP Preliminaries. Nan Jiang. February 10, 2019

MDP Preliminaries. Nan Jiang. February 10, 2019 MDP Preliminaries Nan Jiang February 10, 2019 1 Markov Decision Processes In reinforcement learning, the interactions between the agent and the environment are often described by a Markov Decision Process

More information

Computer Science, Informatik 4 Communication and Distributed Systems. Simulation. Discrete-Event System Simulation. Dr.

Computer Science, Informatik 4 Communication and Distributed Systems. Simulation. Discrete-Event System Simulation. Dr. Simulation Discrete-Event System Simulation Chapter 4 Statistical Models in Simulation Purpose & Overview The world the model-builder sees is probabilistic rather than deterministic. Some statistical model

More information

A COLLOCATION METHOD FOR THE SEQUENTIAL TESTING OF A GAMMA PROCESS

A COLLOCATION METHOD FOR THE SEQUENTIAL TESTING OF A GAMMA PROCESS Statistica Sinica 25 2015), 1527-1546 doi:http://d.doi.org/10.5705/ss.2013.155 A COLLOCATION METHOD FOR THE SEQUENTIAL TESTING OF A GAMMA PROCESS B. Buonaguidi and P. Muliere Bocconi University Abstract:

More information

Efficiency and converse reduction-consistency in collective choice. Abstract. Department of Applied Mathematics, National Dong Hwa University

Efficiency and converse reduction-consistency in collective choice. Abstract. Department of Applied Mathematics, National Dong Hwa University Efficiency and converse reduction-consistency in collective choice Yan-An Hwang Department of Applied Mathematics, National Dong Hwa University Chun-Hsien Yeh Department of Economics, National Central

More information

Bayesian Methods for Machine Learning

Bayesian Methods for Machine Learning Bayesian Methods for Machine Learning CS 584: Big Data Analytics Material adapted from Radford Neal s tutorial (http://ftp.cs.utoronto.ca/pub/radford/bayes-tut.pdf), Zoubin Ghahramni (http://hunch.net/~coms-4771/zoubin_ghahramani_bayesian_learning.pdf),

More information

Stat 260/CS Learning in Sequential Decision Problems. Peter Bartlett

Stat 260/CS Learning in Sequential Decision Problems. Peter Bartlett Stat 260/CS 294-102. Learning in Sequential Decision Problems. Peter Bartlett 1. Multi-armed bandit algorithms. Concentration inequalities. P(X ǫ) exp( ψ (ǫ))). Cumulant generating function bounds. Hoeffding

More information

A Maximally Controllable Indifference-Zone Policy

A Maximally Controllable Indifference-Zone Policy A Maimally Controllable Indifference-Zone Policy Peter I. Frazier Operations Research and Information Engineering Cornell University Ithaca, NY 14853, USA February 21, 2011 Abstract We consider the indifference-zone

More information

Math 180A. Lecture 16 Friday May 7 th. Expectation. Recall the three main probability density functions so far (1) Uniform (2) Exponential.

Math 180A. Lecture 16 Friday May 7 th. Expectation. Recall the three main probability density functions so far (1) Uniform (2) Exponential. Math 8A Lecture 6 Friday May 7 th Epectation Recall the three main probability density functions so far () Uniform () Eponential (3) Power Law e, ( ), Math 8A Lecture 6 Friday May 7 th Epectation Eample

More information

Introduction to Probability Theory for Graduate Economics Fall 2008

Introduction to Probability Theory for Graduate Economics Fall 2008 Introduction to Probability Theory for Graduate Economics Fall 008 Yiğit Sağlam October 10, 008 CHAPTER - RANDOM VARIABLES AND EXPECTATION 1 1 Random Variables A random variable (RV) is a real-valued function

More information

Chapter 2. Random Variable. Define single random variables in terms of their PDF and CDF, and calculate moments such as the mean and variance.

Chapter 2. Random Variable. Define single random variables in terms of their PDF and CDF, and calculate moments such as the mean and variance. Chapter 2 Random Variable CLO2 Define single random variables in terms of their PDF and CDF, and calculate moments such as the mean and variance. 1 1. Introduction In Chapter 1, we introduced the concept

More information

System Simulation Part II: Mathematical and Statistical Models Chapter 5: Statistical Models

System Simulation Part II: Mathematical and Statistical Models Chapter 5: Statistical Models System Simulation Part II: Mathematical and Statistical Models Chapter 5: Statistical Models Fatih Cavdur fatihcavdur@uludag.edu.tr March 29, 2014 Introduction Introduction The world of the model-builder

More information

Nonparametric Bayes Estimator of Survival Function for Right-Censoring and Left-Truncation Data

Nonparametric Bayes Estimator of Survival Function for Right-Censoring and Left-Truncation Data Nonparametric Bayes Estimator of Survival Function for Right-Censoring and Left-Truncation Data Mai Zhou and Julia Luan Department of Statistics University of Kentucky Lexington, KY 40506-0027, U.S.A.

More information

Chapter 5. Statistical Models in Simulations 5.1. Prof. Dr. Mesut Güneş Ch. 5 Statistical Models in Simulations

Chapter 5. Statistical Models in Simulations 5.1. Prof. Dr. Mesut Güneş Ch. 5 Statistical Models in Simulations Chapter 5 Statistical Models in Simulations 5.1 Contents Basic Probability Theory Concepts Discrete Distributions Continuous Distributions Poisson Process Empirical Distributions Useful Statistical Models

More information

Power EP. Thomas Minka Microsoft Research Ltd., Cambridge, UK MSR-TR , October 4, Abstract

Power EP. Thomas Minka Microsoft Research Ltd., Cambridge, UK MSR-TR , October 4, Abstract Power EP Thomas Minka Microsoft Research Ltd., Cambridge, UK MSR-TR-2004-149, October 4, 2004 Abstract This note describes power EP, an etension of Epectation Propagation (EP) that makes the computations

More information

MARKOV DECISION PROCESSES (MDP) AND REINFORCEMENT LEARNING (RL) Versione originale delle slide fornita dal Prof. Francesco Lo Presti

MARKOV DECISION PROCESSES (MDP) AND REINFORCEMENT LEARNING (RL) Versione originale delle slide fornita dal Prof. Francesco Lo Presti 1 MARKOV DECISION PROCESSES (MDP) AND REINFORCEMENT LEARNING (RL) Versione originale delle slide fornita dal Prof. Francesco Lo Presti Historical background 2 Original motivation: animal learning Early

More information

Sequential Procedure for Testing Hypothesis about Mean of Latent Gaussian Process

Sequential Procedure for Testing Hypothesis about Mean of Latent Gaussian Process Applied Mathematical Sciences, Vol. 4, 2010, no. 62, 3083-3093 Sequential Procedure for Testing Hypothesis about Mean of Latent Gaussian Process Julia Bondarenko Helmut-Schmidt University Hamburg University

More information

The Multi-Armed Bandit Problem

The Multi-Armed Bandit Problem Università degli Studi di Milano The bandit problem [Robbins, 1952]... K slot machines Rewards X i,1, X i,2,... of machine i are i.i.d. [0, 1]-valued random variables An allocation policy prescribes which

More information

1: PROBABILITY REVIEW

1: PROBABILITY REVIEW 1: PROBABILITY REVIEW Marek Rutkowski School of Mathematics and Statistics University of Sydney Semester 2, 2016 M. Rutkowski (USydney) Slides 1: Probability Review 1 / 56 Outline We will review the following

More information

Online Supplement to Are Call Center and Hospital Arrivals Well Modeled by Nonhomogeneous Poisson Processes?

Online Supplement to Are Call Center and Hospital Arrivals Well Modeled by Nonhomogeneous Poisson Processes? Online Supplement to Are Call Center and Hospital Arrivals Well Modeled by Nonhomogeneous Poisson Processes? Song-Hee Kim and Ward Whitt Industrial Engineering and Operations Research Columbia University

More information

arxiv: v1 [math.st] 29 Nov 2018

arxiv: v1 [math.st] 29 Nov 2018 Reinforced urns and the subdistribution beta-stacy process prior for competing risks analysis Andrea Arfé Stefano Peluso Pietro Muliere Accepted manuscript, Scandinavian Journal of Statistics, 2018 arxiv:1811.12304v1

More information

The Multi-Armed Bandit Problem

The Multi-Armed Bandit Problem The Multi-Armed Bandit Problem Electrical and Computer Engineering December 7, 2013 Outline 1 2 Mathematical 3 Algorithm Upper Confidence Bound Algorithm A/B Testing Exploration vs. Exploitation Scientist

More information

Bayesian optimization for automatic machine learning

Bayesian optimization for automatic machine learning Bayesian optimization for automatic machine learning Matthew W. Ho man based o work with J. M. Hernández-Lobato, M. Gelbart, B. Shahriari, and others! University of Cambridge July 11, 2015 Black-bo optimization

More information

OPTIMAL LEARNING AND APPROXIMATE DYNAMIC PROGRAMMING

OPTIMAL LEARNING AND APPROXIMATE DYNAMIC PROGRAMMING CHAPTER 18 OPTIMAL LEARNING AND APPROXIMATE DYNAMIC PROGRAMMING Warren B. Powell and Ilya O. Ryzhov Princeton University, University of Maryland 18.1 INTRODUCTION Approimate dynamic programming (ADP) has

More information

The information complexity of sequential resource allocation

The information complexity of sequential resource allocation The information complexity of sequential resource allocation Emilie Kaufmann, joint work with Olivier Cappé, Aurélien Garivier and Shivaram Kalyanakrishan SMILE Seminar, ENS, June 8th, 205 Sequential allocation

More information

The Multi-Arm Bandit Framework

The Multi-Arm Bandit Framework The Multi-Arm Bandit Framework A. LAZARIC (SequeL Team @INRIA-Lille) ENS Cachan - Master 2 MVA SequeL INRIA Lille MVA-RL Course In This Lecture A. LAZARIC Reinforcement Learning Algorithms Oct 29th, 2013-2/94

More information

On Bayesian bandit algorithms

On Bayesian bandit algorithms On Bayesian bandit algorithms Emilie Kaufmann joint work with Olivier Cappé, Aurélien Garivier, Nathaniel Korda and Rémi Munos July 1st, 2012 Emilie Kaufmann (Telecom ParisTech) On Bayesian bandit algorithms

More information

EE5319R: Problem Set 3 Assigned: 24/08/16, Due: 31/08/16

EE5319R: Problem Set 3 Assigned: 24/08/16, Due: 31/08/16 EE539R: Problem Set 3 Assigned: 24/08/6, Due: 3/08/6. Cover and Thomas: Problem 2.30 (Maimum Entropy): Solution: We are required to maimize H(P X ) over all distributions P X on the non-negative integers

More information

SUPERPOSITION OF BETA PROCESSES

SUPERPOSITION OF BETA PROCESSES SUPERPOSITION OF BETA PROCESSES Pierpaolo De Blasi 1, Stefano Favaro 1 and Pietro Muliere 2 1 Dipartiento di Statistica e Mateatica Applicata, Università degli Studi di Torino Corso Unione Sovietica 218/bis,

More information

A Rothschild-Stiglitz approach to Bayesian persuasion

A Rothschild-Stiglitz approach to Bayesian persuasion A Rothschild-Stiglitz approach to Bayesian persuasion Matthew Gentzkow and Emir Kamenica Stanford University and University of Chicago December 2015 Abstract Rothschild and Stiglitz (1970) represent random

More information

Finite-time Analysis of the Multiarmed Bandit Problem*

Finite-time Analysis of the Multiarmed Bandit Problem* Machine Learning, 47, 35 56, 00 c 00 Kluwer Academic Publishers. Manufactured in The Netherlands. Finite-time Analysis of the Multiarmed Bandit Problem* PETER AUER University of Technology Graz, A-8010

More information

Truncation error of a superposed gamma process in a decreasing order representation

Truncation error of a superposed gamma process in a decreasing order representation Truncation error of a superposed gamma process in a decreasing order representation Julyan Arbel Inria Grenoble, Université Grenoble Alpes julyan.arbel@inria.fr Igor Prünster Bocconi University, Milan

More information

Chapter 2 SOME ANALYTICAL TOOLS USED IN THE THESIS

Chapter 2 SOME ANALYTICAL TOOLS USED IN THE THESIS Chapter 2 SOME ANALYTICAL TOOLS USED IN THE THESIS 63 2.1 Introduction In this chapter we describe the analytical tools used in this thesis. They are Markov Decision Processes(MDP), Markov Renewal process

More information

Hierarchical Knowledge Gradient for Sequential Sampling

Hierarchical Knowledge Gradient for Sequential Sampling Journal of Machine Learning Research () Submitted ; Published Hierarchical Knowledge Gradient for Sequential Sampling Martijn R.K. Mes Department of Operational Methods for Production and Logistics University

More information

Mobile Robot Localization

Mobile Robot Localization Mobile Robot Localization 1 The Problem of Robot Localization Given a map of the environment, how can a robot determine its pose (planar coordinates + orientation)? Two sources of uncertainty: - observations

More information

On the Complexity of Best Arm Identification in Multi-Armed Bandit Models

On the Complexity of Best Arm Identification in Multi-Armed Bandit Models On the Complexity of Best Arm Identification in Multi-Armed Bandit Models Aurélien Garivier Institut de Mathématiques de Toulouse Information Theory, Learning and Big Data Simons Institute, Berkeley, March

More information

STATS 200: Introduction to Statistical Inference. Lecture 29: Course review

STATS 200: Introduction to Statistical Inference. Lecture 29: Course review STATS 200: Introduction to Statistical Inference Lecture 29: Course review Course review We started in Lecture 1 with a fundamental assumption: Data is a realization of a random process. The goal throughout

More information

State Space and Hidden Markov Models

State Space and Hidden Markov Models State Space and Hidden Markov Models Kunsch H.R. State Space and Hidden Markov Models. ETH- Zurich Zurich; Aliaksandr Hubin Oslo 2014 Contents 1. Introduction 2. Markov Chains 3. Hidden Markov and State

More information

Taylor Series and Series Convergence (Online)

Taylor Series and Series Convergence (Online) 7in 0in Felder c02_online.te V3 - February 9, 205 9:5 A.M. Page CHAPTER 2 Taylor Series and Series Convergence (Online) 2.8 Asymptotic Epansions In introductory calculus classes the statement this series

More information

Two-stage Adaptive Randomization for Delayed Response in Clinical Trials

Two-stage Adaptive Randomization for Delayed Response in Clinical Trials Two-stage Adaptive Randomization for Delayed Response in Clinical Trials Guosheng Yin Department of Statistics and Actuarial Science The University of Hong Kong Joint work with J. Xu PSI and RSS Journal

More information

Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems, Part I. Sébastien Bubeck Theory Group

Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems, Part I. Sébastien Bubeck Theory Group Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems, Part I Sébastien Bubeck Theory Group i.i.d. multi-armed bandit, Robbins [1952] i.i.d. multi-armed bandit, Robbins [1952] Known

More information

9 - Markov processes and Burt & Allison 1963 AGEC

9 - Markov processes and Burt & Allison 1963 AGEC This document was generated at 8:37 PM on Saturday, March 17, 2018 Copyright 2018 Richard T. Woodward 9 - Markov processes and Burt & Allison 1963 AGEC 642-2018 I. What is a Markov Chain? A Markov chain

More information

Central Limit Theorem and the Law of Large Numbers Class 6, Jeremy Orloff and Jonathan Bloom

Central Limit Theorem and the Law of Large Numbers Class 6, Jeremy Orloff and Jonathan Bloom Central Limit Theorem and the Law of Large Numbers Class 6, 8.5 Jeremy Orloff and Jonathan Bloom Learning Goals. Understand the statement of the law of large numbers. 2. Understand the statement of the

More information

Lecture 4 January 23

Lecture 4 January 23 STAT 263/363: Experimental Design Winter 2016/17 Lecture 4 January 23 Lecturer: Art B. Owen Scribe: Zachary del Rosario 4.1 Bandits Bandits are a form of online (adaptive) experiments; i.e. samples are

More information

Statistical Geometry Processing Winter Semester 2011/2012

Statistical Geometry Processing Winter Semester 2011/2012 Statistical Geometry Processing Winter Semester 2011/2012 Linear Algebra, Function Spaces & Inverse Problems Vector and Function Spaces 3 Vectors vectors are arrows in space classically: 2 or 3 dim. Euclidian

More information

Asymptotics for posterior hazards

Asymptotics for posterior hazards Asymptotics for posterior hazards Pierpaolo De Blasi University of Turin 10th August 2007, BNR Workshop, Isaac Newton Intitute, Cambridge, UK Joint work with Giovanni Peccati (Université Paris VI) and

More information

SOME MEMORYLESS BANDIT POLICIES

SOME MEMORYLESS BANDIT POLICIES J. Appl. Prob. 40, 250 256 (2003) Printed in Israel Applied Probability Trust 2003 SOME MEMORYLESS BANDIT POLICIES EROL A. PEKÖZ, Boston University Abstract We consider a multiarmed bit problem, where

More information

Reinforced urns and the subdistribution beta-stacy process prior for competing risks analysis

Reinforced urns and the subdistribution beta-stacy process prior for competing risks analysis Reinforced urns and the subdistribution beta-stacy process prior for competing risks analysis Andrea Arfè 1 Stefano Peluso 2 Pietro Muliere 1 1 Università Commerciale Luigi Bocconi 2 Università Cattolica

More information

Bandit Problems With Infinitely Many Arms

Bandit Problems With Infinitely Many Arms University of Pennsylvania ScholarlyCommons Statistics Papers Wharton Faculty Research 997 Bandit Problems With Infinitely Many Arms Donald A. Berry Robert W. Chen Alan Zame David C. Heath Larry A. Shepp

More information

Two generic principles in modern bandits: the optimistic principle and Thompson sampling

Two generic principles in modern bandits: the optimistic principle and Thompson sampling Two generic principles in modern bandits: the optimistic principle and Thompson sampling Rémi Munos INRIA Lille, France CSML Lunch Seminars, September 12, 2014 Outline Two principles: The optimistic principle

More information

Competitive Analysis of M/GI/1 Queueing Policies

Competitive Analysis of M/GI/1 Queueing Policies Competitive Analysis of M/GI/1 Queueing Policies Nikhil Bansal Adam Wierman Abstract We propose a framework for comparing the performance of two queueing policies. Our framework is motivated by the notion

More information

Bayesian Nonparametrics

Bayesian Nonparametrics Bayesian Nonparametrics Lorenzo Rosasco 9.520 Class 18 April 11, 2011 About this class Goal To give an overview of some of the basic concepts in Bayesian Nonparametrics. In particular, to discuss Dirichelet

More information

CSCI-6971 Lecture Notes: Probability theory

CSCI-6971 Lecture Notes: Probability theory CSCI-6971 Lecture Notes: Probability theory Kristopher R. Beevers Department of Computer Science Rensselaer Polytechnic Institute beevek@cs.rpi.edu January 31, 2006 1 Properties of probabilities Let, A,

More information

Advanced Machine Learning

Advanced Machine Learning Advanced Machine Learning Bandit Problems MEHRYAR MOHRI MOHRI@ COURANT INSTITUTE & GOOGLE RESEARCH. Multi-Armed Bandit Problem Problem: which arm of a K-slot machine should a gambler pull to maximize his

More information

2 Statistical Estimation: Basic Concepts

2 Statistical Estimation: Basic Concepts Technion Israel Institute of Technology, Department of Electrical Engineering Estimation and Identification in Dynamical Systems (048825) Lecture Notes, Fall 2009, Prof. N. Shimkin 2 Statistical Estimation:

More information

A parametric approach to Bayesian optimization with pairwise comparisons

A parametric approach to Bayesian optimization with pairwise comparisons A parametric approach to Bayesian optimization with pairwise comparisons Marco Co Eindhoven University of Technology m.g.h.co@tue.nl Bert de Vries Eindhoven University of Technology and GN Hearing bdevries@ieee.org

More information

Gaussian Process Vine Copulas for Multivariate Dependence

Gaussian Process Vine Copulas for Multivariate Dependence Gaussian Process Vine Copulas for Multivariate Dependence José Miguel Hernández-Lobato 1,2 joint work with David López-Paz 2,3 and Zoubin Ghahramani 1 1 Department of Engineering, Cambridge University,

More information

Analysis of Thompson Sampling for the multi-armed bandit problem

Analysis of Thompson Sampling for the multi-armed bandit problem Analysis of Thompson Sampling for the multi-armed bandit problem Shipra Agrawal Microsoft Research India shipra@microsoft.com Navin Goyal Microsoft Research India navingo@microsoft.com Abstract The multi-armed

More information

Two optimization problems in a stochastic bandit model

Two optimization problems in a stochastic bandit model Two optimization problems in a stochastic bandit model Emilie Kaufmann joint work with Olivier Cappé, Aurélien Garivier and Shivaram Kalyanakrishnan Journées MAS 204, Toulouse Outline From stochastic optimization

More information

Functional Limit theorems for the quadratic variation of a continuous time random walk and for certain stochastic integrals

Functional Limit theorems for the quadratic variation of a continuous time random walk and for certain stochastic integrals Functional Limit theorems for the quadratic variation of a continuous time random walk and for certain stochastic integrals Noèlia Viles Cuadros BCAM- Basque Center of Applied Mathematics with Prof. Enrico

More information

Prof. Dr. Roland Füss Lecture Series in Applied Econometrics Summer Term Introduction to Time Series Analysis

Prof. Dr. Roland Füss Lecture Series in Applied Econometrics Summer Term Introduction to Time Series Analysis Introduction to Time Series Analysis 1 Contents: I. Basics of Time Series Analysis... 4 I.1 Stationarity... 5 I.2 Autocorrelation Function... 9 I.3 Partial Autocorrelation Function (PACF)... 14 I.4 Transformation

More information

COMPUTING OPTIMAL SEQUENTIAL ALLOCATION RULES IN CLINICAL TRIALS* Michael N. Katehakis. State University of New York at Stony Brook. and.

COMPUTING OPTIMAL SEQUENTIAL ALLOCATION RULES IN CLINICAL TRIALS* Michael N. Katehakis. State University of New York at Stony Brook. and. COMPUTING OPTIMAL SEQUENTIAL ALLOCATION RULES IN CLINICAL TRIALS* Michael N. Katehakis State University of New York at Stony Brook and Cyrus Derman Columbia University The problem of assigning one of several

More information

Multi-Armed Bandit: Learning in Dynamic Systems with Unknown Models

Multi-Armed Bandit: Learning in Dynamic Systems with Unknown Models c Qing Zhao, UC Davis. Talk at Xidian Univ., September, 2011. 1 Multi-Armed Bandit: Learning in Dynamic Systems with Unknown Models Qing Zhao Department of Electrical and Computer Engineering University

More information

, and rewards and transition matrices as shown below:

, and rewards and transition matrices as shown below: CSE 50a. Assignment 7 Out: Tue Nov Due: Thu Dec Reading: Sutton & Barto, Chapters -. 7. Policy improvement Consider the Markov decision process (MDP) with two states s {0, }, two actions a {0, }, discount

More information

Stochastic Dominance-based Rough Set Model for Ordinal Classification

Stochastic Dominance-based Rough Set Model for Ordinal Classification Stochastic Dominance-based Rough Set Model for Ordinal Classification Wojciech Kotłowski wkotlowski@cs.put.poznan.pl Institute of Computing Science, Poznań University of Technology, Poznań, 60-965, Poland

More information

Truncation error of a superposed gamma process in a decreasing order representation

Truncation error of a superposed gamma process in a decreasing order representation Truncation error of a superposed gamma process in a decreasing order representation B julyan.arbel@inria.fr Í www.julyanarbel.com Inria, Mistis, Grenoble, France Joint work with Igor Pru nster (Bocconi

More information

Lecture Notes 2 Random Variables. Discrete Random Variables: Probability mass function (pmf)

Lecture Notes 2 Random Variables. Discrete Random Variables: Probability mass function (pmf) Lecture Notes 2 Random Variables Definition Discrete Random Variables: Probability mass function (pmf) Continuous Random Variables: Probability density function (pdf) Mean and Variance Cumulative Distribution

More information

0.1. Linear transformations

0.1. Linear transformations Suggestions for midterm review #3 The repetitoria are usually not complete; I am merely bringing up the points that many people didn t now on the recitations Linear transformations The following mostly

More information

Random variables. DS GA 1002 Probability and Statistics for Data Science.

Random variables. DS GA 1002 Probability and Statistics for Data Science. Random variables DS GA 1002 Probability and Statistics for Data Science http://www.cims.nyu.edu/~cfgranda/pages/dsga1002_fall17 Carlos Fernandez-Granda Motivation Random variables model numerical quantities

More information

Horizontal asymptotes

Horizontal asymptotes Roberto s Notes on Differential Calculus Chapter 1: Limits and continuity Section 5 Limits at infinity and Horizontal asymptotes What you need to know already: The concept, notation and terminology of

More information

Hypothesis Testing. Part I. James J. Heckman University of Chicago. Econ 312 This draft, April 20, 2006

Hypothesis Testing. Part I. James J. Heckman University of Chicago. Econ 312 This draft, April 20, 2006 Hypothesis Testing Part I James J. Heckman University of Chicago Econ 312 This draft, April 20, 2006 1 1 A Brief Review of Hypothesis Testing and Its Uses values and pure significance tests (R.A. Fisher)

More information

Carnegie Mellon University Forbes Ave. Pittsburgh, PA 15213, USA. fmunos, leemon, V (x)ln + max. cost functional [3].

Carnegie Mellon University Forbes Ave. Pittsburgh, PA 15213, USA. fmunos, leemon, V (x)ln + max. cost functional [3]. Gradient Descent Approaches to Neural-Net-Based Solutions of the Hamilton-Jacobi-Bellman Equation Remi Munos, Leemon C. Baird and Andrew W. Moore Robotics Institute and Computer Science Department, Carnegie

More information

Restless Bandit Index Policies for Solving Constrained Sequential Estimation Problems

Restless Bandit Index Policies for Solving Constrained Sequential Estimation Problems Restless Bandit Index Policies for Solving Constrained Sequential Estimation Problems Sofía S. Villar Postdoctoral Fellow Basque Center for Applied Mathemathics (BCAM) Lancaster University November 5th,

More information

arxiv: v1 [cs.lg] 15 Oct 2014

arxiv: v1 [cs.lg] 15 Oct 2014 THOMPSON SAMPLING WITH THE ONLINE BOOTSTRAP By Dean Eckles and Maurits Kaptein Facebook, Inc., and Radboud University, Nijmegen arxiv:141.49v1 [cs.lg] 15 Oct 214 Thompson sampling provides a solution to

More information

On Consistency of Nonparametric Normal Mixtures for Bayesian Density Estimation

On Consistency of Nonparametric Normal Mixtures for Bayesian Density Estimation On Consistency of Nonparametric Normal Mixtures for Bayesian Density Estimation Antonio LIJOI, Igor PRÜNSTER, andstepheng.walker The past decade has seen a remarkable development in the area of Bayesian

More information

Pubh 8482: Sequential Analysis

Pubh 8482: Sequential Analysis Pubh 8482: Sequential Analysis Joseph S. Koopmeiners Division of Biostatistics University of Minnesota Week 10 Class Summary Last time... We began our discussion of adaptive clinical trials Specifically,

More information

Models of collective inference

Models of collective inference Models of collective inference Laurent Massoulié (Microsoft Research-Inria Joint Centre) Mesrob I. Ohannessian (University of California, San Diego) Alexandre Proutière (KTH Royal Institute of Technology)

More information

Expression arrays, normalization, and error models

Expression arrays, normalization, and error models 1 Epression arrays, normalization, and error models There are a number of different array technologies available for measuring mrna transcript levels in cell populations, from spotted cdna arrays to in

More information

A Rothschild-Stiglitz approach to Bayesian persuasion

A Rothschild-Stiglitz approach to Bayesian persuasion A Rothschild-Stiglitz approach to Bayesian persuasion Matthew Gentzkow and Emir Kamenica Stanford University and University of Chicago January 2016 Consider a situation where one person, call him Sender,

More information

Procedia Computer Science 00 (2011) 000 6

Procedia Computer Science 00 (2011) 000 6 Procedia Computer Science (211) 6 Procedia Computer Science Complex Adaptive Systems, Volume 1 Cihan H. Dagli, Editor in Chief Conference Organized by Missouri University of Science and Technology 211-

More information

EXACT CONVERGENCE RATE AND LEADING TERM IN THE CENTRAL LIMIT THEOREM FOR U-STATISTICS

EXACT CONVERGENCE RATE AND LEADING TERM IN THE CENTRAL LIMIT THEOREM FOR U-STATISTICS Statistica Sinica 6006, 409-4 EXACT CONVERGENCE RATE AND LEADING TERM IN THE CENTRAL LIMIT THEOREM FOR U-STATISTICS Qiying Wang and Neville C Weber The University of Sydney Abstract: The leading term in

More information

Stochastic Realization of Binary Exchangeable Processes

Stochastic Realization of Binary Exchangeable Processes Stochastic Realization of Binary Exchangeable Processes Lorenzo Finesso and Cecilia Prosdocimi Abstract A discrete time stochastic process is called exchangeable if its n-dimensional distributions are,

More information

Quantum Dynamics. March 10, 2017

Quantum Dynamics. March 10, 2017 Quantum Dynamics March 0, 07 As in classical mechanics, time is a parameter in quantum mechanics. It is distinct from space in the sense that, while we have Hermitian operators, X, for position and therefore

More information

Statistics: Learning models from data

Statistics: Learning models from data DS-GA 1002 Lecture notes 5 October 19, 2015 Statistics: Learning models from data Learning models from data that are assumed to be generated probabilistically from a certain unknown distribution is a crucial

More information