PArtially observable Markov decision processes

Size: px
Start display at page:

Download "PArtially observable Markov decision processes"

Transcription

1 Solving Continuous-State POMDPs via Density Projection Enlu Zhou, Member, IEEE, Michael C. Fu, Fellow, IEEE, and Steven I. Marcus, Fellow, IEEE Abstract Research on numerical solution methods for artially observable Markov decision rocesses (POMDPs) has rimarily focused on finite-state models, and these algorithms do not generally extend to continuous-state POMDPs, due to the infinite dimensionality of the belief sace. In this aer, we develo a comutationally viable and theoretically sound method for solving continuous-state POMDPs by effectively reducing the dimensionality of the belief sace via density rojection. The density rojection technique is also incororated into article filtering to rovide a filtering scheme for online decision making. We rovide an error bound between the value function induced by the olicy obtained by our method and the true value function of the POMDP, and also an error bound between rojection article filtering and exact filtering. Finally, we illustrate the effectiveness of our method through an inventory control roblem. Index Terms Partially observable Markov decision rocesses, article filtering, decision making, density rojection, belief state, value function. I. ITRODUCTIO PArtially observable Markov decision rocesses (POMDPs) model sequential decision making under uncertainty with artially observed state information. At each stage or eriod, an action is taken based on a artial observation of the current state along with the history of observations and actions, and the state transitions robabilistically. The objective is to minimize (or maximize) a cost (or reward) function, where costs (or rewards) are accrued in each stage. Clearly, POMDPs suffer from the same curse of dimensionality as fully observable MDPs, so efficient numerical solution of roblems with large state saces is a challenging research area. A POMDP can be converted to a continuous-state Markov decision rocess (MDP) by introducing the notion of the belief state [6], which is the conditional distribution of the current state given the history of observations and actions. For a finite-state POMDP, the belief sace is finite dimensional (i.e., a robability simlex), whereas for a continuous-state POMDP, the belief sace is an infinite-dimensional sace E. Zhou is with the Deartment of Industrial & Enterrise Systems Engineering, University of Illinois at Urbana-Chamaign, IL, 680 USA enluzhou@illinois.edu. S.I. Marcus is with the Deartment of Electrical and Comuter Engineering, and Institute for Systems Research, University of Maryland, College Park, MD, USA marcus@umd.edu. M.C. Fu is with Robert H. Smith School of Business, and Institute for Systems Research, University of Maryland, College Park, MD, USA mfu@umd.edu. This work has been suorted in art by the ational Science Foundation under Grants DMI and DMI , and by the Air Force Office of Scientific Research under Grant FA of continuous robability distributions. This difference suggests that simle generalizations of many of the finite-state algorithms to continuous-state models are not aroriate or alicable. For examle, discretization of the continuousstate sace may result in a finite-state POMDP of dimension either too large to solve comutationally or not sufficiently recise. Taking another examle, many algorithms for solving finite-state POMDPs (see [7] for a survey) are based on discretization of the finite-dimensional robability simlex; however, it is usually not feasible to discretize an infinitedimensional robability distribution sace. Throughout the aer, when we use the word dimension or dimensional, we refer to the dimension of the belief sace/state. Desite the abundance of algorithms for finite-state POMDPs, the aforementioned difficulty has motivated some researchers to look for efficient algorithms for continuous-state POMDPs [24] [25] [3] [28] [8] [9] [0]. Assuming discrete observation and action saces, Porta et al. [24] showed that the otimal finite-horizon value function is defined by a finite set of α-functions, and model all functions of interest by Gaussian mixtures. In a later work [25], they extended their result and method to continuous observation and action saces using samling strategies. However, the number of Gaussian mixtures in reresenting belief states and α-functions grows exonentially in value iteration as the number of iterations increases. Thrun [3] addressed continuous-state POMDPs using article filtering to simulate the roagation of belief states and reresent the belief states by a finite number of samles. The number of samles determines the dimension of the belief sace, and the dimension could be very high in order to aroximate the belief states closely. Brunskill et al. [0] used weighted sums of Gaussians to aroximate the belief states and value functions in a class of switching state models. Roy [28] and Brooks et al. [8] used sufficient statistics to reduce the dimension of the belief sace, which is often referred to as belief comression in the Artificial Intelligence literature. Roy [28] roosed an augmented MDP (AMDP), characterizing belief states using maximum likelihood state and entroy, which are usually not sufficient statistics excet for a linear Gaussian model. As shown by Roy himself, the algorithm fails in a simle robot navigation roblem, since the two statistics are not sufficient for distinguishing between a unimodal distribution and a bimodal distribution. Brooks et al. [8] roosed a arametric POMDP, reresenting the belief state as a Gaussian distribution so as to convert the POMDP to a roblem of comuting the value function over a twodimensional continuous sace, and using the extended Kalman filter to estimate the transition of the aroximated belief state.

2 2 The restriction to the Gaussian reresentation has the same roblem as the AMDP. The algorithm recently roosed in Brooks and Williams [9] is similar to ours, in that they also aroximate the belief state by a arameterized density and solve the aroximate belief MDP on the arameter sace using Monte Carlo simulation-based methods. However, they did not secify how to comute the arameters excet for Gaussian densities, whereas we exlicitly rovide an analytical way to calculate the arameters for exonential families of densities. Moreover, we develo rigorous theoretical error bounds for our algorithm. There are some other belief comression algorithms designed for finite-state POMDPs, such as value-directed comression [26] and the exonential family rincile comonents analysis (E-PCA) belief comression [29], but they cannot be directly generalized to continuousstate models, since they are based on a fixed set of suort oints. Motivated by the work of [3] [28] and [8], we develo a comutationally tractable algorithm that effectively reduces the dimension of the belief state and has the flexibility to reresent arbitrary belief states, such as multimodal or heavy tail distributions. The idea is to roject the original high/infinitedimensional belief sace to a low-dimensional family of arameterized distributions by minimizing the Kullback-Leibler (KL) divergence between the belief state and that family of distributions. For an exonential family, the minimization of KL divergence can be carried out in analytical form, making the method easy to imlement. The rojected belief MDP can then be solved on the arameter sace by using simulationbased algorithms, or can be further aroximated by a finitestate MDP via a suitable discretization of the arameter sace and thus solved by using standard solution techniques such as value iteration and olicy iteration. Our method can be viewed as a generalization of the AMDP in [28] and the arametric POMDP in [8], which considers only the family of Gaussian distributions. In addition, we rovide theoretical results on the error bounds of the value function and the erformance of the olicy generated by our method with resect to the otimal ones. We also develo a rojection article filter for online filtering and decision making, by incororating the density rojection technique into article filtering. The rojection article filter we roose here is a modification of the rojection article filter in [2]. Unlike in [2] where the redicted conditional density is rojected, we roject the udated conditional density, so as to ensure the rojected belief state remains in the given family of densities. Although seemingly a small modification in the algorithm, we rove under much less restrictive assumtions a similar bound on the error between our rojection article filter and the exact filter. The rest of the aer is organized as follows. Section II describes the formulation of a continuous-state POMDP and its transformation to a belief MDP. Section III describes the density rojection technique, and uses it to develo the rojected belief MDP. Section IV develos the rojection article filter. Section V comutes error bounds for the value function aroximation and the rojection article filter. Section VI discusses scalability and comutational issues of the method, and alies the method to a simulation examle of an inventory control roblem. Section VII concludes the aer. Proofs of all results are contained in the Aendix. II. COTIUOUS-STATE POMDP Consider a discrete-time continuous-state POMDP: x k = f(x k, a k, u k ), k =, 2..., () y k = h(x k, a k, v k ), k =, 2,..., (2) y 0 = h 0 (x 0, v 0 ), where for all k, the state x k is in a continuous state sace S R n x, the action a k is in a finite action sace A R n a, the observation y k is in a continuous observation sace O R ny, the random disturbances u k R n u and v k R n v are sequences of i.i.d. continuous random vectors with known distributions. Assume that {u k } and {v k } are indeendent of each other, and are indeendent of x 0, which follows a distribution 0. Also assume that f(x, a, u) is continuous in x for every a A and u R nu, h(x, a, v) is continuous in x for every a A and v R n v, and h 0 (x, v) is continuous in x for every v R n v. Eqn. () is often referred to as the state equation, and (2) as the observation equation. All the information available to the decision maker at time k can be summarized by means of an information vector I k, which is defined as I k = (y 0, y,..., y k, a 0, a,..., a k ), k =, 2,..., I 0 = y 0. The objective is to find a olicy π consisting of a sequence of functions π = {μ 0, μ,...}, where each function μ k mas the information vector I k onto the action sace A, to minimize the value function { H } J π = lim E H x 0,{u k } H k=0,{v γ k g(x k} H k, μ k (I k )), k=0 k=0 where g : S A R is the one-ste cost function, γ (0, ) is the discount factor, and E x0,{u k } H k=0,{v k} H k=0 denotes the exectation with resect to the joint distribution of x 0, u 0,..., u H, v 0,..., v H. For simlicity, we assume that the above limit exists. The otimal value function is defined by J = min π Π J π, where Π is the set of all admissible olicies. An otimal olicy, denoted by π, is an admissible olicy that achieves J. A stationary olicy is an admissible olicy of the form π = {μ, μ,...}, referred to as the stationary olicy μ for brevity, and its corresonding value function is denoted by J μ. The information vector I k grows as the history exands. The standard aroach to encode historical information is the use of the belief state, which is the conditional robability density of the current state x k given the ast history, i.e., b k (x) (x k = xi k ).

3 3 Given our assumtions on () and (2), b k exists, and can be comuted recursively via Bayes rule: b k (x) = (x k = xi k, a k, y k ) = (y kx k = x, I k, a k )(x k = xi k, a k ) (y k I k, a k ) (y k x k = x, a k ) (x k = xi k, a k, x k )... S (x k I k, a k )dx k (y k x k = x, a k ) (x k = xa k, x k )... S b k (x k )dx k. (3) The third line follows from the Markovian roerty of y k induced by (2), and the fact that the denominator (y k I k, a k ) does not exlicitly deend on x k and k; the fourth line follows from the Markovian roerty of x k induced by (), and the fact that a k is a function of I k. The righthand side of (3) can be exressed in terms of b k, a k and y k. Hence, b k = ψ(b k, a k, y k ), (4) where y k is characterized by the time-homogeneous conditional distribution P Y (y k b k ) that is induced by () and (2), and does not deend on {y 0,..., y k }. A POMDP can be converted to an MDP by conditioning on the information vectors ([6], Chater 5), and the converted MDP is called the belief MDP. The states of the belief MDP are the belief states, which follow the system dynamics (4), where y k can be viewed as the system noise with the distribution P Y. The state sace of the belief MDP is the belief sace, denoted by B, which is the set of all belief states, i.e., a set of robability densities. A olicy π is a sequence of functions π = {μ 0, μ,...}, where each function μ k mas the belief state b k onto the action sace A. oticing that E x0,{u i} {g(x k k, a k )} = E {E i=0,{vi}k xk {g(x k, a k )I k }}, i=0 thus the one-ste cost function can be written in terms of the belief state as the belief one-ste cost function g(b k, a k ) E xk {g(x k, a k )I k } = g(x, a k )b k (x)dx x S g(, a), b. Assuming there exists a stationary otimal olicy, the otimal value function is given by J (b) = lim k T k J(b), b B, where T is the dynamic rogramming (DP) maing that oerates on any bounded function J : S R according to T J(b) = min a A [ g(, a), b + γe Y {J(ψ(b, a, Y ))}], (5) where E Y denotes the exectation with resect to the distribution P Y. For finite-state POMDPs, the belief state b is a vector with each entry being the robability of being at one of the states. Hence, the belief sace B is a finite-dimensional robability simlex, and the value function is a iecewise linear convex function after a finite number of iterations, rovided that the one-ste cost function is iecewise linear and convex [30]. This feature has been exloited in various exact and aroximate value iteration algorithms such as those found in [7], [22], and [30]. For continuous-state POMDPs, the belief state b is a continuous density, and thus, the belief sace B is an infinitedimensional sace that contains all sorts of continuous densities. For continuous-state POMDPs, the value function reserves convexity [32], but value iteration algorithms are not comutationally feasible because the belief sace is infinite dimensional. The infinite-dimensionality of the belief sace also creates difficulties in alying the aroximate algorithms that were develoed for finite-state POMDPs. For examle, one straightforward and commonly used aroach is to aroximate a continuous-state POMDP by a finite-state one via discretization of the state sace. In ractice, this could lead to comutational difficulties, either resulting in a belief sace that is of huge dimension or in a solution that is not accurate enough. In addition, note that even for a relatively nice rior distribution b k (e.g., a Gaussian distribution), the exact evaluation of the osterior distribution b k+ is comutationally intractable; moreover, the udate b k+ may not have any structure, and therefore can be very difficult to handle. Therefore, for ractical reasons, we often wish to have a lowdimensional belief sace and to have a osterior distribution b k+ that stays in the same distribution family as the rior b k. To address the aforementioned difficulties, we aly the density rojection technique to roject the infinite-dimensional belief sace onto a finite/low-dimensional arameterized family of densities, so as to derive a so-called rojected belief MDP, which is an MDP with a finite/low-dimensional state sace and therefore can be solved by many existing methods. In the next section, we describe density rojection in detail and develo the formulation of a rojected belief MDP. III. PROJECTED BELIEF MDP A rojection maing from the belief sace B to a family of arameterized densities Ω, denoted as P roj Ω : B Ω, is defined by P roj Ω (b) arg min f Ω D KL(b f), b B, (6) where D KL (b f) denotes the Kullback-Leibler (KL) divergence (or relative entroy) between b and f, which is D KL (b f) b(x) log b(x) dx. (7) f(x) Hence, the rojection of b on Ω has the minimum KL divergence from b among all the densities in Ω. When Ω is an exonential family of densities, the minimization (6) has an analytical solution and can be carried out easily. The exonential families include many common families of densities, such as Gaussian, binomial, Poisson, Gamma, etc. An exonential family of densities is defined as follows [3]: Definition : Let {c ( ),..., c m ( )} be affinely indeendent scalar functions defined on R n, i.e., for distinct oints

4 4 x,..., x m+, m+ i= λ ic(x i ) = 0 and m+ i= λ i = 0 imlies λ,..., λ m+ = 0, where c(x) = [c (x),..., c m (x)] T. Assuming that Θ 0 = {θ R m : φ(θ) = log ex (θ T c(x))dx < } is a convex set with a nonemty interior, then Ω defined by Ω = {f(, θ), θ Θ}, f(x, θ) = ex [θ T c(x) φ(θ)], where Θ Θ 0 is oen, is called an exonential family of robability densities, with θ its arameter and c(x) its sufficient statistic. Substituting f(x) = f(x, θ) into (7) and exressing it further as D KL (b f(, θ)) = b(x) log b(x)dx b(x) log f(x, θ)dx, we can see that the first term does not deend on f(, θ), hence min D KL (b f(, θ)) is equivalent to max b(x) log f(x, θ)dx, which by Definition is the same as max (θ T c(x) φ(θ))b(x)dx. (8) Recall the fact that the log-likelihood l(θ) = θ T c(x) φ(θ) is strictly concave in θ [2], and therefore, (θ T c(x) φ(θ))b(x)dx is also strictly concave in θ. Hence, (8) has a unique maximum and the maximum is achieved when the first-order otimality condition is satisfied, i.e., ( c j (x) cj (x) ex (θ T ) c(x))dx b(x)dx = 0. ex (θt c(x))dx With a little rearranging of the terms and the exression of f(x, θ), the above equation can be rewritten as E b [c j (X)] = E θ [c j (X)], j =,..., m, (9) where E b and E θ denote the exectations with resect to b and f(, θ), resectively. Density rojection is a useful idea to aroximate an arbitrary (most likely, infinite-dimensional) density as accurately as ossible by a density in a chosen family that is characterized by only a few arameters. Using this idea, we can transform the belief MDP to another MDP confined on a lowdimensional belief sace, and then solve this MDP roblem. We call such an MDP the rojected belief MDP. Its state is the rojected belief state b k Ω that satisfies the system dynamics b 0 = P roj Ω (b 0 ), b k = ψ(b k, a k, y k ), k = 0,,..., where ψ(b k, a k, y k ) = P roj Ω (ψ(b k, a k, y k )), and the dynamic rogramming maing on the rojected belief MDP is T J(b ) = min [ g(, a), a A b + γe Y {J(ψ(b, a, Y ) )}]. (0) For the rojected belief MDP, a olicy is denoted as π = {μ 0, μ,...}, where each function μ k mas the rojected belief state b k onto the action sace A. Similarly, a stationary olicy is denoted as μ ; an otimal stationary olicy is denoted as μ ; and the otimal value function is denoted as J (b ). The rojected belief MDP is in fact a low-dimensional continuous-state MDP, and can be solved in numerous ways. One common aroach is to use value iteration or olicy iteration by converting the rojected belief MDP to a discrete-state MDP roblem via a suitable discretization of the rojected belief sace (i.e., the arameter sace) and then estimating the one-ste cost function and transition robabilities on the discretized mesh. The effect of the discretization rocedure on dynamic rogramming has been studied in [5]. We describe this aroach in detail below. Discretization of the rojected belief sace Ω is equivalent to discretization of the arameter sace Θ, which yields a set of grid oints, denoted by G = {θ i, i =,..., }. Let g(θ i, a) denote the one-ste cost function associated with taking action a at the rojected belief state b = f(, θ i ). Let P (θi, a)(θ j ) denote the transition robability from the current rojected belief state b k = f(, θ i) to the next rojected belief state b k+ = f(, θ j) by taking action a. Estimation of P (θ i, a)(θ j ) is done using a variation of the rojection article filtering algorithm, to be described in the next section. g(θ i, a) can be estimated by its samle mean: g(θ i, a) = g(x j, a), () j= where x,..., x are samled i.i.d. from f(, θ i ). Remark : The aroach for solving the rojected belief MDP described here is robably the most intuitive, but not necessarily the most comutationally efficient. Other more efficient techniques for solving continuous-state MDPs can be used to solve the rojected belief MDP, such as the linear rogramming aroach [5], neuro-dynamic rogramming methods [7], and simulation-based methods [2]. IV. PROJECTIO PARTICLE FILTERIG Solving the rojected belief MDP gives us a olicy, which tells us what action to take at each rojected belief state. In an online imlementation, at each time k, the decision maker receives a new observation y k, estimates the belief state b k, and then chooses his action a k according to b k and that olicy. Hence, to imlement our aroach requires addressing the roblem of estimating the belief state. Estimation of b k, or simly called filtering, does not have an analytical solution in most cases excet linear Gaussian systems, but it can be solved using many aroximation methods, such as the extended Kalman filter and article filtering. Here we focus on article filtering, because ) it outerforms the extended Kalman filter in many nonlinear/non-gaussian systems [], and 2) we will develo a rojection article filter to be used in conjunction with the rojected belief MDP.

5 5 A. Particle Filtering Particle filtering is a Monte Carlo simulation-based method that aroximates the belief state by a finite number of articles/samles and mimics the roagation of the belief state [] [4]. As we have already shown in (3), the belief state evolves recursively as b k (x k ) (y k x k, a k ) (x k a k, x k )... S b k (x k )dx k. (2) The integration in (2) can be aroximated using Monte Carlo simulation, which is the essence of article filtering. Secifically, suose {x i k } i= are drawn i.i.d. from b k, and x i kk is drawn from (x ka k, x i k ) for each i; then b k (x k ) can be aroximated by the robability mass function where ˆbk (x k ) = i= wkδ(x i k x i kk ), (3) w i k (y k x i kk, a k ), (4) δ denotes the Kronecker delta function, {x i kk } i= are the random suort oints, and {wk i } i= are the associated robabilities/weights which sum u to. To avoid samle degeneracy, new samles {x i k } i= are samled i.i.d. from the aroximate belief state ˆb k. At the next time k +, the above stes are reeated to yield {x i k+k } i= and corresonding weights {wk+ i } i=, which are used to aroximate b k+. This is the basic form of article filtering, which is also called the bootstra filter [8]. (Please see [] for a rigorous and thorough derivation for a more general form of article filtering.) The algorithm is as follows: Algorithm : (Particle Filtering (Bootstra Filter)) Inut: a (stationary) olicy μ on the belief MDP; a sequence of observations y, y 2,... arriving sequentially at time k =, 2,.... Outut: a sequence of aroximate belief states ˆb, ˆb 2,.... Ste. Initialization: Samle x 0,..., x 0 i.i.d. from the aroximate initial belief state ˆb 0. Set k =. Ste 2. Prediction: Comute x kk,..., x kk by roagating x k,..., x k according to the system dynamics () using the action a k = μ(ˆb k ) and randomly generated noise {u i k } i=, i.e., samle xi kk from ( x i k, a k ), i =,...,. The emirical redicted belief state is ˆbkk (x) = δ(x x i kk ). i= Ste 3. Bayes udating: Receive a new observation y k. The emirical udated belief state is ˆbk (x) = wkδ(x i x i kk ), where w i k = i= (y k x i kk, a k ) i= (y kx i kk, a k ), i =,...,. Ste 4. Resamling: Samle x k,..., x k i.i.d. from ˆb k. Ste 5. k k + and go to ste 2. It has been roved that the aroximate belief state ˆb k converges to the true belief state b k as the samle number increases to infinity [3] [20]. However, uniform convergence in time has only been roved for the secial case, where the system dynamics has a mixing kernel which ensures that any error is forgotten (exonentially) in time. Usually, as time k increases, an increasing number of samles is required to ensure a given recision of the aroximation ˆb k for all k. B. Projection Particle Filtering To obtain a reasonable aroximation of the belief state, article filtering needs a large number of samles/articles. Since the number of samles/articles is the dimension of the aroximate belief state ˆb, article filtering is not very helful in reducing the dimensionality of the belief sace. Moreover, article filtering does not give us an aroximate belief state in the rojected belief sace Ω, hence the olicy we obtained by solving the rojected belief MDP is not immediately alicable. We incororate the idea of density rojection into article filtering, so as to aroximate the belief state by a density in Ω. The rojection article filter we roose here is a modification of the one in [2]. Their rojection article filter rojects the emirical redicted belief state, not the emirical udated belief state, onto a arametric family of densities, so after Bayes udating, the aroximate belief state might not be in that family. We will roject the emirical udated belief state onto a arametric family by minimizing the KL divergence between the emirical density and the rojected one. In addition, we will need much less restrictive assumtions than [2] to obtain similar error bounds. Since resamling is from a continuous distribution instead of an emirical (discrete) one, the roosed rojection article filter also overcomes the difficulty of samle imoverishment [] that occurs in the bootstra filter. Alying the density rojection technique we described in the last section, rojecting the emirical belief state ˆb k onto an exonential family Ω involves finding a f(, θ) with the arameter θ satisfying (9). Hence, lugging (3) into (9), yields w i c j (x i kk ) = E θ [c j ], j =,..., m, i= which constitutes the rojection ste in the rojection article filtering. Algorithm 2: (Projection article filtering for an exonential family of densities (PPF)) Inut: a (stationary) olicy μ on the rojected belief MDP; a family of exonential densities Ω = {f(, θ), θ Θ}; a sequence of observations y, y 2,... arriving sequentially at time k =, 2,.... Outut: a sequence of aroximate belief states f(, ˆθ ), f(, ˆθ 2 ),.... Ste. Initialization: Samle x 0,..., x 0 i.i.d. from the aroximate initial belief state f(, ˆθ 0 ). Set k =.

6 6 Ste 2. Prediction: Comute x kk,..., x kk by roagating x k,..., x k according to the system dynamics () using the action a k = μ (f(, ˆθ k )) and randomly generated noise {u i k } i=, i.e., samle xi kk from ( x i k, a k ), i =,...,. Ste 3. Bayes udating: Receive a new observation y k. Comute weights according to w i k = (y k x i kk, a k ) i= (y kx i kk, a k ), i =,...,. Ste 4. Projection: The aroximate belief state is f(, ˆθ k ), where ˆθ k satisfies the equations wkc i j (x i kk ) = [c Eˆθk j ], i= j =,..., m. Ste 5. Resamling: Samle x k,..., x k from f(, ˆθ k ). Ste 6. k k + and go to Ste 2. In an online imlementation, at each time k, the PPF aroximates b k by f(, ˆθ k ), and then decides an action a k according to a k = μ (f(, ˆθ k )), where μ is the olicy solved for the rojected belief MDP. As mentioned in the last section, PPF can be varied slightly for estimating the transition robabilities of the discretized rojected belief MDP, as follows: Algorithm 3: (Estimation of the transition robabilities) Inut: θ i, a, ; Outut: P (θi, a)(θ j ), j =,...,. Ste. Samling: Samle x,..., x from f(, θ i ). Ste 2. Prediction: Comute x,..., x by roagating x,..., x according to the system dynamics () using the action a and randomly generated noise {u i } i=. Ste 3. Samling observation: Comute y,..., y from x,..., x according to the observation equation (2) using randomly generated noise {v i } i=. Ste 4. Bayes udating: For each y k, k =,...,, the udated belief state is where w k i = bk (x) = wi k δ(x x i ), i= (y k x i, a) i= (y k x i, a), i =,...,. Ste 5. Projection: For k =,...,, roject each b k onto the exonential family, i.e., finding θ k that satisfies (9). Ste 6. Estimation: For k =,...,, find the nearestneighbor grid oint of θ k in G. For each θ j G, count the frequency P (θ i, a)(θ j ) = (number of θ j )/. V. AALYSIS OF ERROR BOUDS A. Value Function Aroximation Our method solves the rojected belief MDP instead of the original belief MDP, and that raises two questions: How well does the otimal value function of the rojected belief MDP aroximate the otimal value function of the original belief MDP? How well does the otimal olicy obtained by solving the rojected belief MDP erform on the original belief MDP? To answer these questions, we first need to rehrase them mathematically. Here we assume erfect comutation of the belief states and the rojected belief states, and the following: Assumtion : There exist a stationary otimal olicy for the belief MDP, denoted by μ, and a stationary otimal olicy for the rojected belief MDP, denoted by μ. Assumtion holds under some mild conditions [6], [9]. Using the stationarity, and the dynamic rogramming maing on the belief MDP and the rojected belief MDP given by (5) and (0), the otimal value function J (b) for the belief MDP can be obtained by J (b) J μ (b) = lim k T k J 0 (b), and the otimal value function for the rojected belief MDP obtained by J (b ) J μ (b ) = lim k (T ) k J 0 (b ). Therefore, the questions osed at the beginning of this section can be formulated mathematically as:. How well the otimal value function of the rojected belief MDP aroximates the true otimal value function can be measured by J (b) J (b ). 2. How well the otimal olicy μ for the rojected belief MDP erforms on the original belief sace can be measured by J (b) J μ (b), where μ (b) μ P roj Ω (b) = μ (b ). The next assumtion bounds the difference between the belief state b and its rojection b, and also the difference between their one-ste evolutions ψ(b, a, y) and ψ(b, a, y). It is an assumtion on the rojection error. Assumtion 2: There exist ε > 0 and δ > 0 such that for all a A, y O and b B, g(, a), b b ε, g(, a), ψ(b, a, y) ψ(b, a, y) δ. The following assumtion can be seen as a continuity roerty of the value function. Assumtion 3: For any δ > 0 that satisfies g(, a), b b δ, b, b B, there exists ε > 0 such that J k (b) J k (b ) ε, b, b B, k, and there exists ε > 0 such that J μ (b) J μ (b ) ε, b, b B, μ Π. ow we resent our main result. Theorem : Under Assumtions, 2 and 3, J (b) J (b ) ε + γε 2, b B, (5) γ J (b) J μ (b) 2ε + γ(ε 2 + ε 3 ), b B, (6) γ

7 7 where ε is the constant in Assumtion 2, and ε 2, ε 3 are the constants ε and ε, resectively, in Assumtion 3 corresonding to δ = δ, where δ is the constant in Assumtion 2. Remark 2: In (5) and (6), ε is a rojection error, and ε 2 and ε 3 decrease as the rojection error δ decreases. Therefore, as the rojection error decreases, the otimal value function of the rojected belief MDP J (b ) converges to the true otimal value function J (b), and the corresonding olicy μ converges to the true otimal olicy μ. Roughly seaking, the rojection error decreases as the number of sufficient statistics in the chosen exonential family increases (for details, lease see section V-C: Validation of the Assumtions). B. Projection Particle Filtering In the above analysis, we assumed erfect comutation of the belief states and the rojected belief states. In this section, we consider the filtering error, and comute an error bound on the aroximate belief state generated by the rojection article filter (PPF). ) otations: Let C b (R n ) be the set of all continuous bounded functions on R n. Let B(R n ) be the set of all bounded measurable functions on R n. Let denote the suremum norm on B(R n ), i.e., φ su x R n φ(x), φ B(R n ). Let M + (R n ) and P(R n ) be the sets of nonnegative measures and robability measures on R n, resectively. If η M + (R n ) and φ : R n R is an integrable function with resect to η, then η, φ φdη. Moreover, if η P(R n ), E η [φ] = η, φ, Var η (φ) = η, φ 2 η, φ 2. We will use the reresentations on the two sides of the above equalities interchangeably in the sequel. The belief state and the rojected belief state are robability densities; however, we will rove our results in terms of their corresonding robability measures, which we refer to as conditional distributions (belief states are conditional densities). The two reresentations are essentially the same once we assume the robability measures admit robability densities. Therefore, the notations used for robability densities before are used to denote their corresonding robability measures from now on. amely, we use b to denote a robability measure on R n x and assume it admits a robability density with resect to Lebesgue measure, which is the belief state. Similarly, we use f(, θ) to denote a robability measure on R nx and assume it admits a robability density with resect to Lebesgue measure in the chosen exonential family with arameter θ. A robability transition kernel K : P(R nx ) R nx R is defined by Kη(F ) η(dx)k(f, x), R n x where F is a set in the Borel σ-algebra on R n x. For φ : R n x R, an integrable function with resect to K(, x), Kφ(x) φ(x )K(dx, x). R n x TABLE I OTATIOS OF DIFFERET CODITIOAL DISTRIBUTIOS b k ˆbk f(, ˆθ k ) b k f(, θ k ) exact conditional distribution PPF conditional distribution before rojection PPF rojected conditional distribution CF conditional distribution before rojection CF rojected conditional distribution Let K k (dx k, x k ) denote the robability transition kernel of the system () at time k, which satisfies b kk (dx k ) = K k b k (dx kk ) = b k (dx k )K k (dx kk, x k ). R n x We let Ψ k denote the likelihood function associated with the observation equation (2) at time k, and assume that Ψ k C b (R nx ). Hence, b k = Ψ kb kk b kk, Ψ k. 2) Main Idea: The exact filter (EF) at time k can be described as b k b kk = K k b k b k = Ψ kb kk b kk, Ψ k. rediction udating The PPF at time k can be described as ˆf(, ˆθ k ) ˆb kk = K k ˆf(, ˆθk )... rediction udating ˆbk = Ψ kˆb kk f(, ˆθ k ) ˆf(, ˆθ k ). ˆb kk, Ψ k rojection resamling To facilitate our analysis, we introduce a concetual filter (CF), which at each time k is reinitialized by f(, ˆθ k ), erforms exact rediction and udating to yield b kk and b k, resectively, and does rojection to obtain f(, θ k ). It can be described as f(, ˆθ k ) b kk = K kf(, ˆθ k )... rediction b k = Ψ kb kk b kk, Ψ k udating rojection f(, θ k). The CF serves as an bridge to connect the EF and PPF, as we describe below. We are interested in the difference between the true conditional distribution b k and the PPF generated rojected conditional distribution f(, ˆθ k ) for each time k. The total error between the two can be decomosed as follows: b k f(, ˆθ k ) = (b k b k)+(b k f(, θ k))+(f(, θ k) f(, ˆθ k )). (7) The first term (b k b k ) is the error due to the inexact initial condition of the CF comared to EF, i.e., (b k f(, ˆθ k )), which is also the total error at time k. The second

8 8 term (b k f(, θ k )) evaluates the minimum deviation from the exonential family generated by one ste of exact filtering, since f(, θ k ) is the rojection of b k. The third term (f(, θ k ) f(, ˆθ k )) is urely due to Monte Carlo simulation, since f(, θ k ) and f(, ˆθ k ) are obtained using the same stes from f(, ˆθ k ) and its emirical version ˆf(, ˆθ k ), resectively. We will find error bounds on each of the three terms, and finally find the total error at time k by induction. 3) Error Bound: We shall look at the the case in which the observation rocess has an arbitrary but fixed value y 0:k = {y 0,..., y k }. Hence, all the exectations in this section are with resect to the samling in the algorithm only. We consider a test function φ B(R nx ). It is easy to see that Kφ B(R n x ) and Kφ φ, since Kφ(x) = φ(x )K(dx, x) R n x φ(x )K(dx, x) R nx φ K(dx, x) = φ. R n x Since Ψ C b (R nx ), we know that Ψ B(R nx ) and Ψφ B(R n x ). We also need the following assumtions. Assumtion 4: All the rojected distributions are in a comact subset of the given exonential family. In other words, there exists a comact set Θ Θ such that ˆθ k Θ, and θ k Θ, k. Assumtion 5: For all k, b kk, Ψ k > 0, b kk, Ψ k > 0, w.., ˆb kk, Ψ k > 0, w... Assumtion 5 guarantees that the normalizing constant in the Bayes udating is nonzero, so that the conditional distribution is well defined. Under Assumtion 4, the second inequality in Assumtion 5 can be strengthened using the comactness of Θ. Since f(x, a k, u k ) in () is continuous in x, K k is weakly continuous ( , [9]). Hence, b kk, Ψ k = K k f(, ˆθ k ), Ψ k = f(, ˆθ k ), K k Ψ k is continuous in ˆθ k, where ˆθ k Θ. Since Θ is comact, there exists a constant δ > 0 such that for each k b kk, Ψ k, w... (8) δ The assumtion below guarantees that the conditional distribution stays close to the given exonential family after one ste of exact filtering if the initial distribution is in the exonential family. Recall that starting with initial distribution f(, ˆθ k ), one ste of exact filtering yields b k, which is then rojected to yield f(, θ k ), where ˆθ k Θ, θ k Θ. Assumtion 6: There exists a constant ε > 0 such that E [ b k, φ f(, θ k), φ ] ε φ, φ B(R n x ), k. Remark 3: Assumtion 6 is our main assumtion, which essentially assumes an error bound on the rojection error. Our assumtions are much less restrictive than the assumtions in [2], while our conclusion is similar to that in [2], which will be seen later. Although Assumtion 6 aears similar to Assumtion 3 in [2], it is essentially different. Assumtion 3 in [2] says that the otimal conditional density stays close to the given exonential family for all time, whereas Assumtion 6 only assumes that if the exact filter starts in the given exonential family, after one ste the conditional distribution stays close to the family. Moreover, we do not need any assumtion like the restrictive Assumtion 4 in [2]. Lemma considers the bound on the first term in (7). Lemma : For each k, if e k is a ositive constant such that E[ b k f(, ˆθ k ), φ ] e k φ, φ B(R nx ), then under Assumtions 4 and 5, for each k there exists a constant τ k > 0 such that E [ b k b k, φ ] τ k e k φ, φ B(R n x ). (9) Lemma 2 considers the bound on the third term in (7) before rojection. Lemma 2: Under Assumtions 4 and 5, for each k, [ ] E ˆbk b k, φ τ k φ, φ B(R n x ), where τ k is the same constant as in Lemma. Lemma 3 considers the bound on the third term in (7) based on the result of Lemma 2. Lemma 3: Let c j, j =,..., m be the sufficient statistics of the exonential family as defined in Definition, and assume c j B(R nx ), j =,..., m. Under Assumtions 4 and 5, there exists a constant d > 0 such that for each k, [ ] f(, E ˆθk ) f(, θ k), φ dτ k φ, φ B(R n x ), (20) where τ k is the same constant as in Lemmas and 2. ow we resent our main result on the error bound of the rojection article filter. Theorem 2: Let e 0 be a nonnegative constant such that E[ b 0 f(, ˆθ 0 ), φ ] e 0 φ, φ B(R n x ). Under Assumtions 4, 5 and 6, and assuming that c j B(R nx ), j =,..., m, then for each k [ bk E f(, ˆθ ] k ), φ e k φ, φ B(R n x ), where ( k ) e k = τ k e 0 + τi k + ε + d i=2 k τi k, (2) τi k = k j=i τ j for k i, τi k = 0 for k < i, τ j is the constant in Lemmas, 2, and 3, d is the constant in Lemma 3, and ε is the constant in Assumtion 6. Remark 4: As we mentioned in Remark 2, the rojection error e 0 and ε decrease as the number of sufficient statistics in the chosen exonential family, m, increases. The error e k decreases at the rate of, as we increase the number of samles in the rojection article filter. However, notice that the coefficient in front of grows with time, so we have to use an increasing number of samles as time goes on, in order to ensure a uniform error bound with resect to time. i=

9 9 C. Validation of the Assumtions Assumtions 2 and 6 are the main assumtions of our analysis. They are assumtions on the rojection error, assuming that density rojection introduces a small error. We will show that in certain cases these assumtions hold, and the rojection error converges to 0 as the number of sufficient statistics, m, goes to infinity. We will first state a convergence result from [4]. However, as this convergence result is in the sense of KL divergence, we will further show the convergence in the sense emloyed in our assumtions by using an intermediate result in [4]. Consider a robability density function b defined on a bounded interval, and aroximate it by b, a density function in an exonential family, whose sufficient statistics consist of olynomials, slines or trigonometric series. The following theorem is roved in [4]. Theorem 3: If log b has r square-integrable derivatives, i.e., D r log b 2 <, then D KL (bb ) converges to 0 at rate m 2r as m. Theorem 3 says the rojected density b converges to b in the sense of KL divergence, as m goes to infinity. An intermediate result (see (6.6) in [4]) is: log b/b ν m, where ν m is a constant that deends on m, and ν m 0 as m. Since b is bounded and log( ) is a continuously differentiable function, there exists a constant ξ such that b b ξ log b log b. Hence, with the intermediate result above, φ, b b φ b b dx φ ξ log b b dx φ ξlν m, where l is the length of the bounded interval that b is defined on. Since ν m can be made arbitrarily small by taking large enough m, it is easy to see that Assumtions 2 and 6 hold in the cases that we consider. VI. UMERICAL EXPERIMETS A. Scalability and Comutational Issues Estimation of the one-ste cost function () and transition robabilities (Algorithm 3) are executed for every belief-action air that is in the discretized mesh G and the action sace A. Hence, the algorithms scale according to O(GA) and O(GA 2 ), resectively, where G is the number of grid oints, A is the number of actions, and is the number of samles secified in the algorithms. In imlementation, we found that most of the comutation time is sent on executing Algorithm 3 over all belief-action airs. However, estimation of cost functions and transition robabilities can be re-comuted and stored, and hence only needs to be done once. The advantage of the algorithms is that the scalability is indeendent of the size of the actual state sace, since G is a grid mesh on the arameter sace of the rojected belief sace. That is exactly what is desired by emloying density rojection. However, to get a better aroximation, more arameters in the exonential family should be used, and that will lead to a higher-dimensional arameter sace to discretize. Increasing the number of arameters in the exonential family also makes samling more difficult. Samling from a general exonential family is usually not easy, and may require some advanced techniques, such as the adative rejection samling (ARS) [6], and hence more comutation time. This difficulty can be avoided by resamling from the discrete articles instead of the rojected density, which is equivalent to using the lain article filter and then doing rojection outside the filter. However, this may lead to samle degeneracy. The trade-off between a better aroximation and less comutation time is comlicated and requires more research. We lan to study how to aroriately choose the exonential family and imrove the simulation efficiency in the future. B. Simulation Results Since most of the benchmark POMDP roblems in the literature assume a discrete state sace, it is difficult to comare against the state of the art. Here we consider an inventory control roblem by adding a artial observation equation to a fully observable inventory control roblem. The fully observable roblem has an otimal threshold olicy [27], which allows us to verify our method in the limiting case by setting the observation noise very close to 0. In our inventory control roblem, the inventory level is reviewed at discrete times, and the observations are noisy because of, e.g., inventory soilage, mislacement, distributed storage. At each eriod, inventory is either relenished by an order of a fixed amount or not relenished. The customer demands arrive randomly with known distribution. The demand is filled if there is enough inventory remaining. Otherwise, in the case of a shortage, excess demand is not satisfied and a enalty is issued on the lost sales amount. We assume that the demand and the observation noise are both continuous random variables; hence the state, i.e., the inventory level, and the observation, are continuous random variables. Let x k denote the inventory level at eriod k, u k the i.i.d. random demand at eriod k, a k the relenish decision at eriod k (i.e., a k = 0 or ), Q the fixed order amount, y k the observation of inventory level x k, v k the i.i.d. observation noise, h the er eriod er unit inventory holding cost, s the er eriod er unit inventory shortage enalty cost. The system equations are as follows x k+ = max(x k + a k Q u k, 0), k = 0,,..., y k = x k + v k, k = 0,,.... The cost incurred in eriod k is g k (x k, a k, u k ) = h max (x k + a k Q u k, 0)... + s max (u k x k a k Q, 0). We consider two objective functions: average cost er eriod and discounted total cost, given by [ H ] E k=0 g [ H ] k lim su, lim H H E γ k g k, H where γ (0, ) is the discount factor. k=0

10 0 In the simulation, we first choose an exonential family and secify a grid mesh on its arameter sace, then imlement () and Algorithm 3 on the grid mesh, and use value iteration to solve for a olicy. These are done offline. In an online run, Algorithm 2 (PPF) is used for filtering and making decisions with the olicy obtained offline. We also consider a small variation of this method: instead of using PPF, we use Algorithm (PF) and do density rojection outside the filter each time. We comare our two methods (called Ours and Ours 2, resectively) described above to four other algorithms: () Certainty equivalence using the mean estimate (CE); (2) Certainty equivalence using the maximum likelihood estimate (CE- MLE); (3) EKF-based Parametric POMDP (EKF-PPOMDP) in [8]; (4) Greedy olicy. CE treats the state estimate as the true state in the solution to the full observation roblem. We use the bootstra filter to obtain the mean estimate and the MLE of the states for CE. EKF-PPOMDP aroximates the belief state by a Gaussian distribution, and uses the extended Kalman filter to estimate the transition of the belief state. Similar to our method, it also solves a discretized MDP defined on the arameter sace of the Gaussian density. The greedy olicy chooses an action a k that attains the minimum in the exression min ak A E xk,u k [g k (x k, a k Q, u k )I k ]. umerical exeriments are carried out in the following settings: Problem arameters: initial inventory level x 0 = 5, holding cost h =, shortage enalty cost s = 0, fixed order amount b = 0, random demand u k ex(5), discount factor γ = 0.9, inventory observation noise v k (0, σ 2 ) with σ ranging from 0. to 3.3 in stes of 0.2. Algorithm arameters: The number of articles in both the usual article filter and the rojection article filter is = 200; the exonential family in the rojection article filter is chosen as the Gaussian family; the set of grid oints on the rojected belief sace is G = {mean = [0 : 0.5 : 5], standard deviation = [0 : 0.2 : 5]} for both our methods and EKF-PPOMDP; one run of horizon length H = 0 5 for each average cost criterion case, 000 indeendent runs of horizon length H = 40 for each discounted total cost criterion case; nearest neighbor as the value function aroximator in both our methods and EKF-PPOMDP. Simulation issues: We use common random variables among different olicies and different σ s. In order to imlement CE, we use Monte Carlo simulation to find the otimal threshold olicy for the fully observed roblem (i.e., y k = x k ): if the inventory level is below the threshold L, the store/warehouse should order to relenish its inventory; otherwise, if the inventory level is above L, the store/warehouse should not order. That is, { 0, if xk > L; a k = (22), if x k < L. The simulation result indicates both the average and discounted cost functions are convex in the threshold and the minimum is achieved at L = 7.7 for both. Table II and Table III list the simulated average costs and discounted total cost using different olicies under different observation noises, resectively. Our methods generally outerforms all the other algorithms under all observation noise levels. CE also erforms very well, and slightly outerforms CE-MLE. EKF-PPOMDP erforms better in the average cost case than the discounted cost case. The greedy olicy is much worse than all other algorithms. While our methods and the EKF-PPOMDP involve offline comutation, the more critical online comutation time of all the simulated methods is aroximately the same. For all the algorithms, the average cost/discounted cost increases as the observation noise increases. That is consistent with the intuition that we cannot erform better with less information. Fig. shows 000 actions taken by our method versus the true inventory levels in the average cost case (the discounted total cost case is similar and is omitted here). The dotted vertical line is the otimal threshold under full observation L. Our algorithm yields a olicy that icks actions very close to those of the otimal threshold olicy when the observation noise is small (cf. Fig.(a)), indicating that our algorithm is indeed finding the otimal olicy. As the observation noise increases, more actions icked by our olicy violate the otimal threshold, and that again shows the value of information in determining the actions. The erformances of our two methods are very close, with one slightly better than the other. Solely for the urose of filtering, doing rojection outside the filter is easier to imlement if we want to use a general exonential family, and also gives a better estimate of the belief state, since the rojection error will not accumulate. However, for solving POMDPs, we conjecture that PPF would work better in conjunction with the olicy solved from the rojected belief MDP, since the rojected belief MDP assumes that the transition of the belief state is also rojected. However, we do not know which one is better. Our method outerforms the EKF-PPOMDP, mainly because the rojection article filter used in our method is better than the extend Kalman filter used in the EKF-PPOMDP for estimating the belief transition robabilities. This agrees with the results in [9], which also observed that Monte Carlo simulation of the belief transitions is better than the EKF estimate. Although the erformance of CE is comarable to that of our methods for this articular examle, CE is generally a subotimal olicy excet in some secial cases (cf. section 6. in [6]), and it does not have a theoretical error bound. Moreover, to use CE requires solving the full observation roblem, which is also very difficult in many cases. In contrast, our method has a roven error bound on the erformance, and works with the POMDP directly without having to solve the MDP roblem under full observation. VII. COCLUSIO In this aer, we develoed a method that effectively reduces the dimension of the belief sace via the orthogonal rojection of the belief states onto a arameterized family of

11 Action a (relenish or not relenish) Action a (relenish or not relenish) Action a (relenish or not relenish) 0 (state, action) yielded by our method otimal threshold under full observation State x (Inventory level) 0 (a) observation noise σ = 0. (state, action) yielded by our method otimal threshold under full observation State x (Inventory level) 0 (b) observation noise σ =. (state,action) yielded by our method otimal threshold under full observation State x (Inventory level) (c) observation noise σ = 3. Fig.. Our method (Ours ): actions taken for different inventory levels under different observation noise variances. robability densities. For an exonential family, the density rojection has an analytical form and can be carried out efficiently. The exonential family is fully reresented by a finite (small) number of arameters, hence the belief sace is maed to a low-dimensional arameter sace and the resultant belief MDP is called the rojected belief MDP. The rojected belief MDP can then be solved in numerous ways, such as standard value iteration or olicy iteration, to generate a olicy. This olicy is used in conjunction with the rojection TABLE II OPTIMAL AVERAGE COST ESTIMATES FOR THE IVETORY COTROL PROBLEM USIG DIFFERET METHODS. EACH ETRY REPRESETS THE AVERAGE COST OF A RU OF HORIZO 0 5. σ Ours Ours 2 CE CE-MLE EKF-P- Greedy POMDP Policy article filter for online decision making. We analyzed the erformance of the olicy generated by solving the rojected belief MDP in terms of the difference between the value function associated with this olicy and the otimal value function of the POMDP. We also rovided a bound on the error between our rojection article filter and exact filtering. We alied our method to an inventory control roblem, and it generally outerformed other methods. When the observation noise is small, our algorithm yields a olicy that icks the actions very closely to the otimal threshold olicy for the fully observed roblem. Although we only roved theoretical results for discounted cost roblems, the simulation results indicate that our method also works well on average cost roblems. We should oint out that our method is also alicable to finite horizon roblems, and is suitable for largestate POMDPs in addition to continuous-state POMDPs. APPEDIX Proof of Theorem : Denote J k (b) T k J 0 (b), J k (b ) (T ) k J 0 (b ), k = 0,,..., and define b k (b, a) = g(, a), b + γe Y {J k (ψ(b, a, Y ))}, μ k (b) = arg min a A Q k(b, a), b k (b, a) = g(, a), b + γe Y {J k (ψ(b, a, Y ) )}, μ k (b) = arg min a A Q k (b, a). Hence, J k (b) = min k(b, a) = Q k (b, μ k (b)), a A J k (b ) = min a A Q k (b, a) = Q k (b, μ k (b)). Denote err k max b B J k (b) J k (b ), k =, 2,.... We consider the first iteration. Initialize with J 0 (b) = J 0 (b ) = 0. By Assumtion 2, a A, Q (b, a) Q (b, a) = g(, a), b b ε, b B. (23)

12 2 TABLE III OPTIMAL DISCOUTED COST ESTIMATE FOR THE IVETORY COTROL PROBLEM USIG DIFFERET METHODS. EACH ETRY REPRESETS THE DISCOUTED COST (MEA, STADARD ERROR I PARETHESES) BASED O 000 IDEPEDET RUS OF HORIZO 40. σ Ours Ours 2 CE CE-MLE EKF-P- Greedy POMDP Policy (.64) (.63) (.8) (.8) (.65) (2.99) (.63) (.63) (.78) (.78) (.62) (2.98) (.63) (.62) (.77) (.78) (.60) (2.98) (.62) (.6) (.77) (.79) (.55) (2.98) (.63) (.63) (.76) (.78) (.60) (2.97) (.64) (.63) (.77) (.75) (.57) (2.97) (.65) (.64) (.76) (.72) (.52) (2.96) (.68) (.66) (.75) (.77) (.52) (2.95) (.67) (.67) (.76) (.77) (.52) (2.96) (.69) (.67) (.75) (.73) (.56) (2.93) (.69) (.67) (.74) (.79) (.54) (2.95) (.68) (.67) (.75) (.78) (.55) (2.97) (.68) (.68) (.76) (.78) (.58) (2.96) (.68) (.68) (.75) (.78) (.54) (2.96) (.68) (.68) (.75) (.83) (.57) (2.96) (.70) (.70) (.76) (.79) (.54) (2.95) (.69) (.69) (.76) (.84) (.60) (2.94) Hence, with a = μ (b), the above inequality yields Q (b, μ (b)) J (b ) + ε. Using J (b) Q (b, μ (b)), we get J (b) J (b ) + ε, b B. (24) With a = μ (b), (23) yields Q (b, μ (b)) ε J (b). Using J (b ) Q (b, μ (b)), we get J (b ) ε J (b), b B. (25) From (24) and (25), we conclude J (b) J (b ) ε, b B. Taking the maximum over b on both sides of the above inequality yields err ε. (26) ow we consider the (k +) th iteration. For a fixed Y = y, by Assumtion 2, g(, a), ψ(b, a, y) ψ(b, a, y) δ. Let δ be the δ in Assumtion 3 and denote the corresonding ε by ε 2. Then J k (ψ(b, a, y)) J k (ψ(b, a, y) ) ε 2, b B. (27) Therefore, a A, Q k+ (b, a) Q k+ (b, a) g(, a), b b... + γe Y {J k (ψ(b, a, Y )) J k (ψ(b, a, Y ) )} ε + γe Y {J k (ψ(b, a, Y )) J k (ψ(b, a, Y ) )... + J k (ψ(b, a, Y ) ) J k (ψ(b, a, Y ) )} ε + γ(ε 2 + err k ), b B. The third inequality follows from (27) and the definition of err k. Using an argument similar to that used to rove (26) from (23), we conclude that err k+ ε + γ(ε 2 + err k ). (28) Using induction on (28) with initial condition (26) and taking k, we obtain J (b) J (b ) γ k ε + γ k ε 2 k=0 k= = ε + γε 2 γ. (29) Therefore, (5) is roved. Fixing a olicy μ on the original belief MDP, define the maings under olicy μ on the belief MDP and the rojected belief MDP as T μ J(b)= g(, μ(b)), b + γe Y {J(ψ(b, μ(b), Y ))}, (30) T μj(b)= g(, μ(b)), b + γe Y {J(ψ(b, μ(b), Y ) )},(3) resectively. Since μ is a stationary olicy for the rojected belief MDP, μ = μ P roj Ω is stationary for the original belief MDP. Hence, J (b ) = T μ J (b ), J μ (b) = T μ J μ (b). Subtracting both sides of the above two equations, and substituting in the definitions of T and T (i.e., (3) and (30)) for the right-hand sides resectively, we get J (b ) J μ (b) = g (, μ (b )), b b... + γe Y { J (ψ(b, μ (b ), Y ) ) J μ (ψ(b, μ (b ), Y )) }. (32) For a fixed Y = y, J (ψ(b, μ (b ), y) ) J μ (ψ(b, μ (b ), y)) J ( b) J μ ( b)... + J μ (ψ(b, μ (b ), y) ) J μ (ψ(b, μ (b ), y)), where b = ψ(b, μ (b ), y) B. By Assumtion 2, g(, a), ψ(b, μ (b ), y) ψ(b, μ (b ), y) δ, letting δ = δ in Assumtion 3 and denoting the corresonding ε by ε 3, we get the second term J μ (ψ(b, μ (b ), y) ) J μ (ψ(b, μ (b ), y)) ε3. Denoting err max b B J (b ) J μ (b), we obtain J (ψ(b, μ (b ), y) ) J μ (ψ(b, μ (b ), y)) err + ε 3.

13 3 Therefore, (32) becomes J (b ) J μ (b) ε + γ(err + ε 3 ). Taking the maximum over b on both sides of the above inequality yields Hence, err ε + γ(err + ε 3 ). err ε + γε 3 γ. (33) With (29) and (33), we obtain J (b) J μ (b) J (b) J (b ) + J (b ) J μ (b) 2ε + γ(ε 2 + ε 3 ), b B. γ Therefore, (6) is roved. [ bk Proof of Lemma : E f(, ˆθ ] k ), φ is the error from time k, which is also the initial error for time k. Hence, the rediction ste yields [ ] bkk E b kk [, φ Kk = E (b k f(, ˆθ ] k )), φ [ bk = E f(, ˆθ ] k ), K k φ e k K k φ e k φ. (34) Under Assumtions 4 and 5, we have showed (8). Using (8) and (34), the Bayes udating ste yields E [ b k b k, φ ] [ b kk, Ψ k φ = E b kk, Ψ k b kk, Ψ ] kφ b kk, Ψ k [ b kk, Ψ k φ E b kk, Ψ k b ] kk, Ψ k φ b kk, Ψ... k [ b kk, Ψ k φ + E b kk, Ψ k b kk, Ψ ] kφ b kk, Ψ k [ b kk, Ψ k φ b kk b kk = E, Ψ ] k b kk, Ψ k b kk, Ψ... k [ b kk b kk + E, Ψ ] kφ b kk, Ψ k b kk, Ψ k φ [ ] δ b kk, Ψ k E bkk b kk, Ψ k... [ ] bkk + δe b kk, Ψ kφ δe k φ Ψ k + δe k Ψ k φ 2δe k Ψ k φ = τ k e k φ, where τ k = 2δ Ψ k. Proof of Lemma 2: This lemma uses essentially the same roof technique as Lemmas 3 and 4 in [3]. However, it is not quite obvious how these lemmas imly our lemma here. Therefore, we state the roof to make this aer more accessible. After the resamling ste, ˆf(, ˆθ k ) = i= δ(x xi k ), where x i k, i =,..., are i.i.d. samles from f(, ˆθ k ). Using the Cauchy-Schwartz inequality, we have = = = ( E [ ˆf(, ˆθ k ) f(, ˆθ k ), φ 2]) /2 ( E ( φ(x i k ) f(, ˆθ k ), φ ) ) 2 (E [ i= /2 ]) /2 (φ(x i k ) f(, ˆθ k ), φ ) 2 i= ( f(, ˆθ k ), φ 2 f(, ˆθ k ), φ 2) /2 f(, ˆθ k ), φ 2 /2 φ. (35) The Bayes udating ste yields [ ] E ˆbk b k, φ [ ˆb kk, Ψ k φ = E ˆb kk, Ψ k b kk, Ψ ] kφ b kk, Ψ k [ ˆb kk, Ψ k φ E ˆb kk, Ψ k ˆb ] kk, Ψ k φ b kk, Ψ... k [ ˆb kk, Ψ k φ + E b kk, Ψ k b kk, Ψ ] kφ b kk, Ψ k [ ˆb kk, Ψ k φ ˆb kk b kk = E, Ψ ] k ˆb kk, Ψ k b kk, Ψ... k [ ˆb kk b kk + E, Ψ ] kφ b kk, Ψ. k Under Assumtions 4 and 5, we have shown (8). Using the Cauchy-Schwartz inequality, (8), and (35), the first term can be simlified as [ ˆb kk, Ψ k φ ˆb kk b kk E, Ψ ] k ˆb kk, Ψ k b kk, Ψ k ( [ ]) /2 ˆbkk, Ψ k φ 2 δ E... ˆb kk, Ψ k 2 ( E [ ˆb kk b kk, Ψ k 2]) /2 ( [ ]) /2 ˆbkk, Ψ k φ 2 = δ E... ˆb kk, Ψ k 2 ( E [ f(, ˆθ k ) f(, θ k ), K k Ψ k 2]) /2 δ φ Ψ k,

14 4 and the second term can be simlified as [ ˆb kk b kk E, Ψ ] kφ b kk, Ψ k ( δ E [ ˆb kk b kk, Ψ kφ 2]) /2 ( = δ E [ ˆf(, ˆθ k ) f(, ˆθ k ), K k Ψ k φ 2]) /2 δ Ψ k φ δ Ψ k φ. Therefore, adding these two terms yields [ ] E ˆbk b k, φ 2δ Ψ k φ = τ k φ, where τ k = 2δ Ψ k, the same constant as in Lemma. Proof of Lemma 3: The key idea of the roof for Lemma 4 in [2] is used here. From (9), we know that [c Eˆθk j (X)] = [c Eˆbk j (X)] and E θ k [c j (X)] = E b k [c j (X)]. Hence, we obtain [ ] [ ] E Eˆθ (c k j(x)) E θ (c j(x)) = E ˆbk b k k, c j, for j =,..., m. Taking summation over j, we obtain m E m [ ] (c Eˆθk j (X)) E θ k (c j (X)) = E ˆbk b k, c j. j= j= Since c j B(R nx ), we aly Lemma 2 with φ = c j and thus obtain [ ] E ˆbk b c j k, c j τ k, j =,..., m. Therefore, [ ] E Eˆθ (c(x)) E k θ k (c(x)) τ k, where denotes the L norm on R n x, c = [c,..., c m ] T, m and τ k = τ k j= c j. Since Θ is comact and the Fisher information matrix [E θ [c i (X)c j (X)] E θ [c i (X)] E θ [c j (X)]] ij is ositive definite, we get (cf. Fact 2 in [2] for a detailed roof) ˆθ k θ k α (c(x)) E Eˆθk θ k (c(x)). Taking exectation on both sides yields [ ] [ ] E ˆθk θ k αe Eˆθ (c(x)) E k θ k (c(x)) α τ k. On the other hand, taking derivative of E θ [φ(x)] with resect to θ i yields d E θ [φ(x)] dθ i = E θ[c i (X)φ(X)] E θ [c i (X)]E θ [φ(x)] Var θ (c i )Var θ (φ) E θ (c 2 i )E θ(φ 2 ) c i φ. Hence, ( d dθ E m ) θ[φ(x)] c i φ. Since Θ is comact, there exists a constant β > 0 such that E θ [φ(x)] is Lischitz over θ Θ with Lischitz constant β φ (cf. the roof of Fact 2 in [2]), i.e., [φ] E Eˆθk θ k [φ] β φ ˆθ k θ k. i= Taking exectation on both sides yields [ ] [ ] f(, E ˆθk ) f(, θ k), φ β φ E ˆθk θ k β φ α τ k = dτ k φ, where d = αβ m j= c j. Proof of Theorem 2: Alying Lemma, Assumtion 6, and Lemma 3, we have that for each k [ bk E f(, ˆθ ] k ), φ E [ b k b k, φ ] + E [ b k f(, θ k), φ ]... [ f(, + E θ k ) f(, ˆθ ] k ), φ ( τ k e k + ε + dτ ) k φ = e k φ, φ B(R nx ), where τ k is the constant in Lemmas and 3, d is the constant in Lemma 3, and ε is the constant in Assumtion 6. It is easy to deduce by induction that where τ k i ( k ) e k = τ k e 0 + τi k + ε + i=2 = k j=i τ j for k i, τ k i ACKOWLEDGEMET d k τi k, i= = 0 for k < i. We thank the associate editor and anonymous reviewers for their careful reading of the aer and very constructive comments that led to a substantially imroved aer. We thank one of the anonymous reviewers for bringing the work of Brooks and Williams [9] to our attention. REFERECES [] S. Arulamalam, S. Maskell,. J. Gordon, and T. Cla, A Tutorial on Particle Filters for On-line on-linear/on-gaussian Bayesian Tracking, IEEE Transactions of Signal Processing, vol. 50, no. 2, , [2] B. Azimi-Sadjadi, and P. S. Krishnarasad, Aroximate onlinear Filtering and its Alication in avigation, Automatica, vol. 4, no. 6, , [3] O. E. Barndorff-ielsen, Information and Exonential Families in Statistical Theory, Wiley, ew York, 978. [4] A. R. Barron, and C. Sheu, Aroximation of Density Functions by Sequences of Exonential Family, The Annals of Statistics, vol. 9, no. 3, , 99. [5] D. P. Bertsekas, Convergence of Discretization Procedures in Dynamic Programming, IEEE Transactions on Automatic Control, vol. 20, no. 3, , 975. [6] D. P. Bertsekas, Dynamic Programming and Otimal Control, Athena Scientific, Belmont, MA, 995. [7] D. P. Bertsekas, and J.. Tsitsiklis, euro-dynamic Programming, Otimization and eural Comutation Series, Athena Scientific, Belmont, MA, st edition, 996.

15 5 [8] A. Brooks, A. Makarenkoa, S. Williams, and H. Durrant-Whytea, Parametric POMDPs for Planning in Continuous State Saces, Robotics and Autonomous Systems, vol. 54, no., , [9] A. Brooks, and S. Williams, A Monte Carlo Udate for Parametric POMDPs, International Symosium of Robotics Research, [0] E. Brunskill, L. Kaelbling, T. Lozano-Perez, and. Roy, Continuous- State POMDPs with Hybrid Dynamics, in Proceedings of the Tenth International Symosium on Artificial Intelligence and Mathematics, [] A. R. Cassandra, Exact and Aroximate Algorithms for Partially Observable Markov Decision Processes, Ph.D. thesis, Brown University, [2] H. S. Chang, M. C. Fu, J. Hu, and S. I. Marcus, Simulation-based Algorithms for Markov Decision Processes, Communications and Control Engineering Series, Sringer, ew York, [3] D. Crisan, and A. Doucet, A Survey of Convergence Results on Particle Filtering Methods for Practitioners, IEEE Transaction on Signal Processing, vol. 50, no. 3, , [4] A. Doucet, J. F. G. de Freitas, and. J. Gordon, editors, Sequential Monte Carlo Methods In Practice, Sringer, ew York, 200. [5] D. de Farias, and B. Van Roy, The Linear Programming Aroach to Aroximate Dynamic Programming, Oerations Research, vol. 5, no. 6, [6] W. R. Gilks, Derivative-Free Adative Rejection Samling for Gibbs Samling, in Bayesian Statistics 4, J. Bernardo, J. Berger, A. Dawid, and A. Smith, editors, , Oxford University Press, Oxford, 992. [7] M. Hauskrecht, Value-Function Aroximations for Partially Observable Markov Decision Processes, Journal of Artificial Intelligence Research, vol. 3, , [8]. J. Gordon, D. J. Salmond, and A. F. M. Smith, ovel Aroach to onlinear/on-gaussian Bayesian State Estimation, in IEE Proceedings F (Radar and Signal Processing), vol. 40, no. 2,. 07 3, 993. [9] O. Hernandez-Lerma, and J. B. Lasserre, Discrete-Time Markov Control Processes Basic Otimality Criteria, Sringer, ew York, 996. [20] F. Le Gland, and. Oudjane, Stability and Uniform Aroximation of onlinear Filters Using the Hilbert Metric and Alication to Particle Filter, The Annals of Alied Probability, vol. 4, no., , [2] E. L. Lemann, and G. Casella, Theory of Point Estimation, Sringer, ew York, 2nd edition, 998. [22] M. L. Littman, The Witness Algorithm: Solving Partially Observable Markov Decision Processes, TR CS-94-40, Deartment of Comuter Science, Brown University, Providence, RI, 994. [23] M. K. Murray, and J. W. Rice, Differential Geometry and Statistics, Chaman & Hall, ew York, 993. [24] J. M. Porta, M. T. J. Saan, and. Vlassis, Robot Planning in Partially Observable Continuous Domains, in Proc. Robotics: Science and Systems, [25] J. M. Porta,. Vlassis, M. T. J. Saan, and P. Pouart, Point-Based Value Iteration for Continuous POMDPs, Journal of Machine Learning Research, vol. 7, , [26] P. Pouart, and C. Boutilier, Value-Directed Comression of POMDPs, Advances in eural Information Processing Systems, vol. 5, , [27] M. Resh, and P. aor, An Inventory Problem with Discrete Time Review and Relenishment by Batches of Fixed Size, Management Science, vol. 0, no.,. 09-8, 963. [28]. Roy, Finding Aroximate POMDP Solutions through Belief Comression, Ph.D. thesis, Robotics Institute, Carnegie Mellon University, Pittsburgh, PA, [29]. Roy, and G. Gordon, Exonential Family PCA for Belief Comression in POMDPs, Advances in eural Information Processing Systems, vol. 5, , [30] R. D. Smallwood, and E. J. Sondik, The Otimal Control of Partially Observable Markov Processes over a Finite Horizon, Oerations Research, vol. 2, no. 5, , 973. [3] S. Thrun, Monte Carlo POMDPs, Advances in eural Information Processing Systems, vol. 2, , [32] H. J. Yu, Aroximate Solution Methods for Partially Observable Markov and Semi-Markov Decision Processes, Ph.D. thesis, M.I.T., Cambridge, MA, Enlu Zhou (S 07-M 0) received the B.S. degree with highest honors in electrical engineering from Zhejiang University, China, in 2004, and received the Ph.D. degree in electrical engineering from the University of Maryland, College Park, in She is currently an Assistant Professor in the Deartment of Industrial and Enterrise Systems Engineering, at the University of Illinois at Urbana-Chamaign. Her research interests include stochastic control, nonlinear filtering, and simulation otimization. Michael C. Fu (S 89-M 89-SM 06-F 08) received a bachelor s degree in mathematics and bachelor s and master s degrees in electrical engineering and comuter science from the Massachusetts Institute of Technology, Cambridge, MA, in 985, and a master s and Ph.D. degree in alied mathematics from Harvard University, Cambridge, MA in 986 and 989, resectively. Since 989, he has been with the University of Maryland, College Park, currently as Ralh J. Tyser Professor of Management Science in the Robert H. Smith School of Business, with a joint aointment in the Institute for Systems Research and an affiliate faculty aointment in the Deartment of Electrical and Comuter Engineering. His research interests include simulation otimization and alied robability, with alications in manufacturing, suly chain management, and financial engineering. He served as the Simulation Area Editor of Oerations Research from , and as the Stochastic Models and Simulation Deartment Editor of Management Science from He is co-author (with J. Q. Hu) of the book, Conditional Monte Carlo: Gradient Estimation and Otimization Alications (Kluwer, 997), which received the IFORMS Simulation Society s Outstanding Publication Award in 998, and co-author (with H. S. Chang, J. Hu, and S. I. Marcus) of the book, Simulation-based Algorithms for Markov Decision Processes (Sringer, 2007). Steven I. Marcus (S 70-M 75-SM 83-F 86) received the B.A. degree in electrical engineering and mathematics from Rice University in 97 and the S.M. and Ph.D. degrees in electrical engineering from the Massachusetts Institute of Technology in 972 and 975, resectively. From 975 to 99, he was with the Deartment of Electrical and Comuter Engineering at the University of Texas at Austin, where he was the L.B. (Preach) Meaders Professor in Engineering. He was Associate Chairman of the Deartment during the eriod In 99, he joined the University of Maryland, College Park, as Professor in the Electrical and Comuter Engineering Deartment and the Institute for Systems Research. He was Director of the Institute for Systems Research from 99 to 996 and Chair of the Electrical and Comuter Engineering Deartment from 2000 to Currently, his research is focused on stochastic control, estimation, and otimization.

Solving Continuous-State POMDPs. via Density Projection

Solving Continuous-State POMDPs. via Density Projection Solving Continuous-State POMDPs 1 via Density Projection Enlu Zhou, Student Member, IEEE, Michael C. Fu, Fellow, IEEE, and Steven I. Marcus, Fellow, IEEE, Abstract Research on numerical solution methods

More information

LECTURE 7 NOTES. x n. d x if. E [g(x n )] E [g(x)]

LECTURE 7 NOTES. x n. d x if. E [g(x n )] E [g(x)] LECTURE 7 NOTES 1. Convergence of random variables. Before delving into the large samle roerties of the MLE, we review some concets from large samle theory. 1. Convergence in robability: x n x if, for

More information

4. Score normalization technical details We now discuss the technical details of the score normalization method.

4. Score normalization technical details We now discuss the technical details of the score normalization method. SMT SCORING SYSTEM This document describes the scoring system for the Stanford Math Tournament We begin by giving an overview of the changes to scoring and a non-technical descrition of the scoring rules

More information

On the capacity of the general trapdoor channel with feedback

On the capacity of the general trapdoor channel with feedback On the caacity of the general tradoor channel with feedback Jui Wu and Achilleas Anastasooulos Electrical Engineering and Comuter Science Deartment University of Michigan Ann Arbor, MI, 48109-1 email:

More information

Convex Optimization methods for Computing Channel Capacity

Convex Optimization methods for Computing Channel Capacity Convex Otimization methods for Comuting Channel Caacity Abhishek Sinha Laboratory for Information and Decision Systems (LIDS), MIT sinhaa@mit.edu May 15, 2014 We consider a classical comutational roblem

More information

System Reliability Estimation and Confidence Regions from Subsystem and Full System Tests

System Reliability Estimation and Confidence Regions from Subsystem and Full System Tests 009 American Control Conference Hyatt Regency Riverfront, St. Louis, MO, USA June 0-, 009 FrB4. System Reliability Estimation and Confidence Regions from Subsystem and Full System Tests James C. Sall Abstract

More information

Information collection on a graph

Information collection on a graph Information collection on a grah Ilya O. Ryzhov Warren Powell October 25, 2009 Abstract We derive a knowledge gradient olicy for an otimal learning roblem on a grah, in which we use sequential measurements

More information

Approximating min-max k-clustering

Approximating min-max k-clustering Aroximating min-max k-clustering Asaf Levin July 24, 2007 Abstract We consider the roblems of set artitioning into k clusters with minimum total cost and minimum of the maximum cost of a cluster. The cost

More information

Information collection on a graph

Information collection on a graph Information collection on a grah Ilya O. Ryzhov Warren Powell February 10, 2010 Abstract We derive a knowledge gradient olicy for an otimal learning roblem on a grah, in which we use sequential measurements

More information

Combining Logistic Regression with Kriging for Mapping the Risk of Occurrence of Unexploded Ordnance (UXO)

Combining Logistic Regression with Kriging for Mapping the Risk of Occurrence of Unexploded Ordnance (UXO) Combining Logistic Regression with Kriging for Maing the Risk of Occurrence of Unexloded Ordnance (UXO) H. Saito (), P. Goovaerts (), S. A. McKenna (2) Environmental and Water Resources Engineering, Deartment

More information

Elementary Analysis in Q p

Elementary Analysis in Q p Elementary Analysis in Q Hannah Hutter, May Szedlák, Phili Wirth November 17, 2011 This reort follows very closely the book of Svetlana Katok 1. 1 Sequences and Series In this section we will see some

More information

Distributed Rule-Based Inference in the Presence of Redundant Information

Distributed Rule-Based Inference in the Presence of Redundant Information istribution Statement : roved for ublic release; distribution is unlimited. istributed Rule-ased Inference in the Presence of Redundant Information June 8, 004 William J. Farrell III Lockheed Martin dvanced

More information

Asymptotically Optimal Simulation Allocation under Dependent Sampling

Asymptotically Optimal Simulation Allocation under Dependent Sampling Asymtotically Otimal Simulation Allocation under Deendent Samling Xiaoing Xiong The Robert H. Smith School of Business, University of Maryland, College Park, MD 20742-1815, USA, xiaoingx@yahoo.com Sandee

More information

Using the Divergence Information Criterion for the Determination of the Order of an Autoregressive Process

Using the Divergence Information Criterion for the Determination of the Order of an Autoregressive Process Using the Divergence Information Criterion for the Determination of the Order of an Autoregressive Process P. Mantalos a1, K. Mattheou b, A. Karagrigoriou b a.deartment of Statistics University of Lund

More information

Estimation of the large covariance matrix with two-step monotone missing data

Estimation of the large covariance matrix with two-step monotone missing data Estimation of the large covariance matrix with two-ste monotone missing data Masashi Hyodo, Nobumichi Shutoh 2, Takashi Seo, and Tatjana Pavlenko 3 Deartment of Mathematical Information Science, Tokyo

More information

Feedback-error control

Feedback-error control Chater 4 Feedback-error control 4.1 Introduction This chater exlains the feedback-error (FBE) control scheme originally described by Kawato [, 87, 8]. FBE is a widely used neural network based controller

More information

On a Markov Game with Incomplete Information

On a Markov Game with Incomplete Information On a Markov Game with Incomlete Information Johannes Hörner, Dinah Rosenberg y, Eilon Solan z and Nicolas Vieille x{ January 24, 26 Abstract We consider an examle of a Markov game with lack of information

More information

Research Note REGRESSION ANALYSIS IN MARKOV CHAIN * A. Y. ALAMUTI AND M. R. MESHKANI **

Research Note REGRESSION ANALYSIS IN MARKOV CHAIN * A. Y. ALAMUTI AND M. R. MESHKANI ** Iranian Journal of Science & Technology, Transaction A, Vol 3, No A3 Printed in The Islamic Reublic of Iran, 26 Shiraz University Research Note REGRESSION ANALYSIS IN MARKOV HAIN * A Y ALAMUTI AND M R

More information

Statics and dynamics: some elementary concepts

Statics and dynamics: some elementary concepts 1 Statics and dynamics: some elementary concets Dynamics is the study of the movement through time of variables such as heartbeat, temerature, secies oulation, voltage, roduction, emloyment, rices and

More information

Named Entity Recognition using Maximum Entropy Model SEEM5680

Named Entity Recognition using Maximum Entropy Model SEEM5680 Named Entity Recognition using Maximum Entroy Model SEEM5680 Named Entity Recognition System Named Entity Recognition (NER): Identifying certain hrases/word sequences in a free text. Generally it involves

More information

Robust Solutions to Markov Decision Problems

Robust Solutions to Markov Decision Problems Robust Solutions to Markov Decision Problems Arnab Nilim and Laurent El Ghaoui Deartment of Electrical Engineering and Comuter Sciences University of California, Berkeley, CA 94720 nilim@eecs.berkeley.edu,

More information

General Linear Model Introduction, Classes of Linear models and Estimation

General Linear Model Introduction, Classes of Linear models and Estimation Stat 740 General Linear Model Introduction, Classes of Linear models and Estimation An aim of scientific enquiry: To describe or to discover relationshis among events (variables) in the controlled (laboratory)

More information

Probability Estimates for Multi-class Classification by Pairwise Coupling

Probability Estimates for Multi-class Classification by Pairwise Coupling Probability Estimates for Multi-class Classification by Pairwise Couling Ting-Fan Wu Chih-Jen Lin Deartment of Comuter Science National Taiwan University Taiei 06, Taiwan Ruby C. Weng Deartment of Statistics

More information

ON THE LEAST SIGNIFICANT p ADIC DIGITS OF CERTAIN LUCAS NUMBERS

ON THE LEAST SIGNIFICANT p ADIC DIGITS OF CERTAIN LUCAS NUMBERS #A13 INTEGERS 14 (014) ON THE LEAST SIGNIFICANT ADIC DIGITS OF CERTAIN LUCAS NUMBERS Tamás Lengyel Deartment of Mathematics, Occidental College, Los Angeles, California lengyel@oxy.edu Received: 6/13/13,

More information

Universal Finite Memory Coding of Binary Sequences

Universal Finite Memory Coding of Binary Sequences Deartment of Electrical Engineering Systems Universal Finite Memory Coding of Binary Sequences Thesis submitted towards the degree of Master of Science in Electrical and Electronic Engineering in Tel-Aviv

More information

An Analysis of Reliable Classifiers through ROC Isometrics

An Analysis of Reliable Classifiers through ROC Isometrics An Analysis of Reliable Classifiers through ROC Isometrics Stijn Vanderlooy s.vanderlooy@cs.unimaas.nl Ida G. Srinkhuizen-Kuyer kuyer@cs.unimaas.nl Evgueni N. Smirnov smirnov@cs.unimaas.nl MICC-IKAT, Universiteit

More information

MATH 2710: NOTES FOR ANALYSIS

MATH 2710: NOTES FOR ANALYSIS MATH 270: NOTES FOR ANALYSIS The main ideas we will learn from analysis center around the idea of a limit. Limits occurs in several settings. We will start with finite limits of sequences, then cover infinite

More information

Improved Capacity Bounds for the Binary Energy Harvesting Channel

Improved Capacity Bounds for the Binary Energy Harvesting Channel Imroved Caacity Bounds for the Binary Energy Harvesting Channel Kaya Tutuncuoglu 1, Omur Ozel 2, Aylin Yener 1, and Sennur Ulukus 2 1 Deartment of Electrical Engineering, The Pennsylvania State University,

More information

STA 250: Statistics. Notes 7. Bayesian Approach to Statistics. Book chapters: 7.2

STA 250: Statistics. Notes 7. Bayesian Approach to Statistics. Book chapters: 7.2 STA 25: Statistics Notes 7. Bayesian Aroach to Statistics Book chaters: 7.2 1 From calibrating a rocedure to quantifying uncertainty We saw that the central idea of classical testing is to rovide a rigorous

More information

arxiv: v1 [cs.ro] 24 May 2017

arxiv: v1 [cs.ro] 24 May 2017 A Near-Otimal Searation Princile for Nonlinear Stochastic Systems Arising in Robotic Path Planning and Control Mohammadhussein Rafieisakhaei 1, Suman Chakravorty 2 and P. R. Kumar 1 arxiv:1705.08566v1

More information

Learning Sequence Motif Models Using Gibbs Sampling

Learning Sequence Motif Models Using Gibbs Sampling Learning Sequence Motif Models Using Gibbs Samling BMI/CS 776 www.biostat.wisc.edu/bmi776/ Sring 2018 Anthony Gitter gitter@biostat.wisc.edu These slides excluding third-arty material are licensed under

More information

Sampling and Distortion Tradeoffs for Bandlimited Periodic Signals

Sampling and Distortion Tradeoffs for Bandlimited Periodic Signals Samling and Distortion radeoffs for Bandlimited Periodic Signals Elaheh ohammadi and Farokh arvasti Advanced Communications Research Institute ACRI Deartment of Electrical Engineering Sharif University

More information

On Wald-Type Optimal Stopping for Brownian Motion

On Wald-Type Optimal Stopping for Brownian Motion J Al Probab Vol 34, No 1, 1997, (66-73) Prerint Ser No 1, 1994, Math Inst Aarhus On Wald-Tye Otimal Stoing for Brownian Motion S RAVRSN and PSKIR The solution is resented to all otimal stoing roblems of

More information

Analysis of some entrance probabilities for killed birth-death processes

Analysis of some entrance probabilities for killed birth-death processes Analysis of some entrance robabilities for killed birth-death rocesses Master s Thesis O.J.G. van der Velde Suervisor: Dr. F.M. Sieksma July 5, 207 Mathematical Institute, Leiden University Contents Introduction

More information

Radial Basis Function Networks: Algorithms

Radial Basis Function Networks: Algorithms Radial Basis Function Networks: Algorithms Introduction to Neural Networks : Lecture 13 John A. Bullinaria, 2004 1. The RBF Maing 2. The RBF Network Architecture 3. Comutational Power of RBF Networks 4.

More information

Cryptanalysis of Pseudorandom Generators

Cryptanalysis of Pseudorandom Generators CSE 206A: Lattice Algorithms and Alications Fall 2017 Crytanalysis of Pseudorandom Generators Instructor: Daniele Micciancio UCSD CSE As a motivating alication for the study of lattice in crytograhy we

More information

1 Extremum Estimators

1 Extremum Estimators FINC 9311-21 Financial Econometrics Handout Jialin Yu 1 Extremum Estimators Let θ 0 be a vector of k 1 unknown arameters. Extremum estimators: estimators obtained by maximizing or minimizing some objective

More information

arxiv: v1 [physics.data-an] 26 Oct 2012

arxiv: v1 [physics.data-an] 26 Oct 2012 Constraints on Yield Parameters in Extended Maximum Likelihood Fits Till Moritz Karbach a, Maximilian Schlu b a TU Dortmund, Germany, moritz.karbach@cern.ch b TU Dortmund, Germany, maximilian.schlu@cern.ch

More information

Approximate Dynamic Programming for Dynamic Capacity Allocation with Multiple Priority Levels

Approximate Dynamic Programming for Dynamic Capacity Allocation with Multiple Priority Levels Aroximate Dynamic Programming for Dynamic Caacity Allocation with Multile Priority Levels Alexander Erdelyi School of Oerations Research and Information Engineering, Cornell University, Ithaca, NY 14853,

More information

CMSC 425: Lecture 4 Geometry and Geometric Programming

CMSC 425: Lecture 4 Geometry and Geometric Programming CMSC 425: Lecture 4 Geometry and Geometric Programming Geometry for Game Programming and Grahics: For the next few lectures, we will discuss some of the basic elements of geometry. There are many areas

More information

Best approximation by linear combinations of characteristic functions of half-spaces

Best approximation by linear combinations of characteristic functions of half-spaces Best aroximation by linear combinations of characteristic functions of half-saces Paul C. Kainen Deartment of Mathematics Georgetown University Washington, D.C. 20057-1233, USA Věra Kůrková Institute of

More information

Elements of Asymptotic Theory. James L. Powell Department of Economics University of California, Berkeley

Elements of Asymptotic Theory. James L. Powell Department of Economics University of California, Berkeley Elements of Asymtotic Theory James L. Powell Deartment of Economics University of California, Berkeley Objectives of Asymtotic Theory While exact results are available for, say, the distribution of the

More information

State Estimation with ARMarkov Models

State Estimation with ARMarkov Models Deartment of Mechanical and Aerosace Engineering Technical Reort No. 3046, October 1998. Princeton University, Princeton, NJ. State Estimation with ARMarkov Models Ryoung K. Lim 1 Columbia University,

More information

Optimal Learning Policies for the Newsvendor Problem with Censored Demand and Unobservable Lost Sales

Optimal Learning Policies for the Newsvendor Problem with Censored Demand and Unobservable Lost Sales Otimal Learning Policies for the Newsvendor Problem with Censored Demand and Unobservable Lost Sales Diana Negoescu Peter Frazier Warren Powell Abstract In this aer, we consider a version of the newsvendor

More information

AI*IA 2003 Fusion of Multiple Pattern Classifiers PART III

AI*IA 2003 Fusion of Multiple Pattern Classifiers PART III AI*IA 23 Fusion of Multile Pattern Classifiers PART III AI*IA 23 Tutorial on Fusion of Multile Pattern Classifiers by F. Roli 49 Methods for fusing multile classifiers Methods for fusing multile classifiers

More information

Positive Definite Uncertain Homogeneous Matrix Polynomials: Analysis and Application

Positive Definite Uncertain Homogeneous Matrix Polynomials: Analysis and Application BULGARIA ACADEMY OF SCIECES CYBEREICS AD IFORMAIO ECHOLOGIES Volume 9 o 3 Sofia 009 Positive Definite Uncertain Homogeneous Matrix Polynomials: Analysis and Alication Svetoslav Savov Institute of Information

More information

Some results of convex programming complexity

Some results of convex programming complexity 2012c12 $ Ê Æ Æ 116ò 14Ï Dec., 2012 Oerations Research Transactions Vol.16 No.4 Some results of convex rogramming comlexity LOU Ye 1,2 GAO Yuetian 1 Abstract Recently a number of aers were written that

More information

Sums of independent random variables

Sums of independent random variables 3 Sums of indeendent random variables This lecture collects a number of estimates for sums of indeendent random variables with values in a Banach sace E. We concentrate on sums of the form N γ nx n, where

More information

Uncorrelated Multilinear Principal Component Analysis for Unsupervised Multilinear Subspace Learning

Uncorrelated Multilinear Principal Component Analysis for Unsupervised Multilinear Subspace Learning TNN-2009-P-1186.R2 1 Uncorrelated Multilinear Princial Comonent Analysis for Unsuervised Multilinear Subsace Learning Haiing Lu, K. N. Plataniotis and A. N. Venetsanooulos The Edward S. Rogers Sr. Deartment

More information

Recursive Estimation of the Preisach Density function for a Smart Actuator

Recursive Estimation of the Preisach Density function for a Smart Actuator Recursive Estimation of the Preisach Density function for a Smart Actuator Ram V. Iyer Deartment of Mathematics and Statistics, Texas Tech University, Lubbock, TX 7949-142. ABSTRACT The Preisach oerator

More information

IMPROVED BOUNDS IN THE SCALED ENFLO TYPE INEQUALITY FOR BANACH SPACES

IMPROVED BOUNDS IN THE SCALED ENFLO TYPE INEQUALITY FOR BANACH SPACES IMPROVED BOUNDS IN THE SCALED ENFLO TYPE INEQUALITY FOR BANACH SPACES OHAD GILADI AND ASSAF NAOR Abstract. It is shown that if (, ) is a Banach sace with Rademacher tye 1 then for every n N there exists

More information

p-adic Measures and Bernoulli Numbers

p-adic Measures and Bernoulli Numbers -Adic Measures and Bernoulli Numbers Adam Bowers Introduction The constants B k in the Taylor series exansion t e t = t k B k k! k=0 are known as the Bernoulli numbers. The first few are,, 6, 0, 30, 0,

More information

Uniformly best wavenumber approximations by spatial central difference operators: An initial investigation

Uniformly best wavenumber approximations by spatial central difference operators: An initial investigation Uniformly best wavenumber aroximations by satial central difference oerators: An initial investigation Vitor Linders and Jan Nordström Abstract A characterisation theorem for best uniform wavenumber aroximations

More information

Towards understanding the Lorenz curve using the Uniform distribution. Chris J. Stephens. Newcastle City Council, Newcastle upon Tyne, UK

Towards understanding the Lorenz curve using the Uniform distribution. Chris J. Stephens. Newcastle City Council, Newcastle upon Tyne, UK Towards understanding the Lorenz curve using the Uniform distribution Chris J. Stehens Newcastle City Council, Newcastle uon Tyne, UK (For the Gini-Lorenz Conference, University of Siena, Italy, May 2005)

More information

Principal Components Analysis and Unsupervised Hebbian Learning

Principal Components Analysis and Unsupervised Hebbian Learning Princial Comonents Analysis and Unsuervised Hebbian Learning Robert Jacobs Deartment of Brain & Cognitive Sciences University of Rochester Rochester, NY 1467, USA August 8, 008 Reference: Much of the material

More information

STABILITY ANALYSIS AND CONTROL OF STOCHASTIC DYNAMIC SYSTEMS USING POLYNOMIAL CHAOS. A Dissertation JAMES ROBERT FISHER

STABILITY ANALYSIS AND CONTROL OF STOCHASTIC DYNAMIC SYSTEMS USING POLYNOMIAL CHAOS. A Dissertation JAMES ROBERT FISHER STABILITY ANALYSIS AND CONTROL OF STOCHASTIC DYNAMIC SYSTEMS USING POLYNOMIAL CHAOS A Dissertation by JAMES ROBERT FISHER Submitted to the Office of Graduate Studies of Texas A&M University in artial fulfillment

More information

MATHEMATICAL MODELLING OF THE WIRELESS COMMUNICATION NETWORK

MATHEMATICAL MODELLING OF THE WIRELESS COMMUNICATION NETWORK Comuter Modelling and ew Technologies, 5, Vol.9, o., 3-39 Transort and Telecommunication Institute, Lomonosov, LV-9, Riga, Latvia MATHEMATICAL MODELLIG OF THE WIRELESS COMMUICATIO ETWORK M. KOPEETSK Deartment

More information

On split sample and randomized confidence intervals for binomial proportions

On split sample and randomized confidence intervals for binomial proportions On slit samle and randomized confidence intervals for binomial roortions Måns Thulin Deartment of Mathematics, Usala University arxiv:1402.6536v1 [stat.me] 26 Feb 2014 Abstract Slit samle methods have

More information

Elementary theory of L p spaces

Elementary theory of L p spaces CHAPTER 3 Elementary theory of L saces 3.1 Convexity. Jensen, Hölder, Minkowski inequality. We begin with two definitions. A set A R d is said to be convex if, for any x 0, x 1 2 A x = x 0 + (x 1 x 0 )

More information

A Bound on the Error of Cross Validation Using the Approximation and Estimation Rates, with Consequences for the Training-Test Split

A Bound on the Error of Cross Validation Using the Approximation and Estimation Rates, with Consequences for the Training-Test Split A Bound on the Error of Cross Validation Using the Aroximation and Estimation Rates, with Consequences for the Training-Test Slit Michael Kearns AT&T Bell Laboratories Murray Hill, NJ 7974 mkearns@research.att.com

More information

Research Article An iterative Algorithm for Hemicontractive Mappings in Banach Spaces

Research Article An iterative Algorithm for Hemicontractive Mappings in Banach Spaces Abstract and Alied Analysis Volume 2012, Article ID 264103, 11 ages doi:10.1155/2012/264103 Research Article An iterative Algorithm for Hemicontractive Maings in Banach Saces Youli Yu, 1 Zhitao Wu, 2 and

More information

Adaptive estimation with change detection for streaming data

Adaptive estimation with change detection for streaming data Adative estimation with change detection for streaming data A thesis resented for the degree of Doctor of Philosohy of the University of London and the Diloma of Imerial College by Dean Adam Bodenham Deartment

More information

Lecture 6. 2 Recurrence/transience, harmonic functions and martingales

Lecture 6. 2 Recurrence/transience, harmonic functions and martingales Lecture 6 Classification of states We have shown that all states of an irreducible countable state Markov chain must of the same tye. This gives rise to the following classification. Definition. [Classification

More information

Estimating Time-Series Models

Estimating Time-Series Models Estimating ime-series Models he Box-Jenkins methodology for tting a model to a scalar time series fx t g consists of ve stes:. Decide on the order of di erencing d that is needed to roduce a stationary

More information

Estimating function analysis for a class of Tweedie regression models

Estimating function analysis for a class of Tweedie regression models Title Estimating function analysis for a class of Tweedie regression models Author Wagner Hugo Bonat Deartamento de Estatística - DEST, Laboratório de Estatística e Geoinformação - LEG, Universidade Federal

More information

Quantitative estimates of propagation of chaos for stochastic systems with W 1, kernels

Quantitative estimates of propagation of chaos for stochastic systems with W 1, kernels oname manuscrit o. will be inserted by the editor) Quantitative estimates of roagation of chaos for stochastic systems with W, kernels Pierre-Emmanuel Jabin Zhenfu Wang Received: date / Acceted: date Abstract

More information

A Social Welfare Optimal Sequential Allocation Procedure

A Social Welfare Optimal Sequential Allocation Procedure A Social Welfare Otimal Sequential Allocation Procedure Thomas Kalinowsi Universität Rostoc, Germany Nina Narodytsa and Toby Walsh NICTA and UNSW, Australia May 2, 201 Abstract We consider a simle sequential

More information

The non-stochastic multi-armed bandit problem

The non-stochastic multi-armed bandit problem Submitted for journal ublication. The non-stochastic multi-armed bandit roblem Peter Auer Institute for Theoretical Comuter Science Graz University of Technology A-8010 Graz (Austria) auer@igi.tu-graz.ac.at

More information

CSE 599d - Quantum Computing When Quantum Computers Fall Apart

CSE 599d - Quantum Computing When Quantum Computers Fall Apart CSE 599d - Quantum Comuting When Quantum Comuters Fall Aart Dave Bacon Deartment of Comuter Science & Engineering, University of Washington In this lecture we are going to begin discussing what haens to

More information

Chapter 7: Special Distributions

Chapter 7: Special Distributions This chater first resents some imortant distributions, and then develos the largesamle distribution theory which is crucial in estimation and statistical inference Discrete distributions The Bernoulli

More information

Paper C Exact Volume Balance Versus Exact Mass Balance in Compositional Reservoir Simulation

Paper C Exact Volume Balance Versus Exact Mass Balance in Compositional Reservoir Simulation Paer C Exact Volume Balance Versus Exact Mass Balance in Comositional Reservoir Simulation Submitted to Comutational Geosciences, December 2005. Exact Volume Balance Versus Exact Mass Balance in Comositional

More information

On Isoperimetric Functions of Probability Measures Having Log-Concave Densities with Respect to the Standard Normal Law

On Isoperimetric Functions of Probability Measures Having Log-Concave Densities with Respect to the Standard Normal Law On Isoerimetric Functions of Probability Measures Having Log-Concave Densities with Resect to the Standard Normal Law Sergey G. Bobkov Abstract Isoerimetric inequalities are discussed for one-dimensional

More information

DETC2003/DAC AN EFFICIENT ALGORITHM FOR CONSTRUCTING OPTIMAL DESIGN OF COMPUTER EXPERIMENTS

DETC2003/DAC AN EFFICIENT ALGORITHM FOR CONSTRUCTING OPTIMAL DESIGN OF COMPUTER EXPERIMENTS Proceedings of DETC 03 ASME 003 Design Engineering Technical Conferences and Comuters and Information in Engineering Conference Chicago, Illinois USA, Setember -6, 003 DETC003/DAC-48760 AN EFFICIENT ALGORITHM

More information

Formal Modeling in Cognitive Science Lecture 29: Noisy Channel Model and Applications;

Formal Modeling in Cognitive Science Lecture 29: Noisy Channel Model and Applications; Formal Modeling in Cognitive Science Lecture 9: and ; ; Frank Keller School of Informatics University of Edinburgh keller@inf.ed.ac.uk Proerties of 3 March, 6 Frank Keller Formal Modeling in Cognitive

More information

CHAPTER-II Control Charts for Fraction Nonconforming using m-of-m Runs Rules

CHAPTER-II Control Charts for Fraction Nonconforming using m-of-m Runs Rules CHAPTER-II Control Charts for Fraction Nonconforming using m-of-m Runs Rules. Introduction: The is widely used in industry to monitor the number of fraction nonconforming units. A nonconforming unit is

More information

Yixi Shi. Jose Blanchet. IEOR Department Columbia University New York, NY 10027, USA. IEOR Department Columbia University New York, NY 10027, USA

Yixi Shi. Jose Blanchet. IEOR Department Columbia University New York, NY 10027, USA. IEOR Department Columbia University New York, NY 10027, USA Proceedings of the 2011 Winter Simulation Conference S. Jain, R. R. Creasey, J. Himmelsach, K. P. White, and M. Fu, eds. EFFICIENT RARE EVENT SIMULATION FOR HEAVY-TAILED SYSTEMS VIA CROSS ENTROPY Jose

More information

Hotelling s Two- Sample T 2

Hotelling s Two- Sample T 2 Chater 600 Hotelling s Two- Samle T Introduction This module calculates ower for the Hotelling s two-grou, T-squared (T) test statistic. Hotelling s T is an extension of the univariate two-samle t-test

More information

An Estimate For Heilbronn s Exponential Sum

An Estimate For Heilbronn s Exponential Sum An Estimate For Heilbronn s Exonential Sum D.R. Heath-Brown Magdalen College, Oxford For Heini Halberstam, on his retirement Let be a rime, and set e(x) = ex(2πix). Heilbronn s exonential sum is defined

More information

Location of solutions for quasi-linear elliptic equations with general gradient dependence

Location of solutions for quasi-linear elliptic equations with general gradient dependence Electronic Journal of Qualitative Theory of Differential Equations 217, No. 87, 1 1; htts://doi.org/1.14232/ejqtde.217.1.87 www.math.u-szeged.hu/ejqtde/ Location of solutions for quasi-linear ellitic equations

More information

Coding Along Hermite Polynomials for Gaussian Noise Channels

Coding Along Hermite Polynomials for Gaussian Noise Channels Coding Along Hermite olynomials for Gaussian Noise Channels Emmanuel A. Abbe IG, EFL Lausanne, 1015 CH Email: emmanuel.abbe@efl.ch Lizhong Zheng LIDS, MIT Cambridge, MA 0139 Email: lizhong@mit.edu Abstract

More information

Improved Bounds on Bell Numbers and on Moments of Sums of Random Variables

Improved Bounds on Bell Numbers and on Moments of Sums of Random Variables Imroved Bounds on Bell Numbers and on Moments of Sums of Random Variables Daniel Berend Tamir Tassa Abstract We rovide bounds for moments of sums of sequences of indeendent random variables. Concentrating

More information

CONVOLVED SUBSAMPLING ESTIMATION WITH APPLICATIONS TO BLOCK BOOTSTRAP

CONVOLVED SUBSAMPLING ESTIMATION WITH APPLICATIONS TO BLOCK BOOTSTRAP Submitted to the Annals of Statistics arxiv: arxiv:1706.07237 CONVOLVED SUBSAMPLING ESTIMATION WITH APPLICATIONS TO BLOCK BOOTSTRAP By Johannes Tewes, Dimitris N. Politis and Daniel J. Nordman Ruhr-Universität

More information

Online Appendix to Accompany AComparisonof Traditional and Open-Access Appointment Scheduling Policies

Online Appendix to Accompany AComparisonof Traditional and Open-Access Appointment Scheduling Policies Online Aendix to Accomany AComarisonof Traditional and Oen-Access Aointment Scheduling Policies Lawrence W. Robinson Johnson Graduate School of Management Cornell University Ithaca, NY 14853-6201 lwr2@cornell.edu

More information

Solution sheet ξi ξ < ξ i+1 0 otherwise ξ ξ i N i,p 1 (ξ) + where 0 0

Solution sheet ξi ξ < ξ i+1 0 otherwise ξ ξ i N i,p 1 (ξ) + where 0 0 Advanced Finite Elements MA5337 - WS7/8 Solution sheet This exercise sheets deals with B-slines and NURBS, which are the basis of isogeometric analysis as they will later relace the olynomial ansatz-functions

More information

Generalized Coiflets: A New Family of Orthonormal Wavelets

Generalized Coiflets: A New Family of Orthonormal Wavelets Generalized Coiflets A New Family of Orthonormal Wavelets Dong Wei, Alan C Bovik, and Brian L Evans Laboratory for Image and Video Engineering Deartment of Electrical and Comuter Engineering The University

More information

1 Gambler s Ruin Problem

1 Gambler s Ruin Problem Coyright c 2017 by Karl Sigman 1 Gambler s Ruin Problem Let N 2 be an integer and let 1 i N 1. Consider a gambler who starts with an initial fortune of $i and then on each successive gamble either wins

More information

The analysis and representation of random signals

The analysis and representation of random signals The analysis and reresentation of random signals Bruno TOÉSNI Bruno.Torresani@cmi.univ-mrs.fr B. Torrésani LTP Université de Provence.1/30 Outline 1. andom signals Introduction The Karhunen-Loève Basis

More information

Finite Mixture EFA in Mplus

Finite Mixture EFA in Mplus Finite Mixture EFA in Mlus November 16, 2007 In this document we describe the Mixture EFA model estimated in Mlus. Four tyes of deendent variables are ossible in this model: normally distributed, ordered

More information

Optimal Random Access and Random Spectrum Sensing for an Energy Harvesting Cognitive Radio with and without Primary Feedback Leveraging

Optimal Random Access and Random Spectrum Sensing for an Energy Harvesting Cognitive Radio with and without Primary Feedback Leveraging 1 Otimal Random Access and Random Sectrum Sensing for an Energy Harvesting Cognitive Radio with and without Primary Feedback Leveraging Ahmed El Shafie, Member, IEEE, arxiv:1401.0340v3 [cs.it] 27 Ar 2014

More information

On the Chvatál-Complexity of Knapsack Problems

On the Chvatál-Complexity of Knapsack Problems R u t c o r Research R e o r t On the Chvatál-Comlexity of Knasack Problems Gergely Kovács a Béla Vizvári b RRR 5-08, October 008 RUTCOR Rutgers Center for Oerations Research Rutgers University 640 Bartholomew

More information

Age of Information: Whittle Index for Scheduling Stochastic Arrivals

Age of Information: Whittle Index for Scheduling Stochastic Arrivals Age of Information: Whittle Index for Scheduling Stochastic Arrivals Yu-Pin Hsu Deartment of Communication Engineering National Taiei University yuinhsu@mail.ntu.edu.tw arxiv:80.03422v2 [math.oc] 7 Ar

More information

Inference for Empirical Wasserstein Distances on Finite Spaces: Supplementary Material

Inference for Empirical Wasserstein Distances on Finite Spaces: Supplementary Material Inference for Emirical Wasserstein Distances on Finite Saces: Sulementary Material Max Sommerfeld Axel Munk Keywords: otimal transort, Wasserstein distance, central limit theorem, directional Hadamard

More information

Finding a sparse vector in a subspace: linear sparsity using alternating directions

Finding a sparse vector in a subspace: linear sparsity using alternating directions IEEE TRANSACTION ON INFORMATION THEORY VOL XX NO XX 06 Finding a sarse vector in a subsace: linear sarsity using alternating directions Qing Qu Student Member IEEE Ju Sun Student Member IEEE and John Wright

More information

AKRON: An Algorithm for Approximating Sparse Kernel Reconstruction

AKRON: An Algorithm for Approximating Sparse Kernel Reconstruction : An Algorithm for Aroximating Sarse Kernel Reconstruction Gregory Ditzler Det. of Electrical and Comuter Engineering The University of Arizona Tucson, AZ 8572 USA ditzler@email.arizona.edu Nidhal Carla

More information

Mollifiers and its applications in L p (Ω) space

Mollifiers and its applications in L p (Ω) space Mollifiers and its alications in L () sace MA Shiqi Deartment of Mathematics, Hong Kong Batist University November 19, 2016 Abstract This note gives definition of mollifier and mollification. We illustrate

More information

Bayesian Networks Practice

Bayesian Networks Practice Bayesian Networks Practice Part 2 2016-03-17 Byoung-Hee Kim, Seong-Ho Son Biointelligence Lab, CSE, Seoul National University Agenda Probabilistic Inference in Bayesian networks Probability basics D-searation

More information

1 Probability Spaces and Random Variables

1 Probability Spaces and Random Variables 1 Probability Saces and Random Variables 1.1 Probability saces Ω: samle sace consisting of elementary events (or samle oints). F : the set of events P: robability 1.2 Kolmogorov s axioms Definition 1.2.1

More information

Power Aware Wireless File Downloading: A Constrained Restless Bandit Approach

Power Aware Wireless File Downloading: A Constrained Restless Bandit Approach PROC. WIOP 204 Power Aware Wireless File Downloading: A Constrained Restless Bandit Aroach Xiaohan Wei and Michael J. Neely, Senior Member, IEEE Abstract his aer treats ower-aware throughut maximization

More information

MODELING THE RELIABILITY OF C4ISR SYSTEMS HARDWARE/SOFTWARE COMPONENTS USING AN IMPROVED MARKOV MODEL

MODELING THE RELIABILITY OF C4ISR SYSTEMS HARDWARE/SOFTWARE COMPONENTS USING AN IMPROVED MARKOV MODEL Technical Sciences and Alied Mathematics MODELING THE RELIABILITY OF CISR SYSTEMS HARDWARE/SOFTWARE COMPONENTS USING AN IMPROVED MARKOV MODEL Cezar VASILESCU Regional Deartment of Defense Resources Management

More information

Bayesian inference & Markov chain Monte Carlo. Note 1: Many slides for this lecture were kindly provided by Paul Lewis and Mark Holder

Bayesian inference & Markov chain Monte Carlo. Note 1: Many slides for this lecture were kindly provided by Paul Lewis and Mark Holder Bayesian inference & Markov chain Monte Carlo Note 1: Many slides for this lecture were kindly rovided by Paul Lewis and Mark Holder Note 2: Paul Lewis has written nice software for demonstrating Markov

More information