Robust Solutions to Markov Decision Problems

Size: px
Start display at page:

Download "Robust Solutions to Markov Decision Problems"

Transcription

1 Robust Solutions to Markov Decision Problems Arnab Nilim and Laurent El Ghaoui Deartment of Electrical Engineering and Comuter Sciences University of California, Berkeley, CA Abstract Otimal solutions to Markov Decision Problems (MDPs) are very sensitive with resect to the state transition robabilities. In many ractical roblems, the estimation of those robabilities is far from accurate. Hence, estimation errors are limiting factors in alying MDPs to real-world roblems. We roose an algorithm for solving nite-state and nite-action MDPs, where the solution is guaranteed to be robust with resect to estimation errors on the state transition robabilities. Our algorithm involves a statistically accurate yet numerically ef cient reresentation of uncertainty via likelihood functions. The worst-case comlexity of the robust algorithm is the same as the original Bellman recursion. Hence, robustness can be added at ractically no extra comuting cost. Introduction We consider a nite-state and nite-action Markov decision roblem in which the transition robabilities themselves are uncertain, and seek a robust decision for it. Our work is motivated by the fact that in many ractical roblems, the transition matrices have to be estimated from data. This may be a dif cult task and the estimation errors may have a huge imact on the solution, which is often quite sensitive to changes in the transition robabilities (Feinberg & Shwartz 2002). A number of authors have addressed the issue of uncertainty in the transition matrices of an MDP. A Bayesian aroach such as described by (Shairo & Kleywegt 2002) requires a erfect knowledge of the whole rior distribution on the transition matrix, making it dif cult to aly in ractice. Other authors have considered the transition matrix to lie in a given set, most tyically a olytoe: see (Satia & Lave 1973; White & Eldeib 1994; Givan, Leach, & Dean 1997). Although our aroach allows to describe the uncertainty on the transition matrix by a olytoe, we may argue against choosing such a model for the uncertainty. First, a general olytoe is often not a tractable way to address the robustness roblem, as it incurs a signi cant additional comutational effort to handle uncertainty. Perhas more imortantly, olytoic models, esecially interval matrices, may be very oor reresentations of statistical uncertainty and Coyright c 2004, American Association for Arti cial Intelligence ( All rights reserved. lead to very conservative robust olicies. In (Bagnell, Ng, & Schneider 2001), the authors consider a roblem dual to ours, and rovide a general statement according to which the cost of solving their roblem is olynomial in roblem size, rovided the uncertainty on the transition matrices is described by convex sets, without roosing any seci c algorithm. This aer is a short version of a longer reort (Nilim & El-Ghaoui 2004), which contains all the roofs of the results summarized here. Notation. P > 0 or P 0 refers to the strict or nonstrict comonentwise inequality for matrices or vectors. For a vector > 0, log refers to the comonentwise oeration. The notation 1 refers to the vector of ones, with size determined from context. The robability simlex in R n is denoted n = { R n + : T 1 = 1}, while Θ n is the set of n n transition matrices (comonentwise non-negative matrices with rows summing to one). We use σ P to denote the suort function of a set P R n, with for v R n, σ P (v) := su{ T v : P}. The roblem descrition We consider a nite horizon Markov decision rocess with nite decision horizon T = {0, 1, 2,..., N 1}. At each stage, the system occuies a state i X, where n = X is nite, and a decision maker is allowed to choose an action a deterministically from a nite set of allowable actions A = {a 1,..., a m } (for notational simlicity we assume that A is not state-deendent). The system starts in a given initial state i 0. The states make Markov transitions according to a collection of (ossibly time-deendent) transition matrices τ := (Pt a ),t T, where for every a A, t T, the n n transition matrix Pt a contains the robabilities of transition under action a at stage t. We denote by π = (a 0,..., a N 1 ) a generic controller olicy, where a t (i) denotes the controller action when the system is in state i X at time t T. Let Π = A nn be the corresonding strategy sace. De ne by c t (i, a) the cost corresonding to state i X and action a A at time t T, and by c N the cost function at the terminal stage. We assume that c t (i, a) is non-negative and nite for every i X and a A. For a given set of transition matrices τ, we de ne the

2 nite-horizon nominal roblem by φ N (Π, τ) := min C N (π, τ), (1) π Π where C N (π, τ) denotes the exected total cost under controller olicy π and transitions τ: ( N 1 ) C N (π, τ) := E c t (i t, a t (i)) + c N (i N ). (2) t=0 A secial case of interest is when the exected total cost function bears the form (2), where the terminal cost is zero, and c t (i, a) = ν t c(i, a), with c(i, a) now a constant cost function, which we assume non-negative and nite everywhere, and ν (0, 1) is a discount factor. We refer to this cost function as the discounted cost function, and denote by C (π, τ) the limit of the discounted cost (2) as N. When the transition matrices are exactly known, the corresonding nominal roblem can be solved via a dynamic rogramming algorithm, which has total comlexity of mn 2 N os in the nite-horizon case. In the in nitehorizon case with a discounted cost function, the cost of comuting an ɛ-subotimal olicy via the Bellman recursion is O(mn 2 log(1/ɛ)); see (Puterman 1994) for more details. The robust control roblems At rst we assume that when for each action a and time t, the corresonding transition matrix Pt a is only known to lie in some given subset P a. Two models for transition matrix uncertainty are ossible, leading to two ossible forms of nite-horizon robust control roblems. In a rst model, referred to as the stationary uncertainty model, the transition matrices are chosen by nature deending on the controller olicy once and for all, and remain xed thereafter. In a second model, which we refer to as the time-varying uncertainty model, the transition matrices can vary arbitrarily with time, within their rescribed bounds. Each roblem leads to a game between the controller and nature, where the controller seeks to minimize the maximum exected cost, with nature being the maximizing layer. Let us de ne our two roblems more formally. A olicy of nature refers to a seci c collection of time-deendent transition matrices τ = (Pt a ),t T chosen by nature, and the set of admissible olicies of nature is T := ( P a ) N. Denote by T s the set of stationary admissible olicies of nature: T s = {τ = (Pt a ),t T T : Pt a = Ps a for every t, s T, a A}. The stationary uncertainty model leads to the roblem φ N (Π, T s ) := min max C N (π, τ). (3) π Π τ T s In contrast, the time-varying uncertainty model leads to a relaxed version of the above: φ N (Π, T s ) φ N (Π, T ) := min max C N (π, τ). (4) π Π τ T The rst model is attractive for statistical reasons, as it is much easier to develo statistically accurate sets of con dence when the underlying rocess is time-invariant. Unfortunately, the resulting game (3) seems to be hard to solve because rincile of otimality may not hold in these roblems due to the deendence of otimal actions of nature among different stages. The second model is attractive as one can solve the corresonding game (4) using a variant of the dynamic rogramming algorithm seen later, but we are left with a dif cult task, that of estimating a meaningful set of con dence for the time-varying matrices P a t. In this aer we will use the rst model of uncertainty in order to derive statistically meaningful sets of con dence for the transition matrices, based on likelihood or entroy bounds. Then, instead of solving the corresonding dif cult control roblem (3), we use an aroximation that is common in robust control, and solve the time-varying uer bound (4), using the uncertainty sets P a derived from a stationarity assumtion about the transition matrices. We will also consider a variant of the nite-horizon time-varying roblem (4), where controller and nature lay alternatively, leading to a reeated game (Π, Q) := min max min max... min max C N (π, τ), a 0 τ 0 Q a 1 τ 1 Q a N 1 τ N 1 Q (5) where the notation τ t = (Pt a ) denotes the collection of transition matrices at a given time t T, and Q := P a is the corresonding set of con dence. Finally, we will consider an in nite-horizon robust control roblem, with the discounted cost function referred to above, and where we restrict control and nature olicies to be stationary: φ re N φ (Π s, T s ) := min max C (π, τ), (6) π Π s τ T s where Π s denotes the sace of stationary control olicies. We de ne φ (Π, T ), φ (Π, T s ) and φ (Π s, T ) accordingly. In the sequel, for a given control olicy π Π and subset S T, the notation φ N (π, S) := max τ S C N (π, τ) denotes the worst-case exected total cost for the nite-horizon roblem, and φ (π, S) is de ned likewise. Main results Our main contributions are as follows. First we rovide a recursion, the robust dynamic rogramming algorithm, which solves the nite-horizon robust control roblem (4). We rovide a simle roof in (Nilim & El-Ghaoui 2004) of the otimality of the recursion, where the main ingredient is to show that erfect duality holds in the game (4). As a corollary of this result, we obtain that the reeated game (5) is equivalent to its non-reeated counterart (4). Second, we rovide similar results for the in nite-horizon roblem with discounted cost function, (6). Moreover, we obtain that if we consider a nite-horizon roblem with a discounted cost function, then the ga between the otimal value of the stationary uncertainty roblem (3) and that of its time-varying counterart (4) goes to zero as the horizon length goes to in- nity, at a rate determined by the discount factor. Finally, we identify several classes of uncertainty models, which result in an algorithm that is both statistically accurate and numerically tractable. We rovide recise comlexity results that imly that, with the roosed aroach, robustness can be handled at ractically no extra comuting cost.

3 Finite-Horizon robust MDP We consider the nite-horizon robust control roblem de- ned in section. For a given state i X, action a A, and P a P a, we denote by a i the next-state distribution drawn from P a corresonding to state i X ; thus a i is the i-th row of matrix P a. We de ne Pi a as the roection of the set P a onto the set of a i -variables. By assumtion, these sets are included in the robability simlex of R n, n ; no other roerty is assumed. The following theorem is roved in (Nilim & El-Ghaoui 2004). Theorem 1 (robust dynamic rogramming) For the robust control roblem (4), erfect duality holds: φ N (Π, T ) = min max C N (π, τ) = max min C N (π, τ) π Π τ T τ T π Π := ψ N (Π, T ). The roblem can be solved via the recursion v t (i) = min ( ct (i, a) + σ P a i (v t+1) ), i X, t T, (7) where σ P (v) := su{ T v : P} denotes the suort function of a set P, v t (i) is the worst-case otimal value function in state i at stage t. A corresonding otimal control olicy π = (a 0,..., a N 1 ) is obtained by setting a { t (i) arg min ct (i, a) + σ P a (v i t+1) }, i X. (8) The effect of uncertainty on a given strategy π = (a 0,..., a N ) can be evaluated by the following recursion vt π (i) = c t (i, a t (i)) + σ a P t (i)(vt+1), π i X, (9) i which rovides the worst-case value function v π for the strategy π. The above result has a nice consequence for the reeated game (5): Corollary 2 The reeated game (5) is equivalent to the game (4): φ re N (Π, Q) = φ N (Π, T ), and the otimal strategies for φ N (Π, T ) given in theorem 1 are otimal for φ re N (Π, Q) as well. The interretation of the erfect duality result given in theorem 1, and its consequence given in corollary 2, is that it does not matter wether the controller or nature lay rst, or if they alternatively; all these games are equivalent. Now consider the following algorithm, where the uncertainty is described in terms of one of the models described in section Kullback-Liebler Divergence Uncertainty Models : Robust Finite Horizon Dynamic Programming Algorithm 1. Set ɛ > 0. Initialize the value function to its terminal value ˆv N = c N. 2. Reeat until t = 0: (a) For every state i X and action a A, comute, using the bisection algorithm given in (Nilim & El- Ghaoui 2004), a value ˆσ i a such that ˆσ a i ɛ/n σ P a i (ˆv t ) ˆσ a i. (b) Udate the value function by ˆv t 1 (i) = min (c t 1 (i, a) + ˆσ a i ), i X. (c) Relace t by t 1 and go to For every i X and t T, set π ɛ = (a ɛ 0,..., a ɛ N 1 ), where a ɛ t(i) = arg max {c t 1(i, a) + ˆσ a i }, i X, a A. As shown in (Nilim & El-Ghaoui 2004), the above algorithm rovides an subotimal olicy π ɛ that achieves the exact otimum with rescribed accuracy ɛ, with a required number of os bounded above by O(mn 2 N log(n/ɛ)). This means that robustness is obtained at a relative increase of comutational cost of only log(n/ɛ) with resect to the classical dynamic rogramming algorithm, which is small for moderate values of N. If N is very large, we can turn instead to the in nite-horizon roblem examined in the following section, and similar comlexity results hold. In nite-horizon MDP In this section, we address the in nite-horizon robust control roblem, with a discounted cost function of the form (2), where the terminal cost is zero, and c t (i, a) = ν t c(i, a), where c(i, a) is now a constant cost function, which we assume non-negative and nite everywhere, and ν (0, 1) is a discount factor. We begin with the in nite-horizon roblem involving stationary control and nature olicies de ned in (6). The following theorem is roved in (Nilim & El-Ghaoui 2004). Theorem 3 (Robust Bellman recursion) For the in nitehorizon robust control roblem (6) with stationary uncertainty on the transition matrices, stationary control olicies, and a discounted cost function with discount factor ν [0, 1), erfect duality holds: φ (Π s, T s ) = max min C (π, τ) := ψ (Π s, T s ). (10) τ T s π Π s The otimal value is given by φ (Π s, T s ) = v(i 0 ), where i 0 is the initial state, and where the value function v satis es the otimality conditions v(i) = min ( c(i, a) + νσp a i (v)), i X. (11) The value function is the unique limit value of the convergent vector sequence de ned by ( v k+1 (i) = min c(i, a) + νσp a (v i k) ), i X, k = 1, 2,... (12) A stationary, otimal control olicy π = (a, a,...) is obtained as a (i) arg min { c(i, a) + νσp a i (v)}, i X. (13)

4 Note that the roblem of comuting the dual quantity ψ (Π s, T s ) given in (10), has been addressed in (Bagnell, Ng, & Schneider 2001), where the authors rovide the recursion (12) without roof. Theorem (3) leads to the following corollary, also roved in (Nilim & El-Ghaoui 2004). Corollary 4 In the in nite-horizon roblem, we can without loss of generality assume that the control and nature olicies are stationary, that is, φ (Π, T ) = φ (Π s, T s ) = φ (Π s, T ) = φ (Π, T s ). (14) Furthermore, in the nite-horizon case, with a discounted cost function, the ga between the otimal values of the nite-horizon roblems under stationary and time-varying uncertainty models, φ N (Π, T ) φ N (Π, T s ), goes to zero as the horizon length N goes to in nity, at a geometric rate ν. Now consider the following algorithm, where we describe the uncertainty using one of the models of section Kullback-Liebler Divergence Uncertainty Models. Robust In nite Horizon Dynamic Programming Algorithm 1. Set ɛ > 0, initialize the value function ˆv 1 > 0 and set k = 1. 2.(a) For all states i and controls a, comute, using the bisection algorithm given in (Nilim & El-Ghaoui 2004), a value ˆσ i a such that ˆσ a i δ σ P a i (ˆv k ) ˆσ a i, where δ = (1 ν)ɛ/2ν. (b) For all states i and controls a, comute ˆv k+1 (i) by, ˆv k+1 (i) = min (c(i, a) + νˆσa i ). 3. If (1 ν)ɛ ˆv k+1 ˆv k <, 2ν go to 4. Otherwise, relace k by k + 1 and go to For each i X, set an π ɛ = (a ɛ, a ɛ,...), where a ɛ (i) = arg max {c(i, a) + νˆσa i }, i X. In (Nilim & El-Ghaoui 2003; 2004), we establish that the above algorithm nds an ɛ-subotimal robust olicy in at most O(mn 2 log(1/ɛ) 2 ) os. Thus, the extra comutational cost incurred by robustness in the in nite-horizon case is only O(log(1/ɛ)). Solving the inner roblem Each ste of the robust dynamic rogramming algorithm involves the solution of an otimization roblem, referred to as the inner roblem, of the form σ P (v) = max P vt, (15) where the variable corresonds to a articular row of a seci c transition matrix, P = Pi a is the set that describes the uncertainty on this row, and v contains the elements of the value function at some given stage. The comlexity of the sets Pi a for each i X and a A is a key comonent in the comlexity of the robust dynamic rogramming algorithm. Note that we can safely relace P in (15) by its convex hull, so that convexity of the sets Pi a is not required; the algorithm only requires the knowledge of their convex hulls. Beyond numerical tractability, an additional criteria for the choice of a seci c uncertainty model is that the sets P a should reresent accurate (non-conservative) descritions of the statistical uncertainty on the transition matrices. Perhas surrisingly, there are statistical models of uncertainty described via Kullback-Liebler divergence that are good on both counts; seci c examles of such models are described in the following sections. Kullback-Liebler Divergence Uncertainty Models We now address the inner roblem (15) for a seci c action a A and state i X. Denote by D( q) denotes the Kullback-Leibler (KL) divergence (relative entroy) from the robability distribution q n to the robability distribution n : D( q) := () log () q(). The above function rovides a natural way to describe errors in (rows of) the transition matrices; examles of models based on this function are given below. Likelihood Models: Our rst uncertainty model is derived from a controlled exeriment starting from state i = 1, 2,..., n and the count of the number of transitions to different states. We denote by F a the matrix of emirical frequencies of transition with control a in the exeriment; denote by fi a its i th row. We have F a 0 and F a 1 = 1, where 1 denotes the vector of ones. The lug-in estimate ˆP a = F a is the solution to the maximum likelihood roblem max P F a (i, ) log P (i, ) : P 0, P 1 = 1. (16) i, The otimal log-likelihood is βmax a = i, F a (i, ) log F a (i, ). A classical descrition of uncertainty in a maximum-likelihood setting is via the likelihood region (Lehmann & Casella 1998) P a = {P R n n : P 0, P 1 = 1, i, F a (i, ) log P (i, ) β a }, where β a < βmax a is a re-seci ed number, which reresents the uncertainty level. In ractice, the designer seci es an uncertainty level β a based on re-samling methods, or on a large-samle Gaussian aroximation, so as to ensure that the set above achieves a desired level of con dence. With the above model, we note that the inner roblem (15) only involves the set Pi a := { a i Rn : a i 0, a i T 1 = 1, where β a i := βa k i F a (i, ) log a i () βa i F a (k, )log F a (k, ). The set },

5 Pi a is the roection of the set described above on a seci c axis of a i -variables. Noting further that the likelihood function can be exressed in terms of KL divergence, the corresonding uncertainty model on the row a i for given i X, a A, is given by a set of the form Pi a = { n : D(fi a ) γa i }, where γi a = F a (i, ) log F a (i, ) βi a is a function of the uncertainty level, and fi a is the ith row of the matrix F a. Maximum A-Posteriori (MAP) Models: a variation on Likelihood models involves Maximum A Posteriori (MAP) estimates. If there exist a rior information regarding the uncertainty on the i-th row of P a, which can be described via a Dirichlet distribution (Ferguson 1974) with arameter αi a, the resulting MAP estimation roblem takes the form max (fi a + αi a 1) T log : T 1 = 1, 0. Thus, the MAP uncertainty model is equivalent to a Likelihood model, with the samle distribution fi a relaced by fi a + αa i 1, where αa i is the rior corresonding to state i and action a. Relative Entroy Models: Likelihood or MAP models involve the KL divergence from the unknown distribution to a reference distribution. We can also choose to describe uncertainty by exchanging the order of the arguments of the KL divergence. This results in a so-called relative entroy model, where the uncertainty on the i-th row of the transition matrix P a described by a set of the form Pi a = { n : D( qi a) γa i }, where γa i > 0 is xed, q i a > 0 is a given reference distribution (for examle, the Maximum Likelihood distribution). Inner roblem with Likelihood Models The inner roblem (15) with the Likelihood uncertainty model is the following, σ := max T v : n, f() log () β, (17) where we have droed the subscrit i and suerscrit a in the emirical frequencies vector fi a and in the lower bound βi a. In this section β max denotes the maximal value of the likelihood function aearing in the above set, which is β max = f() log f(). We assume that β < β max, which, together with f > 0, ensures that the set above has non-emty interior. Without loss of generality, we can assume that v R n +. The dual roblem The Lagrangian L : R n R n R R R associated with the inner roblem can be written as L(v, ζ, µ, ) = T v +ζ T +µ(1 T 1)+(f T log β), where ζ, µ, and are the Lagrange multiliers. The Lagrange dual function d : R n R R R is the maximum value of the Lagrangian over, i.e., for ζ R n, µ R, and R, d(ζ, µ, ) = su L(v, ζ, µ, ) (18) = su( T v + ζ T + µ(1 T 1) + (f T log β)). The otimal = arg su L(v, ζ, µ, ) is readily be obtained by solving L = 0, which results in (i) = f(i) µ v(i) ζ(i). Plugging the value of in the equation for d(ν, µ, ) yields, with some simli cation, the following dual roblem: σ := min,µ,ζ µ (1 + β) + f() log f() µ v() ν(), such that 0, ζ 0, ζ + v µ1. Since the above roblem is convex, and has a feasible set with non-emty interior, there is no duality ga, that is, σ = σ. Moreover, by a monotonicity argument, we obtain that the otimal dual variable ζ is zero, which reduces the number of variables to two: σ = min,µ h(, µ) where µ (1 + β)+ h(, µ) := f() log f() if > 0, µ > v max, µ v() + otherwise. (19) (Note that, v max := max v()). For further reference, we note that h is twice differentiable on its domain, and that its gradient is given by f() f() log h(, µ) = µ v() β 1 f(). (20) µ v() A bisection algorithm From the exression of the gradient obtained above, we obtain that the otimal value of for a xed µ, (µ), is given analytically by (µ) = 1 f(), (21) µ v() which further reduces the roblem to a one-dimensional roblem: σ = min σ(µ), µ v max where v max = max v(), and σ(µ) = h((µ), µ). By construction, the function σ(µ) is convex in its (scalar) argument, since the function h de ned in (19) is ointly convex

6 in both its arguments (see (Boyd & Vandenberghe January 2004,.74)). Hence, we may use bisection to minimize σ. To initialize the bisection algorithm, we need uer and lower bounds µ and µ + on a minimizer of σ. When µ v max, σ(µ) v max and σ (µ) (see (Nilim & El- Ghaoui 2004)). Thus, we may set the lower bound to µ = v max. The uer bound µ + must be chosen such that σ (µ + ) > 0. We have σ (µ) = h h ((µ), µ) + ((µ), µ)d(µ) µ dµ. (22) The rst term is zero by construction, and d(µ)/dµ > 0 for µ > v max. Hence, we only need a value of µ for which h ((µ), µ) = f() log (µ)f() µ v() β > 0. (23) By convexity of the negative log function, and using the fact that f T 1 = 1, f 0, we obtain that h ((µ), µ) = β max β + f() log (µ) ( µ v() ) v() β max β log f()µ (µ) β max β + log (µ) µ v, where v = f T v denotes the average of v under f. The above, combined with the bound on (µ): (µ) µ v max, yields a suf cient condition for (23) to hold: µ > µ 0 + := v max e β βmax v 1 e β βmax. (24) By construction, the interval [v max, µ 0 +] is guaranteed to contain a global minimizer of σ over (v max, + ). The bisection algorithm goes as follows: 1. Set µ = v max and µ + = µ 0 + as in (24). Let δ > 0 be a small convergence arameter. 2. While µ + µ > δ(1 + µ + µ ), reeat (a) Set µ = (µ + + µ )/2. (b) Comute the gradient of σ at µ. (c) If σ (µ) > 0, set µ + = µ; otherwise, set µ = µ. (d) go to 2a. In ractice, the function to minimize may be very at near the minimum. This means that the above bisection algorithm may take a long time to converge to the global minimizer. Since we are only interested in the value of the minimum (and not of the minimizer), we may modify the stoing criterion to µ + µ δ(1 + µ + µ ) or σ (µ + ) σ (µ ) δ. The second condition in the criterion imlies that σ ((µ + + µ )/2) δ, which is an aroximate condition for global otimality. Inner roblem with Maximum A Posteriori Models The inner roblem (15) with MAP uncertainty models takes the form σ := max T v : 0, T 1 = 1, (f() + α() 1) log () κ, where κ deends on the normalizing constant K aearing in the rior density function and on the chosen lower bound on the MAP function, β. We observe that this roblem has exactly the same form as in the case of likelihood function, rovided we relace f by f + α 1. Therefore, the same results aly to the MAP case. Inner roblem with Entroy Models The inner roblem (15) with entroy uncertainty models takes the form σ := max T v : () log () q() β : T 1 = 1, 0. We note that the constraint set actually equals the whole robability simlex if β is too large, seci cally if β max i ( log q i ), since the latter quantity is the maximum of the relative entroy function over the simlex. Thus, if β max i ( log q i ), the worst-case value of T v for P is equal to v max := max v(). Dual roblem By standard duality arguments (set P being of non-emty interior), the inner roblem is equivalent to its dual: min µ + β + ( ) v() µ q() ex 1. >0,µ Setting the derivative with resect to µ to zero, we obtain the otimality condition ( ) v() µ q() ex 1 = 1, from which we derive µ = log The otimal distribution is = q() ex v(). v() q() ex v(i) i q(i) ex As before, we reduce the roblem to a one-dimensional roblem: min >0 σ().

7 where σ is the convex function: σ() = log q() ex v() + β. (25) Perhas not surrisingly, the above function is closely linked to the moment generating function of a random variable v having the discrete distribution with mass q i at v i. A bisection algorithm As roved in (Nilim & El-Ghaoui 2004), the convex function σ in (25) has the following roerties: and where 0, q T v + β σ() v max + β, (26) σ() = v max + (β + log Q(v)) + o(), (27) Q(v) := : v()=v max q() = Prob{v = v max }. Hence, σ(0) = v max and σ (0) = β + log Q(v). In addition, at in nity the exansion of σ is σ() = q T v + β + o(1). (28) The bisection algorithm can be started with the lower bound = 0. An uer bound can be comuted by nding a solution to the equations σ(0) = q T v + β, which yields the initial uer bound 0 + = (v max q T v)/β. By convexity, a minimizer exists in the interval [0 0 +]. Note that if σ (0) 0, then = 0 is otimal and the otimal value of σ is v max. This means that if β is too high, that is, if β > log Q(v), enforcing robustness amounts to disregard any rior information on the robability distribution. We have observed in (Nilim & El-Ghaoui 2004) a similar henomenon brought about by too large values of β, which resulted in a set P equal to the robability simlex. Here, the limiting value log Q(v) deends not only on q but also on v, since we are dealing with the otimization roblem (15) and not only with its feasible set P. Comutational comlexity of the inner roblem Equied with one of the above uncertainty models, we have shown in the revious section that the inner roblem can be converted by convex duality, to a roblem of minimizing a single-variable, convex function. In turn, this onedimensional convex otimization roblem can be solved via a bisection algorithm with a worst-case comlexity of O(n log(v max /δ)) (see (Nilim & El-Ghaoui 2004) for details), where δ > 0 seci es the accuracy at which the otimal value of the inner roblem (15) is comuted, and v max is a global uer bound on the value function. Remark: We can also use models where the uncertainty in the i-th row for the transition matrix P a is described by a nite set of vectors, Pi a = {a,1 i,..., a,k i }. In this case the comlexity of the corresonding robust dynamic rogramming algorithm is increased by a relative factor of K with resect to its classical counterart, which makes the aroach attractive when the number of scenarios K is moderate. Concluding remarks We roosed a robust dynamic rogramming algorithm for solving nite-state and nite-action MDPs whose solutions are guaranteed to tolerate arbitrary changes of the transition robability matrices within given sets. We roosed models based on KL divergence, which is a natural way to describe estimation errors. The resulting robust dynamic rogramming algorithm has almost the same comutational cost as the classical dynamic rogramming algorithm: the relative increase to comute an ɛ-subotimal olicy is O(log(N/ɛ)) in the N-horizon case, and O(log(1/ɛ)) for the in nite-horizon case. References Bagnell, J.; Ng, A.; and Schneider, J Solving uncertain Markov decision roblems. Technical Reort CMU- RI-TR-01-25, Robotics Institute, Carnegie Mellon University. Boyd, S., and Vandenberghe, L. January, Convex Otimization. Cambridge, U.K.: Cambridge University Press. Feinberg, E., and Shwartz, A Handbook of Markov Decision Processes, Methods and Alications. Boston: Kluwer s Academic Publishers. Ferguson, T Prior distributions on sace of robability measures. The Annal of Statistics 2(4): Givan, R.; Leach, S.; and Dean, T Bounded Parameter Markov Decision Processes. In fourth Euroean Conference on Planning, Lehmann, E., and Casella, G Theory of oint estimation. New York, USA: Sringer-Verlag. Nilim, A., and El-Ghaoui, L Robust solution to Markov decision roblems with uncertain transition matrices: roofs and comlexity analysis. In the roceeding of the Neural Information Processing Systems (NIPS, 2004). Nilim, A., and El-Ghaoui, L Curse of uncertainty in markaov decision rocesses: how tro x it. Technical Reort UCB/ERL M04/, Deartment of EECS, University of California, Berkeley. A related version has been submitted to Oerations Research in Dec Puterman, M Markov Decision Processes: Discrete Stochastic Dynamic Programming. New York: Wiley- Interscience. Satia, J. K., and Lave, R. L Markov decision rocesses with uncertain transition robabilities. Oerations Research 21(3): Shairo, A., and Kleywegt, A. J Minimax analysis of stochastic roblems. Otimization Methods and Software. to aear. White, C. C., and Eldeib, H. K Markov decision rocesses with imrecise transition robabilities. Oerations Research 42(4):

On a Markov Game with Incomplete Information

On a Markov Game with Incomplete Information On a Markov Game with Incomlete Information Johannes Hörner, Dinah Rosenberg y, Eilon Solan z and Nicolas Vieille x{ January 24, 26 Abstract We consider an examle of a Markov game with lack of information

More information

Convex Optimization methods for Computing Channel Capacity

Convex Optimization methods for Computing Channel Capacity Convex Otimization methods for Comuting Channel Caacity Abhishek Sinha Laboratory for Information and Decision Systems (LIDS), MIT sinhaa@mit.edu May 15, 2014 We consider a classical comutational roblem

More information

Elements of Asymptotic Theory. James L. Powell Department of Economics University of California, Berkeley

Elements of Asymptotic Theory. James L. Powell Department of Economics University of California, Berkeley Elements of Asymtotic Theory James L. Powell Deartment of Economics University of California, Berkeley Objectives of Asymtotic Theory While exact results are available for, say, the distribution of the

More information

4. Score normalization technical details We now discuss the technical details of the score normalization method.

4. Score normalization technical details We now discuss the technical details of the score normalization method. SMT SCORING SYSTEM This document describes the scoring system for the Stanford Math Tournament We begin by giving an overview of the changes to scoring and a non-technical descrition of the scoring rules

More information

Approximating min-max k-clustering

Approximating min-max k-clustering Aroximating min-max k-clustering Asaf Levin July 24, 2007 Abstract We consider the roblems of set artitioning into k clusters with minimum total cost and minimum of the maximum cost of a cluster. The cost

More information

Optimism, Delay and (In)Efficiency in a Stochastic Model of Bargaining

Optimism, Delay and (In)Efficiency in a Stochastic Model of Bargaining Otimism, Delay and In)Efficiency in a Stochastic Model of Bargaining Juan Ortner Boston University Setember 10, 2012 Abstract I study a bilateral bargaining game in which the size of the surlus follows

More information

Estimating Time-Series Models

Estimating Time-Series Models Estimating ime-series Models he Box-Jenkins methodology for tting a model to a scalar time series fx t g consists of ve stes:. Decide on the order of di erencing d that is needed to roduce a stationary

More information

Some results of convex programming complexity

Some results of convex programming complexity 2012c12 $ Ê Æ Æ 116ò 14Ï Dec., 2012 Oerations Research Transactions Vol.16 No.4 Some results of convex rogramming comlexity LOU Ye 1,2 GAO Yuetian 1 Abstract Recently a number of aers were written that

More information

On the capacity of the general trapdoor channel with feedback

On the capacity of the general trapdoor channel with feedback On the caacity of the general tradoor channel with feedback Jui Wu and Achilleas Anastasooulos Electrical Engineering and Comuter Science Deartment University of Michigan Ann Arbor, MI, 48109-1 email:

More information

Probability Estimates for Multi-class Classification by Pairwise Coupling

Probability Estimates for Multi-class Classification by Pairwise Coupling Probability Estimates for Multi-class Classification by Pairwise Couling Ting-Fan Wu Chih-Jen Lin Deartment of Comuter Science National Taiwan University Taiei 06, Taiwan Ruby C. Weng Deartment of Statistics

More information

Algorithms for Air Traffic Flow Management under Stochastic Environments

Algorithms for Air Traffic Flow Management under Stochastic Environments Algorithms for Air Traffic Flow Management under Stochastic Environments Arnab Nilim and Laurent El Ghaoui Abstract A major ortion of the delay in the Air Traffic Management Systems (ATMS) in US arises

More information

Elementary Analysis in Q p

Elementary Analysis in Q p Elementary Analysis in Q Hannah Hutter, May Szedlák, Phili Wirth November 17, 2011 This reort follows very closely the book of Svetlana Katok 1. 1 Sequences and Series In this section we will see some

More information

On the Chvatál-Complexity of Knapsack Problems

On the Chvatál-Complexity of Knapsack Problems R u t c o r Research R e o r t On the Chvatál-Comlexity of Knasack Problems Gergely Kovács a Béla Vizvári b RRR 5-08, October 008 RUTCOR Rutgers Center for Oerations Research Rutgers University 640 Bartholomew

More information

Uniformly best wavenumber approximations by spatial central difference operators: An initial investigation

Uniformly best wavenumber approximations by spatial central difference operators: An initial investigation Uniformly best wavenumber aroximations by satial central difference oerators: An initial investigation Vitor Linders and Jan Nordström Abstract A characterisation theorem for best uniform wavenumber aroximations

More information

System Reliability Estimation and Confidence Regions from Subsystem and Full System Tests

System Reliability Estimation and Confidence Regions from Subsystem and Full System Tests 009 American Control Conference Hyatt Regency Riverfront, St. Louis, MO, USA June 0-, 009 FrB4. System Reliability Estimation and Confidence Regions from Subsystem and Full System Tests James C. Sall Abstract

More information

Positive Definite Uncertain Homogeneous Matrix Polynomials: Analysis and Application

Positive Definite Uncertain Homogeneous Matrix Polynomials: Analysis and Application BULGARIA ACADEMY OF SCIECES CYBEREICS AD IFORMAIO ECHOLOGIES Volume 9 o 3 Sofia 009 Positive Definite Uncertain Homogeneous Matrix Polynomials: Analysis and Alication Svetoslav Savov Institute of Information

More information

The non-stochastic multi-armed bandit problem

The non-stochastic multi-armed bandit problem Submitted for journal ublication. The non-stochastic multi-armed bandit roblem Peter Auer Institute for Theoretical Comuter Science Graz University of Technology A-8010 Graz (Austria) auer@igi.tu-graz.ac.at

More information

Research Article An iterative Algorithm for Hemicontractive Mappings in Banach Spaces

Research Article An iterative Algorithm for Hemicontractive Mappings in Banach Spaces Abstract and Alied Analysis Volume 2012, Article ID 264103, 11 ages doi:10.1155/2012/264103 Research Article An iterative Algorithm for Hemicontractive Maings in Banach Saces Youli Yu, 1 Zhitao Wu, 2 and

More information

Notes on duality in second order and -order cone otimization E. D. Andersen Λ, C. Roos y, and T. Terlaky z Aril 6, 000 Abstract Recently, the so-calle

Notes on duality in second order and -order cone otimization E. D. Andersen Λ, C. Roos y, and T. Terlaky z Aril 6, 000 Abstract Recently, the so-calle McMaster University Advanced Otimization Laboratory Title: Notes on duality in second order and -order cone otimization Author: Erling D. Andersen, Cornelis Roos and Tamás Terlaky AdvOl-Reort No. 000/8

More information

Information collection on a graph

Information collection on a graph Information collection on a grah Ilya O. Ryzhov Warren Powell February 10, 2010 Abstract We derive a knowledge gradient olicy for an otimal learning roblem on a grah, in which we use sequential measurements

More information

LECTURE 7 NOTES. x n. d x if. E [g(x n )] E [g(x)]

LECTURE 7 NOTES. x n. d x if. E [g(x n )] E [g(x)] LECTURE 7 NOTES 1. Convergence of random variables. Before delving into the large samle roerties of the MLE, we review some concets from large samle theory. 1. Convergence in robability: x n x if, for

More information

Semiparametric Estimation of Markov Decision Processes with Continuous State Space

Semiparametric Estimation of Markov Decision Processes with Continuous State Space Semiarametric Estimation of Markov Decision Processes with Continuous State Sace Sorawoot Srisuma and Oliver Linton London School of Economics and Political Science he Suntory Centre Suntory and oyota

More information

E-companion to A risk- and ambiguity-averse extension of the max-min newsvendor order formula

E-companion to A risk- and ambiguity-averse extension of the max-min newsvendor order formula e-comanion to Han Du and Zuluaga: Etension of Scarf s ma-min order formula ec E-comanion to A risk- and ambiguity-averse etension of the ma-min newsvendor order formula Qiaoming Han School of Mathematics

More information

Topic: Lower Bounds on Randomized Algorithms Date: September 22, 2004 Scribe: Srinath Sridhar

Topic: Lower Bounds on Randomized Algorithms Date: September 22, 2004 Scribe: Srinath Sridhar 15-859(M): Randomized Algorithms Lecturer: Anuam Guta Toic: Lower Bounds on Randomized Algorithms Date: Setember 22, 2004 Scribe: Srinath Sridhar 4.1 Introduction In this lecture, we will first consider

More information

Chapter 3. GMM: Selected Topics

Chapter 3. GMM: Selected Topics Chater 3. GMM: Selected oics Contents Otimal Instruments. he issue of interest..............................2 Otimal Instruments under the i:i:d: assumtion..............2. he basic result............................2.2

More information

1 Gambler s Ruin Problem

1 Gambler s Ruin Problem Coyright c 2017 by Karl Sigman 1 Gambler s Ruin Problem Let N 2 be an integer and let 1 i N 1. Consider a gambler who starts with an initial fortune of $i and then on each successive gamble either wins

More information

ON THE NORM OF AN IDEMPOTENT SCHUR MULTIPLIER ON THE SCHATTEN CLASS

ON THE NORM OF AN IDEMPOTENT SCHUR MULTIPLIER ON THE SCHATTEN CLASS PROCEEDINGS OF THE AMERICAN MATHEMATICAL SOCIETY Volume 00, Number 0, Pages 000 000 S 000-9939XX)0000-0 ON THE NORM OF AN IDEMPOTENT SCHUR MULTIPLIER ON THE SCHATTEN CLASS WILLIAM D. BANKS AND ASMA HARCHARRAS

More information

DETC2003/DAC AN EFFICIENT ALGORITHM FOR CONSTRUCTING OPTIMAL DESIGN OF COMPUTER EXPERIMENTS

DETC2003/DAC AN EFFICIENT ALGORITHM FOR CONSTRUCTING OPTIMAL DESIGN OF COMPUTER EXPERIMENTS Proceedings of DETC 03 ASME 003 Design Engineering Technical Conferences and Comuters and Information in Engineering Conference Chicago, Illinois USA, Setember -6, 003 DETC003/DAC-48760 AN EFFICIENT ALGORITHM

More information

Asymptotically Optimal Simulation Allocation under Dependent Sampling

Asymptotically Optimal Simulation Allocation under Dependent Sampling Asymtotically Otimal Simulation Allocation under Deendent Samling Xiaoing Xiong The Robert H. Smith School of Business, University of Maryland, College Park, MD 20742-1815, USA, xiaoingx@yahoo.com Sandee

More information

Information collection on a graph

Information collection on a graph Information collection on a grah Ilya O. Ryzhov Warren Powell October 25, 2009 Abstract We derive a knowledge gradient olicy for an otimal learning roblem on a grah, in which we use sequential measurements

More information

State Estimation with ARMarkov Models

State Estimation with ARMarkov Models Deartment of Mechanical and Aerosace Engineering Technical Reort No. 3046, October 1998. Princeton University, Princeton, NJ. State Estimation with ARMarkov Models Ryoung K. Lim 1 Columbia University,

More information

Estimation of the large covariance matrix with two-step monotone missing data

Estimation of the large covariance matrix with two-step monotone missing data Estimation of the large covariance matrix with two-ste monotone missing data Masashi Hyodo, Nobumichi Shutoh 2, Takashi Seo, and Tatjana Pavlenko 3 Deartment of Mathematical Information Science, Tokyo

More information

An Analysis of Reliable Classifiers through ROC Isometrics

An Analysis of Reliable Classifiers through ROC Isometrics An Analysis of Reliable Classifiers through ROC Isometrics Stijn Vanderlooy s.vanderlooy@cs.unimaas.nl Ida G. Srinkhuizen-Kuyer kuyer@cs.unimaas.nl Evgueni N. Smirnov smirnov@cs.unimaas.nl MICC-IKAT, Universiteit

More information

MATH 2710: NOTES FOR ANALYSIS

MATH 2710: NOTES FOR ANALYSIS MATH 270: NOTES FOR ANALYSIS The main ideas we will learn from analysis center around the idea of a limit. Limits occurs in several settings. We will start with finite limits of sequences, then cover infinite

More information

Approximate Dynamic Programming for Dynamic Capacity Allocation with Multiple Priority Levels

Approximate Dynamic Programming for Dynamic Capacity Allocation with Multiple Priority Levels Aroximate Dynamic Programming for Dynamic Caacity Allocation with Multile Priority Levels Alexander Erdelyi School of Oerations Research and Information Engineering, Cornell University, Ithaca, NY 14853,

More information

On the asymptotic sizes of subset Anderson-Rubin and Lagrange multiplier tests in linear instrumental variables regression

On the asymptotic sizes of subset Anderson-Rubin and Lagrange multiplier tests in linear instrumental variables regression On the asymtotic sizes of subset Anderson-Rubin and Lagrange multilier tests in linear instrumental variables regression Patrik Guggenberger Frank Kleibergeny Sohocles Mavroeidisz Linchun Chen\ June 22

More information

Elements of Asymptotic Theory. James L. Powell Department of Economics University of California, Berkeley

Elements of Asymptotic Theory. James L. Powell Department of Economics University of California, Berkeley Elements of Asymtotic Theory James L. Powell Deartment of Economics University of California, Berkeley Objectives of Asymtotic Theory While exact results are available for, say, the distribution of the

More information

Convex Analysis and Economic Theory Winter 2018

Convex Analysis and Economic Theory Winter 2018 Division of the Humanities and Social Sciences Ec 181 KC Border Conve Analysis and Economic Theory Winter 2018 Toic 16: Fenchel conjugates 16.1 Conjugate functions Recall from Proosition 14.1.1 that is

More information

Linear diophantine equations for discrete tomography

Linear diophantine equations for discrete tomography Journal of X-Ray Science and Technology 10 001 59 66 59 IOS Press Linear diohantine euations for discrete tomograhy Yangbo Ye a,gewang b and Jiehua Zhu a a Deartment of Mathematics, The University of Iowa,

More information

Age of Information: Whittle Index for Scheduling Stochastic Arrivals

Age of Information: Whittle Index for Scheduling Stochastic Arrivals Age of Information: Whittle Index for Scheduling Stochastic Arrivals Yu-Pin Hsu Deartment of Communication Engineering National Taiei University yuinhsu@mail.ntu.edu.tw arxiv:80.03422v2 [math.oc] 7 Ar

More information

AM 221: Advanced Optimization Spring Prof. Yaron Singer Lecture 6 February 12th, 2014

AM 221: Advanced Optimization Spring Prof. Yaron Singer Lecture 6 February 12th, 2014 AM 221: Advanced Otimization Sring 2014 Prof. Yaron Singer Lecture 6 February 12th, 2014 1 Overview In our revious lecture we exlored the concet of duality which is the cornerstone of Otimization Theory.

More information

arxiv: v1 [cs.ro] 24 May 2017

arxiv: v1 [cs.ro] 24 May 2017 A Near-Otimal Searation Princile for Nonlinear Stochastic Systems Arising in Robotic Path Planning and Control Mohammadhussein Rafieisakhaei 1, Suman Chakravorty 2 and P. R. Kumar 1 arxiv:1705.08566v1

More information

1 Extremum Estimators

1 Extremum Estimators FINC 9311-21 Financial Econometrics Handout Jialin Yu 1 Extremum Estimators Let θ 0 be a vector of k 1 unknown arameters. Extremum estimators: estimators obtained by maximizing or minimizing some objective

More information

Inequalities for the L 1 Deviation of the Empirical Distribution

Inequalities for the L 1 Deviation of the Empirical Distribution Inequalities for the L 1 Deviation of the Emirical Distribution Tsachy Weissman, Erik Ordentlich, Gadiel Seroussi, Sergio Verdu, Marcelo J. Weinberger June 13, 2003 Abstract We derive bounds on the robability

More information

Combining Logistic Regression with Kriging for Mapping the Risk of Occurrence of Unexploded Ordnance (UXO)

Combining Logistic Regression with Kriging for Mapping the Risk of Occurrence of Unexploded Ordnance (UXO) Combining Logistic Regression with Kriging for Maing the Risk of Occurrence of Unexloded Ordnance (UXO) H. Saito (), P. Goovaerts (), S. A. McKenna (2) Environmental and Water Resources Engineering, Deartment

More information

On Wald-Type Optimal Stopping for Brownian Motion

On Wald-Type Optimal Stopping for Brownian Motion J Al Probab Vol 34, No 1, 1997, (66-73) Prerint Ser No 1, 1994, Math Inst Aarhus On Wald-Tye Otimal Stoing for Brownian Motion S RAVRSN and PSKIR The solution is resented to all otimal stoing roblems of

More information

Improved Capacity Bounds for the Binary Energy Harvesting Channel

Improved Capacity Bounds for the Binary Energy Harvesting Channel Imroved Caacity Bounds for the Binary Energy Harvesting Channel Kaya Tutuncuoglu 1, Omur Ozel 2, Aylin Yener 1, and Sennur Ulukus 2 1 Deartment of Electrical Engineering, The Pennsylvania State University,

More information

MATHEMATICAL MODELLING OF THE WIRELESS COMMUNICATION NETWORK

MATHEMATICAL MODELLING OF THE WIRELESS COMMUNICATION NETWORK Comuter Modelling and ew Technologies, 5, Vol.9, o., 3-39 Transort and Telecommunication Institute, Lomonosov, LV-9, Riga, Latvia MATHEMATICAL MODELLIG OF THE WIRELESS COMMUICATIO ETWORK M. KOPEETSK Deartment

More information

Uncorrelated Multilinear Principal Component Analysis for Unsupervised Multilinear Subspace Learning

Uncorrelated Multilinear Principal Component Analysis for Unsupervised Multilinear Subspace Learning TNN-2009-P-1186.R2 1 Uncorrelated Multilinear Princial Comonent Analysis for Unsuervised Multilinear Subsace Learning Haiing Lu, K. N. Plataniotis and A. N. Venetsanooulos The Edward S. Rogers Sr. Deartment

More information

A Bound on the Error of Cross Validation Using the Approximation and Estimation Rates, with Consequences for the Training-Test Split

A Bound on the Error of Cross Validation Using the Approximation and Estimation Rates, with Consequences for the Training-Test Split A Bound on the Error of Cross Validation Using the Aroximation and Estimation Rates, with Consequences for the Training-Test Slit Michael Kearns AT&T Bell Laboratories Murray Hill, NJ 7974 mkearns@research.att.com

More information

Difference of Convex Functions Programming for Reinforcement Learning (Supplementary File)

Difference of Convex Functions Programming for Reinforcement Learning (Supplementary File) Difference of Convex Functions Programming for Reinforcement Learning Sulementary File Bilal Piot 1,, Matthieu Geist 1, Olivier Pietquin,3 1 MaLIS research grou SUPELEC - UMI 958 GeorgiaTech-CRS, France

More information

A Social Welfare Optimal Sequential Allocation Procedure

A Social Welfare Optimal Sequential Allocation Procedure A Social Welfare Otimal Sequential Allocation Procedure Thomas Kalinowsi Universität Rostoc, Germany Nina Narodytsa and Toby Walsh NICTA and UNSW, Australia May 2, 201 Abstract We consider a simle sequential

More information

Paper C Exact Volume Balance Versus Exact Mass Balance in Compositional Reservoir Simulation

Paper C Exact Volume Balance Versus Exact Mass Balance in Compositional Reservoir Simulation Paer C Exact Volume Balance Versus Exact Mass Balance in Comositional Reservoir Simulation Submitted to Comutational Geosciences, December 2005. Exact Volume Balance Versus Exact Mass Balance in Comositional

More information

A numerical implementation of a predictor-corrector algorithm for sufcient linear complementarity problem

A numerical implementation of a predictor-corrector algorithm for sufcient linear complementarity problem A numerical imlementation of a redictor-corrector algorithm for sufcient linear comlementarity roblem BENTERKI DJAMEL University Ferhat Abbas of Setif-1 Faculty of science Laboratory of fundamental and

More information

PArtially observable Markov decision processes

PArtially observable Markov decision processes Solving Continuous-State POMDPs via Density Projection Enlu Zhou, Member, IEEE, Michael C. Fu, Fellow, IEEE, and Steven I. Marcus, Fellow, IEEE Abstract Research on numerical solution methods for artially

More information

The following document is intended for online publication only (authors webpage).

The following document is intended for online publication only (authors webpage). The following document is intended for online ublication only (authors webage). Sulement to Identi cation and stimation of Distributional Imacts of Interventions Using Changes in Inequality Measures, Part

More information

Asymptotic Properties of the Markov Chain Model method of finding Markov chains Generators of..

Asymptotic Properties of the Markov Chain Model method of finding Markov chains Generators of.. IOSR Journal of Mathematics (IOSR-JM) e-issn: 78-578, -ISSN: 319-765X. Volume 1, Issue 4 Ver. III (Jul. - Aug.016), PP 53-60 www.iosrournals.org Asymtotic Proerties of the Markov Chain Model method of

More information

Sums of independent random variables

Sums of independent random variables 3 Sums of indeendent random variables This lecture collects a number of estimates for sums of indeendent random variables with values in a Banach sace E. We concentrate on sums of the form N γ nx n, where

More information

Named Entity Recognition using Maximum Entropy Model SEEM5680

Named Entity Recognition using Maximum Entropy Model SEEM5680 Named Entity Recognition using Maximum Entroy Model SEEM5680 Named Entity Recognition System Named Entity Recognition (NER): Identifying certain hrases/word sequences in a free text. Generally it involves

More information

Feedback-error control

Feedback-error control Chater 4 Feedback-error control 4.1 Introduction This chater exlains the feedback-error (FBE) control scheme originally described by Kawato [, 87, 8]. FBE is a widely used neural network based controller

More information

Optimal Learning Policies for the Newsvendor Problem with Censored Demand and Unobservable Lost Sales

Optimal Learning Policies for the Newsvendor Problem with Censored Demand and Unobservable Lost Sales Otimal Learning Policies for the Newsvendor Problem with Censored Demand and Unobservable Lost Sales Diana Negoescu Peter Frazier Warren Powell Abstract In this aer, we consider a version of the newsvendor

More information

x and y suer from two tyes of additive noise [], [3] Uncertainties e x, e y, where the only rior knowledge is their boundedness and zero mean Gaussian

x and y suer from two tyes of additive noise [], [3] Uncertainties e x, e y, where the only rior knowledge is their boundedness and zero mean Gaussian A New Estimator for Mixed Stochastic and Set Theoretic Uncertainty Models Alied to Mobile Robot Localization Uwe D. Hanebeck Joachim Horn Institute of Automatic Control Engineering Siemens AG, Cororate

More information

Research Note REGRESSION ANALYSIS IN MARKOV CHAIN * A. Y. ALAMUTI AND M. R. MESHKANI **

Research Note REGRESSION ANALYSIS IN MARKOV CHAIN * A. Y. ALAMUTI AND M. R. MESHKANI ** Iranian Journal of Science & Technology, Transaction A, Vol 3, No A3 Printed in The Islamic Reublic of Iran, 26 Shiraz University Research Note REGRESSION ANALYSIS IN MARKOV HAIN * A Y ALAMUTI AND M R

More information

TRACES OF SCHUR AND KRONECKER PRODUCTS FOR BLOCK MATRICES

TRACES OF SCHUR AND KRONECKER PRODUCTS FOR BLOCK MATRICES Khayyam J. Math. DOI:10.22034/kjm.2019.84207 TRACES OF SCHUR AND KRONECKER PRODUCTS FOR BLOCK MATRICES ISMAEL GARCÍA-BAYONA Communicated by A.M. Peralta Abstract. In this aer, we define two new Schur and

More information

A New Perspective on Learning Linear Separators with Large L q L p Margins

A New Perspective on Learning Linear Separators with Large L q L p Margins A New Persective on Learning Linear Searators with Large L q L Margins Maria-Florina Balcan Georgia Institute of Technology Christoher Berlind Georgia Institute of Technology Abstract We give theoretical

More information

For q 0; 1; : : : ; `? 1, we have m 0; 1; : : : ; q? 1. The set fh j(x) : j 0; 1; ; : : : ; `? 1g forms a basis for the tness functions dened on the i

For q 0; 1; : : : ; `? 1, we have m 0; 1; : : : ; q? 1. The set fh j(x) : j 0; 1; ; : : : ; `? 1g forms a basis for the tness functions dened on the i Comuting with Haar Functions Sami Khuri Deartment of Mathematics and Comuter Science San Jose State University One Washington Square San Jose, CA 9519-0103, USA khuri@juiter.sjsu.edu Fax: (40)94-500 Keywords:

More information

arxiv: v1 [cs.lg] 31 Jul 2014

arxiv: v1 [cs.lg] 31 Jul 2014 Learning Nash Equilibria in Congestion Games Walid Krichene Benjamin Drighès Alexandre M. Bayen arxiv:408.007v [cs.lg] 3 Jul 204 Abstract We study the reeated congestion game, in which multile oulations

More information

Mobility-Induced Service Migration in Mobile. Micro-Clouds

Mobility-Induced Service Migration in Mobile. Micro-Clouds arxiv:503054v [csdc] 7 Mar 205 Mobility-Induced Service Migration in Mobile Micro-Clouds Shiiang Wang, Rahul Urgaonkar, Ting He, Murtaza Zafer, Kevin Chan, and Kin K LeungTime Oerating after ossible Deartment

More information

Universal Finite Memory Coding of Binary Sequences

Universal Finite Memory Coding of Binary Sequences Deartment of Electrical Engineering Systems Universal Finite Memory Coding of Binary Sequences Thesis submitted towards the degree of Master of Science in Electrical and Electronic Engineering in Tel-Aviv

More information

On split sample and randomized confidence intervals for binomial proportions

On split sample and randomized confidence intervals for binomial proportions On slit samle and randomized confidence intervals for binomial roortions Måns Thulin Deartment of Mathematics, Usala University arxiv:1402.6536v1 [stat.me] 26 Feb 2014 Abstract Slit samle methods have

More information

Combinatorics of topmost discs of multi-peg Tower of Hanoi problem

Combinatorics of topmost discs of multi-peg Tower of Hanoi problem Combinatorics of tomost discs of multi-eg Tower of Hanoi roblem Sandi Klavžar Deartment of Mathematics, PEF, Unversity of Maribor Koroška cesta 160, 000 Maribor, Slovenia Uroš Milutinović Deartment of

More information

Khinchine inequality for slightly dependent random variables

Khinchine inequality for slightly dependent random variables arxiv:170808095v1 [mathpr] 7 Aug 017 Khinchine inequality for slightly deendent random variables Susanna Sektor Abstract We rove a Khintchine tye inequality under the assumtion that the sum of Rademacher

More information

The analysis and representation of random signals

The analysis and representation of random signals The analysis and reresentation of random signals Bruno TOÉSNI Bruno.Torresani@cmi.univ-mrs.fr B. Torrésani LTP Université de Provence.1/30 Outline 1. andom signals Introduction The Karhunen-Loève Basis

More information

GOOD MODELS FOR CUBIC SURFACES. 1. Introduction

GOOD MODELS FOR CUBIC SURFACES. 1. Introduction GOOD MODELS FOR CUBIC SURFACES ANDREAS-STEPHAN ELSENHANS Abstract. This article describes an algorithm for finding a model of a hyersurface with small coefficients. It is shown that the aroach works in

More information

Factorial moments of point processes

Factorial moments of point processes Factorial moments of oint rocesses Jean-Christohe Breton IRMAR - UMR CNRS 6625 Université de Rennes 1 Camus de Beaulieu F-35042 Rennes Cedex France Nicolas Privault Division of Mathematical Sciences School

More information

Real Analysis 1 Fall Homework 3. a n.

Real Analysis 1 Fall Homework 3. a n. eal Analysis Fall 06 Homework 3. Let and consider the measure sace N, P, µ, where µ is counting measure. That is, if N, then µ equals the number of elements in if is finite; µ = otherwise. One usually

More information

RANDOM WALKS AND PERCOLATION: AN ANALYSIS OF CURRENT RESEARCH ON MODELING NATURAL PROCESSES

RANDOM WALKS AND PERCOLATION: AN ANALYSIS OF CURRENT RESEARCH ON MODELING NATURAL PROCESSES RANDOM WALKS AND PERCOLATION: AN ANALYSIS OF CURRENT RESEARCH ON MODELING NATURAL PROCESSES AARON ZWIEBACH Abstract. In this aer we will analyze research that has been recently done in the field of discrete

More information

Adaptive Estimation of the Regression Discontinuity Model

Adaptive Estimation of the Regression Discontinuity Model Adative Estimation of the Regression Discontinuity Model Yixiao Sun Deartment of Economics Univeristy of California, San Diego La Jolla, CA 9293-58 Feburary 25 Email: yisun@ucsd.edu; Tel: 858-534-4692

More information

#A37 INTEGERS 15 (2015) NOTE ON A RESULT OF CHUNG ON WEIL TYPE SUMS

#A37 INTEGERS 15 (2015) NOTE ON A RESULT OF CHUNG ON WEIL TYPE SUMS #A37 INTEGERS 15 (2015) NOTE ON A RESULT OF CHUNG ON WEIL TYPE SUMS Norbert Hegyvári ELTE TTK, Eötvös University, Institute of Mathematics, Budaest, Hungary hegyvari@elte.hu François Hennecart Université

More information

ON THE LEAST SIGNIFICANT p ADIC DIGITS OF CERTAIN LUCAS NUMBERS

ON THE LEAST SIGNIFICANT p ADIC DIGITS OF CERTAIN LUCAS NUMBERS #A13 INTEGERS 14 (014) ON THE LEAST SIGNIFICANT ADIC DIGITS OF CERTAIN LUCAS NUMBERS Tamás Lengyel Deartment of Mathematics, Occidental College, Los Angeles, California lengyel@oxy.edu Received: 6/13/13,

More information

Lecture 6. 2 Recurrence/transience, harmonic functions and martingales

Lecture 6. 2 Recurrence/transience, harmonic functions and martingales Lecture 6 Classification of states We have shown that all states of an irreducible countable state Markov chain must of the same tye. This gives rise to the following classification. Definition. [Classification

More information

dn i where we have used the Gibbs equation for the Gibbs energy and the definition of chemical potential

dn i where we have used the Gibbs equation for the Gibbs energy and the definition of chemical potential Chem 467 Sulement to Lectures 33 Phase Equilibrium Chemical Potential Revisited We introduced the chemical otential as the conjugate variable to amount. Briefly reviewing, the total Gibbs energy of a system

More information

IMPROVED BOUNDS IN THE SCALED ENFLO TYPE INEQUALITY FOR BANACH SPACES

IMPROVED BOUNDS IN THE SCALED ENFLO TYPE INEQUALITY FOR BANACH SPACES IMPROVED BOUNDS IN THE SCALED ENFLO TYPE INEQUALITY FOR BANACH SPACES OHAD GILADI AND ASSAF NAOR Abstract. It is shown that if (, ) is a Banach sace with Rademacher tye 1 then for every n N there exists

More information

Radial Basis Function Networks: Algorithms

Radial Basis Function Networks: Algorithms Radial Basis Function Networks: Algorithms Introduction to Neural Networks : Lecture 13 John A. Bullinaria, 2004 1. The RBF Maing 2. The RBF Network Architecture 3. Comutational Power of RBF Networks 4.

More information

LEIBNIZ SEMINORMS IN PROBABILITY SPACES

LEIBNIZ SEMINORMS IN PROBABILITY SPACES LEIBNIZ SEMINORMS IN PROBABILITY SPACES ÁDÁM BESENYEI AND ZOLTÁN LÉKA Abstract. In this aer we study the (strong) Leibniz roerty of centered moments of bounded random variables. We shall answer a question

More information

Bilinear Entropy Expansion from the Decisional Linear Assumption

Bilinear Entropy Expansion from the Decisional Linear Assumption Bilinear Entroy Exansion from the Decisional Linear Assumtion Lucas Kowalczyk Columbia University luke@cs.columbia.edu Allison Bisho Lewko Columbia University alewko@cs.columbia.edu Abstract We develo

More information

AI*IA 2003 Fusion of Multiple Pattern Classifiers PART III

AI*IA 2003 Fusion of Multiple Pattern Classifiers PART III AI*IA 23 Fusion of Multile Pattern Classifiers PART III AI*IA 23 Tutorial on Fusion of Multile Pattern Classifiers by F. Roli 49 Methods for fusing multile classifiers Methods for fusing multile classifiers

More information

substantial literature on emirical likelihood indicating that it is widely viewed as a desirable and natural aroach to statistical inference in a vari

substantial literature on emirical likelihood indicating that it is widely viewed as a desirable and natural aroach to statistical inference in a vari Condence tubes for multile quantile lots via emirical likelihood John H.J. Einmahl Eindhoven University of Technology Ian W. McKeague Florida State University May 7, 998 Abstract The nonarametric emirical

More information

Generalized Coiflets: A New Family of Orthonormal Wavelets

Generalized Coiflets: A New Family of Orthonormal Wavelets Generalized Coiflets A New Family of Orthonormal Wavelets Dong Wei, Alan C Bovik, and Brian L Evans Laboratory for Image and Video Engineering Deartment of Electrical and Comuter Engineering The University

More information

Statistical Treatment Choice Based on. Asymmetric Minimax Regret Criteria

Statistical Treatment Choice Based on. Asymmetric Minimax Regret Criteria Statistical Treatment Coice Based on Asymmetric Minimax Regret Criteria Aleksey Tetenov y Deartment of Economics ortwestern University ovember 5, 007 (JOB MARKET PAPER) Abstract Tis aer studies te roblem

More information

Correspondence Between Fractal-Wavelet. Transforms and Iterated Function Systems. With Grey Level Maps. F. Mendivil and E.R.

Correspondence Between Fractal-Wavelet. Transforms and Iterated Function Systems. With Grey Level Maps. F. Mendivil and E.R. 1 Corresondence Between Fractal-Wavelet Transforms and Iterated Function Systems With Grey Level Mas F. Mendivil and E.R. Vrscay Deartment of Alied Mathematics Faculty of Mathematics University of Waterloo

More information

The power performance of fixed-t panel unit root tests allowing for structural breaks in their deterministic components

The power performance of fixed-t panel unit root tests allowing for structural breaks in their deterministic components ATHES UIVERSITY OF ECOOMICS AD BUSIESS DEPARTMET OF ECOOMICS WORKIG PAPER SERIES 23-203 The ower erformance of fixed-t anel unit root tests allowing for structural breaks in their deterministic comonents

More information

Using the Divergence Information Criterion for the Determination of the Order of an Autoregressive Process

Using the Divergence Information Criterion for the Determination of the Order of an Autoregressive Process Using the Divergence Information Criterion for the Determination of the Order of an Autoregressive Process P. Mantalos a1, K. Mattheou b, A. Karagrigoriou b a.deartment of Statistics University of Lund

More information

Statics and dynamics: some elementary concepts

Statics and dynamics: some elementary concepts 1 Statics and dynamics: some elementary concets Dynamics is the study of the movement through time of variables such as heartbeat, temerature, secies oulation, voltage, roduction, emloyment, rices and

More information

Estimating function analysis for a class of Tweedie regression models

Estimating function analysis for a class of Tweedie regression models Title Estimating function analysis for a class of Tweedie regression models Author Wagner Hugo Bonat Deartamento de Estatística - DEST, Laboratório de Estatística e Geoinformação - LEG, Universidade Federal

More information

Best approximation by linear combinations of characteristic functions of half-spaces

Best approximation by linear combinations of characteristic functions of half-spaces Best aroximation by linear combinations of characteristic functions of half-saces Paul C. Kainen Deartment of Mathematics Georgetown University Washington, D.C. 20057-1233, USA Věra Kůrková Institute of

More information

Positive decomposition of transfer functions with multiple poles

Positive decomposition of transfer functions with multiple poles Positive decomosition of transfer functions with multile oles Béla Nagy 1, Máté Matolcsi 2, and Márta Szilvási 1 Deartment of Analysis, Technical University of Budaest (BME), H-1111, Budaest, Egry J. u.

More information

Optimal Design of Truss Structures Using a Neutrosophic Number Optimization Model under an Indeterminate Environment

Optimal Design of Truss Structures Using a Neutrosophic Number Optimization Model under an Indeterminate Environment Neutrosohic Sets and Systems Vol 14 016 93 University of New Mexico Otimal Design of Truss Structures Using a Neutrosohic Number Otimization Model under an Indeterminate Environment Wenzhong Jiang & Jun

More information

Uniform Law on the Unit Sphere of a Banach Space

Uniform Law on the Unit Sphere of a Banach Space Uniform Law on the Unit Shere of a Banach Sace by Bernard Beauzamy Société de Calcul Mathématique SA Faubourg Saint Honoré 75008 Paris France Setember 008 Abstract We investigate the construction of a

More information

1. INTRODUCTION. Fn 2 = F j F j+1 (1.1)

1. INTRODUCTION. Fn 2 = F j F j+1 (1.1) CERTAIN CLASSES OF FINITE SUMS THAT INVOLVE GENERALIZED FIBONACCI AND LUCAS NUMBERS The beautiful identity R.S. Melham Deartment of Mathematical Sciences, University of Technology, Sydney PO Box 23, Broadway,

More information