Robust Solutions to Markov Decision Problems
|
|
- Steven Sutton
- 6 years ago
- Views:
Transcription
1 Robust Solutions to Markov Decision Problems Arnab Nilim and Laurent El Ghaoui Deartment of Electrical Engineering and Comuter Sciences University of California, Berkeley, CA Abstract Otimal solutions to Markov Decision Problems (MDPs) are very sensitive with resect to the state transition robabilities. In many ractical roblems, the estimation of those robabilities is far from accurate. Hence, estimation errors are limiting factors in alying MDPs to real-world roblems. We roose an algorithm for solving nite-state and nite-action MDPs, where the solution is guaranteed to be robust with resect to estimation errors on the state transition robabilities. Our algorithm involves a statistically accurate yet numerically ef cient reresentation of uncertainty via likelihood functions. The worst-case comlexity of the robust algorithm is the same as the original Bellman recursion. Hence, robustness can be added at ractically no extra comuting cost. Introduction We consider a nite-state and nite-action Markov decision roblem in which the transition robabilities themselves are uncertain, and seek a robust decision for it. Our work is motivated by the fact that in many ractical roblems, the transition matrices have to be estimated from data. This may be a dif cult task and the estimation errors may have a huge imact on the solution, which is often quite sensitive to changes in the transition robabilities (Feinberg & Shwartz 2002). A number of authors have addressed the issue of uncertainty in the transition matrices of an MDP. A Bayesian aroach such as described by (Shairo & Kleywegt 2002) requires a erfect knowledge of the whole rior distribution on the transition matrix, making it dif cult to aly in ractice. Other authors have considered the transition matrix to lie in a given set, most tyically a olytoe: see (Satia & Lave 1973; White & Eldeib 1994; Givan, Leach, & Dean 1997). Although our aroach allows to describe the uncertainty on the transition matrix by a olytoe, we may argue against choosing such a model for the uncertainty. First, a general olytoe is often not a tractable way to address the robustness roblem, as it incurs a signi cant additional comutational effort to handle uncertainty. Perhas more imortantly, olytoic models, esecially interval matrices, may be very oor reresentations of statistical uncertainty and Coyright c 2004, American Association for Arti cial Intelligence ( All rights reserved. lead to very conservative robust olicies. In (Bagnell, Ng, & Schneider 2001), the authors consider a roblem dual to ours, and rovide a general statement according to which the cost of solving their roblem is olynomial in roblem size, rovided the uncertainty on the transition matrices is described by convex sets, without roosing any seci c algorithm. This aer is a short version of a longer reort (Nilim & El-Ghaoui 2004), which contains all the roofs of the results summarized here. Notation. P > 0 or P 0 refers to the strict or nonstrict comonentwise inequality for matrices or vectors. For a vector > 0, log refers to the comonentwise oeration. The notation 1 refers to the vector of ones, with size determined from context. The robability simlex in R n is denoted n = { R n + : T 1 = 1}, while Θ n is the set of n n transition matrices (comonentwise non-negative matrices with rows summing to one). We use σ P to denote the suort function of a set P R n, with for v R n, σ P (v) := su{ T v : P}. The roblem descrition We consider a nite horizon Markov decision rocess with nite decision horizon T = {0, 1, 2,..., N 1}. At each stage, the system occuies a state i X, where n = X is nite, and a decision maker is allowed to choose an action a deterministically from a nite set of allowable actions A = {a 1,..., a m } (for notational simlicity we assume that A is not state-deendent). The system starts in a given initial state i 0. The states make Markov transitions according to a collection of (ossibly time-deendent) transition matrices τ := (Pt a ),t T, where for every a A, t T, the n n transition matrix Pt a contains the robabilities of transition under action a at stage t. We denote by π = (a 0,..., a N 1 ) a generic controller olicy, where a t (i) denotes the controller action when the system is in state i X at time t T. Let Π = A nn be the corresonding strategy sace. De ne by c t (i, a) the cost corresonding to state i X and action a A at time t T, and by c N the cost function at the terminal stage. We assume that c t (i, a) is non-negative and nite for every i X and a A. For a given set of transition matrices τ, we de ne the
2 nite-horizon nominal roblem by φ N (Π, τ) := min C N (π, τ), (1) π Π where C N (π, τ) denotes the exected total cost under controller olicy π and transitions τ: ( N 1 ) C N (π, τ) := E c t (i t, a t (i)) + c N (i N ). (2) t=0 A secial case of interest is when the exected total cost function bears the form (2), where the terminal cost is zero, and c t (i, a) = ν t c(i, a), with c(i, a) now a constant cost function, which we assume non-negative and nite everywhere, and ν (0, 1) is a discount factor. We refer to this cost function as the discounted cost function, and denote by C (π, τ) the limit of the discounted cost (2) as N. When the transition matrices are exactly known, the corresonding nominal roblem can be solved via a dynamic rogramming algorithm, which has total comlexity of mn 2 N os in the nite-horizon case. In the in nitehorizon case with a discounted cost function, the cost of comuting an ɛ-subotimal olicy via the Bellman recursion is O(mn 2 log(1/ɛ)); see (Puterman 1994) for more details. The robust control roblems At rst we assume that when for each action a and time t, the corresonding transition matrix Pt a is only known to lie in some given subset P a. Two models for transition matrix uncertainty are ossible, leading to two ossible forms of nite-horizon robust control roblems. In a rst model, referred to as the stationary uncertainty model, the transition matrices are chosen by nature deending on the controller olicy once and for all, and remain xed thereafter. In a second model, which we refer to as the time-varying uncertainty model, the transition matrices can vary arbitrarily with time, within their rescribed bounds. Each roblem leads to a game between the controller and nature, where the controller seeks to minimize the maximum exected cost, with nature being the maximizing layer. Let us de ne our two roblems more formally. A olicy of nature refers to a seci c collection of time-deendent transition matrices τ = (Pt a ),t T chosen by nature, and the set of admissible olicies of nature is T := ( P a ) N. Denote by T s the set of stationary admissible olicies of nature: T s = {τ = (Pt a ),t T T : Pt a = Ps a for every t, s T, a A}. The stationary uncertainty model leads to the roblem φ N (Π, T s ) := min max C N (π, τ). (3) π Π τ T s In contrast, the time-varying uncertainty model leads to a relaxed version of the above: φ N (Π, T s ) φ N (Π, T ) := min max C N (π, τ). (4) π Π τ T The rst model is attractive for statistical reasons, as it is much easier to develo statistically accurate sets of con dence when the underlying rocess is time-invariant. Unfortunately, the resulting game (3) seems to be hard to solve because rincile of otimality may not hold in these roblems due to the deendence of otimal actions of nature among different stages. The second model is attractive as one can solve the corresonding game (4) using a variant of the dynamic rogramming algorithm seen later, but we are left with a dif cult task, that of estimating a meaningful set of con dence for the time-varying matrices P a t. In this aer we will use the rst model of uncertainty in order to derive statistically meaningful sets of con dence for the transition matrices, based on likelihood or entroy bounds. Then, instead of solving the corresonding dif cult control roblem (3), we use an aroximation that is common in robust control, and solve the time-varying uer bound (4), using the uncertainty sets P a derived from a stationarity assumtion about the transition matrices. We will also consider a variant of the nite-horizon time-varying roblem (4), where controller and nature lay alternatively, leading to a reeated game (Π, Q) := min max min max... min max C N (π, τ), a 0 τ 0 Q a 1 τ 1 Q a N 1 τ N 1 Q (5) where the notation τ t = (Pt a ) denotes the collection of transition matrices at a given time t T, and Q := P a is the corresonding set of con dence. Finally, we will consider an in nite-horizon robust control roblem, with the discounted cost function referred to above, and where we restrict control and nature olicies to be stationary: φ re N φ (Π s, T s ) := min max C (π, τ), (6) π Π s τ T s where Π s denotes the sace of stationary control olicies. We de ne φ (Π, T ), φ (Π, T s ) and φ (Π s, T ) accordingly. In the sequel, for a given control olicy π Π and subset S T, the notation φ N (π, S) := max τ S C N (π, τ) denotes the worst-case exected total cost for the nite-horizon roblem, and φ (π, S) is de ned likewise. Main results Our main contributions are as follows. First we rovide a recursion, the robust dynamic rogramming algorithm, which solves the nite-horizon robust control roblem (4). We rovide a simle roof in (Nilim & El-Ghaoui 2004) of the otimality of the recursion, where the main ingredient is to show that erfect duality holds in the game (4). As a corollary of this result, we obtain that the reeated game (5) is equivalent to its non-reeated counterart (4). Second, we rovide similar results for the in nite-horizon roblem with discounted cost function, (6). Moreover, we obtain that if we consider a nite-horizon roblem with a discounted cost function, then the ga between the otimal value of the stationary uncertainty roblem (3) and that of its time-varying counterart (4) goes to zero as the horizon length goes to in- nity, at a rate determined by the discount factor. Finally, we identify several classes of uncertainty models, which result in an algorithm that is both statistically accurate and numerically tractable. We rovide recise comlexity results that imly that, with the roosed aroach, robustness can be handled at ractically no extra comuting cost.
3 Finite-Horizon robust MDP We consider the nite-horizon robust control roblem de- ned in section. For a given state i X, action a A, and P a P a, we denote by a i the next-state distribution drawn from P a corresonding to state i X ; thus a i is the i-th row of matrix P a. We de ne Pi a as the roection of the set P a onto the set of a i -variables. By assumtion, these sets are included in the robability simlex of R n, n ; no other roerty is assumed. The following theorem is roved in (Nilim & El-Ghaoui 2004). Theorem 1 (robust dynamic rogramming) For the robust control roblem (4), erfect duality holds: φ N (Π, T ) = min max C N (π, τ) = max min C N (π, τ) π Π τ T τ T π Π := ψ N (Π, T ). The roblem can be solved via the recursion v t (i) = min ( ct (i, a) + σ P a i (v t+1) ), i X, t T, (7) where σ P (v) := su{ T v : P} denotes the suort function of a set P, v t (i) is the worst-case otimal value function in state i at stage t. A corresonding otimal control olicy π = (a 0,..., a N 1 ) is obtained by setting a { t (i) arg min ct (i, a) + σ P a (v i t+1) }, i X. (8) The effect of uncertainty on a given strategy π = (a 0,..., a N ) can be evaluated by the following recursion vt π (i) = c t (i, a t (i)) + σ a P t (i)(vt+1), π i X, (9) i which rovides the worst-case value function v π for the strategy π. The above result has a nice consequence for the reeated game (5): Corollary 2 The reeated game (5) is equivalent to the game (4): φ re N (Π, Q) = φ N (Π, T ), and the otimal strategies for φ N (Π, T ) given in theorem 1 are otimal for φ re N (Π, Q) as well. The interretation of the erfect duality result given in theorem 1, and its consequence given in corollary 2, is that it does not matter wether the controller or nature lay rst, or if they alternatively; all these games are equivalent. Now consider the following algorithm, where the uncertainty is described in terms of one of the models described in section Kullback-Liebler Divergence Uncertainty Models : Robust Finite Horizon Dynamic Programming Algorithm 1. Set ɛ > 0. Initialize the value function to its terminal value ˆv N = c N. 2. Reeat until t = 0: (a) For every state i X and action a A, comute, using the bisection algorithm given in (Nilim & El- Ghaoui 2004), a value ˆσ i a such that ˆσ a i ɛ/n σ P a i (ˆv t ) ˆσ a i. (b) Udate the value function by ˆv t 1 (i) = min (c t 1 (i, a) + ˆσ a i ), i X. (c) Relace t by t 1 and go to For every i X and t T, set π ɛ = (a ɛ 0,..., a ɛ N 1 ), where a ɛ t(i) = arg max {c t 1(i, a) + ˆσ a i }, i X, a A. As shown in (Nilim & El-Ghaoui 2004), the above algorithm rovides an subotimal olicy π ɛ that achieves the exact otimum with rescribed accuracy ɛ, with a required number of os bounded above by O(mn 2 N log(n/ɛ)). This means that robustness is obtained at a relative increase of comutational cost of only log(n/ɛ) with resect to the classical dynamic rogramming algorithm, which is small for moderate values of N. If N is very large, we can turn instead to the in nite-horizon roblem examined in the following section, and similar comlexity results hold. In nite-horizon MDP In this section, we address the in nite-horizon robust control roblem, with a discounted cost function of the form (2), where the terminal cost is zero, and c t (i, a) = ν t c(i, a), where c(i, a) is now a constant cost function, which we assume non-negative and nite everywhere, and ν (0, 1) is a discount factor. We begin with the in nite-horizon roblem involving stationary control and nature olicies de ned in (6). The following theorem is roved in (Nilim & El-Ghaoui 2004). Theorem 3 (Robust Bellman recursion) For the in nitehorizon robust control roblem (6) with stationary uncertainty on the transition matrices, stationary control olicies, and a discounted cost function with discount factor ν [0, 1), erfect duality holds: φ (Π s, T s ) = max min C (π, τ) := ψ (Π s, T s ). (10) τ T s π Π s The otimal value is given by φ (Π s, T s ) = v(i 0 ), where i 0 is the initial state, and where the value function v satis es the otimality conditions v(i) = min ( c(i, a) + νσp a i (v)), i X. (11) The value function is the unique limit value of the convergent vector sequence de ned by ( v k+1 (i) = min c(i, a) + νσp a (v i k) ), i X, k = 1, 2,... (12) A stationary, otimal control olicy π = (a, a,...) is obtained as a (i) arg min { c(i, a) + νσp a i (v)}, i X. (13)
4 Note that the roblem of comuting the dual quantity ψ (Π s, T s ) given in (10), has been addressed in (Bagnell, Ng, & Schneider 2001), where the authors rovide the recursion (12) without roof. Theorem (3) leads to the following corollary, also roved in (Nilim & El-Ghaoui 2004). Corollary 4 In the in nite-horizon roblem, we can without loss of generality assume that the control and nature olicies are stationary, that is, φ (Π, T ) = φ (Π s, T s ) = φ (Π s, T ) = φ (Π, T s ). (14) Furthermore, in the nite-horizon case, with a discounted cost function, the ga between the otimal values of the nite-horizon roblems under stationary and time-varying uncertainty models, φ N (Π, T ) φ N (Π, T s ), goes to zero as the horizon length N goes to in nity, at a geometric rate ν. Now consider the following algorithm, where we describe the uncertainty using one of the models of section Kullback-Liebler Divergence Uncertainty Models. Robust In nite Horizon Dynamic Programming Algorithm 1. Set ɛ > 0, initialize the value function ˆv 1 > 0 and set k = 1. 2.(a) For all states i and controls a, comute, using the bisection algorithm given in (Nilim & El-Ghaoui 2004), a value ˆσ i a such that ˆσ a i δ σ P a i (ˆv k ) ˆσ a i, where δ = (1 ν)ɛ/2ν. (b) For all states i and controls a, comute ˆv k+1 (i) by, ˆv k+1 (i) = min (c(i, a) + νˆσa i ). 3. If (1 ν)ɛ ˆv k+1 ˆv k <, 2ν go to 4. Otherwise, relace k by k + 1 and go to For each i X, set an π ɛ = (a ɛ, a ɛ,...), where a ɛ (i) = arg max {c(i, a) + νˆσa i }, i X. In (Nilim & El-Ghaoui 2003; 2004), we establish that the above algorithm nds an ɛ-subotimal robust olicy in at most O(mn 2 log(1/ɛ) 2 ) os. Thus, the extra comutational cost incurred by robustness in the in nite-horizon case is only O(log(1/ɛ)). Solving the inner roblem Each ste of the robust dynamic rogramming algorithm involves the solution of an otimization roblem, referred to as the inner roblem, of the form σ P (v) = max P vt, (15) where the variable corresonds to a articular row of a seci c transition matrix, P = Pi a is the set that describes the uncertainty on this row, and v contains the elements of the value function at some given stage. The comlexity of the sets Pi a for each i X and a A is a key comonent in the comlexity of the robust dynamic rogramming algorithm. Note that we can safely relace P in (15) by its convex hull, so that convexity of the sets Pi a is not required; the algorithm only requires the knowledge of their convex hulls. Beyond numerical tractability, an additional criteria for the choice of a seci c uncertainty model is that the sets P a should reresent accurate (non-conservative) descritions of the statistical uncertainty on the transition matrices. Perhas surrisingly, there are statistical models of uncertainty described via Kullback-Liebler divergence that are good on both counts; seci c examles of such models are described in the following sections. Kullback-Liebler Divergence Uncertainty Models We now address the inner roblem (15) for a seci c action a A and state i X. Denote by D( q) denotes the Kullback-Leibler (KL) divergence (relative entroy) from the robability distribution q n to the robability distribution n : D( q) := () log () q(). The above function rovides a natural way to describe errors in (rows of) the transition matrices; examles of models based on this function are given below. Likelihood Models: Our rst uncertainty model is derived from a controlled exeriment starting from state i = 1, 2,..., n and the count of the number of transitions to different states. We denote by F a the matrix of emirical frequencies of transition with control a in the exeriment; denote by fi a its i th row. We have F a 0 and F a 1 = 1, where 1 denotes the vector of ones. The lug-in estimate ˆP a = F a is the solution to the maximum likelihood roblem max P F a (i, ) log P (i, ) : P 0, P 1 = 1. (16) i, The otimal log-likelihood is βmax a = i, F a (i, ) log F a (i, ). A classical descrition of uncertainty in a maximum-likelihood setting is via the likelihood region (Lehmann & Casella 1998) P a = {P R n n : P 0, P 1 = 1, i, F a (i, ) log P (i, ) β a }, where β a < βmax a is a re-seci ed number, which reresents the uncertainty level. In ractice, the designer seci es an uncertainty level β a based on re-samling methods, or on a large-samle Gaussian aroximation, so as to ensure that the set above achieves a desired level of con dence. With the above model, we note that the inner roblem (15) only involves the set Pi a := { a i Rn : a i 0, a i T 1 = 1, where β a i := βa k i F a (i, ) log a i () βa i F a (k, )log F a (k, ). The set },
5 Pi a is the roection of the set described above on a seci c axis of a i -variables. Noting further that the likelihood function can be exressed in terms of KL divergence, the corresonding uncertainty model on the row a i for given i X, a A, is given by a set of the form Pi a = { n : D(fi a ) γa i }, where γi a = F a (i, ) log F a (i, ) βi a is a function of the uncertainty level, and fi a is the ith row of the matrix F a. Maximum A-Posteriori (MAP) Models: a variation on Likelihood models involves Maximum A Posteriori (MAP) estimates. If there exist a rior information regarding the uncertainty on the i-th row of P a, which can be described via a Dirichlet distribution (Ferguson 1974) with arameter αi a, the resulting MAP estimation roblem takes the form max (fi a + αi a 1) T log : T 1 = 1, 0. Thus, the MAP uncertainty model is equivalent to a Likelihood model, with the samle distribution fi a relaced by fi a + αa i 1, where αa i is the rior corresonding to state i and action a. Relative Entroy Models: Likelihood or MAP models involve the KL divergence from the unknown distribution to a reference distribution. We can also choose to describe uncertainty by exchanging the order of the arguments of the KL divergence. This results in a so-called relative entroy model, where the uncertainty on the i-th row of the transition matrix P a described by a set of the form Pi a = { n : D( qi a) γa i }, where γa i > 0 is xed, q i a > 0 is a given reference distribution (for examle, the Maximum Likelihood distribution). Inner roblem with Likelihood Models The inner roblem (15) with the Likelihood uncertainty model is the following, σ := max T v : n, f() log () β, (17) where we have droed the subscrit i and suerscrit a in the emirical frequencies vector fi a and in the lower bound βi a. In this section β max denotes the maximal value of the likelihood function aearing in the above set, which is β max = f() log f(). We assume that β < β max, which, together with f > 0, ensures that the set above has non-emty interior. Without loss of generality, we can assume that v R n +. The dual roblem The Lagrangian L : R n R n R R R associated with the inner roblem can be written as L(v, ζ, µ, ) = T v +ζ T +µ(1 T 1)+(f T log β), where ζ, µ, and are the Lagrange multiliers. The Lagrange dual function d : R n R R R is the maximum value of the Lagrangian over, i.e., for ζ R n, µ R, and R, d(ζ, µ, ) = su L(v, ζ, µ, ) (18) = su( T v + ζ T + µ(1 T 1) + (f T log β)). The otimal = arg su L(v, ζ, µ, ) is readily be obtained by solving L = 0, which results in (i) = f(i) µ v(i) ζ(i). Plugging the value of in the equation for d(ν, µ, ) yields, with some simli cation, the following dual roblem: σ := min,µ,ζ µ (1 + β) + f() log f() µ v() ν(), such that 0, ζ 0, ζ + v µ1. Since the above roblem is convex, and has a feasible set with non-emty interior, there is no duality ga, that is, σ = σ. Moreover, by a monotonicity argument, we obtain that the otimal dual variable ζ is zero, which reduces the number of variables to two: σ = min,µ h(, µ) where µ (1 + β)+ h(, µ) := f() log f() if > 0, µ > v max, µ v() + otherwise. (19) (Note that, v max := max v()). For further reference, we note that h is twice differentiable on its domain, and that its gradient is given by f() f() log h(, µ) = µ v() β 1 f(). (20) µ v() A bisection algorithm From the exression of the gradient obtained above, we obtain that the otimal value of for a xed µ, (µ), is given analytically by (µ) = 1 f(), (21) µ v() which further reduces the roblem to a one-dimensional roblem: σ = min σ(µ), µ v max where v max = max v(), and σ(µ) = h((µ), µ). By construction, the function σ(µ) is convex in its (scalar) argument, since the function h de ned in (19) is ointly convex
6 in both its arguments (see (Boyd & Vandenberghe January 2004,.74)). Hence, we may use bisection to minimize σ. To initialize the bisection algorithm, we need uer and lower bounds µ and µ + on a minimizer of σ. When µ v max, σ(µ) v max and σ (µ) (see (Nilim & El- Ghaoui 2004)). Thus, we may set the lower bound to µ = v max. The uer bound µ + must be chosen such that σ (µ + ) > 0. We have σ (µ) = h h ((µ), µ) + ((µ), µ)d(µ) µ dµ. (22) The rst term is zero by construction, and d(µ)/dµ > 0 for µ > v max. Hence, we only need a value of µ for which h ((µ), µ) = f() log (µ)f() µ v() β > 0. (23) By convexity of the negative log function, and using the fact that f T 1 = 1, f 0, we obtain that h ((µ), µ) = β max β + f() log (µ) ( µ v() ) v() β max β log f()µ (µ) β max β + log (µ) µ v, where v = f T v denotes the average of v under f. The above, combined with the bound on (µ): (µ) µ v max, yields a suf cient condition for (23) to hold: µ > µ 0 + := v max e β βmax v 1 e β βmax. (24) By construction, the interval [v max, µ 0 +] is guaranteed to contain a global minimizer of σ over (v max, + ). The bisection algorithm goes as follows: 1. Set µ = v max and µ + = µ 0 + as in (24). Let δ > 0 be a small convergence arameter. 2. While µ + µ > δ(1 + µ + µ ), reeat (a) Set µ = (µ + + µ )/2. (b) Comute the gradient of σ at µ. (c) If σ (µ) > 0, set µ + = µ; otherwise, set µ = µ. (d) go to 2a. In ractice, the function to minimize may be very at near the minimum. This means that the above bisection algorithm may take a long time to converge to the global minimizer. Since we are only interested in the value of the minimum (and not of the minimizer), we may modify the stoing criterion to µ + µ δ(1 + µ + µ ) or σ (µ + ) σ (µ ) δ. The second condition in the criterion imlies that σ ((µ + + µ )/2) δ, which is an aroximate condition for global otimality. Inner roblem with Maximum A Posteriori Models The inner roblem (15) with MAP uncertainty models takes the form σ := max T v : 0, T 1 = 1, (f() + α() 1) log () κ, where κ deends on the normalizing constant K aearing in the rior density function and on the chosen lower bound on the MAP function, β. We observe that this roblem has exactly the same form as in the case of likelihood function, rovided we relace f by f + α 1. Therefore, the same results aly to the MAP case. Inner roblem with Entroy Models The inner roblem (15) with entroy uncertainty models takes the form σ := max T v : () log () q() β : T 1 = 1, 0. We note that the constraint set actually equals the whole robability simlex if β is too large, seci cally if β max i ( log q i ), since the latter quantity is the maximum of the relative entroy function over the simlex. Thus, if β max i ( log q i ), the worst-case value of T v for P is equal to v max := max v(). Dual roblem By standard duality arguments (set P being of non-emty interior), the inner roblem is equivalent to its dual: min µ + β + ( ) v() µ q() ex 1. >0,µ Setting the derivative with resect to µ to zero, we obtain the otimality condition ( ) v() µ q() ex 1 = 1, from which we derive µ = log The otimal distribution is = q() ex v(). v() q() ex v(i) i q(i) ex As before, we reduce the roblem to a one-dimensional roblem: min >0 σ().
7 where σ is the convex function: σ() = log q() ex v() + β. (25) Perhas not surrisingly, the above function is closely linked to the moment generating function of a random variable v having the discrete distribution with mass q i at v i. A bisection algorithm As roved in (Nilim & El-Ghaoui 2004), the convex function σ in (25) has the following roerties: and where 0, q T v + β σ() v max + β, (26) σ() = v max + (β + log Q(v)) + o(), (27) Q(v) := : v()=v max q() = Prob{v = v max }. Hence, σ(0) = v max and σ (0) = β + log Q(v). In addition, at in nity the exansion of σ is σ() = q T v + β + o(1). (28) The bisection algorithm can be started with the lower bound = 0. An uer bound can be comuted by nding a solution to the equations σ(0) = q T v + β, which yields the initial uer bound 0 + = (v max q T v)/β. By convexity, a minimizer exists in the interval [0 0 +]. Note that if σ (0) 0, then = 0 is otimal and the otimal value of σ is v max. This means that if β is too high, that is, if β > log Q(v), enforcing robustness amounts to disregard any rior information on the robability distribution. We have observed in (Nilim & El-Ghaoui 2004) a similar henomenon brought about by too large values of β, which resulted in a set P equal to the robability simlex. Here, the limiting value log Q(v) deends not only on q but also on v, since we are dealing with the otimization roblem (15) and not only with its feasible set P. Comutational comlexity of the inner roblem Equied with one of the above uncertainty models, we have shown in the revious section that the inner roblem can be converted by convex duality, to a roblem of minimizing a single-variable, convex function. In turn, this onedimensional convex otimization roblem can be solved via a bisection algorithm with a worst-case comlexity of O(n log(v max /δ)) (see (Nilim & El-Ghaoui 2004) for details), where δ > 0 seci es the accuracy at which the otimal value of the inner roblem (15) is comuted, and v max is a global uer bound on the value function. Remark: We can also use models where the uncertainty in the i-th row for the transition matrix P a is described by a nite set of vectors, Pi a = {a,1 i,..., a,k i }. In this case the comlexity of the corresonding robust dynamic rogramming algorithm is increased by a relative factor of K with resect to its classical counterart, which makes the aroach attractive when the number of scenarios K is moderate. Concluding remarks We roosed a robust dynamic rogramming algorithm for solving nite-state and nite-action MDPs whose solutions are guaranteed to tolerate arbitrary changes of the transition robability matrices within given sets. We roosed models based on KL divergence, which is a natural way to describe estimation errors. The resulting robust dynamic rogramming algorithm has almost the same comutational cost as the classical dynamic rogramming algorithm: the relative increase to comute an ɛ-subotimal olicy is O(log(N/ɛ)) in the N-horizon case, and O(log(1/ɛ)) for the in nite-horizon case. References Bagnell, J.; Ng, A.; and Schneider, J Solving uncertain Markov decision roblems. Technical Reort CMU- RI-TR-01-25, Robotics Institute, Carnegie Mellon University. Boyd, S., and Vandenberghe, L. January, Convex Otimization. Cambridge, U.K.: Cambridge University Press. Feinberg, E., and Shwartz, A Handbook of Markov Decision Processes, Methods and Alications. Boston: Kluwer s Academic Publishers. Ferguson, T Prior distributions on sace of robability measures. The Annal of Statistics 2(4): Givan, R.; Leach, S.; and Dean, T Bounded Parameter Markov Decision Processes. In fourth Euroean Conference on Planning, Lehmann, E., and Casella, G Theory of oint estimation. New York, USA: Sringer-Verlag. Nilim, A., and El-Ghaoui, L Robust solution to Markov decision roblems with uncertain transition matrices: roofs and comlexity analysis. In the roceeding of the Neural Information Processing Systems (NIPS, 2004). Nilim, A., and El-Ghaoui, L Curse of uncertainty in markaov decision rocesses: how tro x it. Technical Reort UCB/ERL M04/, Deartment of EECS, University of California, Berkeley. A related version has been submitted to Oerations Research in Dec Puterman, M Markov Decision Processes: Discrete Stochastic Dynamic Programming. New York: Wiley- Interscience. Satia, J. K., and Lave, R. L Markov decision rocesses with uncertain transition robabilities. Oerations Research 21(3): Shairo, A., and Kleywegt, A. J Minimax analysis of stochastic roblems. Otimization Methods and Software. to aear. White, C. C., and Eldeib, H. K Markov decision rocesses with imrecise transition robabilities. Oerations Research 42(4):
On a Markov Game with Incomplete Information
On a Markov Game with Incomlete Information Johannes Hörner, Dinah Rosenberg y, Eilon Solan z and Nicolas Vieille x{ January 24, 26 Abstract We consider an examle of a Markov game with lack of information
More informationConvex Optimization methods for Computing Channel Capacity
Convex Otimization methods for Comuting Channel Caacity Abhishek Sinha Laboratory for Information and Decision Systems (LIDS), MIT sinhaa@mit.edu May 15, 2014 We consider a classical comutational roblem
More informationElements of Asymptotic Theory. James L. Powell Department of Economics University of California, Berkeley
Elements of Asymtotic Theory James L. Powell Deartment of Economics University of California, Berkeley Objectives of Asymtotic Theory While exact results are available for, say, the distribution of the
More information4. Score normalization technical details We now discuss the technical details of the score normalization method.
SMT SCORING SYSTEM This document describes the scoring system for the Stanford Math Tournament We begin by giving an overview of the changes to scoring and a non-technical descrition of the scoring rules
More informationApproximating min-max k-clustering
Aroximating min-max k-clustering Asaf Levin July 24, 2007 Abstract We consider the roblems of set artitioning into k clusters with minimum total cost and minimum of the maximum cost of a cluster. The cost
More informationOptimism, Delay and (In)Efficiency in a Stochastic Model of Bargaining
Otimism, Delay and In)Efficiency in a Stochastic Model of Bargaining Juan Ortner Boston University Setember 10, 2012 Abstract I study a bilateral bargaining game in which the size of the surlus follows
More informationEstimating Time-Series Models
Estimating ime-series Models he Box-Jenkins methodology for tting a model to a scalar time series fx t g consists of ve stes:. Decide on the order of di erencing d that is needed to roduce a stationary
More informationSome results of convex programming complexity
2012c12 $ Ê Æ Æ 116ò 14Ï Dec., 2012 Oerations Research Transactions Vol.16 No.4 Some results of convex rogramming comlexity LOU Ye 1,2 GAO Yuetian 1 Abstract Recently a number of aers were written that
More informationOn the capacity of the general trapdoor channel with feedback
On the caacity of the general tradoor channel with feedback Jui Wu and Achilleas Anastasooulos Electrical Engineering and Comuter Science Deartment University of Michigan Ann Arbor, MI, 48109-1 email:
More informationProbability Estimates for Multi-class Classification by Pairwise Coupling
Probability Estimates for Multi-class Classification by Pairwise Couling Ting-Fan Wu Chih-Jen Lin Deartment of Comuter Science National Taiwan University Taiei 06, Taiwan Ruby C. Weng Deartment of Statistics
More informationAlgorithms for Air Traffic Flow Management under Stochastic Environments
Algorithms for Air Traffic Flow Management under Stochastic Environments Arnab Nilim and Laurent El Ghaoui Abstract A major ortion of the delay in the Air Traffic Management Systems (ATMS) in US arises
More informationElementary Analysis in Q p
Elementary Analysis in Q Hannah Hutter, May Szedlák, Phili Wirth November 17, 2011 This reort follows very closely the book of Svetlana Katok 1. 1 Sequences and Series In this section we will see some
More informationOn the Chvatál-Complexity of Knapsack Problems
R u t c o r Research R e o r t On the Chvatál-Comlexity of Knasack Problems Gergely Kovács a Béla Vizvári b RRR 5-08, October 008 RUTCOR Rutgers Center for Oerations Research Rutgers University 640 Bartholomew
More informationUniformly best wavenumber approximations by spatial central difference operators: An initial investigation
Uniformly best wavenumber aroximations by satial central difference oerators: An initial investigation Vitor Linders and Jan Nordström Abstract A characterisation theorem for best uniform wavenumber aroximations
More informationSystem Reliability Estimation and Confidence Regions from Subsystem and Full System Tests
009 American Control Conference Hyatt Regency Riverfront, St. Louis, MO, USA June 0-, 009 FrB4. System Reliability Estimation and Confidence Regions from Subsystem and Full System Tests James C. Sall Abstract
More informationPositive Definite Uncertain Homogeneous Matrix Polynomials: Analysis and Application
BULGARIA ACADEMY OF SCIECES CYBEREICS AD IFORMAIO ECHOLOGIES Volume 9 o 3 Sofia 009 Positive Definite Uncertain Homogeneous Matrix Polynomials: Analysis and Alication Svetoslav Savov Institute of Information
More informationThe non-stochastic multi-armed bandit problem
Submitted for journal ublication. The non-stochastic multi-armed bandit roblem Peter Auer Institute for Theoretical Comuter Science Graz University of Technology A-8010 Graz (Austria) auer@igi.tu-graz.ac.at
More informationResearch Article An iterative Algorithm for Hemicontractive Mappings in Banach Spaces
Abstract and Alied Analysis Volume 2012, Article ID 264103, 11 ages doi:10.1155/2012/264103 Research Article An iterative Algorithm for Hemicontractive Maings in Banach Saces Youli Yu, 1 Zhitao Wu, 2 and
More informationNotes on duality in second order and -order cone otimization E. D. Andersen Λ, C. Roos y, and T. Terlaky z Aril 6, 000 Abstract Recently, the so-calle
McMaster University Advanced Otimization Laboratory Title: Notes on duality in second order and -order cone otimization Author: Erling D. Andersen, Cornelis Roos and Tamás Terlaky AdvOl-Reort No. 000/8
More informationInformation collection on a graph
Information collection on a grah Ilya O. Ryzhov Warren Powell February 10, 2010 Abstract We derive a knowledge gradient olicy for an otimal learning roblem on a grah, in which we use sequential measurements
More informationLECTURE 7 NOTES. x n. d x if. E [g(x n )] E [g(x)]
LECTURE 7 NOTES 1. Convergence of random variables. Before delving into the large samle roerties of the MLE, we review some concets from large samle theory. 1. Convergence in robability: x n x if, for
More informationSemiparametric Estimation of Markov Decision Processes with Continuous State Space
Semiarametric Estimation of Markov Decision Processes with Continuous State Sace Sorawoot Srisuma and Oliver Linton London School of Economics and Political Science he Suntory Centre Suntory and oyota
More informationE-companion to A risk- and ambiguity-averse extension of the max-min newsvendor order formula
e-comanion to Han Du and Zuluaga: Etension of Scarf s ma-min order formula ec E-comanion to A risk- and ambiguity-averse etension of the ma-min newsvendor order formula Qiaoming Han School of Mathematics
More informationTopic: Lower Bounds on Randomized Algorithms Date: September 22, 2004 Scribe: Srinath Sridhar
15-859(M): Randomized Algorithms Lecturer: Anuam Guta Toic: Lower Bounds on Randomized Algorithms Date: Setember 22, 2004 Scribe: Srinath Sridhar 4.1 Introduction In this lecture, we will first consider
More informationChapter 3. GMM: Selected Topics
Chater 3. GMM: Selected oics Contents Otimal Instruments. he issue of interest..............................2 Otimal Instruments under the i:i:d: assumtion..............2. he basic result............................2.2
More information1 Gambler s Ruin Problem
Coyright c 2017 by Karl Sigman 1 Gambler s Ruin Problem Let N 2 be an integer and let 1 i N 1. Consider a gambler who starts with an initial fortune of $i and then on each successive gamble either wins
More informationON THE NORM OF AN IDEMPOTENT SCHUR MULTIPLIER ON THE SCHATTEN CLASS
PROCEEDINGS OF THE AMERICAN MATHEMATICAL SOCIETY Volume 00, Number 0, Pages 000 000 S 000-9939XX)0000-0 ON THE NORM OF AN IDEMPOTENT SCHUR MULTIPLIER ON THE SCHATTEN CLASS WILLIAM D. BANKS AND ASMA HARCHARRAS
More informationDETC2003/DAC AN EFFICIENT ALGORITHM FOR CONSTRUCTING OPTIMAL DESIGN OF COMPUTER EXPERIMENTS
Proceedings of DETC 03 ASME 003 Design Engineering Technical Conferences and Comuters and Information in Engineering Conference Chicago, Illinois USA, Setember -6, 003 DETC003/DAC-48760 AN EFFICIENT ALGORITHM
More informationAsymptotically Optimal Simulation Allocation under Dependent Sampling
Asymtotically Otimal Simulation Allocation under Deendent Samling Xiaoing Xiong The Robert H. Smith School of Business, University of Maryland, College Park, MD 20742-1815, USA, xiaoingx@yahoo.com Sandee
More informationInformation collection on a graph
Information collection on a grah Ilya O. Ryzhov Warren Powell October 25, 2009 Abstract We derive a knowledge gradient olicy for an otimal learning roblem on a grah, in which we use sequential measurements
More informationState Estimation with ARMarkov Models
Deartment of Mechanical and Aerosace Engineering Technical Reort No. 3046, October 1998. Princeton University, Princeton, NJ. State Estimation with ARMarkov Models Ryoung K. Lim 1 Columbia University,
More informationEstimation of the large covariance matrix with two-step monotone missing data
Estimation of the large covariance matrix with two-ste monotone missing data Masashi Hyodo, Nobumichi Shutoh 2, Takashi Seo, and Tatjana Pavlenko 3 Deartment of Mathematical Information Science, Tokyo
More informationAn Analysis of Reliable Classifiers through ROC Isometrics
An Analysis of Reliable Classifiers through ROC Isometrics Stijn Vanderlooy s.vanderlooy@cs.unimaas.nl Ida G. Srinkhuizen-Kuyer kuyer@cs.unimaas.nl Evgueni N. Smirnov smirnov@cs.unimaas.nl MICC-IKAT, Universiteit
More informationMATH 2710: NOTES FOR ANALYSIS
MATH 270: NOTES FOR ANALYSIS The main ideas we will learn from analysis center around the idea of a limit. Limits occurs in several settings. We will start with finite limits of sequences, then cover infinite
More informationApproximate Dynamic Programming for Dynamic Capacity Allocation with Multiple Priority Levels
Aroximate Dynamic Programming for Dynamic Caacity Allocation with Multile Priority Levels Alexander Erdelyi School of Oerations Research and Information Engineering, Cornell University, Ithaca, NY 14853,
More informationOn the asymptotic sizes of subset Anderson-Rubin and Lagrange multiplier tests in linear instrumental variables regression
On the asymtotic sizes of subset Anderson-Rubin and Lagrange multilier tests in linear instrumental variables regression Patrik Guggenberger Frank Kleibergeny Sohocles Mavroeidisz Linchun Chen\ June 22
More informationElements of Asymptotic Theory. James L. Powell Department of Economics University of California, Berkeley
Elements of Asymtotic Theory James L. Powell Deartment of Economics University of California, Berkeley Objectives of Asymtotic Theory While exact results are available for, say, the distribution of the
More informationConvex Analysis and Economic Theory Winter 2018
Division of the Humanities and Social Sciences Ec 181 KC Border Conve Analysis and Economic Theory Winter 2018 Toic 16: Fenchel conjugates 16.1 Conjugate functions Recall from Proosition 14.1.1 that is
More informationLinear diophantine equations for discrete tomography
Journal of X-Ray Science and Technology 10 001 59 66 59 IOS Press Linear diohantine euations for discrete tomograhy Yangbo Ye a,gewang b and Jiehua Zhu a a Deartment of Mathematics, The University of Iowa,
More informationAge of Information: Whittle Index for Scheduling Stochastic Arrivals
Age of Information: Whittle Index for Scheduling Stochastic Arrivals Yu-Pin Hsu Deartment of Communication Engineering National Taiei University yuinhsu@mail.ntu.edu.tw arxiv:80.03422v2 [math.oc] 7 Ar
More informationAM 221: Advanced Optimization Spring Prof. Yaron Singer Lecture 6 February 12th, 2014
AM 221: Advanced Otimization Sring 2014 Prof. Yaron Singer Lecture 6 February 12th, 2014 1 Overview In our revious lecture we exlored the concet of duality which is the cornerstone of Otimization Theory.
More informationarxiv: v1 [cs.ro] 24 May 2017
A Near-Otimal Searation Princile for Nonlinear Stochastic Systems Arising in Robotic Path Planning and Control Mohammadhussein Rafieisakhaei 1, Suman Chakravorty 2 and P. R. Kumar 1 arxiv:1705.08566v1
More information1 Extremum Estimators
FINC 9311-21 Financial Econometrics Handout Jialin Yu 1 Extremum Estimators Let θ 0 be a vector of k 1 unknown arameters. Extremum estimators: estimators obtained by maximizing or minimizing some objective
More informationInequalities for the L 1 Deviation of the Empirical Distribution
Inequalities for the L 1 Deviation of the Emirical Distribution Tsachy Weissman, Erik Ordentlich, Gadiel Seroussi, Sergio Verdu, Marcelo J. Weinberger June 13, 2003 Abstract We derive bounds on the robability
More informationCombining Logistic Regression with Kriging for Mapping the Risk of Occurrence of Unexploded Ordnance (UXO)
Combining Logistic Regression with Kriging for Maing the Risk of Occurrence of Unexloded Ordnance (UXO) H. Saito (), P. Goovaerts (), S. A. McKenna (2) Environmental and Water Resources Engineering, Deartment
More informationOn Wald-Type Optimal Stopping for Brownian Motion
J Al Probab Vol 34, No 1, 1997, (66-73) Prerint Ser No 1, 1994, Math Inst Aarhus On Wald-Tye Otimal Stoing for Brownian Motion S RAVRSN and PSKIR The solution is resented to all otimal stoing roblems of
More informationImproved Capacity Bounds for the Binary Energy Harvesting Channel
Imroved Caacity Bounds for the Binary Energy Harvesting Channel Kaya Tutuncuoglu 1, Omur Ozel 2, Aylin Yener 1, and Sennur Ulukus 2 1 Deartment of Electrical Engineering, The Pennsylvania State University,
More informationMATHEMATICAL MODELLING OF THE WIRELESS COMMUNICATION NETWORK
Comuter Modelling and ew Technologies, 5, Vol.9, o., 3-39 Transort and Telecommunication Institute, Lomonosov, LV-9, Riga, Latvia MATHEMATICAL MODELLIG OF THE WIRELESS COMMUICATIO ETWORK M. KOPEETSK Deartment
More informationUncorrelated Multilinear Principal Component Analysis for Unsupervised Multilinear Subspace Learning
TNN-2009-P-1186.R2 1 Uncorrelated Multilinear Princial Comonent Analysis for Unsuervised Multilinear Subsace Learning Haiing Lu, K. N. Plataniotis and A. N. Venetsanooulos The Edward S. Rogers Sr. Deartment
More informationA Bound on the Error of Cross Validation Using the Approximation and Estimation Rates, with Consequences for the Training-Test Split
A Bound on the Error of Cross Validation Using the Aroximation and Estimation Rates, with Consequences for the Training-Test Slit Michael Kearns AT&T Bell Laboratories Murray Hill, NJ 7974 mkearns@research.att.com
More informationDifference of Convex Functions Programming for Reinforcement Learning (Supplementary File)
Difference of Convex Functions Programming for Reinforcement Learning Sulementary File Bilal Piot 1,, Matthieu Geist 1, Olivier Pietquin,3 1 MaLIS research grou SUPELEC - UMI 958 GeorgiaTech-CRS, France
More informationA Social Welfare Optimal Sequential Allocation Procedure
A Social Welfare Otimal Sequential Allocation Procedure Thomas Kalinowsi Universität Rostoc, Germany Nina Narodytsa and Toby Walsh NICTA and UNSW, Australia May 2, 201 Abstract We consider a simle sequential
More informationPaper C Exact Volume Balance Versus Exact Mass Balance in Compositional Reservoir Simulation
Paer C Exact Volume Balance Versus Exact Mass Balance in Comositional Reservoir Simulation Submitted to Comutational Geosciences, December 2005. Exact Volume Balance Versus Exact Mass Balance in Comositional
More informationA numerical implementation of a predictor-corrector algorithm for sufcient linear complementarity problem
A numerical imlementation of a redictor-corrector algorithm for sufcient linear comlementarity roblem BENTERKI DJAMEL University Ferhat Abbas of Setif-1 Faculty of science Laboratory of fundamental and
More informationPArtially observable Markov decision processes
Solving Continuous-State POMDPs via Density Projection Enlu Zhou, Member, IEEE, Michael C. Fu, Fellow, IEEE, and Steven I. Marcus, Fellow, IEEE Abstract Research on numerical solution methods for artially
More informationThe following document is intended for online publication only (authors webpage).
The following document is intended for online ublication only (authors webage). Sulement to Identi cation and stimation of Distributional Imacts of Interventions Using Changes in Inequality Measures, Part
More informationAsymptotic Properties of the Markov Chain Model method of finding Markov chains Generators of..
IOSR Journal of Mathematics (IOSR-JM) e-issn: 78-578, -ISSN: 319-765X. Volume 1, Issue 4 Ver. III (Jul. - Aug.016), PP 53-60 www.iosrournals.org Asymtotic Proerties of the Markov Chain Model method of
More informationSums of independent random variables
3 Sums of indeendent random variables This lecture collects a number of estimates for sums of indeendent random variables with values in a Banach sace E. We concentrate on sums of the form N γ nx n, where
More informationNamed Entity Recognition using Maximum Entropy Model SEEM5680
Named Entity Recognition using Maximum Entroy Model SEEM5680 Named Entity Recognition System Named Entity Recognition (NER): Identifying certain hrases/word sequences in a free text. Generally it involves
More informationFeedback-error control
Chater 4 Feedback-error control 4.1 Introduction This chater exlains the feedback-error (FBE) control scheme originally described by Kawato [, 87, 8]. FBE is a widely used neural network based controller
More informationOptimal Learning Policies for the Newsvendor Problem with Censored Demand and Unobservable Lost Sales
Otimal Learning Policies for the Newsvendor Problem with Censored Demand and Unobservable Lost Sales Diana Negoescu Peter Frazier Warren Powell Abstract In this aer, we consider a version of the newsvendor
More informationx and y suer from two tyes of additive noise [], [3] Uncertainties e x, e y, where the only rior knowledge is their boundedness and zero mean Gaussian
A New Estimator for Mixed Stochastic and Set Theoretic Uncertainty Models Alied to Mobile Robot Localization Uwe D. Hanebeck Joachim Horn Institute of Automatic Control Engineering Siemens AG, Cororate
More informationResearch Note REGRESSION ANALYSIS IN MARKOV CHAIN * A. Y. ALAMUTI AND M. R. MESHKANI **
Iranian Journal of Science & Technology, Transaction A, Vol 3, No A3 Printed in The Islamic Reublic of Iran, 26 Shiraz University Research Note REGRESSION ANALYSIS IN MARKOV HAIN * A Y ALAMUTI AND M R
More informationTRACES OF SCHUR AND KRONECKER PRODUCTS FOR BLOCK MATRICES
Khayyam J. Math. DOI:10.22034/kjm.2019.84207 TRACES OF SCHUR AND KRONECKER PRODUCTS FOR BLOCK MATRICES ISMAEL GARCÍA-BAYONA Communicated by A.M. Peralta Abstract. In this aer, we define two new Schur and
More informationA New Perspective on Learning Linear Separators with Large L q L p Margins
A New Persective on Learning Linear Searators with Large L q L Margins Maria-Florina Balcan Georgia Institute of Technology Christoher Berlind Georgia Institute of Technology Abstract We give theoretical
More informationFor q 0; 1; : : : ; `? 1, we have m 0; 1; : : : ; q? 1. The set fh j(x) : j 0; 1; ; : : : ; `? 1g forms a basis for the tness functions dened on the i
Comuting with Haar Functions Sami Khuri Deartment of Mathematics and Comuter Science San Jose State University One Washington Square San Jose, CA 9519-0103, USA khuri@juiter.sjsu.edu Fax: (40)94-500 Keywords:
More informationarxiv: v1 [cs.lg] 31 Jul 2014
Learning Nash Equilibria in Congestion Games Walid Krichene Benjamin Drighès Alexandre M. Bayen arxiv:408.007v [cs.lg] 3 Jul 204 Abstract We study the reeated congestion game, in which multile oulations
More informationMobility-Induced Service Migration in Mobile. Micro-Clouds
arxiv:503054v [csdc] 7 Mar 205 Mobility-Induced Service Migration in Mobile Micro-Clouds Shiiang Wang, Rahul Urgaonkar, Ting He, Murtaza Zafer, Kevin Chan, and Kin K LeungTime Oerating after ossible Deartment
More informationUniversal Finite Memory Coding of Binary Sequences
Deartment of Electrical Engineering Systems Universal Finite Memory Coding of Binary Sequences Thesis submitted towards the degree of Master of Science in Electrical and Electronic Engineering in Tel-Aviv
More informationOn split sample and randomized confidence intervals for binomial proportions
On slit samle and randomized confidence intervals for binomial roortions Måns Thulin Deartment of Mathematics, Usala University arxiv:1402.6536v1 [stat.me] 26 Feb 2014 Abstract Slit samle methods have
More informationCombinatorics of topmost discs of multi-peg Tower of Hanoi problem
Combinatorics of tomost discs of multi-eg Tower of Hanoi roblem Sandi Klavžar Deartment of Mathematics, PEF, Unversity of Maribor Koroška cesta 160, 000 Maribor, Slovenia Uroš Milutinović Deartment of
More informationKhinchine inequality for slightly dependent random variables
arxiv:170808095v1 [mathpr] 7 Aug 017 Khinchine inequality for slightly deendent random variables Susanna Sektor Abstract We rove a Khintchine tye inequality under the assumtion that the sum of Rademacher
More informationThe analysis and representation of random signals
The analysis and reresentation of random signals Bruno TOÉSNI Bruno.Torresani@cmi.univ-mrs.fr B. Torrésani LTP Université de Provence.1/30 Outline 1. andom signals Introduction The Karhunen-Loève Basis
More informationGOOD MODELS FOR CUBIC SURFACES. 1. Introduction
GOOD MODELS FOR CUBIC SURFACES ANDREAS-STEPHAN ELSENHANS Abstract. This article describes an algorithm for finding a model of a hyersurface with small coefficients. It is shown that the aroach works in
More informationFactorial moments of point processes
Factorial moments of oint rocesses Jean-Christohe Breton IRMAR - UMR CNRS 6625 Université de Rennes 1 Camus de Beaulieu F-35042 Rennes Cedex France Nicolas Privault Division of Mathematical Sciences School
More informationReal Analysis 1 Fall Homework 3. a n.
eal Analysis Fall 06 Homework 3. Let and consider the measure sace N, P, µ, where µ is counting measure. That is, if N, then µ equals the number of elements in if is finite; µ = otherwise. One usually
More informationRANDOM WALKS AND PERCOLATION: AN ANALYSIS OF CURRENT RESEARCH ON MODELING NATURAL PROCESSES
RANDOM WALKS AND PERCOLATION: AN ANALYSIS OF CURRENT RESEARCH ON MODELING NATURAL PROCESSES AARON ZWIEBACH Abstract. In this aer we will analyze research that has been recently done in the field of discrete
More informationAdaptive Estimation of the Regression Discontinuity Model
Adative Estimation of the Regression Discontinuity Model Yixiao Sun Deartment of Economics Univeristy of California, San Diego La Jolla, CA 9293-58 Feburary 25 Email: yisun@ucsd.edu; Tel: 858-534-4692
More information#A37 INTEGERS 15 (2015) NOTE ON A RESULT OF CHUNG ON WEIL TYPE SUMS
#A37 INTEGERS 15 (2015) NOTE ON A RESULT OF CHUNG ON WEIL TYPE SUMS Norbert Hegyvári ELTE TTK, Eötvös University, Institute of Mathematics, Budaest, Hungary hegyvari@elte.hu François Hennecart Université
More informationON THE LEAST SIGNIFICANT p ADIC DIGITS OF CERTAIN LUCAS NUMBERS
#A13 INTEGERS 14 (014) ON THE LEAST SIGNIFICANT ADIC DIGITS OF CERTAIN LUCAS NUMBERS Tamás Lengyel Deartment of Mathematics, Occidental College, Los Angeles, California lengyel@oxy.edu Received: 6/13/13,
More informationLecture 6. 2 Recurrence/transience, harmonic functions and martingales
Lecture 6 Classification of states We have shown that all states of an irreducible countable state Markov chain must of the same tye. This gives rise to the following classification. Definition. [Classification
More informationdn i where we have used the Gibbs equation for the Gibbs energy and the definition of chemical potential
Chem 467 Sulement to Lectures 33 Phase Equilibrium Chemical Potential Revisited We introduced the chemical otential as the conjugate variable to amount. Briefly reviewing, the total Gibbs energy of a system
More informationIMPROVED BOUNDS IN THE SCALED ENFLO TYPE INEQUALITY FOR BANACH SPACES
IMPROVED BOUNDS IN THE SCALED ENFLO TYPE INEQUALITY FOR BANACH SPACES OHAD GILADI AND ASSAF NAOR Abstract. It is shown that if (, ) is a Banach sace with Rademacher tye 1 then for every n N there exists
More informationRadial Basis Function Networks: Algorithms
Radial Basis Function Networks: Algorithms Introduction to Neural Networks : Lecture 13 John A. Bullinaria, 2004 1. The RBF Maing 2. The RBF Network Architecture 3. Comutational Power of RBF Networks 4.
More informationLEIBNIZ SEMINORMS IN PROBABILITY SPACES
LEIBNIZ SEMINORMS IN PROBABILITY SPACES ÁDÁM BESENYEI AND ZOLTÁN LÉKA Abstract. In this aer we study the (strong) Leibniz roerty of centered moments of bounded random variables. We shall answer a question
More informationBilinear Entropy Expansion from the Decisional Linear Assumption
Bilinear Entroy Exansion from the Decisional Linear Assumtion Lucas Kowalczyk Columbia University luke@cs.columbia.edu Allison Bisho Lewko Columbia University alewko@cs.columbia.edu Abstract We develo
More informationAI*IA 2003 Fusion of Multiple Pattern Classifiers PART III
AI*IA 23 Fusion of Multile Pattern Classifiers PART III AI*IA 23 Tutorial on Fusion of Multile Pattern Classifiers by F. Roli 49 Methods for fusing multile classifiers Methods for fusing multile classifiers
More informationsubstantial literature on emirical likelihood indicating that it is widely viewed as a desirable and natural aroach to statistical inference in a vari
Condence tubes for multile quantile lots via emirical likelihood John H.J. Einmahl Eindhoven University of Technology Ian W. McKeague Florida State University May 7, 998 Abstract The nonarametric emirical
More informationGeneralized Coiflets: A New Family of Orthonormal Wavelets
Generalized Coiflets A New Family of Orthonormal Wavelets Dong Wei, Alan C Bovik, and Brian L Evans Laboratory for Image and Video Engineering Deartment of Electrical and Comuter Engineering The University
More informationStatistical Treatment Choice Based on. Asymmetric Minimax Regret Criteria
Statistical Treatment Coice Based on Asymmetric Minimax Regret Criteria Aleksey Tetenov y Deartment of Economics ortwestern University ovember 5, 007 (JOB MARKET PAPER) Abstract Tis aer studies te roblem
More informationCorrespondence Between Fractal-Wavelet. Transforms and Iterated Function Systems. With Grey Level Maps. F. Mendivil and E.R.
1 Corresondence Between Fractal-Wavelet Transforms and Iterated Function Systems With Grey Level Mas F. Mendivil and E.R. Vrscay Deartment of Alied Mathematics Faculty of Mathematics University of Waterloo
More informationThe power performance of fixed-t panel unit root tests allowing for structural breaks in their deterministic components
ATHES UIVERSITY OF ECOOMICS AD BUSIESS DEPARTMET OF ECOOMICS WORKIG PAPER SERIES 23-203 The ower erformance of fixed-t anel unit root tests allowing for structural breaks in their deterministic comonents
More informationUsing the Divergence Information Criterion for the Determination of the Order of an Autoregressive Process
Using the Divergence Information Criterion for the Determination of the Order of an Autoregressive Process P. Mantalos a1, K. Mattheou b, A. Karagrigoriou b a.deartment of Statistics University of Lund
More informationStatics and dynamics: some elementary concepts
1 Statics and dynamics: some elementary concets Dynamics is the study of the movement through time of variables such as heartbeat, temerature, secies oulation, voltage, roduction, emloyment, rices and
More informationEstimating function analysis for a class of Tweedie regression models
Title Estimating function analysis for a class of Tweedie regression models Author Wagner Hugo Bonat Deartamento de Estatística - DEST, Laboratório de Estatística e Geoinformação - LEG, Universidade Federal
More informationBest approximation by linear combinations of characteristic functions of half-spaces
Best aroximation by linear combinations of characteristic functions of half-saces Paul C. Kainen Deartment of Mathematics Georgetown University Washington, D.C. 20057-1233, USA Věra Kůrková Institute of
More informationPositive decomposition of transfer functions with multiple poles
Positive decomosition of transfer functions with multile oles Béla Nagy 1, Máté Matolcsi 2, and Márta Szilvási 1 Deartment of Analysis, Technical University of Budaest (BME), H-1111, Budaest, Egry J. u.
More informationOptimal Design of Truss Structures Using a Neutrosophic Number Optimization Model under an Indeterminate Environment
Neutrosohic Sets and Systems Vol 14 016 93 University of New Mexico Otimal Design of Truss Structures Using a Neutrosohic Number Otimization Model under an Indeterminate Environment Wenzhong Jiang & Jun
More informationUniform Law on the Unit Sphere of a Banach Space
Uniform Law on the Unit Shere of a Banach Sace by Bernard Beauzamy Société de Calcul Mathématique SA Faubourg Saint Honoré 75008 Paris France Setember 008 Abstract We investigate the construction of a
More information1. INTRODUCTION. Fn 2 = F j F j+1 (1.1)
CERTAIN CLASSES OF FINITE SUMS THAT INVOLVE GENERALIZED FIBONACCI AND LUCAS NUMBERS The beautiful identity R.S. Melham Deartment of Mathematical Sciences, University of Technology, Sydney PO Box 23, Broadway,
More information