Efficient Reinforcement Learning for High Dimensional Linear Quadratic Systems

Size: px
Start display at page:

Download "Efficient Reinforcement Learning for High Dimensional Linear Quadratic Systems"

Transcription

1 Efficient Reinforcement Learning for High Dimensional Linear Quadratic Systems Morteza Ibrahimi Stanford University Stanford, CA Adel Javanmard Stanford University Stanford, CA Benjamin Van Roy Stanford University Stanford, CA Abstract We study the problem of adaptive control of a high dimensional linear quadratic (LQ system. Previous work established the asymptotic convergence to an optimal controller for various adaptive control schemes. More recently, for the average cost LQ problem, a regret bound of O( T was shown, apart form logarithmic factors. However, this bound scales exponentially with p, the dimension of the state space. In this work we consider the case where the matrices describing the dynamic of the LQ system are sparse and their dimensions are large. We present an adaptive control scheme that achieves a regret bound of O(p T, apart from logarithmic factors. In particular, our algorithm has an average cost of (1 + ɛ times the optimum cost after T = polylog(po(1/ɛ 2. This is in comparison to previous work on the dense dynamics where the algorithm requires time that scales exponentially with dimension in order to achieve regret of ɛ times the optimal cost. We believe that our result has prominent applications in the emerging area of computational advertising, in particular targeted online advertising and advertising in social networks. 1 Introduction In this paper we address the problem of adaptive control of a high dimensional linear quadratic (LQ system. Formally, the dynamics of a linear quadratic system are given by x(t + 1 = A 0 x(t + B 0 u(t + w(t + 1, c(t = x(t T Qx(t + u(t T Ru(t, (1 where u(t R r is the control (action at time t, x(t R p is the state at time t, c(t R is the cost at time t, and {w(t + 1} t 0 is a sequence of random vectors in R p with i.i.d. standard Normal entries. The matrices Q R p p and R R r r are positive semi-definite (PSD matrices that determine the cost at each step. The evolution of the system is described through the matrices A 0 R p p and B 0 R p r. Finally by high dimensional system we mean the case where p, r 1. A celebrated fundamental theorem in control theory asserts that the above LQ system can be optimally controlled by a simple linear feedback if the pair (A 0, B 0 is controllable and the pair (A 0, Q 1/2 is observable. The optimal controller can be explicitly computed from the matrices describing the dynamics and the cost. Throughout this paper we assume that controllability and observability conditions hold. When the matrix Θ 0 [A 0, B 0 ] is unknown, the task is that of adaptive control, where the system is to be learned and controlled at the same time. Early works on the adaptive control of LQ systems relied on the certainty equivalence principle [2]. In this scheme at each time t the unknown parameter Θ 0 is estimated based on the observations collected so far and the optimal controller for the 1

2 estimated system is applied. Such controllers are shown to converge to an optimal controller in the case of minimum variance cost, however, in general they may converge to a suboptimal controller [11]. Subsequently, it has been shown that introducing random exploration by adding noise to the control signal, e.g., [14], solves the problem of converging to suboptimal estimates. All the aforementioned work have been concerned with the asymptotic convergence of the controller to an optimal controller. In order to achieve regret bounds, cost-biased parameter estimation [12, 8, 1], in particular the optimism in the face of uncertainty (OFU principle [13] has been shown to be effective. In this method a confidence set S is found such that Θ 0 S with high probability. The system is then controlled using the most optimistic parameter estimates, i.e., Θ S with the smallest optimum cost. The asymptotic convergence of the average cost of OFU for the LQR problem was shown in [6]. This asymptotic result was extended in [1] by providing a bound for the cumulative regret. Assume x(0 = 0 and for a control policy π define the average cost Further, define the cumulative regret as J π = limsup T R(T = 1 T E[c t ]. (2 (c π (t J, (3 where c π (t is the cost of control policy π at time t and J = J(Θ 0 is the optimal average cost. The algorithm proposed in [1] is shown to have cumulative regret Õ( T where Õ is hiding the logarithmic factors. While no lower bound was provided for the regret, comparison with the multiarmed bandit problem where a lower bound of O( T was shown for the general case [9], suggests that this scaling with time for the cumulative regret is optimal. The focus of [1] was on scaling of the regret with time horizon T. However, the regret of the proposed algorithm scales poorly with dimension. More specifically, the analysis in [1] proves a regret bound of R(T < Cp p+r+2 T. The current paper focuses on (many applications where the state and control dimensions are much larger than the time horizon of interest. A powerful reinforcement learning algorithm for these applications should have regret which depends gracefully on dimension. In general, there is little to be achieved when T < p as the number of degrees of freedom (pr + p 2 is larger than the number of observations (T p and any estimator can be arbitrary inaccurate. However, when there is prior knowledge about the unknown parameters A 0, B 0, e.g., when A 0, B 0 are sparse, accurate estimation can be feasible. In particular, [3] proved that under suitable conditions the unknown parameters of a noise driven system (i.e., no control whose dynamics are modeled by linear stochastic differential equations can be estimated accurately with as few as O(log(p samples. However, the result of [3] is not directly applicable here since for a general feedback gain L even if A 0 and B 0 are sparse, the closed loop gain A 0 B 0 L need not be sparse. Furthermore, system dynamics would be correlated with past observations through the estimated gain matrix L. Finally, there is no notion of cost in [3] while here we have to obtain bounds on cost and its scaling with p. In this work we extend the result of [3] by showing that under suitable conditions, unknown parameters of sparse high dimensional LQ systems can be accurately estimated with as few as O(log(p + r observations. Equipped with this efficient learning method, we show that sparse high dimensional LQ systems can be adaptively controlled with regret Õ(p T. To put this result in perspective note that even when x(t = 0, the expected cost at time t + 1 is Ω(p due to the noise. Therefore, the cumulative cost at time T is bounded as Ω(pT. Comparing this to our regret bound, we see that for T = polylog(po( 1 ɛ 2, the cumulative cost of our algorithm is bounded by (1 + ɛ times the optimum cumulative cost. In other words, our algorithm performs close to optimal after polylog(p steps. This is in contrast with the result of [1] where the algorithm needs Ω(p 2p steps in order to achieve regret of ɛ times the optimal cost. Sparse high dimensional LQ systems appear in many engineering applications. Here we are particularly motivated by an emerging field of applications in marketing and advertising. The use of dynamical optimal control models in advertising has a history of at least four decades, cf. [17, 10] for a survey. In these models, often a partial differential equation is used to describe how advertising expenditure over time translates into sales. The basic problem is to find the advertising expenditure that maximizes the net profit. The focus of these works is to model the temporal dynamics of 2

3 the advertising expenditure (the control variable and the variables of interest (sales, goodwill level, etc.. There also exists a rich literature studying the spatial interdependence of consumers and firms behavior to devise marketing schemes [7]. In these models space can be generalized beyond geographies to include notions like demographies and psychometry. Combination of spatial interdependence and temporal dynamics models for optimal advertising was also considered [16, 15]. A simple temporal dynamics model is extended in [15] by allowing state and control variables to have spatial dependence and introducing a diffusive component in the controlled PDE which describes the spatial dynamics. The controlled PDE is then showed to be equivalent to an abstract linear control system of the form dx(t dt = Ax(t + Bu(t. (4 Both [15] and [7] are concerned with the optimal control and the interactions are either dictated by the model or assumed known. Our work deals with a discrete and noisy version of (4 where the dynamics is to be estimated but is known to be sparse. In the model considered in [15] the state variable x lives in an infinite dimensional space. Spatial models in marketing [7] usually consider state variables which have a large number of dimensions, e.g., number of zip codes in the US ( 50K. High dimensional state space and control is a recurring theme in these applications. In particular, with the modern social networks customers are classified in a highly granular way, potentially with each customer representing his own class. With the number of classes and complexity of their interactions, its unlikely that we could formulate an effective model a priori for how classes interact. Further, the nature of these interactions change over time with the changing landscape of Internet services and information available to customers. This makes it important to efficiently learn from real-time data about the nature of these interactions. Notation: We bundle the unknown parameters into one variable Θ 0 = [A 0, B 0 ] R p q where q = p + r and call it the interaction matrix. For v R n, M R m n and p 1, we denote by v p the standard p-norm and by M p the corresponding operator norm. For 1 i m, M i represents the i th row of matrix M. For S [m], J [n], M SJ is the submatrix of M formed by the rows in S and columns in J. For a set S denote by S its cardinality. For an integer n denote by [n] the set {1,..., n}. 2 Algorithm Our algorithm employs the Optimism in the Face of Uncertainty (OFU principle in an episodic fashion. At the beginning of episode i the algorithm constructs a confidence set Ω (i which is guaranteed to include the unknown parameter Θ 0 with high probability. The algorithm then chooses Θ (i Ω (i that has the smallest expected cost as the estimated parameter for episode i and applies the optimal control for the estimated parameter during episode i. The confidence set is constructed using observations from the last episode only but the length of episodes are chosen to increase geometrically allowing for more accurate estimates and shrinkage of the confidence set by a constant factor at each episode. The details of each step and the pseudo code for the algorithm follows. Constructing confidence set: Define τ i to be the start of episode i with τ 0 = 0. Let L (i be the controller that has been chosen for episode i. For t [τ i, τ i+1 the system is controlled by u(t = L (i x(t and the system dynamics can be written as x(t + 1 = (A 0 B 0 L (i x(t + w(t + 1. At the beginning of episode i + 1, first an initial estimate Θ is obtained by solving the following convex optimization problem for each row Θ u R q separately: where L(Θ u = 1 2 τ i+1 Θ (i+1 u argmin L(Θ u + λ Θ u 1, (5 τ i+1 1 t=τ i {x u (t + 1 Θ u L(i x(t} 2, τ i+1 = τ i+1 τ i, (6 3

4 ALGORITHM: Reinforcement learning algorithm for LQ systems. Input: Precision ɛ, failure probability 4δ, initial (ρ, C min, α identifiable controller L (0, l(θ 0, ɛ Output: Series of estimates Θ (i, confidence sets Ω (i and controllers L (i 1: Let l 0 = max(1, max j [r] L (0 j n 0 = k 2 l 2 0 α(1 ρc 2 min 2, and ( 1 ɛ 2 + k (1 ρ 2 ( 1 ɛ 2 + n 1 = k 2 l(θ 0, ɛ 2 (1 ρc 2 min k (1 ρ 2 log( 4kq δ, log( 4kq δ. Let τ 0 = n 0, τ i = 4 i (1 + i/ log(q/δn 1 for i 1, and τ i = i j=0 τ j. 2: for i = 0, 1, 2,... do 3: Apply the control u(t = L (i x(t until τ i+1 1 and observe the trace {x(t} τi t<τ i+1. 4: Calculate the estimate Θ (i+1 from (5 and construct the confidence set Ω (i+1. 5: Calculate Θ (i+1 from (9 and set L (i+1 L( Θ (i+1. (i and L = [I, L (it ] T. The estimator Θ u is known as the LASSO estimator. The first term in the cost function is the normalized negative log likelihood which measures the fidelity to the observations while the second term imposes the sparsity constraint on Θ u. λ is the regularization parameter. For Θ (1, Θ (2 R p q define the distance d(θ (1, Θ (2 as d(θ (1, Θ (2 = max u [p] Θ(1 u Θ (2 u 2, (7 where Θ u is the u th row of the matrix Θ. It is worth noting that for k-sparse matrices with k constant, this distance does not scale with p or q. In particular, if the absolute value of the elements of Θ (1 and Θ (2 are bounded by Θ max then d(θ (1, Θ (2 2 kθ max. Having the estimator Θ (i the algorithm constructs the confidence set for episode i as Ω (i = {Θ R p q d(θ, Θ (i 2 i ɛ}, (8 where ɛ > 0 is an input parameter to the algorithm. For any fixed δ > 0, by choosing τ i judiciously we ensure that with probability at least 1 δ, Θ 0 Ω (i, for all i 1. (see Theorem 3.2. Design of the controller: Let J(Θ be the minimum expected cost if the interaction matrix is Θ = [A, B] and denote by L(Θ the optimal controller that achieves the expected cost J(Θ. The algorithm implements OFU principle by choosing, at the beginning of episode i, an estimate Θ (i Ω (i such that Θ (i argmin J(Θ. (9 Θ Ω (i The optimal control corresponding to Θ (i is then applied during episode i, i.e., u(t = L( Θ (i x(t for t [τ i, τ i+1. Recall that for Θ = [A, B], the optimal controller is given through the following relations K(Θ = Q + A T K(ΘA A T K(ΘB(B T K(ΘB + R 1 B T K(ΘA, (Riccati equation L(Θ = (B T K(ΘB + R 1 B T K(ΘA. The pseudo code for the algorithm is summarized in the table. 3 Main Results In this section we present performance guarantees in terms of cumulative regret and learning accuracy for the presented algorithm. In order to state the theorems, we first need to present some assumptions on the system. 4

5 Given Θ R p q and L R r p, define L = [I, L T ] T R q p and let Λ R p p be a solution to the following Lyapunov equation Λ Θ LΛ L T Θ T = I. (10 If the closed loop system (A 0 B 0 L is stable then the solution to the above equation exists and the state vector x(t has a Normal stationary distribution with covariance Λ. We proceed by introducing an identifiable regulator. Definition 3.1. For a k-sparse matrix Θ 0 = [A 0, B 0 ] R p q and L R r p, define L = [I, L T ] T R q p and let H = LΛ L T where Λ is the solution of Eq. (10 with Θ = Θ 0. Define L to be (ρ, C min, α identifiable (with respect to Θ 0 if it satisfies the following conditions for all S [q], S k. (1 A 0 B 0 L 2 ρ < 1, (2 λ min (H SS C min, (3 H S c SH 1 SS 1 α. The first condition simply states that if the system is controlled using the regulator L then the closed loop autonomous system is asymptotically stable. The second and third conditions are similar to what is referred to in the sparse signal recovery literature as the mutual incoherence or irreprepresentable conditions. Various examples and results exist for the matrix families that satisfy these conditions [18]. Let S be the set of indices of the nonzero entries in a specific row of Θ 0. The second condition states that the corresponding entries in the extended state variable y = [x T, u T ] are sufficiently distinguishable from each other. In other words, if the trajectories corresponding to this group of state variables are observed, non of them can be well approximated as a linear combination of the others. The third condition can be thought of as a quantification of the first vs. higher order dependencies. Consider entry j in the extended state variable. Then, the dynamic of y j is directly influenced by entries y S. However they are also influenced indirectly by other entries of y. The third condition roughly states that the indirect influences are sufficiently weaker than the direct influences. There exists a vast literature on the applicability of these conditions and scenarios in which they are known to hold. These conditions are almost necessary for the successful recovery by l 1 relaxation. For a discussion on these and other similar conditions imposed for sparse signal recovery we refer the reader to [19] and [20] and the references therein. Define Θ min = min i [p],j [q],θ 0 ij 0 Θ 0 ij. Our first result states that the system can be learned efficiently from its trajectory observations when it is controlled by an identifiable regulator. Theorem 3.2. Consider the LQ system of Eq. (1 and assume Θ 0 = [A 0, B 0 ] is k-sparse. Let u(t = Lx(t where L is a (ρ, C min, α identifiable regulator with respect to Θ 0 and define l = max(1, max j [r] L j 2. Let n denote the number of samples of the trajectory that is observed. For any 0 < ɛ < min(θ min, l 2, 3 1 ρ n k 2 l 2 α 2 (1 ρc 2 min, there exists λ such that, if ( 1 ɛ 2 + k (1 ρ 2 log( 4kq δ, (11 then the l 1 -regularized least squares solution Θ of Eq. (5 satisfies d( Θ, Θ 0 ɛ with probability larger than 1 δ. In particular, this is achieved by taking λ = 6l log(4q/δ/(nα 2 (1 ρ. Our second result states that equipped with an efficient learning algorithm, the LQ system of Eq. (1 can be controlled with regret Õ(p T log 3 2 (1/δ under suitable assumptions. Define an ɛ-neighborhood of Θ 0 as N ɛ (Θ 0 = {Θ R p q d(θ 0, Θ ɛ}. Our assumption asserts the identifiably of L(Θ for Θ close to Θ 0. Assumption: There exist ɛ, C > 0 such that for all Θ N ɛ (Θ 0, L(Θ is identifiable w.r.t. Θ 0 and Also define σ L (Θ 0, ɛ = sup Θ N ɛ(θ 0 l(θ 0, ɛ = L(Θ 2 C, σ K (Θ 0, ɛ = sup K(Θ 2 C. Θ N ɛ(θ 0 sup max(1, max L j(θ 2. Θ N ɛ(θ 0 j [r] Note that l(θ 0, ɛ max(c, 1, since max j [r] L j (Θ 2 L(Θ 2. 5

6 Theorem 3.3. Consider the LQ system of Eq. (1. For some constants ɛ, C min and 0 < α, ρ < 1, assume that an initial (ρ, C min, α identifiable regulator L (0 is given. Further, assume that for any Θ N ɛ (Θ 0, L(Θ is (ρ, C min, α identifiable. Then, with probability at least 1 δ the cumulative regret of ALGORITHM (cf. the table is bounded as where Õ is hiding the logarithmic factors. 4 Analysis 4.1 Proof of Theorem 3.2 R(T Õ(p T log 3 2 (1/δ, (12 To prove theorem 3.2 we first state a set of sufficient conditions for the solution of the l 1 -regularized least squares to be within some distance, as defined by d(,, of the true parameter. Subsequently, we prove that these conditions hold with high probability. Define X = [x(0, x(1,..., x(n 1] R p n and let W = [w(1,..., w(n] R p n be the matrix containing the Gaussian noise realization. Further let the W u denote the u th row of W. Define the normalized gradient and Hessian of the likelihood function (6 as Ĝ = L(Θ 0 u = 1 n LXW T u, Ĥ = 2 L(Θ 0 u = 1 n LXX T LT. (13 The following proposition, a proof of which can be found in [20], provides a set of sufficient conditions for the accuracy of the l 1 -regularized least squares solution. Proposition 4.1. Let S be the support of Θ 0 u with S < k, and H be defined per Definition 3.1. Assume there exist 0 < α < 1 and C min > 0 such that λ min (H S,S C min, For any 0 < ɛ < Θ min if the following conditions hold H Sc,SH 1 S,S 1 α. (14 Ĝ λα 3, ĜS ɛc min λ, (15 4k ĤS C S H SC S α C min, ĤSS H SS α C min, (16 12 k 12 k the l 1 -regularized least square solution (5 satisfies d( Θ u, Θ 0 u ɛ. In the sequel, we prove that the conditions in Proposition 4.1 hold with high probability given that the assumptions of Theorem 3.2 are satisfied. A few lemmas are in order proofs of which are deferred to the Appendix. The first lemma states that Ĝ concentrates in infinity norm around its mean of zero. Lemma 4.2. Assume ρ = A 0 B 0 L 2 < 1 and let l = max(1, max i [r] L i 2. Then, for any S [q] and 0 < ɛ < l 2 P { ĜS > ɛ } n(1 ρɛ2 2 S exp ( 4l 2. (17 To prove the conditions in Eq. (16 we first bound in the following lemma the absolute deviations of the elements of Ĥ from their mean H, i.e., Ĥij H ij. Lemma 4.3. Let i, j [q], ρ = A 0 B 0 L 2 < 1, and 0 < ɛ < 3 1 ρ < n. Then, P( Ĥij H ij > ɛ 2 exp ( n(1 ρ3 ɛ 2 24l 2. (18 The following corollary of Lemma 4.3 bounds ĤJS H JS for J, S [q]. 6

7 Corollary 4.4. Let J, S [q], ρ = A 0 B 0 L 2 < 1, ɛ < 3 S 1 ρ, and n > 3 1 ρ. Then, P( ĤJS H JS > ɛ 2 J S exp ( n(1 ρ3 ɛ 2 24 S 2 l 2. (19 The proof of Corollary 4.4 is by applying union bound as P( ĤJS H JS > ɛ J S max i J,j S P( Ĥij H ij > ɛ/ S. (20 Proof of Theorem 3.2. We show that the conditions given by Proposition 4.1 hold. The conditions in Eq. (14 are true by the assumption of identifiability of L with respect to Θ 0. In order to make the first constraint on Ĝ imply the second constraint on Ĝ, we assume that λα/3 ɛc min/(4k λ, which is ensured to hold if λ ɛc min /(6k. By Lemma 4.2, P( Ĝ > λα/3 δ/2 if λ 2 36l 2 = n(1 ρα 2 log(4q δ. (21 Requiring λ ɛc min /(6k, we obtain 36 2 k 2 l 2 n ɛ 2 α 2 Cmin 2 log(4q (1 ρ δ. (22 The conditions on Ĥ can also be aggregated as Ĥ[q],S H [q],s αc min /(12 k. By Corollary 4.4, P( Ĥ[q]S H [q]s > αc min /(12 k δ/2 if 3456 k 3 l 2 n α 2 (1 ρ 3 Cmin 2 log( 4kq. (23 δ Merging the conditions in Eq. (22 and (23 we conclude that the conditions in Proposition 4.1 hold with probability at least 1 δ if n k 2 l 2 ( 1 α 2 (1 ρcmin 2 ɛ 2 + k (1 ρ 2 log( 4kq. (24 δ Which finishes the proof of Theorem Proof of Theorem 3.3 The high-level idea of the proof is similar to the proof of main Theorem in [1]. First, we give a decomposition for the gap between the cost obtained by the algorithm and the optimal cost. We then upper bound each term of the decomposition separately Cost Decomposition Writing the Bellman optimality equations [5, 4] for average cost dynamic programming, we get { J( Θ t + x(t T K( Θ t x(t = min x(t T Qx(t + u T Ru + E [ z(t + 1 T ] } K( Θ t z(t + 1 F t, u where Θ t = [Ã, B] is the estimate used at time t, z(t + 1 = Ãtx(t + B t u + w(t + 1, and F t is the σ-field generated by the variables {(z τ, x τ } t τ=0. Notice that the left-hand side is the average cost occurred with initial state x(t [5, 4]. Therefore, J( Θ t + x(t T K( Θ t x(t = x(t T Qx(t + u(t T Ru(t + E [ (Ãtx(t + B t u(t + w(t + 1 T K( Θ t (Ãtx(t + B t u(t + w(t + 1 F t ] = x(t T Qx(t + u(t T Ru(t + E [ (Ãtx(t + B t u(t T K( Θ t (Ãtx(t + B t u(t F t ] + E [ w(t + 1 T K( Θ t w(t + 1 F t ] = x(t T Qx(t + u(t T Ru(t + E [ x(t + 1 T ] K( Θ t x(t + 1 F t ( + (Ãtx(t + B t u(t T K( Θ t (Ãtx(t + B t u(t (A 0 x(t + B 0 u(t T K( Θ t (A 0 x(t + B 0 u(t. 7

8 Consequently where ( x(t T Qx(t + u(t T Ru(t = C 1 = C 2 = C 3 = Good events J( Θ t + C 1 + C 2 + C 3, (25 ( x(t T K( Θ t x(t E [ x(t + 1 T K( Θ t+1 x(t + 1 Ft ], (26 E [ x(t + 1 T (K( Θ t K( Θ t+1 x(t + 1 ] F t, (27 ( (Ãtx(t + B t u(t T K( Θ t (Ãtx(t + B t u(t (A 0 x(t + B 0 u(t T K( Θ t (A 0 x(t + B 0 u(t. (28 We proceed by defining the following two events in the probability space under which we can bound the terms C 1, C 2, C 3. We then provide a lower bound on the probability of these events. E 1 = {Θ 0 Ω (i, for i 1}, E 2 = { w(t 2 p log(t/δ, for 1 t T + 1} Technical lemmas The following lemmas establish upper bounds on C 1, C 2, C 3. Lemma 4.5. Under the event E 1 E 2, the following holds with probability at least 1 δ. 128 C T C 1 T p log( (1 ρ 2 δ log( 1 δ. (29 Lemma 4.6. Under the event E 1 E 2, the following holds. C 2 8C (1 ρ 2 p log(t log T. (30 δ Lemma 4.7. Under the event E 1 E 2, the following holds with probability at least 1 δ. ( C 5 2 ( C k 1 + kɛ2 1 ρ (1 ρ C log( pt C min δ log( 4kq δ p log T T. (31 Lemma 4.8. The following holds true. P(E 1 1 δ, P(E 2 1 δ. (32 Therefore, P(E 1 E 2 1 2δ. We are now in position to prove Theorem 3.3. Proof (Theorem 3.3. Using cost decomposition (Eq. (25, under the event E 1 E 2, we have (x(t T Qx(t + u(t T Ru(t = J( Θ t + C 1 + C 2 + C 3 T J(Θ 0 + C 1 + C 2 + C 3, where the last inequality stems from the choice of Θ t by the algorithm (cf. Eq (9 and the fact that Θ 0 Ω t, for all t under the event E 1. Hence, R(T C 1 + C 2 + C 3. Now using the bounds on C 1, C 2, C 3, we get the desired result. Acknowledgments The authors thank the anonymous reviewers for their insightful comments. A.J. is supported by a Caroline and Fabian Pease Stanford Graduate Fellowship. 8

9 References [1] Y. Abbasi-Yadkori and C. Szepesvári. Regret bounds for the adaptive control of linear quadratic systems. Proceeding of the 24th Annual Conference on Learning Theory, pages 1 26, [2] Y. Bar-Shalom and E. Tse. Dual effect, certainty equivalence, and separation in stochastic control. Automatic Control, IEEE Transactions on, 19(5: , [3] J. Bento, M. Ibrahimi, and A. Montanari. Learning networks of stochastic differential equations. Advances in Neural Information Processing Systems 23, pages , [4] D. Bertsekas. Dynamic Programming: Deterministic and Stochastic Models. Prentice-Hall, [5] D. P. Bertsekas. Dynamic Programming and Optimal Control. Athena Scientific, 3rd edition, [6] S. Bittanti and M. Campi. Adaptive control of linear time invariant systems: the bet on the best principle. Communications in Information and Systems, 6(4: , [7] E. Bradlow, B. Bronnenberg, G. Russell, N. Arora, D. Bell, S. Duvvuri, F. Hofstede, C. Sismeiro, R. Thomadsen, and S. Yang. Spatial models in marketing. Marketing Letters, 16(3: , [8] M. Campi. Achieving optimality in adaptive control: the bet on the best approach. In Decision and Control, 1997., Proceedings of the 36th IEEE Conference on, volume 5, pages IEEE, [9] V. Dani, T. Hayes, and S. Kakade. Stochastic linear optimization under bandit feedback. In Proceedings of the 21st Annual Conference on Learning Theory (COLT, [10] G. Feichtinger, R. Hartl, and S. Sethi. Dynamic optimal control models in advertising: recent developments. Management Science, pages , [11] L. Guo and H. Chen. The åstrom-wittenmark self-tuning regulator revisited and els-based adaptive trackers. Automatic Control, IEEE Transactions on, 36(7: , [12] P. Kumar and A. Becker. A new family of optimal adaptive controllers for markov chains. Automatic Control, IEEE Transactions on, 27(1: , [13] T. Lai and H. Robbins. Asymptotically efficient adaptive allocation rules. Advances in applied mathematics, 6(1:4 22, [14] T. Lai and C. Wei. Least squares estimates in stochastic regression models with applications to identification and control of dynamic systems. The Annals of Statistics, 10(1: , [15] C. Marinelli and S. Savin. Optimal distributed dynamic advertising. Journal of Optimization Theory and Applications, 137(3: , [16] T. Seidman, S. Sethi, and N. Derzko. Dynamics and optimization of a distributed salesadvertising model. Journal of Optimization Theory and Applications, 52(3: , [17] S. Sethi. Dynamic optimal control models in advertising: a survey. SIAM review, pages , [18] J. Tropp. Just relax: Convex programming methods for identifying sparse signals in noise. Information Theory, IEEE Transactions on, 52(3: , [19] M. Wainwright. Sharp thresholds for high-dimensional and noisy sparsity recovery usingconstrained quadratic programming (lasso. Information Theory, IEEE Transactions on, 55(5: , [20] P. Zhao and B. Yu. On model selection consistency of Lasso. The Journal of Machine Learning Research, 7: ,

arxiv: v1 [stat.ml] 24 Mar 2013

arxiv: v1 [stat.ml] 24 Mar 2013 Efficient Reinforcement Learning for High Dimensional Linear Quadratic Systems arxiv:1303.5984v1 [stat.ml] 24 Mar 2013 Morteza Ibrahimi Stanford University Stanford, CA 94305 ibrahimi@stanford.edu Adel

More information

Sparse Linear Contextual Bandits via Relevance Vector Machines

Sparse Linear Contextual Bandits via Relevance Vector Machines Sparse Linear Contextual Bandits via Relevance Vector Machines Davis Gilton and Rebecca Willett Electrical and Computer Engineering University of Wisconsin-Madison Madison, WI 53706 Email: gilton@wisc.edu,

More information

STAT 200C: High-dimensional Statistics

STAT 200C: High-dimensional Statistics STAT 200C: High-dimensional Statistics Arash A. Amini May 30, 2018 1 / 57 Table of Contents 1 Sparse linear models Basis Pursuit and restricted null space property Sufficient conditions for RNS 2 / 57

More information

The geometry of Gaussian processes and Bayesian optimization. Contal CMLA, ENS Cachan

The geometry of Gaussian processes and Bayesian optimization. Contal CMLA, ENS Cachan The geometry of Gaussian processes and Bayesian optimization. Contal CMLA, ENS Cachan Background: Global Optimization and Gaussian Processes The Geometry of Gaussian Processes and the Chaining Trick Algorithm

More information

Csaba Szepesvári 1. University of Alberta. Machine Learning Summer School, Ile de Re, France, 2008

Csaba Szepesvári 1. University of Alberta. Machine Learning Summer School, Ile de Re, France, 2008 LEARNING THEORY OF OPTIMAL DECISION MAKING PART I: ON-LINE LEARNING IN STOCHASTIC ENVIRONMENTS Csaba Szepesvári 1 1 Department of Computing Science University of Alberta Machine Learning Summer School,

More information

A Tour of Reinforcement Learning The View from Continuous Control. Benjamin Recht University of California, Berkeley

A Tour of Reinforcement Learning The View from Continuous Control. Benjamin Recht University of California, Berkeley A Tour of Reinforcement Learning The View from Continuous Control Benjamin Recht University of California, Berkeley trustable, scalable, predictable Control Theory! Reinforcement Learning is the study

More information

IEOR 265 Lecture 3 Sparse Linear Regression

IEOR 265 Lecture 3 Sparse Linear Regression IOR 65 Lecture 3 Sparse Linear Regression 1 M Bound Recall from last lecture that the reason we are interested in complexity measures of sets is because of the following result, which is known as the M

More information

Tractable Upper Bounds on the Restricted Isometry Constant

Tractable Upper Bounds on the Restricted Isometry Constant Tractable Upper Bounds on the Restricted Isometry Constant Alex d Aspremont, Francis Bach, Laurent El Ghaoui Princeton University, École Normale Supérieure, U.C. Berkeley. Support from NSF, DHS and Google.

More information

Compressibility of Infinite Sequences and its Interplay with Compressed Sensing Recovery

Compressibility of Infinite Sequences and its Interplay with Compressed Sensing Recovery Compressibility of Infinite Sequences and its Interplay with Compressed Sensing Recovery Jorge F. Silva and Eduardo Pavez Department of Electrical Engineering Information and Decision Systems Group Universidad

More information

Subject: Optimal Control Assignment-1 (Related to Lecture notes 1-10)

Subject: Optimal Control Assignment-1 (Related to Lecture notes 1-10) Subject: Optimal Control Assignment- (Related to Lecture notes -). Design a oil mug, shown in fig., to hold as much oil possible. The height and radius of the mug should not be more than 6cm. The mug must

More information

1 Regression with High Dimensional Data

1 Regression with High Dimensional Data 6.883 Learning with Combinatorial Structure ote for Lecture 11 Instructor: Prof. Stefanie Jegelka Scribe: Xuhong Zhang 1 Regression with High Dimensional Data Consider the following regression problem:

More information

High-dimensional graphical model selection: Practical and information-theoretic limits

High-dimensional graphical model selection: Practical and information-theoretic limits 1 High-dimensional graphical model selection: Practical and information-theoretic limits Martin Wainwright Departments of Statistics, and EECS UC Berkeley, California, USA Based on joint work with: John

More information

Learning Algorithms for Minimizing Queue Length Regret

Learning Algorithms for Minimizing Queue Length Regret Learning Algorithms for Minimizing Queue Length Regret Thomas Stahlbuhk Massachusetts Institute of Technology Cambridge, MA Brooke Shrader MIT Lincoln Laboratory Lexington, MA Eytan Modiano Massachusetts

More information

Deterministic Dynamic Programming

Deterministic Dynamic Programming Deterministic Dynamic Programming 1 Value Function Consider the following optimal control problem in Mayer s form: V (t 0, x 0 ) = inf u U J(t 1, x(t 1 )) (1) subject to ẋ(t) = f(t, x(t), u(t)), x(t 0

More information

On Stochastic Adaptive Control & its Applications. Bozenna Pasik-Duncan University of Kansas, USA

On Stochastic Adaptive Control & its Applications. Bozenna Pasik-Duncan University of Kansas, USA On Stochastic Adaptive Control & its Applications Bozenna Pasik-Duncan University of Kansas, USA ASEAS Workshop, AFOSR, 23-24 March, 2009 1. Motivation: Work in the 1970's 2. Adaptive Control of Continuous

More information

Approximate Dynamic Programming

Approximate Dynamic Programming Approximate Dynamic Programming A. LAZARIC (SequeL Team @INRIA-Lille) Ecole Centrale - Option DAD SequeL INRIA Lille EC-RL Course Value Iteration: the Idea 1. Let V 0 be any vector in R N A. LAZARIC Reinforcement

More information

Stochastic Optimization with Inequality Constraints Using Simultaneous Perturbations and Penalty Functions

Stochastic Optimization with Inequality Constraints Using Simultaneous Perturbations and Penalty Functions International Journal of Control Vol. 00, No. 00, January 2007, 1 10 Stochastic Optimization with Inequality Constraints Using Simultaneous Perturbations and Penalty Functions I-JENG WANG and JAMES C.

More information

Oblivious Equilibrium: A Mean Field Approximation for Large-Scale Dynamic Games

Oblivious Equilibrium: A Mean Field Approximation for Large-Scale Dynamic Games Oblivious Equilibrium: A Mean Field Approximation for Large-Scale Dynamic Games Gabriel Y. Weintraub, Lanier Benkard, and Benjamin Van Roy Stanford University {gweintra,lanierb,bvr}@stanford.edu Abstract

More information

Learning discrete graphical models via generalized inverse covariance matrices

Learning discrete graphical models via generalized inverse covariance matrices Learning discrete graphical models via generalized inverse covariance matrices Duzhe Wang, Yiming Lv, Yongjoon Kim, Young Lee Department of Statistics University of Wisconsin-Madison {dwang282, lv23, ykim676,

More information

MDP Preliminaries. Nan Jiang. February 10, 2019

MDP Preliminaries. Nan Jiang. February 10, 2019 MDP Preliminaries Nan Jiang February 10, 2019 1 Markov Decision Processes In reinforcement learning, the interactions between the agent and the environment are often described by a Markov Decision Process

More information

(Part 1) High-dimensional statistics May / 41

(Part 1) High-dimensional statistics May / 41 Theory for the Lasso Recall the linear model Y i = p j=1 β j X (j) i + ɛ i, i = 1,..., n, or, in matrix notation, Y = Xβ + ɛ, To simplify, we assume that the design X is fixed, and that ɛ is N (0, σ 2

More information

Approximate Dynamic Programming

Approximate Dynamic Programming Master MVA: Reinforcement Learning Lecture: 5 Approximate Dynamic Programming Lecturer: Alessandro Lazaric http://researchers.lille.inria.fr/ lazaric/webpage/teaching.html Objectives of the lecture 1.

More information

Optimization methods

Optimization methods Lecture notes 3 February 8, 016 1 Introduction Optimization methods In these notes we provide an overview of a selection of optimization methods. We focus on methods which rely on first-order information,

More information

EE C128 / ME C134 Feedback Control Systems

EE C128 / ME C134 Feedback Control Systems EE C128 / ME C134 Feedback Control Systems Lecture Additional Material Introduction to Model Predictive Control Maximilian Balandat Department of Electrical Engineering & Computer Science University of

More information

High-dimensional Statistical Models

High-dimensional Statistical Models High-dimensional Statistical Models Pradeep Ravikumar UT Austin MLSS 2014 1 Curse of Dimensionality Statistical Learning: Given n observations from p(x; θ ), where θ R p, recover signal/parameter θ. For

More information

Improved Algorithms for Linear Stochastic Bandits

Improved Algorithms for Linear Stochastic Bandits Improved Algorithms for Linear Stochastic Bandits Yasin Abbasi-Yadkori abbasiya@ualberta.ca Dept. of Computing Science University of Alberta Dávid Pál dpal@google.com Dept. of Computing Science University

More information

1 Lyapunov theory of stability

1 Lyapunov theory of stability M.Kawski, APM 581 Diff Equns Intro to Lyapunov theory. November 15, 29 1 1 Lyapunov theory of stability Introduction. Lyapunov s second (or direct) method provides tools for studying (asymptotic) stability

More information

1 Problem Formulation

1 Problem Formulation Book Review Self-Learning Control of Finite Markov Chains by A. S. Poznyak, K. Najim, and E. Gómez-Ramírez Review by Benjamin Van Roy This book presents a collection of work on algorithms for learning

More information

High Dimensional Inverse Covariate Matrix Estimation via Linear Programming

High Dimensional Inverse Covariate Matrix Estimation via Linear Programming High Dimensional Inverse Covariate Matrix Estimation via Linear Programming Ming Yuan October 24, 2011 Gaussian Graphical Model X = (X 1,..., X p ) indep. N(µ, Σ) Inverse covariance matrix Σ 1 = Ω = (ω

More information

Linear Regression and Its Applications

Linear Regression and Its Applications Linear Regression and Its Applications Predrag Radivojac October 13, 2014 Given a data set D = {(x i, y i )} n the objective is to learn the relationship between features and the target. We usually start

More information

On Piecewise Quadratic Control-Lyapunov Functions for Switched Linear Systems

On Piecewise Quadratic Control-Lyapunov Functions for Switched Linear Systems On Piecewise Quadratic Control-Lyapunov Functions for Switched Linear Systems Wei Zhang, Alessandro Abate, Michael P. Vitus and Jianghai Hu Abstract In this paper, we prove that a discrete-time switched

More information

IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 57, NO. 11, NOVEMBER On the Performance of Sparse Recovery

IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 57, NO. 11, NOVEMBER On the Performance of Sparse Recovery IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 57, NO. 11, NOVEMBER 2011 7255 On the Performance of Sparse Recovery Via `p-minimization (0 p 1) Meng Wang, Student Member, IEEE, Weiyu Xu, and Ao Tang, Senior

More information

Approximate Dynamic Programming

Approximate Dynamic Programming Approximate Dynamic Programming A. LAZARIC (SequeL Team @INRIA-Lille) ENS Cachan - Master 2 MVA SequeL INRIA Lille MVA-RL Course Approximate Dynamic Programming (a.k.a. Batch Reinforcement Learning) A.

More information

An Empirical Algorithm for Relative Value Iteration for Average-cost MDPs

An Empirical Algorithm for Relative Value Iteration for Average-cost MDPs 2015 IEEE 54th Annual Conference on Decision and Control CDC December 15-18, 2015. Osaka, Japan An Empirical Algorithm for Relative Value Iteration for Average-cost MDPs Abhishek Gupta Rahul Jain Peter

More information

On the Complexity of Best Arm Identification in Multi-Armed Bandit Models

On the Complexity of Best Arm Identification in Multi-Armed Bandit Models On the Complexity of Best Arm Identification in Multi-Armed Bandit Models Aurélien Garivier Institut de Mathématiques de Toulouse Information Theory, Learning and Big Data Simons Institute, Berkeley, March

More information

Compressed Sensing and Sparse Recovery

Compressed Sensing and Sparse Recovery ELE 538B: Sparsity, Structure and Inference Compressed Sensing and Sparse Recovery Yuxin Chen Princeton University, Spring 217 Outline Restricted isometry property (RIP) A RIPless theory Compressed sensing

More information

Solving Corrupted Quadratic Equations, Provably

Solving Corrupted Quadratic Equations, Provably Solving Corrupted Quadratic Equations, Provably Yuejie Chi London Workshop on Sparse Signal Processing September 206 Acknowledgement Joint work with Yuanxin Li (OSU), Huishuai Zhuang (Syracuse) and Yingbin

More information

Robotics. Control Theory. Marc Toussaint U Stuttgart

Robotics. Control Theory. Marc Toussaint U Stuttgart Robotics Control Theory Topics in control theory, optimal control, HJB equation, infinite horizon case, Linear-Quadratic optimal control, Riccati equations (differential, algebraic, discrete-time), controllability,

More information

OWL to the rescue of LASSO

OWL to the rescue of LASSO OWL to the rescue of LASSO IISc IBM day 2018 Joint Work R. Sankaran and Francis Bach AISTATS 17 Chiranjib Bhattacharyya Professor, Department of Computer Science and Automation Indian Institute of Science,

More information

Basics of reinforcement learning

Basics of reinforcement learning Basics of reinforcement learning Lucian Buşoniu TMLSS, 20 July 2018 Main idea of reinforcement learning (RL) Learn a sequential decision policy to optimize the cumulative performance of an unknown system

More information

Lecture 4: Lower Bounds (ending); Thompson Sampling

Lecture 4: Lower Bounds (ending); Thompson Sampling CMSC 858G: Bandits, Experts and Games 09/12/16 Lecture 4: Lower Bounds (ending); Thompson Sampling Instructor: Alex Slivkins Scribed by: Guowei Sun,Cheng Jie 1 Lower bounds on regret (ending) Recap from

More information

Target Localization and Circumnavigation Using Bearing Measurements in 2D

Target Localization and Circumnavigation Using Bearing Measurements in 2D Target Localization and Circumnavigation Using Bearing Measurements in D Mohammad Deghat, Iman Shames, Brian D. O. Anderson and Changbin Yu Abstract This paper considers the problem of localization and

More information

Mathematical Optimization Models and Applications

Mathematical Optimization Models and Applications Mathematical Optimization Models and Applications Yinyu Ye Department of Management Science and Engineering Stanford University Stanford, CA 94305, U.S.A. http://www.stanford.edu/ yyye Chapters 1, 2.1-2,

More information

An iterative hard thresholding estimator for low rank matrix recovery

An iterative hard thresholding estimator for low rank matrix recovery An iterative hard thresholding estimator for low rank matrix recovery Alexandra Carpentier - based on a joint work with Arlene K.Y. Kim Statistical Laboratory, Department of Pure Mathematics and Mathematical

More information

Signal Recovery from Permuted Observations

Signal Recovery from Permuted Observations EE381V Course Project Signal Recovery from Permuted Observations 1 Problem Shanshan Wu (sw33323) May 8th, 2015 We start with the following problem: let s R n be an unknown n-dimensional real-valued signal,

More information

New Coherence and RIP Analysis for Weak. Orthogonal Matching Pursuit

New Coherence and RIP Analysis for Weak. Orthogonal Matching Pursuit New Coherence and RIP Analysis for Wea 1 Orthogonal Matching Pursuit Mingrui Yang, Member, IEEE, and Fran de Hoog arxiv:1405.3354v1 [cs.it] 14 May 2014 Abstract In this paper we define a new coherence

More information

Sparsity Models. Tong Zhang. Rutgers University. T. Zhang (Rutgers) Sparsity Models 1 / 28

Sparsity Models. Tong Zhang. Rutgers University. T. Zhang (Rutgers) Sparsity Models 1 / 28 Sparsity Models Tong Zhang Rutgers University T. Zhang (Rutgers) Sparsity Models 1 / 28 Topics Standard sparse regression model algorithms: convex relaxation and greedy algorithm sparse recovery analysis:

More information

Notes on Tabular Methods

Notes on Tabular Methods Notes on Tabular ethods Nan Jiang September 28, 208 Overview of the methods. Tabular certainty-equivalence Certainty-equivalence is a model-based RL algorithm, that is, it first estimates an DP model from

More information

Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems, Part I. Sébastien Bubeck Theory Group

Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems, Part I. Sébastien Bubeck Theory Group Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems, Part I Sébastien Bubeck Theory Group i.i.d. multi-armed bandit, Robbins [1952] i.i.d. multi-armed bandit, Robbins [1952] Known

More information

Path Integral Stochastic Optimal Control for Reinforcement Learning

Path Integral Stochastic Optimal Control for Reinforcement Learning Preprint August 3, 204 The st Multidisciplinary Conference on Reinforcement Learning and Decision Making RLDM203 Path Integral Stochastic Optimal Control for Reinforcement Learning Farbod Farshidian Institute

More information

Distributed Optimization. Song Chong EE, KAIST

Distributed Optimization. Song Chong EE, KAIST Distributed Optimization Song Chong EE, KAIST songchong@kaist.edu Dynamic Programming for Path Planning A path-planning problem consists of a weighted directed graph with a set of n nodes N, directed links

More information

Chapter 2 Event-Triggered Sampling

Chapter 2 Event-Triggered Sampling Chapter Event-Triggered Sampling In this chapter, some general ideas and basic results on event-triggered sampling are introduced. The process considered is described by a first-order stochastic differential

More information

ECE 8201: Low-dimensional Signal Models for High-dimensional Data Analysis

ECE 8201: Low-dimensional Signal Models for High-dimensional Data Analysis ECE 8201: Low-dimensional Signal Models for High-dimensional Data Analysis Lecture 7: Matrix completion Yuejie Chi The Ohio State University Page 1 Reference Guaranteed Minimum-Rank Solutions of Linear

More information

High-dimensional graphical model selection: Practical and information-theoretic limits

High-dimensional graphical model selection: Practical and information-theoretic limits 1 High-dimensional graphical model selection: Practical and information-theoretic limits Martin Wainwright Departments of Statistics, and EECS UC Berkeley, California, USA Based on joint work with: John

More information

Lecture 20: Linear Dynamics and LQG

Lecture 20: Linear Dynamics and LQG CSE599i: Online and Adaptive Machine Learning Winter 2018 Lecturer: Kevin Jamieson Lecture 20: Linear Dynamics and LQG Scribes: Atinuke Ademola-Idowu, Yuanyuan Shi Disclaimer: These notes have not been

More information

Uniqueness Conditions for A Class of l 0 -Minimization Problems

Uniqueness Conditions for A Class of l 0 -Minimization Problems Uniqueness Conditions for A Class of l 0 -Minimization Problems Chunlei Xu and Yun-Bin Zhao October, 03, Revised January 04 Abstract. We consider a class of l 0 -minimization problems, which is to search

More information

A Bootstrap Lasso + Partial Ridge Method to Construct Confidence Intervals for Parameters in High-dimensional Sparse Linear Models

A Bootstrap Lasso + Partial Ridge Method to Construct Confidence Intervals for Parameters in High-dimensional Sparse Linear Models A Bootstrap Lasso + Partial Ridge Method to Construct Confidence Intervals for Parameters in High-dimensional Sparse Linear Models Jingyi Jessica Li Department of Statistics University of California, Los

More information

Robust Principal Component Analysis

Robust Principal Component Analysis ELE 538B: Mathematics of High-Dimensional Data Robust Principal Component Analysis Yuxin Chen Princeton University, Fall 2018 Disentangling sparse and low-rank matrices Suppose we are given a matrix M

More information

Convex relaxation for Combinatorial Penalties

Convex relaxation for Combinatorial Penalties Convex relaxation for Combinatorial Penalties Guillaume Obozinski Equipe Imagine Laboratoire d Informatique Gaspard Monge Ecole des Ponts - ParisTech Joint work with Francis Bach Fête Parisienne in Computation,

More information

arxiv: v4 [math.oc] 5 Jan 2016

arxiv: v4 [math.oc] 5 Jan 2016 Restarted SGD: Beating SGD without Smoothness and/or Strong Convexity arxiv:151.03107v4 [math.oc] 5 Jan 016 Tianbao Yang, Qihang Lin Department of Computer Science Department of Management Sciences The

More information

Lecture 6: CS395T Numerical Optimization for Graphics and AI Line Search Applications

Lecture 6: CS395T Numerical Optimization for Graphics and AI Line Search Applications Lecture 6: CS395T Numerical Optimization for Graphics and AI Line Search Applications Qixing Huang The University of Texas at Austin huangqx@cs.utexas.edu 1 Disclaimer This note is adapted from Section

More information

Advanced Machine Learning

Advanced Machine Learning Advanced Machine Learning Bandit Problems MEHRYAR MOHRI MOHRI@ COURANT INSTITUTE & GOOGLE RESEARCH. Multi-Armed Bandit Problem Problem: which arm of a K-slot machine should a gambler pull to maximize his

More information

1 Markov decision processes

1 Markov decision processes 2.997 Decision-Making in Large-Scale Systems February 4 MI, Spring 2004 Handout #1 Lecture Note 1 1 Markov decision processes In this class we will study discrete-time stochastic systems. We can describe

More information

Utility Maximization for Communication Networks with Multi-path Routing

Utility Maximization for Communication Networks with Multi-path Routing Utility Maximization for Communication Networks with Multi-path Routing Xiaojun Lin and Ness B. Shroff School of Electrical and Computer Engineering Purdue University, West Lafayette, IN 47906 {linx,shroff}@ecn.purdue.edu

More information

Analysis of Greedy Algorithms

Analysis of Greedy Algorithms Analysis of Greedy Algorithms Jiahui Shen Florida State University Oct.26th Outline Introduction Regularity condition Analysis on orthogonal matching pursuit Analysis on forward-backward greedy algorithm

More information

Lecture 5 Linear Quadratic Stochastic Control

Lecture 5 Linear Quadratic Stochastic Control EE363 Winter 2008-09 Lecture 5 Linear Quadratic Stochastic Control linear-quadratic stochastic control problem solution via dynamic programming 5 1 Linear stochastic system linear dynamical system, over

More information

Lecture Note 7: Switching Stabilization via Control-Lyapunov Function

Lecture Note 7: Switching Stabilization via Control-Lyapunov Function ECE7850: Hybrid Systems:Theory and Applications Lecture Note 7: Switching Stabilization via Control-Lyapunov Function Wei Zhang Assistant Professor Department of Electrical and Computer Engineering Ohio

More information

arxiv: v1 [cs.it] 21 Feb 2013

arxiv: v1 [cs.it] 21 Feb 2013 q-ary Compressive Sensing arxiv:30.568v [cs.it] Feb 03 Youssef Mroueh,, Lorenzo Rosasco, CBCL, CSAIL, Massachusetts Institute of Technology LCSL, Istituto Italiano di Tecnologia and IIT@MIT lab, Istituto

More information

BAGUS: Bayesian Regularization for Graphical Models with Unequal Shrinkage

BAGUS: Bayesian Regularization for Graphical Models with Unequal Shrinkage BAGUS: Bayesian Regularization for Graphical Models with Unequal Shrinkage Lingrui Gan, Naveen N. Narisetty, Feng Liang Department of Statistics University of Illinois at Urbana-Champaign Problem Statement

More information

Vast Volatility Matrix Estimation for High Frequency Data

Vast Volatility Matrix Estimation for High Frequency Data Vast Volatility Matrix Estimation for High Frequency Data Yazhen Wang National Science Foundation Yale Workshop, May 14-17, 2009 Disclaimer: My opinion, not the views of NSF Y. Wang (at NSF) 1 / 36 Outline

More information

Controller Coefficient Truncation Using Lyapunov Performance Certificate

Controller Coefficient Truncation Using Lyapunov Performance Certificate Controller Coefficient Truncation Using Lyapunov Performance Certificate Joëlle Skaf Stephen Boyd Information Systems Laboratory Electrical Engineering Department Stanford University European Control Conference,

More information

Adaptive Nonlinear Model Predictive Control with Suboptimality and Stability Guarantees

Adaptive Nonlinear Model Predictive Control with Suboptimality and Stability Guarantees Adaptive Nonlinear Model Predictive Control with Suboptimality and Stability Guarantees Pontus Giselsson Department of Automatic Control LTH Lund University Box 118, SE-221 00 Lund, Sweden pontusg@control.lth.se

More information

ASIGNIFICANT research effort has been devoted to the. Optimal State Estimation for Stochastic Systems: An Information Theoretic Approach

ASIGNIFICANT research effort has been devoted to the. Optimal State Estimation for Stochastic Systems: An Information Theoretic Approach IEEE TRANSACTIONS ON AUTOMATIC CONTROL, VOL 42, NO 6, JUNE 1997 771 Optimal State Estimation for Stochastic Systems: An Information Theoretic Approach Xiangbo Feng, Kenneth A Loparo, Senior Member, IEEE,

More information

Covariance function estimation in Gaussian process regression

Covariance function estimation in Gaussian process regression Covariance function estimation in Gaussian process regression François Bachoc Department of Statistics and Operations Research, University of Vienna WU Research Seminar - May 2015 François Bachoc Gaussian

More information

Restricted Strong Convexity Implies Weak Submodularity

Restricted Strong Convexity Implies Weak Submodularity Restricted Strong Convexity Implies Weak Submodularity Ethan R. Elenberg Rajiv Khanna Alexandros G. Dimakis Department of Electrical and Computer Engineering The University of Texas at Austin {elenberg,rajivak}@utexas.edu

More information

Birgit Rudloff Operations Research and Financial Engineering, Princeton University

Birgit Rudloff Operations Research and Financial Engineering, Princeton University TIME CONSISTENT RISK AVERSE DYNAMIC DECISION MODELS: AN ECONOMIC INTERPRETATION Birgit Rudloff Operations Research and Financial Engineering, Princeton University brudloff@princeton.edu Alexandre Street

More information

Simultaneous Support Recovery in High Dimensions: Benefits and Perils of Block `1=` -Regularization

Simultaneous Support Recovery in High Dimensions: Benefits and Perils of Block `1=` -Regularization IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 57, NO. 6, JUNE 2011 3841 Simultaneous Support Recovery in High Dimensions: Benefits and Perils of Block `1=` -Regularization Sahand N. Negahban and Martin

More information

arxiv: v2 [math.st] 12 Feb 2008

arxiv: v2 [math.st] 12 Feb 2008 arxiv:080.460v2 [math.st] 2 Feb 2008 Electronic Journal of Statistics Vol. 2 2008 90 02 ISSN: 935-7524 DOI: 0.24/08-EJS77 Sup-norm convergence rate and sign concentration property of Lasso and Dantzig

More information

Online Passive-Aggressive Algorithms

Online Passive-Aggressive Algorithms Online Passive-Aggressive Algorithms Koby Crammer Ofer Dekel Shai Shalev-Shwartz Yoram Singer School of Computer Science & Engineering The Hebrew University, Jerusalem 91904, Israel {kobics,oferd,shais,singer}@cs.huji.ac.il

More information

High-dimensional Statistics

High-dimensional Statistics High-dimensional Statistics Pradeep Ravikumar UT Austin Outline 1. High Dimensional Data : Large p, small n 2. Sparsity 3. Group Sparsity 4. Low Rank 1 Curse of Dimensionality Statistical Learning: Given

More information

Random projections. 1 Introduction. 2 Dimensionality reduction. Lecture notes 5 February 29, 2016

Random projections. 1 Introduction. 2 Dimensionality reduction. Lecture notes 5 February 29, 2016 Lecture notes 5 February 9, 016 1 Introduction Random projections Random projections are a useful tool in the analysis and processing of high-dimensional data. We will analyze two applications that use

More information

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Cheng Soon Ong & Christian Walder. Canberra February June 2018 Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 2018 (Many figures from C. M. Bishop, "Pattern Recognition and ") 1of 254 Part V

More information

Event-triggered stabilization of linear systems under channel blackouts

Event-triggered stabilization of linear systems under channel blackouts Event-triggered stabilization of linear systems under channel blackouts Pavankumar Tallapragada, Massimo Franceschetti & Jorge Cortés Allerton Conference, 30 Sept. 2015 Acknowledgements: National Science

More information

A Hierarchy of Suboptimal Policies for the Multi-period, Multi-echelon, Robust Inventory Problem

A Hierarchy of Suboptimal Policies for the Multi-period, Multi-echelon, Robust Inventory Problem A Hierarchy of Suboptimal Policies for the Multi-period, Multi-echelon, Robust Inventory Problem Dimitris J. Bertsimas Dan A. Iancu Pablo A. Parrilo Sloan School of Management and Operations Research Center,

More information

Stochastic Proximal Gradient Algorithm

Stochastic Proximal Gradient Algorithm Stochastic Institut Mines-Télécom / Telecom ParisTech / Laboratoire Traitement et Communication de l Information Joint work with: Y. Atchade, Ann Arbor, USA, G. Fort LTCI/Télécom Paristech and the kind

More information

Optimal control and estimation

Optimal control and estimation Automatic Control 2 Optimal control and estimation Prof. Alberto Bemporad University of Trento Academic year 2010-2011 Prof. Alberto Bemporad (University of Trento) Automatic Control 2 Academic year 2010-2011

More information

ECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference

ECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference ECE 18-898G: Special Topics in Signal Processing: Sparsity, Structure, and Inference Low-rank matrix recovery via nonconvex optimization Yuejie Chi Department of Electrical and Computer Engineering Spring

More information

Lecture 3. Linear Regression II Bastian Leibe RWTH Aachen

Lecture 3. Linear Regression II Bastian Leibe RWTH Aachen Advanced Machine Learning Lecture 3 Linear Regression II 02.11.2015 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de/ leibe@vision.rwth-aachen.de This Lecture: Advanced Machine Learning Regression

More information

José C. Geromel. Australian National University Canberra, December 7-8, 2017

José C. Geromel. Australian National University Canberra, December 7-8, 2017 5 1 15 2 25 3 35 4 45 5 1 15 2 25 3 35 4 45 5 55 Differential LMI in Optimal Sampled-data Control José C. Geromel School of Electrical and Computer Engineering University of Campinas - Brazil Australian

More information

Stability Analysis of Optimal Adaptive Control under Value Iteration using a Stabilizing Initial Policy

Stability Analysis of Optimal Adaptive Control under Value Iteration using a Stabilizing Initial Policy Stability Analysis of Optimal Adaptive Control under Value Iteration using a Stabilizing Initial Policy Ali Heydari, Member, IEEE Abstract Adaptive optimal control using value iteration initiated from

More information

Reinforcement Learning

Reinforcement Learning Reinforcement Learning March May, 2013 Schedule Update Introduction 03/13/2015 (10:15-12:15) Sala conferenze MDPs 03/18/2015 (10:15-12:15) Sala conferenze Solving MDPs 03/20/2015 (10:15-12:15) Aula Alpha

More information

The deterministic Lasso

The deterministic Lasso The deterministic Lasso Sara van de Geer Seminar für Statistik, ETH Zürich Abstract We study high-dimensional generalized linear models and empirical risk minimization using the Lasso An oracle inequality

More information

A Generalized Uncertainty Principle and Sparse Representation in Pairs of Bases

A Generalized Uncertainty Principle and Sparse Representation in Pairs of Bases 2558 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL 48, NO 9, SEPTEMBER 2002 A Generalized Uncertainty Principle Sparse Representation in Pairs of Bases Michael Elad Alfred M Bruckstein Abstract An elementary

More information

On the Complexity of Best Arm Identification with Fixed Confidence

On the Complexity of Best Arm Identification with Fixed Confidence On the Complexity of Best Arm Identification with Fixed Confidence Discrete Optimization with Noise Aurélien Garivier, Emilie Kaufmann COLT, June 23 th 2016, New York Institut de Mathématiques de Toulouse

More information

Time Series 3. Robert Almgren. Sept. 28, 2009

Time Series 3. Robert Almgren. Sept. 28, 2009 Time Series 3 Robert Almgren Sept. 28, 2009 Last time we discussed two main categories of linear models, and their combination. Here w t denotes a white noise: a stationary process with E w t ) = 0, E

More information

CS168: The Modern Algorithmic Toolbox Lecture #6: Regularization

CS168: The Modern Algorithmic Toolbox Lecture #6: Regularization CS168: The Modern Algorithmic Toolbox Lecture #6: Regularization Tim Roughgarden & Gregory Valiant April 18, 2018 1 The Context and Intuition behind Regularization Given a dataset, and some class of models

More information

Gradient Estimation for Attractor Networks

Gradient Estimation for Attractor Networks Gradient Estimation for Attractor Networks Thomas Flynn Department of Computer Science Graduate Center of CUNY July 2017 1 Outline Motivations Deterministic attractor networks Stochastic attractor networks

More information

Optimal Control. McGill COMP 765 Oct 3 rd, 2017

Optimal Control. McGill COMP 765 Oct 3 rd, 2017 Optimal Control McGill COMP 765 Oct 3 rd, 2017 Classical Control Quiz Question 1: Can a PID controller be used to balance an inverted pendulum: A) That starts upright? B) That must be swung-up (perhaps

More information

Towards control over fading channels

Towards control over fading channels Towards control over fading channels Paolo Minero, Massimo Franceschetti Advanced Network Science University of California San Diego, CA, USA mail: {minero,massimo}@ucsd.edu Invited Paper) Subhrakanti

More information

Utility Maximization for Communication Networks with Multi-path Routing

Utility Maximization for Communication Networks with Multi-path Routing Utility Maximization for Communication Networks with Multi-path Routing Xiaojun Lin and Ness B. Shroff Abstract In this paper, we study utility maximization problems for communication networks where each

More information