Finite-Memory Universal Prediction of Individual Continuous Sequences

Size: px
Start display at page:

Download "Finite-Memory Universal Prediction of Individual Continuous Sequences"

Transcription

1 THE IBY AND ALADAR FLEISCHMAN FACULTY OF ENGINEERING THE ZANDMAN-SLANER SCHOOL OF GRADUATE STUDIES THE SCHOOL OF ELECTRICAL ENGINEERING Finite-Memory Universal Prediction of Individual Continuous Sequences Thesis submitted toward the degree of Master of Science in Electrical and Electronic Engineering by Ronen Dar June, 011

2 THE IBY AND ALADAR FLEISCHMAN FACULTY OF ENGINEERING THE ZANDMAN-SLANER SCHOOL OF GRADUATE STUDIES THE SCHOOL OF ELECTRICAL ENGINEERING Finite-Memory Universal Prediction of Individual Continuous Sequences Thesis submitted toward the degree of Master of Science in Electrical and Electronic Engineering by Ronen Dar This research was carried out at the School of Electrical Engineering, Tel-Aviv University Advisor: Prof. Meir Feder June, 011

3 Acknowledgements I would like to offer my sincerest gratitude to my advisor, Prof. Meir Feder, for his great support. His enthusiasm and inspiration made this thesis possible. The work with Meir was a pure delight, making me a better student and researcher. I would also like to thank the Yitzhak and Chaya Weinstein Research Institute for Signal Processing for the generous support.

4 Abstract In this work we consider the problem of universal prediction of individual continuous sequences with square-error loss, using a deterministic finite-state machine (FSM). The goal is to attain universally the performance of the best constant predictor tuned to the sequence, which predicts the empirical mean and incurs the empirical variance as the loss. This work analyzes the tradeoff between the number of states of the universal FSM and the excess loss (regret). We first present the Exponential Decaying Memory (EDM) machine, used in the past for predicting binary sequences, and show bounds on its performance. Then we look explicitly for the optimal machine with a small number of states. We consider a class of machines denoted the Degenerated Tracking Memory (DTM) machines, find the optimal DTM machine and show that it outperforms any other machine for a small number of states. Incidentally, we prove a lower bound indicating that even with large number of states the regret of the DTM machines does not vanish. Finally, we study the problem of universal machines with large number of states. We prove a lower bound on the achievable regret of any FSM, a lower bound that defines the best rate in which the regret can vanish with the number of states. Then we propose a new machine, the Enhanced Exponential Decaying Memory, which attains the bound and outperforms the EDM machine for any number of states.

5 Contents 1 Introduction Previous Results 4.1 Prediction in the Continuous Case Prediction in the Binary Case Problem Formulation Definitions Guidelines Exponential Decaying Memory Machine EDM Predictor for Individual Continuous Sequences Bounds on the Regret of the EDM Machine Designing an Optimal Machine with Small Number of States Single State Two States Three States Degenerated Tracking Memory Machine Introduction to the Class of DTM Machines Building the Optimal DTM Machine Lower Bound - The Limitation of the DTM Machines ii

6 5.4.4 Numerical Results On the Asymptotic Behavior of Finite-Memory Tracking Machines Lower bound on the achievable regret of any k-states machine Enhanced Exponential Decaying Memory Machine Designing the E-EDM Machine The Regret of The E-EDM Machine Number of States in the E-EDM machine Numerical Results Summary and Conclusions 51 A Lower Bound on the Maximal Regret of the EDM Machine 53 B Proof of Lemma iii

7 List of Figures 4.1 Down sub steps allocation Two states machine Optimal two states machine Three states machine Optimal three states machine Example of a DTM machine DTM machine minimal circle Regret vs. number of states - DTM and EDM machines Low number of states - DTM machine performance E-EDM machine - spacing Gap Mixed - Straight sequences Worst case spacing loss - an example Regret vs. number of states - E-EDM, EDM machines and the lower bound. 50 B.1 Spacing gap in the E-EDM machine B. Minimal circle between segments - splitting a down-step

8 Chapter 1 Introduction The subject of this thesis is universal prediction of individual continuous sequences using deterministic finite state machine w.r.t the square-error loss compared to the constant predictors class. The goal is to attain universally the performance of the best constant predictor tuned to the sequence, which predicts the empirical mean and incurs the empirical variance as the loss. Consider a continuous-valued individual sequence x 1, x,..., x t,..., where each sample is assumed to be bounded in the interval [a, b] but otherwise arbitrary with no underlying statistics. At each time t, after observing x t 1, a predictor guesses the next outcome ˆx t+1, and incurs a square error prediction loss (x t+1 ˆx t+1 ). Suppose one can tune a (non-universal) predictor to the sequence, from a given class of predictors. For example, the best constant predictor for a given sequence, i.e. a predictor that uses a constant prediction for all the sequence outcomes, is the empirical mean x = 1 n n x t. The square error loss incurred by this predictor is the sequence s empirical variance 1 n n (x t x). Thus, for a given sequence x n 1, the excess loss of a universal predictor over the best constant predictor, termed the regret of the sequence, is: regret = 1 n n (x t ˆx t ) 1 n n (x t x). In the individual setting, we analyze the performance of a universal predictor by the maximal excess loss, that is, the incurred regret of the worst sequence. It was shown that the Recursive Least Squares (RLS) algorithm generates a universal predictor that attains the performance of the best (non-universal) L-order linear predictor tuned to the sequence. When specialized to the case of zero-order, i.e. the case where the non-universal predictor is the constant empirical mean predictor, the resulting universal predictor is the Cumulative Moving Average (CMA): ˆx t+1 = (1 1 t + 1 )ˆx t + 1 t + 1 x t, where ˆx t is the prediction at time t. The regret of this predictor tends to zero with the sequence length n. Note that while the reference non-universal constant predictor needs

9 a single state, the universal CMA predictor requires an ever growing amount of memory. What happens when the universal predictor is constrained to be a finite k-state machine? This basic problem was left unexplored so far. Our work provides a solution for the problem, presenting such universal predictors attaining a vanishing regret when a large memory is allowed, but also maintain an optimal tradeoff between the regret and the number of states used by the universal predictor. In this thesis we present the following main results: We propose the Exponential Decaying Memory (EDM) machine that was presented in the past for predicting individual binary sequences as a finite-memory universal predictor for continuous sequences. We prove bounds on the worst regret in the continuous case. A new class of machines named the Degenerated Tracking Memory (DTM) is defined and a schematic algorithm for constructing the optimal DTM machine is further presented. We prove a lower bound on the achievable regret of the optimal DTM machine and show that it outperforms any other known deterministic machine for a small number of states. A lower bound on the achievable regret of any deterministic k-state predictor is presented. A new finite state machine termed the Enhanced Exponential Decaying Memory is presented for any vanishing desired regret. We prove that this machine outperforms any other known machine and approaches the lower bound. 3

10 Chapter Previous Results Universal prediction was studied extensively throughout the years. A summarizing overview of universal prediction in general is well given in [10]. In this chapter we present the most relevant results to the problem of universal prediction for individual sequences..1 Prediction in the Continuous Case Some aspects of the universal prediction problem for individual continuous sequences with square error loss were already explored by Merhav and Feder in [9]. That work actually considered a more general case and showed that the Recursive Least Squares (RLS) algorithm [], [3] generates a universal predictor that attains the performance of the best (non-universal) L-order linear predictor [8] tuned to the sequence. When specialized to the case of zero-order, i.e. the case where the non-universal predictor is the constant empirical mean predictor, the resulting universal predictor is the Cumulative Moving Average (CMA): ˆx t+1 = (1 1 t + 1 )ˆx t + 1 t + 1 x t, (.1) where ˆx t is the prediction at time t. The regret of this predictor tends to zero with the sequence length n. Note that while the reference non-universal constant predictor needs a single state, the universal predictor (.1) requires an ever growing amount of memory. What happens when the universal predictor is constrained to be a finite k-state machine? Universal estimation and prediction problems where the estimator/predictor is a k-state machine have been explored extensively in the past years. Cover [1] studied hypothesis testing problem where the tester has a finite memory. Hellman [4] studied the problem of estimating the mean of a Gaussian (or more generally stochastic) sequence using a finite state machine. This problem is closely related to our problem and may be considered as a stochastic version of it: if one assumes that the data is Gaussian than predicting it with a minimal mean square error essentially boils to estimating its mean. More recently, the finitememory universal portfolio selection problem (that dealt with continuous-valued sequences but considered a very unique loss function) was also explored [16]. Yet, the basic problem of finite-memory universal prediction of continuous-valued, individual sequences with square 4

11 error loss was left unexplored so far.. Prediction in the Binary Case The case of universal prediction for binary individual sequences using finite-state machine was explored in the last decade. The first deterministic machine for the individual binary sequences case w.r.t the log-loss function was presented by Rajwan and Feder [15]. The presented machine counts the number of ones and reset its self after a predefined number of input bits. It was shown that the counter and reset machine achieves redundancy of O( log(k) k ) compared to the constant predictors class, where k is the number of states. In [13] Meron and Feder lower bounded the redundancy of any k states deterministic FSM by O( 1). Later [1], a tighter lower bound of k O(k 4/5 ) was presented. Another important milestone in finding the optimal deterministic FSM for the binary case was achieved by Meron and Feder in [1] - the Exponential Decaying Memory (EDM) machine was presented. The EDM machine consists k states uniformly distributed over the probability axis [0, 1]. The machine predicts at each time the probability assigned to the current state, while the transition between states is as follows: ˆx t+1 = Q ( (1 k /3 )ˆx t + k /3 x t ), (.) where ˆx t is the current state, x t is the input bit and Q(x) is the quantization function to the nearest state. It was shown that the EDM machine achieves asymptotic redundancy of O(k /3 ). In [6], Ingber and Feder confront the question what is the optimal deterministic FSM for the non-asymptotic case?. The semi-counter machine was defined: Symmetric array of k states {S 1,..., S k }. At each time the machine predicts the value assigned to the current state. For the lower half of states, meaning for i = 1,..., M = k, the transition function is as follows: ϕ(i, 0) = i 1 1 < i M, ϕ(1, 0) = 1, ϕ(i, 1) = i + δ i 1 i M, (.3) where in a q semi-counter 0 < δ i q. Meaning, if at time t the machine is at state i and the input sample is x, the next state is ϕ(i, x). The transition function for the upper half of states is defined symmetrically. A method was given for finding the optimal transition length and the optimal probability assignment for each state. It was also proved that the optimal semi-counter machine is the best deterministic FSM when using low number of states. Furthermore, a lower bound on the achievable redundancy by any semi-counter machine was given, 0.0bit. 5

12 In [7] the Piecewise Linear Predictor (PLP) was presented. First a profile P (R) for any redundancy R is defined - set of rational numbers {r 1,..., r M } that is being constructed in the following manner: Start with P (R) = {0, 1 }. Find the pair of rational numbers (r i, r i+1 ) in the profile P (R) that maximize the following term: R gap (r 1, r ) = D(r 1 MP (r 1, r )), (.4) MP (r 1, r ) = 1 1+ h(r ) h(r 1 ) r r 1, (.5) where h( ) denotes the binary entropy function and D( ) denotes the binary divergence function. Add a new rational number r to the profile where r is selected to be the rational number with the minimal denominator that satisfy r i < r < r i+1. Repeat this process until: max i R gap (r i, r i+1 ) < R. (.6) Given any desired redundancy R, the PLP is being constructed by finding the corresponded profile P (R), dividing the [0, 1] axis into segments where each segment matches one of the rational numbers in the profile. States are being linearly spaced in each segment with a different slope according to the correspond rational number. It was shown that the PLP outperforms the EDM machine and achieves asymptotic performance which is very close to the lower bound of O(k 4/5 ). The PLP is the best asymptotic binary deterministic predictor known so far. 6

13 Chapter 3 Problem Formulation In this chapter we want to give basic definitions and guidelines for the problem discussed in this thesis, finite-memory universal prediction for individual (bounded) continuous sequences with square error loss when comparing to the class of constant predictors. 3.1 Definitions In this section we give definitions that will be used throughout this work. Definition 3.1. The loss of any predictor b that predicts the next outcome of an individual continuous sequence of length n, {x} n, is defined as the averaged square error between the prediction and the actual outcome at each time t: L b (x n 1) = 1 n n (x t ˆx t ), (3.1) where {ˆx} n are the predictions. Definition 3.. The regret of any predictor b is defined as the excess loss of the predictor over the best predictor from a reference class B in the worst case scenario: { R n (B, b) = max Lb (x n 1) min L b(x n 1) }. (3.) x n 1 b B Taking n to infinity defines the actual regret: R(B, b) = lim n R n (B, b). (3.3) Finite-state machine (FSM) is a commonly used model for sequential machines with a limited amount of storage. A sequential decision strategy based on an k states FSM is a triple E = (S, f, g), where S is a finite set of states with k elements, f : S B is the output function, and g : X S S is the next state function. When an input sequence {x t } n 1 is fed into E, starting with an initial state S 0 S, this FSM produces a sequence of output strategies {ˆx t } n 1 while going through a sequence of states {S t } n 1 according to the 7

14 recursive rules: S t = g(x t 1, S t 1 ) ˆx t = f(s t ), (3.4) where S t is the state of E at time t. For the case considered in this work, predicting continuous sequences, we formulate the following definitions: Definition 3.3. A deterministic finite-state machine that predicts the next outcome of a continuous sequence is defined as follows: Array of k states where {S 1,..., S k } denote the value assigned to each state. The prediction of the machine at time t, ˆx t, is the value assigned to the current state. A threshold set is defined for each state i: T i = {T i, md,i 1, T i, md,i,..., T i,mu,i }, (3.5) where m u,i and m d,i are the maximum up and down-steps from state i, correspondingly. The transition thresholds are non-decreasing and non-intersecting satisfying: Ti,j = [a, b], (3.6) where the samples of any sequence are assumed to be bounded in the interval [a, b]. The transition function for state i is as follows: i m d,i if T i, md,i 1 x < T i, md,i i m d,i + 1 if T i, md,i x < T i, md,i +1 ϕ(i, x) =. i + m u,i 1 if T i,mu,i x < T i,mu,i 1 i + m u,i if T i,mu,i 1 x < T i,mu,i, 1 i k For example, the machine jumps j states if at time t the machine is at state i and the input sample satisfies: T i,j 1 x t < T i,j. (3.7) 3. Guidelines In this section we present few basic theorems that will be used as fundamental guidelines throughout this thesis. Theorem 3.1. The regret of any predictor b w.r.t to the square-error function comparing to 8

15 the constant predictor class satisfies: { R n (B, b) = max 1 x n n 1 where x is the average of the sequence. n (x t ˆx t ) (x t x) }, (3.8) Proof. We need to prove that the best constant predictor for all sequences is the one that predicts the average of the sequence, meaning: min L b(x n 1 1) = min n b B b B = 1 n n (x t ˆx( b)) n (x t x), (3.9) where B denotes the constant predictors class and ˆx( b) denotes the constant prediction of predictor b. x is a local minima of L b(x n 1) for any {x} n. Since L b(x n 1) is a convex function, the local minima is global and single. Throughout this work we are going to discuss predictors designed for input samples that are bounded in the interval [0, 1]. The results in this work can be easily expand to predictors for input samples that are bounded in any interval [a, b] using the following theorem. Theorem 3.. Any FSM designed to achieve regret R for sequences bounded in the interval [0, 1], can be transformed into FSM that achieves regret (b a) R for sequences bounded in the interval [a, b], where a, b R, by applying the following transformation: Assume the states of the original FSM are {S 1,..., S k }. Each state S i is transformed into a + (b a)s i in the new FSM. Assume the thresholds set of the original FSM are {T 1,..., T k }. Each thresholds set T i is transformed into a + (b a)t i in the new FSM. Proof. Let {x} n be a sequence bounded in [a, b] and let {ˆx} n be the predictions of the new FSM suggested above. Let us examine the regret: regret new = 1 n n (x t ˆx t ) (x t x). (3.10) Let us apply the following transformation: x t = a + (b a) x t, (3.11) ˆx t = a + (b a)ˆ x t. (3.1) 9

16 Note that the sequence { x} n is bounded in [0, 1] and {ˆ x} n are the predictions of the original FSM over the sequence { x} n (both sequences apply the same sequence of jumps). Therefore, the regret of the new FSM satisfies: regret new = (b a) 1 n n ( x t ˆ x t ) ( x t x) = (b a) regret original. (3.13) Since the transformation is bijective, a corollary of Theorem 3. is that the optimal deterministic FSM for sequences bounded in the interval [0, 1] can be transformed to an optimal deterministic FSM for sequences bounded in the interval [a, b]. Definition 3.4. A circle is a cyclic closed set of states\predictions {ˆx t } L i=1 and input samples {x t } L that rotate the machine between these states. A minimal circle is a circle that does not contain the same state more than once. Theorem 3.3. The worst sequence for a given FSM takes the machine to a minimal circle and rotates in it endlessly. Proof. The proof is given in details in [14] where it is shown that the worst binary sequence for a given FSM w.r.t the log-loss function endlessly rotates the machine in a minimal circle. The proof is based on partitioning any long enough prediction sequence ˆx 1,..., ˆx n of a given FSM for an input sequence x 1,...x n into minimal circles and a negligible residual monotonic sequence. By using the convexity of the log-loss function, the regret over the entire sequence is upper bounded by the weighted average of the regrets of these minimal circles. Therefore a sequence that endlessly rotates the machine in the minimal circle with the highest regret will have a greater regret. Concluding that the worst sequence is the one that endlessly rotates the machine in the minimal circle with the highest regret. The proof for our case, the worst continuous sequence w.r.t the square loss function, is identical. Note that Theorem 3.3 implies that to analyze the regret of a given FSM, one should examine only the regrets of all sequences that rotate the machine in a minimal circle. In sequel we will use Theorem 3.3 to design FS machines by verifying that all minimal circles have a regret smaller than the desired regret. Also note that there is an infinite number of sequences that can rotate a machine in a minimal circle (any input sample x that satisfies T i,j 1 x < T i,j induces a j states jump from state i). In this work we will refer the regret of a minimal circle as the regret of the sequence that endlessly rotates the machine in the minimal circle and achieve the highest regret. 10

17 Chapter 4 Exponential Decaying Memory Machine Meron and Feder had presented in [11] the Exponential Decaying Memory (EDM) machine as a deterministic finite state machine for predicting the next outcome of an individual binary sequence. They further showed that the EDM machine with k states achieves asymptotic regret of O(k /3 ) compared to the constant predictors class with respect to the log-loss (code length) and square-error functions. Here we prove that also for the problem discussed in this work, predicting the next outcome of an individual em continuous sequence, the k-states EDM machine achieves asymptotic regret of O(k /3 ) compared to the constant predictors class with square-error loss. Moreover, we present lower and upper bounds on the maximal regret incurred by the worst sequence. 4.1 EDM Predictor for Individual Continuous Sequences Definition 4.1. The Exponential Decaying Memory machine is defined by k states {S 1,..., S k } distributed uniformly over [k 1/3, 1 k 1/3 ] axis. Thus, the state gap satisfies: = 1 k 1/3 k 1 k 1. (4.1) At each time t, the machine predicts the value assigned to the current state, denote ˆx t. The transition function between states is defined by: ˆx t+1 = Q(ˆx t (1 k /3 ) + x t k /3 ), (4.) where Q is the quantization function to the nearest state: Q(y) = ˆx t+1, (4.3) where y satisfies ˆx t+1 1 y < ˆx t (4.4) 11

18 Using Equation (4.) we note that the machine jumps j states from state ˆx t = S i if Q(S i + (x t S i )k /3 ) = S i + j, (4.5) or equivalently S i + (j 1 ) k/3 x t < S i + (j + 1 ) k/3. (4.6) Following Equation (4.6) we can define the transition thresholds for the EDM machine: Definition 4.. For any state i, the transition thresholds {T i, md,i,..., T i,mu,i } satisfy: T i,j = S i + (j + 1 ) k/3 m d,i j m u,i, (4.7) where m d,i and m u,i are the first integers satisfying: S i (m d,i + 1 ) k/3 0 S i + (m u,i + 1 ) k/3 1 Proposition 4.1. The farthest possible jump (either up or down) from any state of the EDM machine is upper bounded by k /3. Proof. From Equation (4.) we note that at time t the machine jumps (x t ˆx t )k /3 + δ t (4.8) states, where δ t is the quantization addition to the nearest state, satisfying: 1 δ t < 1. (4.9) Therefore, the farthest possible jump by the EDM machine at any time t can be upper bounded: (xt ˆx t )k /3 + δ t xt ˆx t k /3 + δ t where in the last inequality we applied Equation (4.1). (1 k 1/3 )k /3 + 1 k /3, (4.10) 4. Bounds on the Regret of the EDM Machine In this section we present lower and upper bounds on the maximal regret incurred by the worst continuous sequence. Thus, in a k-states EDM machine, the regret of any continuous sequence is smaller than the upper bound, and the maximal regret is at least the lower bound. 1

19 Theorem 4.1. The maximal regret of the k-states EDM machine (denoted U EDMk ), attained by the worst continuous sequence, is asymptotically bounded by 1 k /3 + O(k 1 ) max R(U EDMk, x n 1) 17 4 k /3. x n 1 Proof. Theorem 3.3 implies that the sequence that maximize the regret of any FSM is the one that takes it to the minimal circle with the highest regret, and rotates in it endlessly. Here we prove that any sequence that rotates the EDM machine in a minimal circle incur a regret smaller than 17 4 k /3. Thus, this upper bound holds also for the worst sequence. We examine the regret of L length sequence {x t } L that rotates the EDM machine in a minimal circle of L states {ˆx t } L. The input sample at each time t can be written as follows: x t = ˆx t + (P t + δ t )k /3, (4.11) where P t Z denotes the number of states crossed by the machine at time t (negative for down-steps and positive for up-steps). δ t is a quantization addition that satisfies 1 δ t < 1, (4.1) and has no impact on the jump at time t, i.e. has no impact on the prediction at time t + 1. Since the input samples rotate the machine in a minimal circle (last sample returns the machine to the starting state) we note that the number of states crossed on the way up and down is equal, hence: P t = 0. (4.13) Now, denote the average of the sequence by x = 1 L L x t. Thus, the regret of any sequence that endlessly rotates the EDM machine in a minimal circle satisfies: R(U EDMk, x L 1 ) = 1 L = 1 L = 1 L = 1 L (x t ˆx t ) (x t x) (P t + δ t ) k 4/3 (ˆx t + (P t + δ t )k /3 1 L δt k 4/3 P t k /3 (ˆx t 1 L (ˆx t + δ tk /3 )) (ˆx t + δ tk /3 )) (ˆx t + δ t k /3 1 L ( δ t k 4/3 P t k /3ˆx ) ( t + 1 L (ˆx t + δ t k /3 ) ) 1 L (ˆx t + δ tk /3 )) (ˆx t + δ t k /3 ), where in the last equality we applied Equation (4.13). Since the square-error function is 13

20 convex, applying Jensen s inequality results: R(U EDMk, x L 1 ) 1 L δt k 4/3 1 L P t k /3ˆx t. (4.14) The first term on the right hand side of Equation (4.14) depends only on the quantization of the input samples, δ t, thus we term it quantization loss. The second term depends on the spacing gap between states,, thus we term it spacing loss. Hence, the regret of the sequence is upper bounded by a loss incurred by the quantization of the input samples and a loss incurred by the quantization of the states values, i.e. the prediction values. By applying Equation (4.1) we bound the quantization loss: quantization loss = 1 L δt k 4/3 1 4 k /3. (4.15) We finalize the proof for the upper bound by proving that the spacing loss can be upper bounded by: spacing loss = 1 P L t k /3ˆx t 4k /3. (4.16) First we note that P t, the number of states crossed at time t, is positive for up-steps and negative for down-steps. As a corollary of Proposition 4.1, P t can be bounded by the farthest possible jump: P t k /3 k1/3 (4.17) states. Define sub-step as a one state step that is associated with a full step, i.e. a step of P > 0 states originated from state ˆx consist P sub-steps (each crosses a different state), all associated with the origin state, ˆx. Now assume an up-step of P u > 0 states from state ˆx u. Since we examine minimal circle, there are always corresponding down-steps that cross the same P u states on the way down. Therefore, it is possible to assign for each up-step of P u states, P u down sub-steps that cross the same states. Let D(ˆx u, P u ) be the set of down sub-steps assigned to an up-step of P u states originated from state ˆx u. We can use Proposition 4.1 to conclude that all down sub-steps in D(ˆx u, P u ) originated from a state that is not higher than: We further note that by definition: ˆx u + P u + k /3. (4.18) D(ˆx u, P u ) = P u. (4.19) 14

21 Figure 4.1: Minimal circle of two up-steps and two down-steps (solid lines). SS j are the down sub steps where D(S i, 3) = {SS 3, SS 4, SS 5 }, D(S i+3, ) = {SS 1, SS }. Note that sub steps SS 1, SS, SS 3 associated with origin state S i+5 while sub steps SS 4, SS 5 associated with origin state S i+. Thus we show that 1 L P tˆx t = 1 P L tˆx t + 1 P L t ˆx t t {up steps} = 1 L t {up steps} t {down steps} ( Ptˆx t + j D(ˆx t,p t) ˆx j ), (4.0) where ˆx j is the origin state of sub-step j (meaning, a down-step originated from state ˆx j contains sub-state j). By applying Equation (4.18) we get: 1 L ( P tˆx t 1 L Ptˆx t + (ˆx t + P t + k /3 ) ). (4.1) t {up steps} j D(ˆx t,p t) Now we can apply Equation (4.19): 1 L ( P tˆx t 1 L Ptˆx t + P t (ˆx t + P t + k /3 ) ) t {up steps} = 1 L t {up steps} P t (P t + k /3 ). (4.) Applying k 1 and bounding P t using Equation (4.17) results: 1 L P tˆx t k 1/3. (4.3) 15

22 Thus, the spacing loss satisfies: spacing loss = k /3 ( 1 L P tˆx t ) 4k /3. (4.4) The proof for the lower bound is given in Appendix A where we show that there is a sequence that endlessly rotates the k-states EDM machine in a minimal circle, incurring a regret of 1 k /3 +O(k 1 ). Thus, the regret of the worst sequence is at least 1 k /3 +O(k 1 ). 16

23 Chapter 5 Designing an Optimal Machine with Small Number of States In this chapter we want to design a finite-state universal predictor for a small number of states. We start by designing the optimal machine for a single, two and three states trying to understand the properties of the problem. We continue with defining a new class of machines, named the Degenerated Tracking Memory (DTM) machines, presenting an algorithm for constructing the optimal DTM machine and proving a lower bound on the achievable regret. At last, we present numerical results showing that the optimal DTM machine outperforms the EDM machine. 5.1 Single State We start with designing the optimal single state prediction machine. The machine predicts a constant prediction, denote S. Theorem 5.1. The optimal single state universal predictor predicts constant S = 1 achieves regret smaller than R = 1 for any sequence. 4 and Proof. Let us examine the regret of the predictor: regret = 1 n 1 n n (x t S) (x t x) n (x t S). (5.1) A constant sequence, meaning x t = x t, attains Equation (5.1) with equality. Therefore, we want to find the solution for the following minmax problem: min S max {regret} = min max (x 0 x 1 S). (5.) 0 x 1 17 S

24 If we choose S < 1, the maximal regret is: If we choose S > 1, the maximal regret is: max {regret} = max (x 0 x 1 0 x 1 S) = (1 S) > 1. (5.3) 4 max {regret} = max (x 0 x 1 0 x 1 S) = (0 S) > 1. (5.4) 4 Therefore, the optimal solution is S = 1 attaining regret R = Two States We continue with designing the optimal two states universal prediction machine. Denote the regret of the machine by R and the two states by S 1 and S. Figure 5.1: Two states machine described geometrically over the [0, 1] axis. There are two possible minimal circles: 1. Staying at the same state (zero-step circle).. Jumping from one state to the other (two steps circle). The first minimal circle can be viewed as a quantization error while the second minimal circle can be viewed as a prediction error. Theorem 5.. The optimal two states machine satisfies: S 1 = R While the transition function satisfies: S = 1 R { 1 if x < R ϕ(1, x) = otherwise { 1 if x < 1 R ϕ(, x) = otherwise 18

25 Proof. Since the two states are both saturated (jump is allowed only in one direction), higher S 1 and lower S reduces the regret of the second minimal circle (two steps circle). The zerostep minimal circle (staying at the same state) satisfies: hence, the states have to satisfy: regret = (x t S i ) R x t [S i R, S i + R], (5.5) S 1 R 0 S + R 1 (5.6) Thus, the optimal S 1 and S are the highest and lowest possible values, correspondingly, satisfying: S 1 = R S = 1 R (5.7) Now, the transition function consist only one threshold. For S 1, the thresholds is S 1 + R. Higher threshold results regret bigger than R, lower threshold can only maximize the regret of the two steps minimal circle. Therefore, this is the optimal transition threshold. Theorem 5.3. The maximal regret of the optimal two states machine, incurred by the worst sequence, is R = Proof. Theorem 5. implies that the zero-step minimal circle achieves regret R. We now want to find the sequence that maximize the regret of the two step minimal circle. Denote the samples that rotate the machine between the two states by x 1 for the up-step sample and x for the down-step sample. The regret of the two step minimal circle is: regret = 1 ( (x1 S 1 ) + (x S ) (x 1 x) (x x) ), (5.8) where x is the average of the sequence. It can be shown that Equation (5.8) can be also written as follows: regret = ( 1 (x 1 + x ) S 1 ) (S S 1 )x + 1 (S S 1). (5.9) Finding x 1 and x that maximize above, is a maximization problem over a two variables convex function with two constraints resulting from the transition function proved to be optimal in Theorem 5.: R x x 1 R. (5.10) 19

26 We can write: max{regret} = 1 x 1,x (S S1) + max{max{( 1 x 1 (x 1 + x ) S 1 ) } (S S 1 )x } x = 1 (S S 1) + max x {( 1 (1 + x ) S 1 ) (S S 1 )x }, (5.11) where in Equation (5.11) we applied x 1 = 1 to maximize the regret since 1 (x 1 + x ) S 1 for any x 1, x. We note that now the maximization problem is over a convex function with a singular minimum point at x = 1 R. Thus, x = 0 maximize the regret, concluding that the maximal regret is attained with x 1 = 1 and x = 0. Since the machine achieves regret R, the minimal circle have to satisfy: R 1 ( (x1 S 1 ) + (x S ) (x 1 x) (x x) ) x1 =1,x =0 = (1 R) ( 1 ). (5.1) Solving above results: R ( 3 8 ) = (5.13) We can now summarize the optimal two states machine: State values are: S 1 = 3 8, S = 5 8 The states transition function satisfies: { 1 if x < 3 ϕ(1, x) = 4 otherwise { 1 if x < 1 ϕ(, x) = 4 otherwise Comments: 1. For any desired regret satisfying 0.5 R 0.14, the optimal two states machine is the optimal solution in sense of minimum number of states.. The optimal two states machine for predicting continuous sequences is identical to the binary case (with square-error loss function), in sense of states value and maximal regret. 0

27 1.5 1 Lower state Higher state 0.5 Jump Axis Figure 5.: Optimal two states machine described geometrically over the [0, 1] axis along with the transition thresholds for each state. 5.3 Three States We continue with designing the optimal three states universal prediction machine. Denote the regret of the machine by R and the three states by {S 1, S, S 3 }. Note that now also a two states jump from S 1 to S 3 is possible. Figure 5.3: Three states machine described geometrically over the [0, 1] axis. Theorem 5.4. The optimal three states machine allows only a single state jump. Proof. From symmetry aspects, if applying a two states jump from S 1, also a two states down-jump have to be applied to S 3. This means that the machine can be rotated between S 1 and S 3 with x 1 = 1 and x = 0 as in the two states machine. This minimal circle results 1

28 an identical regret to the two states machine case, 0.14, since is attained with samples 0 and 1. Hence, if applying a two states jump, no improvement can be gained (in the regret sense) over the two states machine by adding another state. Since no two states jump is allowed, there are three possible minimal circles: 1. zero-step circle (staying at the same state).. Rotating the machine between S 1 and S. 3. Rotating the machine between S and S 3. Theorem 5.5. The optimal three states machine satisfies: S 1 = R S = 1 While the transition function satisfies: S 3 = 1 R { 1 if x < R ϕ(1, x) = otherwise 1 if x < 1 R ϕ(, x) = if 1 R x < 1 + R 3 otherwise { if x < 1 R ϕ(3, x) = 3 otherwise Proof. From symmetry aspects we can conclude that S = 1. In the same manner as shown in Proof 5., it can be shown that by choosing the regret of the zero-step minimal circle of the saturated states to be R we minimize the regret of the two steps minimal circles. Thus, S 1 = R and S 3 = 1 R are the optimal saturated states. In the same manner as shown in Proof 5., the optimal transition thresholds for state i are R away from S i, otherwise the regret of the two step minimal circles can be increased. Theorem 5.6. The maximal regret of the optimal three states machine, incurred by the worst sequence, is R = Proof. Theorem 5.5 implies that the zero-step minimal circle achieves regret R. From symmetry aspects, the regrets of the two steps minimal circles are equal. Let us analyze the minimal circle between S 1 and S. Denote the up-step sample by x 1 and the down-step sample by x. We now want to find the sequence that maximize the regret of the minimal circle between S 1 and S : regret = 1 ( (x1 S 1 ) + (x S ) (x 1 x) (x x) ), (5.14)

29 where x is the average of the sequence. It can be shown that Equation (5.14) can be also written as follows: regret = ( 1 (x 1 + x ) S 1 ) (S S 1 )x + 1 (S S 1). (5.15) Finding x 1 and x that maximize above, is a maximization problem over a two variables convex function with two constraints resulting from the transition function proved to be optimal in Theorem 5.5: We can write: R x x 1 R (5.16) max{regret} = 1 x 1,x (S S1) + max{max{( 1 x 1 (x 1 + x ) S 1 ) } (S S 1 )x } x = 1 (S S 1) + max x {( 1 (1 + x ) S 1 ) (S S 1 )x }, (5.17) where in Equation (5.17) we applied x 1 = 1 to maximize the regret since 1 (x 1 + x ) S 1 for any x 1, x. We note that now the maximization problem is over a convex function with a singular minimum point at x = 0. Therefore, x = 1 R maximize the regret. Thus, the maximal regret is attained with x 1 = 1 and x = 1 R. Since the machine achieves regret R, the regret of the minimal circle have to satisfy: R 1 ( (x1 S 1 ) + (x S ) (x 1 x) (x x) ) x1 =1,x = 1 R = 1 ( (1 R) + R ( R). (5.18) Solving above results: R (5.19) We can now summarize the optimal three states machine: State values are: S 1 = 0.385, S = , S 3 =

30 1.5 1 Lower state Middle state Higher state 0.5 Jump Axis Figure 5.4: Optimal three states machine described geometrically over the [0, 1] axis along with the transition thresholds for each state. The states transition function satisfies: { 1 if x < ϕ(1, x) = otherwise 1 if x < ϕ(, x) = if x < otherwise { if x < ϕ(3, x) = 3 otherwise Comments: 1. For any desired regret satisfying 0.14 R , the optimal three states machine is the optimal solution in sense of minimum number of states.. The optimal three states machine in the continuous sequences case is different than in the binary case. The need to track continues samples and not only binary samples degrades the maximal regret that is achievable with three states to from in the binary case. 4

31 3. Figure 5.4 depict the states and the transition thresholds over the [0, 1] axis. One can notice the hysteresis characteristics of the machine, providing a memory or inertia to the finite-state predictor. An extreme input sample is needed for the machine to jump from the current state, that is, to change the prediction value. 5.4 Degenerated Tracking Memory Machine After designing the optimal finite-state machine for a single, two and three states, we want to find the optimal deterministic finite-state machine for k states, where in this chapter we assume k is relatively small. In this section we introduce the class of Degenerated Tracking Memory (DTM) machines and present an algorithm for constructing the optimal DTM machine. We further present a lower bound on the achievable regret of the machine Introduction to the Class of DTM Machines Definition 5.1. The class of all Degenerated Tracking Memory machines with k states is of the form: An array of k states. We denote the states in the lower half {S kl,..., S 1 } (in descending order where S 1 is the nearest state to 1 and S i 1 for all 1 i k l). We denote the states in the upper half { S 1,..., S ku } (in ascending order where S 1 is the nearest state to 1 and S i > 1 for all 1 i k u), where k l + k u = k. The maximum down-step in the lower half, i.e. from states {S kl,..., S 1 }, is no more than a single state jump. The maximum up-step in the upper half, i.e. from states { S 1,..., S ku } is no more than a single state jump. A transition between the lower and upper halves is allowed only from and to the nearest states to 1, S 1 and S 1 (implying that the maximum up-jump (down-jump) from S 1 ( S 1 ) is a single state jump). An example for a DTM machine is depict in Figure 5.5. Figure 5.5: An example of a DTM machine - note that a transition between the lower and upper halves is allowed only from (and to) S 1 and S 1. Solid lines represent the maximum up or down jumps from each state. In a DTM machine only a single state down-jump (up-jump) from all states in the lower (upper) half is allowed. In addition, the transition between the lower and upper halves is allowed only from and to the nearest states to 1, S 1 and S 1, implying that the maximum up-jump (down-jump) from {S kl,..., S } ({ S,..., S ku }) is up to S 1 ( S 1 ). These constraints facilitate the algorithm for constructing the optimal DTM machine. 5

32 5.4. Building the Optimal DTM Machine We present here a schematic algorithm for constructing the optimal DTM machine. Given a desired regret, R d, the task of finding the optimal DTM machine can be viewed as a covering problem, that is, assigning the smallest number of states in the interval [0, 1], achieving a regret smaller than R d for all sequences. We note that in an optimal k-state machine, the upper half of the states is the mirror image of the lower half. The symmetry property arises from the fact that any sequence x 1,..., x n can be transformed into the symmetric sequence 1 x 1,..., 1 x n. Both sequences achieve the same regret if full symmetry between the lower and upper halves is applied. Thus, assuming that the lower half is optimal in sense of achieving the desired regret with the smallest number of states, the upper half must be the reflection of the lower half to achieve optimality. Note that this property allows us to design the optimal DTM machine only for the lower half. The algorithm we present here recursively finds the optimal states allocation and their transition thresholds. Suppose states {S i 1,..., S 1 } in the lower half (in descending order where S 1 is the nearest state to 1) and their transition thresholds set {T i 1,..., T 1 } are given and satisfying regret smaller than R d for all minimal circles between them. Our algorithm generates the optimal S i, i.e. the optimal allocation for state i, and a thresholds set, T i, satisfying regret smaller than R d for all minimal circles starting at that state. We start by finding S 1, the nearest state to 1 in the lower half, in the optimal DTM machine. Lemma 5.1. In the optimal k-states DTM machine for a given desired regret R d, S 1 = 1 if k is odd and { S 1 = max 1 R d + 1, + R 4 d R d + } R d + 1 if k is even. Proof. From symmetry aspects S 1 = 1 in the optimal DTM machine with odd number of states, otherwise there are more states in one of the halves and the symmetry property presented above does not hold. In the optimal DTM machine with even k states, the nearest state to 1 in the upper half, S 1, is the mirror image of S 1, hence S 1 = 1 S 1. By definition, only a single state up-jump is allowed from S 1 and only a single state down-jump is allowed from S 1. Thus, the machine can be rotated between these states, constructing a two steps minimal circle. Denote by x 1 and x the samples that induce the up and down jumps, correspondingly. These samples must satisfy the transition thresholds, i.e. S 1 + R d x x S 1 R d = 1 S 1 R d. (5.0) Since the regret is a convex function, the samples x 1 and x that bring the regret of the minimal circle to maximum are at the edges of the constraint regions, therefore in a two steps minimal circle there are four combinations that need to be analyzed: 6

33 1. x 1 = 1, x = 0: regret = 1 ] [(1 S 1 ) + (0 S 0 ) (1 1 ) (0 1 ) = (1 S 1 ) ( 1 ). (5.1) Thus, to achieve regret smaller than R d we claim for: 1 R d S R d (1). x 1 = 1, x = S 1 R d : regret = 1 [(1 S 1 ) + R d (1 x) ( S 1 ] R d x) = 1 [(1 S 1 ) + R d 1 (S 1 + ] R d ) = 1 [ 1 S 1 S 1 ( + ] R d ) R d. (5.) Thus, to achieve regret smaller than R d we claim for: + R d R d + R d + 1 S 1 + R d + R d + R d + 1. () 3. x 1 = S 1 + R d, x = 0 - same result as in. 4. x 1 = S 1 + R d, x = S 1 R d : regret = 1 = 1 [R d (S 1 + R d x) (1 S 1 R d x) ] [ R d 1 S 1 + S 1 (1 ] R d ). (5.3) Thus, to achieve regret smaller than R d we claim for: which is being satisfied for any S 1. S 1 S 1 (1 R d ) R d R d 0, (5.4) Therefore S 1 must satisfy two constraints, (1) and (). We also want to choose the lowest S 1 that satisfy both constraints, hence taking the maximum between the lower bounds in (1) and (). Note that S 1 must satisfy S 1 1 which does not hold for low enough R d, implying a lower bound on the achievable regret of the optimal DTM machine (see next section). 7

34 Now, after presenting the starting state of the algorithm, we present the complete algorithm for constructing the optimal DTM machine: 1. Set i = 1 and the corresponded starting state S 1 for odd or even number of states (see Lemma 5.1). Set the maximum up-step from the starting state m u,1 = 1.. Set the next state index i = i For all 1 m i 1 (where m denotes the maximum up-step from state i) find the minimal S i,m with valid thresholds set T i,m (in sequel we present the algorithm for finding the thresholds set). 4. Choose the minimal S i,m among all possible maximum up-steps, that is: m u,i = arg min 1 m i 1 S i,m S i = S i,mu,i T i = T i,mu,i. Thus we have set the parameters of state i: assigned value S i, maximum up-jump of m u,i states and transition thresholds T i. 5. If S i > R d go to step (). 6. Set the upper half of the states to be the mirror image of the lower half. Explanations and Comments: For a given desired regret R d, one should run the algorithm presented above twice - for odd and even number of states with the corresponded starting state, S 1. The optimal DTM machine is the one with the least states among the two (differ by a single state). Note that a transition thresholds for state 1 need to be given - a single state up-jump if the input sample satisfies x S 1 + R d and a single state down-jump if the input sample satisfies x S 1 R d. These are the optimal transition thresholds since as the interval for transition is wider the number of possible worst sequences in other minimal circles increases. Note that these transition thresholds achieve the maximal regret R d for zero-step minimal circle (staying at S 1 ). A valid thresholds set for state i is a set of transition thresholds that satisfy regret smaller than R d for all minimal circles starting at state i. We now want to find an optimal transition thresholds for state i. Note that in DTM machines the thresholds for up-steps from state i do not have an impact on regrets of minimal circles other than those starting at state i. Thus, the optimality of these thresholds is only in sense of satisfying regret smaller than R d for minimal circles starting at state i. The optimal transition thresholds for the down-step are {0, S i R d } since we want to allow the smallest 8

35 interval for input samples that induce a down-step (the smallest impact on minimal circles that starts in a different state). Now, consider states {S i 1,..., S 1 } in the lower half and their transition thresholds set {T i 1,..., T 1 } are given and satisfying regret smaller than R d for all minimal circles between them. Suppose also S i and m are given, where m denotes the maximum up-step from state i. Note that there are m + 1 minimal circles starting at state i (depict in Figure 5.6): zero-step minimal circle (staying at state i). For any j m + 1, a minimal circle of j steps - one up-step (of j 1 states), j 1 down-steps (of a single state). Also note that these m + 1 minimal circles are within the lower half, that is within the states {S i 1,..., S 1 } (since by definition a transition from the lower half to the upper half is allowed only from S 1 ). Let x j 1 be the samples that endlessly rotate the machine in a j steps minimal circle, where x 1 induces the up-step from state i and x j induce the down-steps. Since the regret is convex in the input samples, the samples x j that bring the regret to maximum are at the edges of the transition regions, that is, satisfying x t = ˆx t R d or x t = 0 t j. (5.5) Figure 5.6: m + 1 possible minimal circles starting at S i, where m is the maximum up-step from state i. Suppose that for a given x j, the regret of the sequence is smaller than R d if x 1 satisfies C l (x j ) x 1 C h (x j ). (5.6) Thus, by Equation (5.5), x 1 must satisfy the constraint given in (5.6) for j 1 known combinations of x j to satisfy regret smaller than R d for any sequence that rotates the machine in this minimal circle. Hence, C l = max x j A C l (x j ) x 1 min x j A C h (x j ) = C h (5.7) satisfies all the constrains, where A is the set of j 1 combinations of x j according to Equation (5.5). Since x 1 must also satisfy the transition thresholds of state i, i.e. T i,j x 1 T i,j 1, (5.8) 9

36 we can conclude that the transition thresholds must satisfy C l T i,j, T i,j 1 C h. (5.9) Going over all minimal circles, j m + 1, results bounds on all transition thresholds (upper and lower bound on each threshold). Thus, if a thresholds set can be found to satisfy all bounds and to cover the interval [S i + R d, 1], we finished. Otherwise, no valid thresholds can be found for the given S i and m. Lemma 5.. Consider a sequence x j 1 that rotates a DTM machine in a j steps minimal circle starting at state i. Given states S i,..., S i j+1, the regret is smaller than R d if x 1 satisfies: C l (x j ) x 1 C h (x j ), where C l (x j ) and C h (x j ) satisfy: j C l (x j ) = S i + (S i x t ) j Rd 1 j t= j C h (x j ) = S i + (S i x t ) + j Rd 1 j t= j (S i j+t 1 S i )(S i j+t 1 + S i x t ), t= j (S i j+t 1 S i )(S i j+t 1 + S i x t ). t= (5.30) Proof. Assume samples x j 1 are rotating the machine in a j steps minimal circle starting at state i. Let us analyze the regret of this sequence: regret j = 1 j = 1 j j [(x t ˆx t ) (x t x) ] j [ x x tˆx t + ˆx t ] = x + 1 j [S i x 1 S i ] + 1 j = 1 j [1 j x 1 + x 1 ( 1 j j [Si j+t 1 x t S i j+t 1 ] t= j (x t ) S i ) + 1 j j [ x t ] + Si + t= t= j (Si j+t 1 x t S i j+t 1 )] We claim for regret j R d, meaning x 1 must satisfy the following equation: t= j j x 1 + x 1 ( (x t ) js i ) + [ x t ] + j[si + t= t= j (Si j+t 1 x t S i j+t 1 )] j R d 0 (5.31) t= 30

Lecture 16: Perceptron and Exponential Weights Algorithm

Lecture 16: Perceptron and Exponential Weights Algorithm EECS 598-005: Theoretical Foundations of Machine Learning Fall 2015 Lecture 16: Perceptron and Exponential Weights Algorithm Lecturer: Jacob Abernethy Scribes: Yue Wang, Editors: Weiqing Yu and Andrew

More information

The No-Regret Framework for Online Learning

The No-Regret Framework for Online Learning The No-Regret Framework for Online Learning A Tutorial Introduction Nahum Shimkin Technion Israel Institute of Technology Haifa, Israel Stochastic Processes in Engineering IIT Mumbai, March 2013 N. Shimkin,

More information

Complexity Theory VU , SS The Polynomial Hierarchy. Reinhard Pichler

Complexity Theory VU , SS The Polynomial Hierarchy. Reinhard Pichler Complexity Theory Complexity Theory VU 181.142, SS 2018 6. The Polynomial Hierarchy Reinhard Pichler Institut für Informationssysteme Arbeitsbereich DBAI Technische Universität Wien 15 May, 2018 Reinhard

More information

Outline. Complexity Theory EXACT TSP. The Class DP. Definition. Problem EXACT TSP. Complexity of EXACT TSP. Proposition VU 181.

Outline. Complexity Theory EXACT TSP. The Class DP. Definition. Problem EXACT TSP. Complexity of EXACT TSP. Proposition VU 181. Complexity Theory Complexity Theory Outline Complexity Theory VU 181.142, SS 2018 6. The Polynomial Hierarchy Reinhard Pichler Institut für Informationssysteme Arbeitsbereich DBAI Technische Universität

More information

Machine Learning Linear Classification. Prof. Matteo Matteucci

Machine Learning Linear Classification. Prof. Matteo Matteucci Machine Learning Linear Classification Prof. Matteo Matteucci Recall from the first lecture 2 X R p Regression Y R Continuous Output X R p Y {Ω 0, Ω 1,, Ω K } Classification Discrete Output X R p Y (X)

More information

On the Average Complexity of Brzozowski s Algorithm for Deterministic Automata with a Small Number of Final States

On the Average Complexity of Brzozowski s Algorithm for Deterministic Automata with a Small Number of Final States On the Average Complexity of Brzozowski s Algorithm for Deterministic Automata with a Small Number of Final States Sven De Felice 1 and Cyril Nicaud 2 1 LIAFA, Université Paris Diderot - Paris 7 & CNRS

More information

We are going to discuss what it means for a sequence to converge in three stages: First, we define what it means for a sequence to converge to zero

We are going to discuss what it means for a sequence to converge in three stages: First, we define what it means for a sequence to converge to zero Chapter Limits of Sequences Calculus Student: lim s n = 0 means the s n are getting closer and closer to zero but never gets there. Instructor: ARGHHHHH! Exercise. Think of a better response for the instructor.

More information

Lecture 17: Trees and Merge Sort 10:00 AM, Oct 15, 2018

Lecture 17: Trees and Merge Sort 10:00 AM, Oct 15, 2018 CS17 Integrated Introduction to Computer Science Klein Contents Lecture 17: Trees and Merge Sort 10:00 AM, Oct 15, 2018 1 Tree definitions 1 2 Analysis of mergesort using a binary tree 1 3 Analysis of

More information

Unit 2: Modeling in the Frequency Domain Part 2: The Laplace Transform. The Laplace Transform. The need for Laplace

Unit 2: Modeling in the Frequency Domain Part 2: The Laplace Transform. The Laplace Transform. The need for Laplace Unit : Modeling in the Frequency Domain Part : Engineering 81: Control Systems I Faculty of Engineering & Applied Science Memorial University of Newfoundland January 1, 010 1 Pair Table Unit, Part : Unit,

More information

Piecewise Constant Prediction

Piecewise Constant Prediction Piecewise Constant Prediction Erik Ordentlich Information heory Research Hewlett-Packard Laboratories Palo Alto, CA 94304 Email: erik.ordentlich@hp.com Marcelo J. Weinberger Information heory Research

More information

1 + lim. n n+1. f(x) = x + 1, x 1. and we check that f is increasing, instead. Using the quotient rule, we easily find that. 1 (x + 1) 1 x (x + 1) 2 =

1 + lim. n n+1. f(x) = x + 1, x 1. and we check that f is increasing, instead. Using the quotient rule, we easily find that. 1 (x + 1) 1 x (x + 1) 2 = Chapter 5 Sequences and series 5. Sequences Definition 5. (Sequence). A sequence is a function which is defined on the set N of natural numbers. Since such a function is uniquely determined by its values

More information

Change Detection Algorithms

Change Detection Algorithms 5 Change Detection Algorithms In this chapter, we describe the simplest change detection algorithms. We consider a sequence of independent random variables (y k ) k with a probability density p (y) depending

More information

Online Learning, Mistake Bounds, Perceptron Algorithm

Online Learning, Mistake Bounds, Perceptron Algorithm Online Learning, Mistake Bounds, Perceptron Algorithm 1 Online Learning So far the focus of the course has been on batch learning, where algorithms are presented with a sample of training data, from which

More information

A Rothschild-Stiglitz approach to Bayesian persuasion

A Rothschild-Stiglitz approach to Bayesian persuasion A Rothschild-Stiglitz approach to Bayesian persuasion Matthew Gentzkow and Emir Kamenica Stanford University and University of Chicago September 2015 Abstract Rothschild and Stiglitz (1970) introduce a

More information

Department of Mathematics, University of California, Berkeley. GRADUATE PRELIMINARY EXAMINATION, Part A Fall Semester 2014

Department of Mathematics, University of California, Berkeley. GRADUATE PRELIMINARY EXAMINATION, Part A Fall Semester 2014 Department of Mathematics, University of California, Berkeley YOUR 1 OR 2 DIGIT EXAM NUMBER GRADUATE PRELIMINARY EXAMINATION, Part A Fall Semester 2014 1. Please write your 1- or 2-digit exam number on

More information

SIGNAL COMPRESSION Lecture 7. Variable to Fix Encoding

SIGNAL COMPRESSION Lecture 7. Variable to Fix Encoding SIGNAL COMPRESSION Lecture 7 Variable to Fix Encoding 1. Tunstall codes 2. Petry codes 3. Generalized Tunstall codes for Markov sources (a presentation of the paper by I. Tabus, G. Korodi, J. Rissanen.

More information

C.7. Numerical series. Pag. 147 Proof of the converging criteria for series. Theorem 5.29 (Comparison test) Let a k and b k be positive-term series

C.7. Numerical series. Pag. 147 Proof of the converging criteria for series. Theorem 5.29 (Comparison test) Let a k and b k be positive-term series C.7 Numerical series Pag. 147 Proof of the converging criteria for series Theorem 5.29 (Comparison test) Let and be positive-term series such that 0, for any k 0. i) If the series converges, then also

More information

WORD: EXAMPLE(S): COUNTEREXAMPLE(S): EXAMPLE(S): COUNTEREXAMPLE(S): WORD: EXAMPLE(S): COUNTEREXAMPLE(S): EXAMPLE(S): COUNTEREXAMPLE(S): WORD:

WORD: EXAMPLE(S): COUNTEREXAMPLE(S): EXAMPLE(S): COUNTEREXAMPLE(S): WORD: EXAMPLE(S): COUNTEREXAMPLE(S): EXAMPLE(S): COUNTEREXAMPLE(S): WORD: Bivariate Data DEFINITION: In statistics, data sets using two variables. Scatter Plot DEFINITION: a bivariate graph with points plotted to show a possible relationship between the two sets of data. Positive

More information

On Acceleration with Noise-Corrupted Gradients. + m k 1 (x). By the definition of Bregman divergence:

On Acceleration with Noise-Corrupted Gradients. + m k 1 (x). By the definition of Bregman divergence: A Omitted Proofs from Section 3 Proof of Lemma 3 Let m x) = a i On Acceleration with Noise-Corrupted Gradients fxi ), u x i D ψ u, x 0 ) denote the function under the minimum in the lower bound By Proposition

More information

Let s now begin to formalize our analysis of sequential machines Powerful methods for designing machines for System control Pattern recognition Etc.

Let s now begin to formalize our analysis of sequential machines Powerful methods for designing machines for System control Pattern recognition Etc. Finite State Machines Introduction Let s now begin to formalize our analysis of sequential machines Powerful methods for designing machines for System control Pattern recognition Etc. Such devices form

More information

HMMT February 2018 February 10, 2018

HMMT February 2018 February 10, 2018 HMMT February 018 February 10, 018 Algebra and Number Theory 1. For some real number c, the graphs of the equation y = x 0 + x + 18 and the line y = x + c intersect at exactly one point. What is c? 18

More information

Linear & nonlinear classifiers

Linear & nonlinear classifiers Linear & nonlinear classifiers Machine Learning Hamid Beigy Sharif University of Technology Fall 1394 Hamid Beigy (Sharif University of Technology) Linear & nonlinear classifiers Fall 1394 1 / 34 Table

More information

Information Theory in Intelligent Decision Making

Information Theory in Intelligent Decision Making Information Theory in Intelligent Decision Making Adaptive Systems and Algorithms Research Groups School of Computer Science University of Hertfordshire, United Kingdom June 7, 2015 Information Theory

More information

CS261: Problem Set #3

CS261: Problem Set #3 CS261: Problem Set #3 Due by 11:59 PM on Tuesday, February 23, 2016 Instructions: (1) Form a group of 1-3 students. You should turn in only one write-up for your entire group. (2) Submission instructions:

More information

A Partial List of Topics: Math Spring 2009

A Partial List of Topics: Math Spring 2009 A Partial List of Topics: Math 112 - Spring 2009 This is a partial compilation of a majority of the topics covered this semester and may not include everything which might appear on the exam. The purpose

More information

Chapter 2. Binary and M-ary Hypothesis Testing 2.1 Introduction (Levy 2.1)

Chapter 2. Binary and M-ary Hypothesis Testing 2.1 Introduction (Levy 2.1) Chapter 2. Binary and M-ary Hypothesis Testing 2.1 Introduction (Levy 2.1) Detection problems can usually be casted as binary or M-ary hypothesis testing problems. Applications: This chapter: Simple hypothesis

More information

CSCI 1951-G Optimization Methods in Finance Part 10: Conic Optimization

CSCI 1951-G Optimization Methods in Finance Part 10: Conic Optimization CSCI 1951-G Optimization Methods in Finance Part 10: Conic Optimization April 6, 2018 1 / 34 This material is covered in the textbook, Chapters 9 and 10. Some of the materials are taken from it. Some of

More information

Theorem 2.1 (Caratheodory). A (countably additive) probability measure on a field has an extension. n=1

Theorem 2.1 (Caratheodory). A (countably additive) probability measure on a field has an extension. n=1 Chapter 2 Probability measures 1. Existence Theorem 2.1 (Caratheodory). A (countably additive) probability measure on a field has an extension to the generated σ-field Proof of Theorem 2.1. Let F 0 be

More information

LOW-density parity-check (LDPC) codes were invented

LOW-density parity-check (LDPC) codes were invented IEEE TRANSACTIONS ON INFORMATION THEORY, VOL 54, NO 1, JANUARY 2008 51 Extremal Problems of Information Combining Yibo Jiang, Alexei Ashikhmin, Member, IEEE, Ralf Koetter, Senior Member, IEEE, and Andrew

More information

Linear & nonlinear classifiers

Linear & nonlinear classifiers Linear & nonlinear classifiers Machine Learning Hamid Beigy Sharif University of Technology Fall 1396 Hamid Beigy (Sharif University of Technology) Linear & nonlinear classifiers Fall 1396 1 / 44 Table

More information

Subset Universal Lossy Compression

Subset Universal Lossy Compression Subset Universal Lossy Compression Or Ordentlich Tel Aviv University ordent@eng.tau.ac.il Ofer Shayevitz Tel Aviv University ofersha@eng.tau.ac.il Abstract A lossy source code C with rate R for a discrete

More information

Homework 4, 5, 6 Solutions. > 0, and so a n 0 = n + 1 n = ( n+1 n)( n+1+ n) 1 if n is odd 1/n if n is even diverges.

Homework 4, 5, 6 Solutions. > 0, and so a n 0 = n + 1 n = ( n+1 n)( n+1+ n) 1 if n is odd 1/n if n is even diverges. 2..2(a) lim a n = 0. Homework 4, 5, 6 Solutions Proof. Let ɛ > 0. Then for n n = 2+ 2ɛ we have 2n 3 4+ ɛ 3 > ɛ > 0, so 0 < 2n 3 < ɛ, and thus a n 0 = 2n 3 < ɛ. 2..2(g) lim ( n + n) = 0. Proof. Let ɛ >

More information

Optimal matching in wireless sensor networks

Optimal matching in wireless sensor networks Optimal matching in wireless sensor networks A. Roumy, D. Gesbert INRIA-IRISA, Rennes, France. Institute Eurecom, Sophia Antipolis, France. Abstract We investigate the design of a wireless sensor network

More information

A Rothschild-Stiglitz approach to Bayesian persuasion

A Rothschild-Stiglitz approach to Bayesian persuasion A Rothschild-Stiglitz approach to Bayesian persuasion Matthew Gentzkow and Emir Kamenica Stanford University and University of Chicago December 2015 Abstract Rothschild and Stiglitz (1970) represent random

More information

Maximum Likelihood (ML), Expectation Maximization (EM) Pieter Abbeel UC Berkeley EECS

Maximum Likelihood (ML), Expectation Maximization (EM) Pieter Abbeel UC Berkeley EECS Maximum Likelihood (ML), Expectation Maximization (EM) Pieter Abbeel UC Berkeley EECS Many slides adapted from Thrun, Burgard and Fox, Probabilistic Robotics Outline Maximum likelihood (ML) Priors, and

More information

Iowa State University. Instructor: Alex Roitershtein Summer Exam #1. Solutions. x u = 2 x v

Iowa State University. Instructor: Alex Roitershtein Summer Exam #1. Solutions. x u = 2 x v Math 501 Iowa State University Introduction to Real Analysis Department of Mathematics Instructor: Alex Roitershtein Summer 015 Exam #1 Solutions This is a take-home examination. The exam includes 8 questions.

More information

A Rothschild-Stiglitz approach to Bayesian persuasion

A Rothschild-Stiglitz approach to Bayesian persuasion A Rothschild-Stiglitz approach to Bayesian persuasion Matthew Gentzkow and Emir Kamenica Stanford University and University of Chicago January 2016 Consider a situation where one person, call him Sender,

More information

Power series solutions for 2nd order linear ODE s (not necessarily with constant coefficients) a n z n. n=0

Power series solutions for 2nd order linear ODE s (not necessarily with constant coefficients) a n z n. n=0 Lecture 22 Power series solutions for 2nd order linear ODE s (not necessarily with constant coefficients) Recall a few facts about power series: a n z n This series in z is centered at z 0. Here z can

More information

Bounded Expected Delay in Arithmetic Coding

Bounded Expected Delay in Arithmetic Coding Bounded Expected Delay in Arithmetic Coding Ofer Shayevitz, Ram Zamir, and Meir Feder Tel Aviv University, Dept. of EE-Systems Tel Aviv 69978, Israel Email: {ofersha, zamir, meir }@eng.tau.ac.il arxiv:cs/0604106v1

More information

Computational and Statistical Learning theory

Computational and Statistical Learning theory Computational and Statistical Learning theory Problem set 2 Due: January 31st Email solutions to : karthik at ttic dot edu Notation : Input space : X Label space : Y = {±1} Sample : (x 1, y 1,..., (x n,

More information

Online Learning with Experts & Multiplicative Weights Algorithms

Online Learning with Experts & Multiplicative Weights Algorithms Online Learning with Experts & Multiplicative Weights Algorithms CS 159 lecture #2 Stephan Zheng April 1, 2016 Caltech Table of contents 1. Online Learning with Experts With a perfect expert Without perfect

More information

Optimization and Optimal Control in Banach Spaces

Optimization and Optimal Control in Banach Spaces Optimization and Optimal Control in Banach Spaces Bernhard Schmitzer October 19, 2017 1 Convex non-smooth optimization with proximal operators Remark 1.1 (Motivation). Convex optimization: easier to solve,

More information

Algebra II Vocabulary Alphabetical Listing. Absolute Maximum: The highest point over the entire domain of a function or relation.

Algebra II Vocabulary Alphabetical Listing. Absolute Maximum: The highest point over the entire domain of a function or relation. Algebra II Vocabulary Alphabetical Listing Absolute Maximum: The highest point over the entire domain of a function or relation. Absolute Minimum: The lowest point over the entire domain of a function

More information

Doubly Indexed Infinite Series

Doubly Indexed Infinite Series The Islamic University of Gaza Deanery of Higher studies Faculty of Science Department of Mathematics Doubly Indexed Infinite Series Presented By Ahed Khaleel Abu ALees Supervisor Professor Eissa D. Habil

More information

Piecewise Constant Prediction

Piecewise Constant Prediction Piecewise Constant Prediction Erik Ordentlich, Marcelo J. Weinberger, Yihong Wu HP Laboratories HPL-01-114 Keyword(s): prediction; data compression; i strategies; universal schemes Abstract: Mini prediction

More information

Exercises for Unit VI (Infinite constructions in set theory)

Exercises for Unit VI (Infinite constructions in set theory) Exercises for Unit VI (Infinite constructions in set theory) VI.1 : Indexed families and set theoretic operations (Halmos, 4, 8 9; Lipschutz, 5.3 5.4) Lipschutz : 5.3 5.6, 5.29 5.32, 9.14 1. Generalize

More information

ALGEBRA I CCR MATH STANDARDS

ALGEBRA I CCR MATH STANDARDS RELATIONSHIPS BETWEEN QUANTITIES AND REASONING WITH EQUATIONS M.A1HS.1 M.A1HS.2 M.A1HS.3 M.A1HS.4 M.A1HS.5 M.A1HS.6 M.A1HS.7 M.A1HS.8 M.A1HS.9 M.A1HS.10 Reason quantitatively and use units to solve problems.

More information

1 Markov decision processes

1 Markov decision processes 2.997 Decision-Making in Large-Scale Systems February 4 MI, Spring 2004 Handout #1 Lecture Note 1 1 Markov decision processes In this class we will study discrete-time stochastic systems. We can describe

More information

Computational Game Theory Spring Semester, 2005/6. Lecturer: Yishay Mansour Scribe: Ilan Cohen, Natan Rubin, Ophir Bleiberg*

Computational Game Theory Spring Semester, 2005/6. Lecturer: Yishay Mansour Scribe: Ilan Cohen, Natan Rubin, Ophir Bleiberg* Computational Game Theory Spring Semester, 2005/6 Lecture 5: 2-Player Zero Sum Games Lecturer: Yishay Mansour Scribe: Ilan Cohen, Natan Rubin, Ophir Bleiberg* 1 5.1 2-Player Zero Sum Games In this lecture

More information

= 10 such triples. If it is 5, there is = 1 such triple. Therefore, there are a total of = 46 such triples.

= 10 such triples. If it is 5, there is = 1 such triple. Therefore, there are a total of = 46 such triples. . Two externally tangent unit circles are constructed inside square ABCD, one tangent to AB and AD, the other to BC and CD. Compute the length of AB. Answer: + Solution: Observe that the diagonal of the

More information

Module 1. Probability

Module 1. Probability Module 1 Probability 1. Introduction In our daily life we come across many processes whose nature cannot be predicted in advance. Such processes are referred to as random processes. The only way to derive

More information

Kolmogorov complexity

Kolmogorov complexity Kolmogorov complexity In this section we study how we can define the amount of information in a bitstring. Consider the following strings: 00000000000000000000000000000000000 0000000000000000000000000000000000000000

More information

Lecture 9. d N(0, 1). Now we fix n and think of a SRW on [0,1]. We take the k th step at time k n. and our increments are ± 1

Lecture 9. d N(0, 1). Now we fix n and think of a SRW on [0,1]. We take the k th step at time k n. and our increments are ± 1 Random Walks and Brownian Motion Tel Aviv University Spring 011 Lecture date: May 0, 011 Lecture 9 Instructor: Ron Peled Scribe: Jonathan Hermon In today s lecture we present the Brownian motion (BM).

More information

Worst-Case Optimal Redistribution of VCG Payments in Multi-Unit Auctions

Worst-Case Optimal Redistribution of VCG Payments in Multi-Unit Auctions Worst-Case Optimal Redistribution of VCG Payments in Multi-Unit Auctions Mingyu Guo Duke University Dept. of Computer Science Durham, NC, USA mingyu@cs.duke.edu Vincent Conitzer Duke University Dept. of

More information

On Fast Bitonic Sorting Networks

On Fast Bitonic Sorting Networks On Fast Bitonic Sorting Networks Tamir Levi Ami Litman January 1, 2009 Abstract This paper studies fast Bitonic sorters of arbitrary width. It constructs such a sorter of width n and depth log(n) + 3,

More information

Lecture 5: Logistic Regression. Neural Networks

Lecture 5: Logistic Regression. Neural Networks Lecture 5: Logistic Regression. Neural Networks Logistic regression Comparison with generative models Feed-forward neural networks Backpropagation Tricks for training neural networks COMP-652, Lecture

More information

Lecture 4: Proof of Shannon s theorem and an explicit code

Lecture 4: Proof of Shannon s theorem and an explicit code CSE 533: Error-Correcting Codes (Autumn 006 Lecture 4: Proof of Shannon s theorem and an explicit code October 11, 006 Lecturer: Venkatesan Guruswami Scribe: Atri Rudra 1 Overview Last lecture we stated

More information

We have been going places in the car of calculus for years, but this analysis course is about how the car actually works.

We have been going places in the car of calculus for years, but this analysis course is about how the car actually works. Analysis I We have been going places in the car of calculus for years, but this analysis course is about how the car actually works. Copier s Message These notes may contain errors. In fact, they almost

More information

On the errors introduced by the naive Bayes independence assumption

On the errors introduced by the naive Bayes independence assumption On the errors introduced by the naive Bayes independence assumption Author Matthijs de Wachter 3671100 Utrecht University Master Thesis Artificial Intelligence Supervisor Dr. Silja Renooij Department of

More information

On the smoothness of the conjugacy between circle maps with a break

On the smoothness of the conjugacy between circle maps with a break On the smoothness of the conjugacy between circle maps with a break Konstantin Khanin and Saša Kocić 2 Department of Mathematics, University of Toronto, Toronto, ON, Canada M5S 2E4 2 Department of Mathematics,

More information

The Capacity of Finite Abelian Group Codes Over Symmetric Memoryless Channels Giacomo Como and Fabio Fagnani

The Capacity of Finite Abelian Group Codes Over Symmetric Memoryless Channels Giacomo Como and Fabio Fagnani IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 55, NO. 5, MAY 2009 2037 The Capacity of Finite Abelian Group Codes Over Symmetric Memoryless Channels Giacomo Como and Fabio Fagnani Abstract The capacity

More information

2. Two binary operations (addition, denoted + and multiplication, denoted

2. Two binary operations (addition, denoted + and multiplication, denoted Chapter 2 The Structure of R The purpose of this chapter is to explain to the reader why the set of real numbers is so special. By the end of this chapter, the reader should understand the difference between

More information

POISSON PROCESSES 1. THE LAW OF SMALL NUMBERS

POISSON PROCESSES 1. THE LAW OF SMALL NUMBERS POISSON PROCESSES 1. THE LAW OF SMALL NUMBERS 1.1. The Rutherford-Chadwick-Ellis Experiment. About 90 years ago Ernest Rutherford and his collaborators at the Cavendish Laboratory in Cambridge conducted

More information

CS6375: Machine Learning Gautam Kunapuli. Decision Trees

CS6375: Machine Learning Gautam Kunapuli. Decision Trees Gautam Kunapuli Example: Restaurant Recommendation Example: Develop a model to recommend restaurants to users depending on their past dining experiences. Here, the features are cost (x ) and the user s

More information

Discrete Geometry. Problem 1. Austin Mohr. April 26, 2012

Discrete Geometry. Problem 1. Austin Mohr. April 26, 2012 Discrete Geometry Austin Mohr April 26, 2012 Problem 1 Theorem 1 (Linear Programming Duality). Suppose x, y, b, c R n and A R n n, Ax b, x 0, A T y c, and y 0. If x maximizes c T x and y minimizes b T

More information

PETERS TOWNSHIP HIGH SCHOOL

PETERS TOWNSHIP HIGH SCHOOL PETERS TOWNSHIP HIGH SCHOOL COURSE SYLLABUS: ALG EBRA 2 HONORS Course Overview and Essential Skills This course is an in-depth study of the language, concepts, and techniques of Algebra that will prepare

More information

1 Sequences and Summation

1 Sequences and Summation 1 Sequences and Summation A sequence is a function whose domain is either all the integers between two given integers or all the integers greater than or equal to a given integer. For example, a m, a m+1,...,

More information

Dublin City Schools Mathematics Graded Course of Study Algebra I Philosophy

Dublin City Schools Mathematics Graded Course of Study Algebra I Philosophy Philosophy The Dublin City Schools Mathematics Program is designed to set clear and consistent expectations in order to help support children with the development of mathematical understanding. We believe

More information

Advanced topic: Space complexity

Advanced topic: Space complexity Advanced topic: Space complexity CSCI 3130 Formal Languages and Automata Theory Siu On CHAN Chinese University of Hong Kong Fall 2016 1/28 Review: time complexity We have looked at how long it takes to

More information

CSC 5170: Theory of Computational Complexity Lecture 4 The Chinese University of Hong Kong 1 February 2010

CSC 5170: Theory of Computational Complexity Lecture 4 The Chinese University of Hong Kong 1 February 2010 CSC 5170: Theory of Computational Complexity Lecture 4 The Chinese University of Hong Kong 1 February 2010 Computational complexity studies the amount of resources necessary to perform given computations.

More information

arxiv: v1 [math.co] 17 Dec 2007

arxiv: v1 [math.co] 17 Dec 2007 arxiv:07.79v [math.co] 7 Dec 007 The copies of any permutation pattern are asymptotically normal Milós Bóna Department of Mathematics University of Florida Gainesville FL 36-805 bona@math.ufl.edu Abstract

More information

The Mandelbrot Set and the Farey Tree

The Mandelbrot Set and the Farey Tree The Mandelbrot Set and the Farey Tree Emily Allaway May 206 Contents Introduction 2 2 The Mandelbrot Set 2 2. Definitions of the Mandelbrot Set................ 2 2.2 The First Folk Theorem.....................

More information

Counting independent sets up to the tree threshold

Counting independent sets up to the tree threshold Counting independent sets up to the tree threshold Dror Weitz March 16, 2006 Abstract We consider the problem of approximately counting/sampling weighted independent sets of a graph G with activity λ,

More information

COUNTING NUMERICAL SEMIGROUPS BY GENUS AND SOME CASES OF A QUESTION OF WILF

COUNTING NUMERICAL SEMIGROUPS BY GENUS AND SOME CASES OF A QUESTION OF WILF COUNTING NUMERICAL SEMIGROUPS BY GENUS AND SOME CASES OF A QUESTION OF WILF NATHAN KAPLAN Abstract. The genus of a numerical semigroup is the size of its complement. In this paper we will prove some results

More information

Digital Image Processing Lectures 25 & 26

Digital Image Processing Lectures 25 & 26 Lectures 25 & 26, Professor Department of Electrical and Computer Engineering Colorado State University Spring 2015 Area 4: Image Encoding and Compression Goal: To exploit the redundancies in the image

More information

Common Core State Standards with California Additions 1 Standards Map. Algebra I

Common Core State Standards with California Additions 1 Standards Map. Algebra I Common Core State s with California Additions 1 s Map Algebra I *Indicates a modeling standard linking mathematics to everyday life, work, and decision-making N-RN 1. N-RN 2. Publisher Language 2 Primary

More information

Asymptotically optimal induced universal graphs

Asymptotically optimal induced universal graphs Asymptotically optimal induced universal graphs Noga Alon Abstract We prove that the minimum number of vertices of a graph that contains every graph on vertices as an induced subgraph is (1+o(1))2 ( 1)/2.

More information

Algebra 2 Khan Academy Video Correlations By SpringBoard Activity

Algebra 2 Khan Academy Video Correlations By SpringBoard Activity SB Activity Activity 1 Creating Equations 1-1 Learning Targets: Create an equation in one variable from a real-world context. Solve an equation in one variable. 1-2 Learning Targets: Create equations in

More information

A function is actually a simple concept; if it were not, history would have replaced it with a simpler one by now! Here is the definition:

A function is actually a simple concept; if it were not, history would have replaced it with a simpler one by now! Here is the definition: 1.2 Functions and Their Properties A function is actually a simple concept; if it were not, history would have replaced it with a simpler one by now! Here is the definition: Definition: Function, Domain,

More information

Algebra 2 Khan Academy Video Correlations By SpringBoard Activity

Algebra 2 Khan Academy Video Correlations By SpringBoard Activity SB Activity Activity 1 Creating Equations 1-1 Learning Targets: Create an equation in one variable from a real-world context. Solve an equation in one variable. 1-2 Learning Targets: Create equations in

More information

Regularity and optimization

Regularity and optimization Regularity and optimization Bruno Gaujal INRIA SDA2, 2007 B. Gaujal (INRIA ) Regularity and Optimization SDA2, 2007 1 / 39 Majorization x = (x 1,, x n ) R n, x is a permutation of x with decreasing coordinates.

More information

Worst-Case Violation of Sampled Convex Programs for Optimization with Uncertainty

Worst-Case Violation of Sampled Convex Programs for Optimization with Uncertainty Worst-Case Violation of Sampled Convex Programs for Optimization with Uncertainty Takafumi Kanamori and Akiko Takeda Abstract. Uncertain programs have been developed to deal with optimization problems

More information

No EFFICIENT LINE SEARCHING FOR CONVEX FUNCTIONS. By E. den Boef, D. den Hertog. May 2004 ISSN

No EFFICIENT LINE SEARCHING FOR CONVEX FUNCTIONS. By E. den Boef, D. den Hertog. May 2004 ISSN No. 4 5 EFFICIENT LINE SEARCHING FOR CONVEX FUNCTIONS y E. den oef, D. den Hertog May 4 ISSN 94-785 Efficient Line Searching for Convex Functions Edgar den oef Dick den Hertog 3 Philips Research Laboratories,

More information

Codes for Partially Stuck-at Memory Cells

Codes for Partially Stuck-at Memory Cells 1 Codes for Partially Stuck-at Memory Cells Antonia Wachter-Zeh and Eitan Yaakobi Department of Computer Science Technion Israel Institute of Technology, Haifa, Israel Email: {antonia, yaakobi@cs.technion.ac.il

More information

Theorems. Theorem 1.11: Greatest-Lower-Bound Property. Theorem 1.20: The Archimedean property of. Theorem 1.21: -th Root of Real Numbers

Theorems. Theorem 1.11: Greatest-Lower-Bound Property. Theorem 1.20: The Archimedean property of. Theorem 1.21: -th Root of Real Numbers Page 1 Theorems Wednesday, May 9, 2018 12:53 AM Theorem 1.11: Greatest-Lower-Bound Property Suppose is an ordered set with the least-upper-bound property Suppose, and is bounded below be the set of lower

More information

MA008/MIIZ01 Design and Analysis of Algorithms Lecture Notes 3

MA008/MIIZ01 Design and Analysis of Algorithms Lecture Notes 3 MA008 p.1/37 MA008/MIIZ01 Design and Analysis of Algorithms Lecture Notes 3 Dr. Markus Hagenbuchner markus@uow.edu.au. MA008 p.2/37 Exercise 1 (from LN 2) Asymptotic Notation When constants appear in exponents

More information

CSE 206A: Lattice Algorithms and Applications Spring Basic Algorithms. Instructor: Daniele Micciancio

CSE 206A: Lattice Algorithms and Applications Spring Basic Algorithms. Instructor: Daniele Micciancio CSE 206A: Lattice Algorithms and Applications Spring 2014 Basic Algorithms Instructor: Daniele Micciancio UCSD CSE We have already seen an algorithm to compute the Gram-Schmidt orthogonalization of a lattice

More information

The main results about probability measures are the following two facts:

The main results about probability measures are the following two facts: Chapter 2 Probability measures The main results about probability measures are the following two facts: Theorem 2.1 (extension). If P is a (continuous) probability measure on a field F 0 then it has a

More information

Today s class. Constrained optimization Linear programming. Prof. Jinbo Bi CSE, UConn. Numerical Methods, Fall 2011 Lecture 12

Today s class. Constrained optimization Linear programming. Prof. Jinbo Bi CSE, UConn. Numerical Methods, Fall 2011 Lecture 12 Today s class Constrained optimization Linear programming 1 Midterm Exam 1 Count: 26 Average: 73.2 Median: 72.5 Maximum: 100.0 Minimum: 45.0 Standard Deviation: 17.13 Numerical Methods Fall 2011 2 Optimization

More information

5682 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 55, NO. 12, DECEMBER /$ IEEE

5682 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 55, NO. 12, DECEMBER /$ IEEE 5682 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 55, NO. 12, DECEMBER 2009 Hyperplane-Based Vector Quantization for Distributed Estimation in Wireless Sensor Networks Jun Fang, Member, IEEE, and Hongbin

More information

Limiting behaviour of large Frobenius numbers. V.I. Arnold

Limiting behaviour of large Frobenius numbers. V.I. Arnold Limiting behaviour of large Frobenius numbers by J. Bourgain and Ya. G. Sinai 2 Dedicated to V.I. Arnold on the occasion of his 70 th birthday School of Mathematics, Institute for Advanced Study, Princeton,

More information

INFORMATION PROCESSING ABILITY OF BINARY DETECTORS AND BLOCK DECODERS. Michael A. Lexa and Don H. Johnson

INFORMATION PROCESSING ABILITY OF BINARY DETECTORS AND BLOCK DECODERS. Michael A. Lexa and Don H. Johnson INFORMATION PROCESSING ABILITY OF BINARY DETECTORS AND BLOCK DECODERS Michael A. Lexa and Don H. Johnson Rice University Department of Electrical and Computer Engineering Houston, TX 775-892 amlexa@rice.edu,

More information

The Multi-Arm Bandit Framework

The Multi-Arm Bandit Framework The Multi-Arm Bandit Framework A. LAZARIC (SequeL Team @INRIA-Lille) ENS Cachan - Master 2 MVA SequeL INRIA Lille MVA-RL Course In This Lecture A. LAZARIC Reinforcement Learning Algorithms Oct 29th, 2013-2/94

More information

Computing Minimal Polynomial of Matrices over Algebraic Extension Fields

Computing Minimal Polynomial of Matrices over Algebraic Extension Fields Bull. Math. Soc. Sci. Math. Roumanie Tome 56(104) No. 2, 2013, 217 228 Computing Minimal Polynomial of Matrices over Algebraic Extension Fields by Amir Hashemi and Benyamin M.-Alizadeh Abstract In this

More information

Standard forms for writing numbers

Standard forms for writing numbers Standard forms for writing numbers In order to relate the abstract mathematical descriptions of familiar number systems to the everyday descriptions of numbers by decimal expansions and similar means,

More information

EGYPTIAN FRACTIONS WITH EACH DENOMINATOR HAVING THREE DISTINCT PRIME DIVISORS

EGYPTIAN FRACTIONS WITH EACH DENOMINATOR HAVING THREE DISTINCT PRIME DIVISORS #A5 INTEGERS 5 (205) EGYPTIAN FRACTIONS WITH EACH DENOMINATOR HAVING THREE DISTINCT PRIME DIVISORS Steve Butler Department of Mathematics, Iowa State University, Ames, Iowa butler@iastate.edu Paul Erdős

More information

Space Complexity. The space complexity of a program is how much memory it uses.

Space Complexity. The space complexity of a program is how much memory it uses. Space Complexity The space complexity of a program is how much memory it uses. Measuring Space When we compute the space used by a TM, we do not count the input (think of input as readonly). We say that

More information

STIT Tessellations via Martingales

STIT Tessellations via Martingales STIT Tessellations via Martingales Christoph Thäle Department of Mathematics, Osnabrück University (Germany) Stochastic Geometry Days, Lille, 30.3.-1.4.2011 Christoph Thäle (Uni Osnabrück) STIT Tessellations

More information

Mathematical Methods for Physics and Engineering

Mathematical Methods for Physics and Engineering Mathematical Methods for Physics and Engineering Lecture notes for PDEs Sergei V. Shabanov Department of Mathematics, University of Florida, Gainesville, FL 32611 USA CHAPTER 1 The integration theory

More information

Algorithms, Design and Analysis. Order of growth. Table 2.1. Big-oh. Asymptotic growth rate. Types of formulas for basic operation count

Algorithms, Design and Analysis. Order of growth. Table 2.1. Big-oh. Asymptotic growth rate. Types of formulas for basic operation count Types of formulas for basic operation count Exact formula e.g., C(n) = n(n-1)/2 Algorithms, Design and Analysis Big-Oh analysis, Brute Force, Divide and conquer intro Formula indicating order of growth

More information