Bounds on the Minimax Rate for Estimating a Prior over a VC Class from Independent Learning Tasks
|
|
- Valentine Conley
- 5 years ago
- Views:
Transcription
1 Bounds on the Miniax Rate for Estiating a Prior over a VC Class fro Independent Learning Tasks Liu Yang Steve Hanneke Jaie Carbonell Deceber 01 CMU-ML-1-11 School of Coputer Science Carnegie Mellon University Pittsburgh, PA 1513 Abstract We study the optial rates of convergence for estiating a prior distribution over a VC class fro a sequence of independent data sets respectively labeled by independent target functions sapled fro the prior. We specifically derive upper and lower bounds on the optial rates under a soothness condition on the correct prior, with the nuber of saples per data set equal the VC diension. These results have iplications for the iproveents achievable via transfer learning. This research is supported in part by NSF grant IIS and a Google Core AI grant.
2 Keywords: Miniax Rates, Transfer Learning, VC Diension, Bayesian Learning
3 1 Introduction In the transfer learning setting, we are presented with a sequence of learning probles, each with soe respective target concept we are tasked with learning. The key question in transfer learning is how to leverage our access to past learning probles in order to iprove perforance on learning probles we will be presented with in the future. Aong the several proposed odels for transfer learning, one particularly appealing odel supposes the learning probles are independent and identically distributed, with unknown distribution, and the advantage of transfer learning then coes fro the ability to estiate this shared distribution based on the data fro past learning probles [Baxter, 1997, Yang et al., 011]. For instance, when custoizing a speech recognition syste to a particular speaker s voice, we ight expect the first few people would need to speak any words or phrases in order for the syste to accurately identify the nuances. However, after perforing this for any different people, if the software has access to those past training sessions when custoizing itself to a new user, it should have identified iportant properties of the speech patterns, such as the coon patterns within each of the ajor dialects or accents, and other such inforation about the distribution of speech patterns within the user population. It should then be able to leverage this inforation to reduce the nuber of words or phrases the next user needs to speak in order to train the syste, for instance by first trying to identify the individual s dialect, then presenting phrases that differentiate coon subpatterns within that dialect, and so forth. In analyzing the benefits of transfer learning in such a setting, one iportant question to ask is how quickly we can estiate the distribution fro which the learning probles are sapled. In recent work, [Yang et al., 011] have shown that under ild conditions on the faily of possible distributions, if the target concepts reside in a known VC class, then it is possible to estiate this distribtion to arbitrary precision using only a bounded nuber of training saples per task: specifically, a nuber of saples equal the VC diension. However, that work left open the question of quantifying the rate of convergence. This rate of convergence can have a direct ipact on how uch benefit we gain fro transfer learning when we are faced with only a finite sequence of learning probles. As such, it is certainly desirable to derive tight characterizations of this rate of convergence. The present work continues that of [Yang et al., 011], bounding the rate of convergence for estiating this distribution, under a soothness condition on the distribution. We derive a generic upper bound, which holds regardless of the VC class the target concepts reside in. The proof of this result builds on the earlier work of [Yang et al., 011], but requires several interesting innovations to ake the rate of convergence explicit, and to draatically iprove the upper bounds on certain quantities copared to the analogous bounds iplicit in the original proofs. We further derive a nontrivial lower bound that holds for certain constructed scenarios, which illustrates a lower liit on how good of a general upper bound we ight hope for in results expressed only in ters of the nuber of tasks, the soothness conditions, and the VC diension. 1
4 The Setting Let (X, B X ) be a Borel space [Schervish, 1995] (where X is called the instance space), and let D be a distribution on X (called the data distribution). Let C be a VC class of easurable classifiers h : X { 1, +1} (called the concept space), and denote by d the VC diension of C [Vapnik, 198]. We suppose C is equipped with its Borel σ-algebra B induced by the pseudo-etric ρ(h, g) = D({x X : h(x) g(x)}). Though our results can be forulated for general D (with soewhat ore coplicated theore stateents), to siplify the stateent of results we suppose ρ is actually a etric, which would follow fro appropriate topological conditions on C relative to D. For any two probability easures µ 1, µ on a easurable space (Ω, F), define the total variation distance µ 1 µ = sup µ 1 (A) µ (A). A F Let Π Θ = {π θ : θ Θ} be a faily of probability easures on C (called priors), where Θ is an arbitrary index set (called the paraeter space). We additionally suppose there exists a probability easure π 0 on C (called the reference easure) such that every π θ is absolutely continuous with respect to π 0, and therefore has a density function f θ given by the Radon-Nikody derivative dπ θ dπ 0 [Schervish, 1995]. We consider the following type of estiation proble. There is a collection of C-valued rando variables {h tθ : t N, θ Θ}, where for any fixed θ Θ the {h tθ } t=1 variables are i.i.d. with distribution π θ. For each θ Θ, there is a sequence Z t (θ) = {(X t1, Y t1 (θ)), (X t, Y t (θ)),...}, where {X ti } t,i N are i.i.d. D, and for each t, i N, Y ti (θ) = h tθ (X ti). We additionally denote by Z tk = {(X t1, Y t1 (θ)),..., (X tk, Y tk (θ))} the first k eleents of Z t (θ), for any k N, and siilarly X tk = {X t1,..., X tk } and Y tk (θ) = {Y t1 (θ),..., Y tk (θ)}. Following the terinology used in the transfer learning literature, we refer to the collection of variables associated with each t collectively as the t th task. We will be concerned with sequences of estiators ˆθ T θ = ˆθ T (Z 1k (θ),..., Z T k (θ)), for T N, which are based on only a bounded nuber k of saples per task, aong the first T tasks. Our ain [ results specifically ] study the case of k = d. For any such estiator, we easure the risk as E πˆθt θ π θ, and will be particularly interested in upper-bounding the worst-case [ ] risk sup θ Θ E πˆθt θ π θ as a function of T, and lower-bounding the iniu possible value of this worst-case risk over all possible ˆθ T estiators (called the iniax risk). In previous work, [Yang et al., 011] showed that, if Π Θ is a totally bounded faily, then even with only d nuber of saples per task, the iniax risk (as a function of the nuber of tasks T ) converges to zero. In fact, that work also includes a proof that this is not necessarily the case in general for any nuber of saples less than d. However, the actual rates of convergence were not explicitly derived in that work, and indeed the upper bounds on the rates of convergence iplicit in that analysis ay often have fairly coplicated dependences on C, Π Θ, and D, and furtherore often provide only very slow rates of convergence. To derive explicit bounds on the rates of convergence, in the present work we specifically focus on failies of sooth densities. The otivation for involving a notion of soothness in characterizing rates of convergence is clear if we consider the extree case in which Π Θ contains two priors π 1 and π, with π 1 ({h}) = π ({g}) = 1, where ρ(h, g) is a very sall but nonzero
5 value; in this case, if we have only a sall nuber of saples per task, we would require any tasks (on the order of 1/ρ(h, g)) to observe any data points carrying any inforation that would distinguish between these two priors (naely, points x with h(x) g(x)); yet π 1 π = 1, so that we have a slow rate of convergence (at least initially). A total boundedness condition on Π Θ would liit the nuber of such pairs present in Π Θ, so that for instance we cannot have arbitrarily close h and g, but less extree variants of this can lead to slow asyptotic rates of convergence as well. Specifically, in the present work we consider the following notion of soothness. For L (0, ) and α (0, 1], a function f : C R is (L, α)-hölder sooth if h, g C, f(h) f(g) Lρ(h, g) α. 3 An Upper Bound We now have the following theore, holding for an arbitrary VC class C and data distribution D; it is the ain result of this work. Theore 1. For Π Θ any class of priors on C having (L, α)-hölder sooth densities {f θ : θ Θ}, for any T N, there exists an estiator ˆθ T θ = ˆθ T (Z 1d (θ),..., Z T d (θ)) such that ( ) sup π E πˆθt θ = Õ LT α (d+α)(α+(d+1)). θ Θ Proof. By the standard PAC analysis [Vapnik, 198, Bluer et al., 1989], for any γ > 0, with probability greater than 1 γ, a saple of k = O((d/γ) log(1/γ)) rando points will partition C into regions of width less than γ. For brevity, we oit the t subscript on quantities such as Z tk (θ) throughout the following analysis, since the clais hold for any arbitrary value of t. For any θ Θ, let π θ denote a (conditional on X 1,..., X k ) distribution defined as follows. Let f θ denote the (conditional on X 1,..., X k ) density function of π θ with respect to π 0, and for any g C, let f θ (g) = π θ({h C: i k,h(x i )=g(x i )}) (or 0 if π π 0 ({h C: i k,h(x i )=g(x i )}) 0({h C : i k, h(x i ) = g(x i )}) = 0). In other words, π θ has the sae probability ass as π θ for each of the equivalence classes induced by X 1,..., X k, but conditioned on the equivalence class, siply has a constant-density distribution over that equivalence class. Note that, by the soothness condition, with probability greater than 1 γ, we have everywhere f θ (h) f θ(h) < Lγ α. So for any θ, θ Θ, with probability greater than 1 γ, π θ π θ = (1/) f θ f θ dπ 0 < Lγ α + (1/) f θ f θ dπ 0. Furtherore, since the regions that define f θ and f θ are the sae (naely, the partition induced by X 1,..., X k ), we have (1/) f θ f θ dπ 0 = 1 y 1,...,y k { 1,+1} π θ({h C : i k, h(x i ) = 3
6 y i }) π θ ({h C : i k, h(x i ) = y i }) = P Yk (θ) X k P Yk (θ ) X k. Thus, we have that with probability at least 1 γ, π θ π θ < Lγ α + P Yk (θ) X k P Yk (θ ) X k. Following analogous to the inductive arguent of [Yang et al., 011], suppose I {1,..., k}, fix x I X I and ȳ I { 1, +1} I. Then the ỹ I { 1, +1} I for which no h C has h( x I ) = ỹ I for which ȳ I ỹ I 1 is inial, has (1/) ȳ I ỹ I 1 d + 1, and for any i I with ȳ i ỹ i, letting ȳ j = ȳ j for j I \ {i} and ȳ i = ỹ i, we have and siilarly for θ, so that P YI (θ) X I (ȳ I x I ) = P YI\{i} (θ) X I\{i} (ȳ I\{i} x I\{i} ) P YI (θ) X I (ȳ I x I ), P YI (θ) X I (ȳ I x I ) P YI (θ ) X I (ȳ I x I ) P YI\{i} (θ) X I\{i} (ȳ I\{i} x I\{i} ) P YI\{i} (θ ) X I\{i} (ȳ I\{i} x I\{i} ) + P YI (θ) X I (ȳ I x I ) P YI (θ ) X I (ȳ I x I ). Now consider that these two ters inductively define a binary tree. Every tie the tree branches left once, it arrives at a difference of probabilities for a set I of one less eleent than that of its parent. Every tie the tree branches right once, it arrives at a difference of probabilities for a ȳ I one closer to an unrealized ỹ I than that of its parent. Say we stop branching the tree upon reaching a set I and a ȳ I such that either ȳ I is an unrealized labeling, or I = d. Thus, we can bound the original (root node) difference of probabilities by the su of the differences of probabilities for the leaf nodes with I = d. Any path in the tree can branch left at ost k d ties (total) before reaching a set I with only d eleents, and can branch right at ost d + 1 ties in a row before reaching a ȳ I such that both probabilities are zero, so that the difference is zero. So the depth of any leaf node with I = d is at ost (k d)d. Furtherore, at any level of the tree, fro left to right the nodes have strictly decreasing I values, so that the axiu width of the tree is at ost k d. So the total nuber of leaf nodes with I = d is at ost (k d) d. Thus, for any ȳ { 1, +1} k and x X k, Since P Yk (θ) X k (ȳ x) P Yk (θ ) X k (ȳ x) (k d) d ax ȳ d { 1,+1} d P Yk (θ) X k P Yk (θ ) X k = (1/) and by Sauer s Lea this is at ost ax P Yd (θ) X D {1,...,k} d d (ȳ d x D ) P Yd (θ ) X d (ȳ d x D ). ȳ k { 1,+1} k P Yk (θ) X k (ȳ k ) P Yk (θ ) X k (ȳ k ), we have that (ek) d ax P Yk (θ) X ȳ k { 1,+1} k k (ȳ k ) P Yk (θ ) X k (ȳ k ), P Yk (θ) X k P Yk (θ ) X k (ek) d k d ax ȳ d { 1,+1} d 4 ax P Yd (θ) X D {1,...,k} d D (ȳ d ) P Yd (θ ) X D (ȳ d ).
7 Thus, we have that [ π θ π θ = E π θ π θ < γ+lγ α +(ek) d k de ax ȳ d { 1,+1} d ] ax P Yd (θ) X D {1,...,k} d D (ȳ d ) P Yd (θ ) X D (ȳ d ). Note that [ E ax ȳ d { 1,+1} d ] ax P Yd (θ) X D {1,...,k} d D (ȳ d ) P Yd (θ ) X D (ȳ d ) [ ] P Yd (θ) X D (ȳ d ) P Yd (θ ) X D (ȳ d ) E ȳ d { 1,+1} d D {1,...,k} d (k) d ax ax E ȳ d { 1,+1} d D {1,...,k} d [ ] P Yd (θ) X D (ȳ d ) P Yd (θ ) X D (ȳ d ), and by exchangeability, this last line equals (k) d ax E [ P Yd (θ) X ȳ d { 1,+1} d d (ȳ d ) P Yd (θ ) X d (ȳ d ) ]. [Yang et al., 011] showed that E [ P Yd (θ) X d (ȳ d ) P Yd (θ ) X d (ȳ d ) ] 4 P Zd (θ) P Zd (θ ), so that in total we have π θ π θ < (L + 1)γ α + 4(ek) d+ P Zd (θ) P Zd (θ ). Plugging in the value of k = c(d/γ) log(1/γ), this is ( (L + 1)γ α + 4 ec d ( )) d+ 1 γ log P Zd (θ) P Zd (θ γ ). So the only reaining question is the rate of convergence of our estiate of P Zd (θ ). If N(ε) is the ε-covering nuber of {P Zd (θ) : θ Θ}, then taking ˆθ T θ as the iniu distance skeleton estiate of [Yatracos, 1985, Devroye & Lugosi, 001] achieves expected total variation distance ε fro π θ, for soe T = O((1/ε ) log N(ε/4)). We can partition C into O((L/ε) d/α ) cells of diaeter O((ε/L) 1/α ), and set a constant density value within each cell, on an O(ε)-grid of density values, and every prior with (L, α)-hölder sooth density will have density within ε of soe density so-constructed; there are then at ost (1/ε) O((L/ε)d/α) such densities, so this bounds the covering nubers of Π Θ. Furtherore, the covering nuber of Π Θ upper bounds N(ε) [Yang et al., 011], so that N(ε) (1/ε) O((L/ε)d/α). Solving T = O(ε (L/ε) d/α log(1/ε)) for ε, we have ε = O ( L ( log(t L) T ) ) α d+α. So this bounds the rate of convergence for E P Zd (ˆθ T ) P Z d (θ ), for ˆθ T the iniu distance skeleton 5
8 estiate. Plugging this rate into the bound on the priors, cobined with Jensen s inequality, we have ( π E πˆθt θ < (L + 1)γ α + 4 ec d ( )) ( d+ ( ) ) α 1 log(t L) γ log d+4α O L. γ T This holds for any γ > 0, so iniizing ) this expression over γ > 0 yields a bound on the rate. For instance, with γ = (T Õ α (d+α)(α+(d+1)), we have E πˆθt ( ) π θ = Õ LT α (d+α)(α+(d+1)). 4 A Miniax Lower Bound One natural quesiton is whether Theore 1 can generally be iproved. While we expect this to be true for soe fixed VC classes (e.g., those of finite size), and in any case we expect that soe of the constant factors in the exponent ay be iprovable, it is not at this tie clear whether the general for of T Θ(α /(d+α) ) is soeties optial. One way to investigate this question is to construct specific spaces C and distributions D for which a lower bound can be obtained. In particular, we are generally interested in exhibiting lower bounds that are worse than those that apply to the usual proble of density estiation based on direct access to the h tθ values (see Theore 3 below). Here we present a lower bound that is interesting for this reason. However, although larger than the optial rate for ethods with direct access to the target concepts, it is still far fro atching the upper bound above, so that the question of tightness reains open. Specifically, we have the following result. Theore. For any integer d 1, any L > 0, α (0, 1], there is a value C(d, L, α) (0, ) such that, for any T N, there exists an instance space X, a concept space C of VC diension d, a distribution D over X, and a distribution π 0 over C such that, for Π Θ a set of distributions over C with (L, α)-hölder sooth density functions with respect to π 0, any estiator ˆθ T = ˆθ T (Z 1d (θ ),..., Z T d (θ )) has sup E [ π πˆθt θ ] C(d, L, α)t α (d+α). θ Θ Proof. (Sketch) We proceed by a reduction fro the task of deterining the bias of a coin fro aong two given possibilities. Specifically, fix any γ (0, 1/), n N, and let B 1 (p),..., B n (p) be i.i.d Bernoulli(p) rando variables, for each p [0, 1]; then it is known that, for any (possibly nondeterinistic) decision rule ˆp n : {0, 1} n {(1 + γ)/, (1 γ)/}, 1 p {(1+γ)/,(1 γ)/} P(ˆp n (B 1 (p),..., B n (p)) p) (1/3) exp { 18γ n/3 }. (1) 6
9 This easily follows fro the results of [Wald, 1945, Bar-Yossef, 003], cobined with a result of [Poland & Hutter, 006] bounding the KL divergence. To use this result, we construct a learning proble as follows. Fix soe N with d, let X = {1,..., }, and let C be the space of all classifiers h : X { 1, +1} such that {x X : h(x) = +1} d. Clearly the VC diension of C is d. Define the distribution D as unifor over X. Finally, we specify a faily of (L, α)-hölder sooth priors, paraeterized by Θ = { 1, +1} ( d), as follows. Let γ = (L/)(1/) α. First, enuerate the ( ) d distinct d-sized subsets of {1,..., } as X 1, X,..., X ( d). Define the reference distribution π 0 by the property that, for any h C, letting q = {x : h(x) = +1}, π 0 ({h}) = ( 1)d( ) ( q d q / ) d. For any b = (b 1,..., b ( d) ) { 1, d), define the prior 1}( πb as the distribution of a rando variable h b specified by the following generative odel. Let i Unifor({1,..., ( ) d }), let C b (i ) Bernoulli((1 + γ b i )/); finally, h b Unifor({h C : {x : h(x) = +1} X i, Parity( {x : h(x) = +1} ) = C b (i )}), where Parity(n) is 1 if n is odd, or 0 if n is even. We will refer to the variables in this generative odel below. For any h C, letting H = {x : ) 1 ( d) h(x) = +1} and q = H, we can equivalently express π b ({h}) = ( 1)d( d 1[H X i ](1 + γ b i ) Parity(q) (1 γ b i ) 1 Parity(q). Fro this explicit representation, it is clear that, letting f b = dπ b dπ 0, we have f b (h) [1 γ, 1+γ ] for all h C. The fact that f b is Hölder sooth follows fro this, since every distinct h, g C have D({x : h(x) g(x)}) 1/ = (γ /L) 1/α. Next we set up the reduction as follows. For any estiator ˆπ T = ˆπ T (Z 1d (θ ),..., Z T d (θ )), and each i {1,..., ( ) d }, let hi be the classifier with {x : h i (x) = +1} = X i ; also, if ˆπ T ({h i }) > ( 1 )d / ( ) d, let ˆbi = Parity(d) 1, and otherwise ˆb i = 1 Parity(d). We use these ˆb i values to estiate the original b i values. Specifically, let ˆp i = (1 + γ ˆbi )/ and p i = (1 + γ b i )/, where b = θ. Then ( d) ˆπ T π θ (1/) ˆπ T ({h i }) π θ ({h i }) ( d) (1/) ( d) = (1/) γ d( d 1 d( d ) ˆb i b i / ) ˆp i p i. Thus, we have reduced fro the proble of deciding the biases of these ( ) d independent Bernoulli rando variables. To coplete the proof, it suffices to lower bound the expectation of the right side for an arbitrary estiator. Toward this end, we in fact study an even easier proble. Specifically, consider an estiator ˆq i = ˆq i (Z 1d (θ ),..., Z T d (θ ), i 1,..., i T ), where i t is the i rando variable in the generative odel that defines h tθ ; that is, i t Unifor({1,..., ( ) d }), Ct Bernoulli((1 + γ b i )/), t and h tθ Unifor({h C : {x : h(x) = +1} X i t, Parity( {x : h(x) = +1} ) = C t }), 7
10 where the i t are independent across t, as are the C t and h tθ. Clearly the ˆp i fro above can be viewed as an estiator of this type, which siply ignores the knowledge of i t. The knowledge of these i t variables siplifies the analysis, since given {i t : t T }, the data can be partitioned into ( ) d disjoint sets, {{Ztd (θ ) : i t = i} : i = 1,..., ( ) d }, and we can use only the set {Z td (θ ) : i t = i} to estiate p i. Furtherore, we can use only the subset of these for which X td = X i, since otherwise we have zero inforation about the value of Parity( {x : h tθ (x) = +1} ). That is, given i t = i, any Z td (θ ) is conditionally independent fro every b j for j i, and is even conditionally independent fro b i when X td is not copletely contained in X i ; specifically, in this case, regardless of b i, the conditional distribution of Y td (θ ) given i t = i and given X td is a product distribution, which deterinistically assigns label 1 to those Y tk (θ ) with X tk / X i, and gives unifor rando values to the subset of Y td (θ ) with their respective X tk X i. Finally, letting r t = Parity( {k d : Y tk (θ ) = +1} ), we note that given i t = i, X td = X i, and the value r t, b i is conditionally independent fro Z td (θ ). Thus, the set of values C it (θ ) = {r t : i t = i, X td = X i } is a sufficient statistic for b i (hence for p i ). Recall that, when i t = i and X td = X i, the value of r t is equal to C t, a Bernoulli(p i ) rando variable. Thus, we neither lose nor gain anything (in ters of risk) by restricting ourselves to estiators ˆq i of the type ˆq i = ˆq i (Z 1d (θ ),..., Z T d (θ ), i 1,..., i T ) = ˆq i(c it (θ )), for soe ˆq i [Schervish, 1995]: that is, estiators that are a function of the N it (θ ) = C it (θ ) Bernoulli(p i ) rando variables, which we should note are conditionally i.i.d. given N it (θ ). Thus, by (1), for any n T, 1 [ ] E ˆq i p i N it (θ ) = n b i { 1,+1} = 1 ( ) NiT γ P ˆq i p i (θ ) = n b i { 1,+1} (γ /3) exp { 18γ N i /3 }. Also note that, for each i, E[N i ] = d!(1/)d T (d/) ( d) d T = d d (γ /L) d/α T, so that Jensen s inequality, linearity of expectation, and the law of total expectation iply 1 E [ ˆq i p i ] (γ /3) exp { 43(/L) d/α d d γ +d/α T }. b i { 1,+1} Thus, by linearity of the expectation, ( ) 1 ( d) b { 1,+1} ( d) = ( d) 1 d( ) 1 d b i { 1,+1} ( d) E 1 d( d E [ ˆq i p i ] ) ˆq i p i (γ /(3 d )) exp { 43(/L) d/α d d γ +d/α T }. 8
11 In particular, taking we have so that ( ) 1 ( d) b { 1,+1} ( d) = (L/) ( 1/α 43(/L) d/α d d T ) 1 (d+α), ( (43(/L) γ = Θ d/α d d T ) ) α (d+α), ( d) E 1 d( d ) ˆq i p i = Ω ( ( d 43(/L) d/α d d T ) ) α (d+α). In particular, this iplies there exists soe b for which ( d) 1 E d( ) ˆq i p i = Ω ( ( d 43(/L) d/α d d T ) ) α (d+α). d Applying this lower bound to the estiator ˆp i defined above yields the result. In the extree case of allowing arbitrary dependence on the data saples, we erely recover the known results lower bounding the risk of density estiation fro i.i.d. saples fro a sooth density, as indicated by the following result. Theore 3. For any integer d 1, there exists an instance space X, a concept space C of VC diension d, a distribution D over X, and a distribution π 0 over C such that, for Π Θ the set of distributions over C with (L, α)-hölder sooth density functions with respect to π 0, any sequence of estiators, ˆθ T = ˆθ T (Z 1 (θ ),..., Z T (θ )) (T = 1,,...), has sup E [ πˆθt θ Θ π θ ] = Ω (T α d+α The proof is a siple reduction fro the proble of estiating π θ based on direct access to h 1θ,..., h T θ, which is essentially equivalent to the standard odel of density estiation, and indeed the lower bound in Theore 3 is a well-known result for density estiation fro T i.i.d. saples fro a Hölder sooth density in a d-diensional space [?, see e.g.,]]devroye:01. 5 Future Directions There are several interesting questions that reain open at this tie. Can either the lower bound or upper bound be iproved in general? If, instead of d saples per task, we instead use d saples, how does the iniax risk vary with? Related to this, what is the optial value of to optiize the rate of convergence as a function of T, the total nuber of saples? More generally, if an estiator is peritted to use N total saples, taken fro however any tasks it wishes, what is the optial rate of convergence as a function of N? 9 ).
12 References Bar-Yossef, Z. (003). Sapling lower bounds via inforation theory. Proceedings of the 35th Annual ACM Syposiu on the Theory of Coputing (pp ). Baxter, J. (1997). A bayesian/inforation theoretic odel of learning to learn via ultiple task sapling. Machine Learning, 8, Bluer, A., Ehrenfeucht, A., Haussler, D., & Waruth, M. (1989). Learnability and the Vapnik- Chervonenkis diension. Journal of the Association for Coputing Machinery, 36, Devroye, L., & Lugosi, G. (001). Cobinatorial ethods in density estiation. New York, NY, USA: Springer. Poland, J., & Hutter, M. (006). MDL convergence speed for Bernoulli sequences. Statistics and Coputing, 16, Schervish, M. J. (1995). Theory of statistics. New York, NY, USA: Springer. Vapnik, V. (198). Estiation of dependencies based on epirical data. Springer-Verlag, New York. Wald, A. (1945). Sequential tests of statistical hypotheses. The Annals of Matheatical Statistics, 16, Yang, L., Hanneke, S., & Carbonell, J. (011). Identifiability of priors fro bounded saple sizes with applications to transfer learning. 4 th Annual Conference on Learning Theory. Yatracos, Y. G. (1985). Rates of convergence of iniu distance estiators and Kologorov s entropy. The Annals of Statistics, 13,
Computational and Statistical Learning Theory
Coputational and Statistical Learning Theory Proble sets 5 and 6 Due: Noveber th Please send your solutions to learning-subissions@ttic.edu Notations/Definitions Recall the definition of saple based Radeacher
More informationCS Lecture 13. More Maximum Likelihood
CS 6347 Lecture 13 More Maxiu Likelihood Recap Last tie: Introduction to axiu likelihood estiation MLE for Bayesian networks Optial CPTs correspond to epirical counts Today: MLE for CRFs 2 Maxiu Likelihood
More informationComputable Shell Decomposition Bounds
Coputable Shell Decoposition Bounds John Langford TTI-Chicago jcl@cs.cu.edu David McAllester TTI-Chicago dac@autoreason.co Editor: Leslie Pack Kaelbling and David Cohn Abstract Haussler, Kearns, Seung
More informationE0 370 Statistical Learning Theory Lecture 6 (Aug 30, 2011) Margin Analysis
E0 370 tatistical Learning Theory Lecture 6 (Aug 30, 20) Margin Analysis Lecturer: hivani Agarwal cribe: Narasihan R Introduction In the last few lectures we have seen how to obtain high confidence bounds
More information1 Rademacher Complexity Bounds
COS 511: Theoretical Machine Learning Lecturer: Rob Schapire Lecture #10 Scribe: Max Goer March 07, 2013 1 Radeacher Coplexity Bounds Recall the following theore fro last lecture: Theore 1. With probability
More informationComputable Shell Decomposition Bounds
Journal of Machine Learning Research 5 (2004) 529-547 Subitted 1/03; Revised 8/03; Published 5/04 Coputable Shell Decoposition Bounds John Langford David McAllester Toyota Technology Institute at Chicago
More informationarxiv: v1 [cs.ds] 3 Feb 2014
arxiv:40.043v [cs.ds] 3 Feb 04 A Bound on the Expected Optiality of Rando Feasible Solutions to Cobinatorial Optiization Probles Evan A. Sultani The Johns Hopins University APL evan@sultani.co http://www.sultani.co/
More informationCourse Notes for EE227C (Spring 2018): Convex Optimization and Approximation
Course Notes for EE227C (Spring 2018): Convex Optiization and Approxiation Instructor: Moritz Hardt Eail: hardt+ee227c@berkeley.edu Graduate Instructor: Max Sichowitz Eail: sichow+ee227c@berkeley.edu October
More information1 Generalization bounds based on Rademacher complexity
COS 5: Theoretical Machine Learning Lecturer: Rob Schapire Lecture #0 Scribe: Suqi Liu March 07, 08 Last tie we started proving this very general result about how quickly the epirical average converges
More informationMachine Learning Basics: Estimators, Bias and Variance
Machine Learning Basics: Estiators, Bias and Variance Sargur N. srihari@cedar.buffalo.edu This is part of lecture slides on Deep Learning: http://www.cedar.buffalo.edu/~srihari/cse676 1 Topics in Basics
More information3.8 Three Types of Convergence
3.8 Three Types of Convergence 3.8 Three Types of Convergence 93 Suppose that we are given a sequence functions {f k } k N on a set X and another function f on X. What does it ean for f k to converge to
More informationSupport Vector Machine Classification of Uncertain and Imbalanced data using Robust Optimization
Recent Researches in Coputer Science Support Vector Machine Classification of Uncertain and Ibalanced data using Robust Optiization RAGHAV PAT, THEODORE B. TRAFALIS, KASH BARKER School of Industrial Engineering
More informationA Simple Regression Problem
A Siple Regression Proble R. M. Castro March 23, 2 In this brief note a siple regression proble will be introduced, illustrating clearly the bias-variance tradeoff. Let Y i f(x i ) + W i, i,..., n, where
More informationA Note on Scheduling Tall/Small Multiprocessor Tasks with Unit Processing Time to Minimize Maximum Tardiness
A Note on Scheduling Tall/Sall Multiprocessor Tasks with Unit Processing Tie to Miniize Maxiu Tardiness Philippe Baptiste and Baruch Schieber IBM T.J. Watson Research Center P.O. Box 218, Yorktown Heights,
More informationProbability Distributions
Probability Distributions In Chapter, we ephasized the central role played by probability theory in the solution of pattern recognition probles. We turn now to an exploration of soe particular exaples
More informationLearnability and Stability in the General Learning Setting
Learnability and Stability in the General Learning Setting Shai Shalev-Shwartz TTI-Chicago shai@tti-c.org Ohad Shair The Hebrew University ohadsh@cs.huji.ac.il Nathan Srebro TTI-Chicago nati@uchicago.edu
More informationUnderstanding Machine Learning Solution Manual
Understanding Machine Learning Solution Manual Written by Alon Gonen Edited by Dana Rubinstein Noveber 17, 2014 2 Gentle Start 1. Given S = ((x i, y i )), define the ultivariate polynoial p S (x) = i []:y
More informationA Better Algorithm For an Ancient Scheduling Problem. David R. Karger Steven J. Phillips Eric Torng. Department of Computer Science
A Better Algorith For an Ancient Scheduling Proble David R. Karger Steven J. Phillips Eric Torng Departent of Coputer Science Stanford University Stanford, CA 9435-4 Abstract One of the oldest and siplest
More informationNon-Parametric Non-Line-of-Sight Identification 1
Non-Paraetric Non-Line-of-Sight Identification Sinan Gezici, Hisashi Kobayashi and H. Vincent Poor Departent of Electrical Engineering School of Engineering and Applied Science Princeton University, Princeton,
More informationThe Weierstrass Approximation Theorem
36 The Weierstrass Approxiation Theore Recall that the fundaental idea underlying the construction of the real nubers is approxiation by the sipler rational nubers. Firstly, nubers are often deterined
More informationPattern Recognition and Machine Learning. Learning and Evaluation for Pattern Recognition
Pattern Recognition and Machine Learning Jaes L. Crowley ENSIMAG 3 - MMIS Fall Seester 2017 Lesson 1 4 October 2017 Outline Learning and Evaluation for Pattern Recognition Notation...2 1. The Pattern Recognition
More informationE0 370 Statistical Learning Theory Lecture 5 (Aug 25, 2011)
E0 370 Statistical Learning Theory Lecture 5 Aug 5, 0 Covering Nubers, Pseudo-Diension, and Fat-Shattering Diension Lecturer: Shivani Agarwal Scribe: Shivani Agarwal Introduction So far we have seen how
More informationIn this chapter, we consider several graph-theoretic and probabilistic models
THREE ONE GRAPH-THEORETIC AND STATISTICAL MODELS 3.1 INTRODUCTION In this chapter, we consider several graph-theoretic and probabilistic odels for a social network, which we do under different assuptions
More information13.2 Fully Polynomial Randomized Approximation Scheme for Permanent of Random 0-1 Matrices
CS71 Randoness & Coputation Spring 018 Instructor: Alistair Sinclair Lecture 13: February 7 Disclaier: These notes have not been subjected to the usual scrutiny accorded to foral publications. They ay
More informationCompression and Predictive Distributions for Large Alphabet i.i.d and Markov models
2014 IEEE International Syposiu on Inforation Theory Copression and Predictive Distributions for Large Alphabet i.i.d and Markov odels Xiao Yang Departent of Statistics Yale University New Haven, CT, 06511
More informationBlock designs and statistics
Bloc designs and statistics Notes for Math 447 May 3, 2011 The ain paraeters of a bloc design are nuber of varieties v, bloc size, nuber of blocs b. A design is built on a set of v eleents. Each eleent
More informationModel Fitting. CURM Background Material, Fall 2014 Dr. Doreen De Leon
Model Fitting CURM Background Material, Fall 014 Dr. Doreen De Leon 1 Introduction Given a set of data points, we often want to fit a selected odel or type to the data (e.g., we suspect an exponential
More informationNew upper bound for the B-spline basis condition number II. K. Scherer. Institut fur Angewandte Mathematik, Universitat Bonn, Bonn, Germany.
New upper bound for the B-spline basis condition nuber II. A proof of de Boor's 2 -conjecture K. Scherer Institut fur Angewandte Matheati, Universitat Bonn, 535 Bonn, Gerany and A. Yu. Shadrin Coputing
More informationRobustness and Regularization of Support Vector Machines
Robustness and Regularization of Support Vector Machines Huan Xu ECE, McGill University Montreal, QC, Canada xuhuan@ci.cgill.ca Constantine Caraanis ECE, The University of Texas at Austin Austin, TX, USA
More informationOn Poset Merging. 1 Introduction. Peter Chen Guoli Ding Steve Seiden. Keywords: Merging, Partial Order, Lower Bounds. AMS Classification: 68W40
On Poset Merging Peter Chen Guoli Ding Steve Seiden Abstract We consider the follow poset erging proble: Let X and Y be two subsets of a partially ordered set S. Given coplete inforation about the ordering
More informationComputational and Statistical Learning Theory
Coputational and Statistical Learning Theory TTIC 31120 Prof. Nati Srebro Lecture 2: PAC Learning and VC Theory I Fro Adversarial Online to Statistical Three reasons to ove fro worst-case deterinistic
More informationVC Dimension and Sauer s Lemma
CMSC 35900 (Spring 2008) Learning Theory Lecture: VC Diension and Sauer s Lea Instructors: Sha Kakade and Abuj Tewari Radeacher Averages and Growth Function Theore Let F be a class of ±-valued functions
More informatione-companion ONLY AVAILABLE IN ELECTRONIC FORM
OPERATIONS RESEARCH doi 10.1287/opre.1070.0427ec pp. ec1 ec5 e-copanion ONLY AVAILABLE IN ELECTRONIC FORM infors 07 INFORMS Electronic Copanion A Learning Approach for Interactive Marketing to a Custoer
More informationBipartite subgraphs and the smallest eigenvalue
Bipartite subgraphs and the sallest eigenvalue Noga Alon Benny Sudaov Abstract Two results dealing with the relation between the sallest eigenvalue of a graph and its bipartite subgraphs are obtained.
More informationA Smoothed Boosting Algorithm Using Probabilistic Output Codes
A Soothed Boosting Algorith Using Probabilistic Output Codes Rong Jin rongjin@cse.su.edu Dept. of Coputer Science and Engineering, Michigan State University, MI 48824, USA Jian Zhang jian.zhang@cs.cu.edu
More informationSupplementary to Learning Discriminative Bayesian Networks from High-dimensional Continuous Neuroimaging Data
Suppleentary to Learning Discriinative Bayesian Networks fro High-diensional Continuous Neuroiaging Data Luping Zhou, Lei Wang, Lingqiao Liu, Philip Ogunbona, and Dinggang Shen Proposition. Given a sparse
More informationFoundations of Machine Learning Boosting. Mehryar Mohri Courant Institute and Google Research
Foundations of Machine Learning Boosting Mehryar Mohri Courant Institute and Google Research ohri@cis.nyu.edu Weak Learning Definition: concept class C is weakly PAC-learnable if there exists a (weak)
More informationFairness via priority scheduling
Fairness via priority scheduling Veeraruna Kavitha, N Heachandra and Debayan Das IEOR, IIT Bobay, Mubai, 400076, India vavitha,nh,debayan}@iitbacin Abstract In the context of ulti-agent resource allocation
More informationSupplement to: Subsampling Methods for Persistent Homology
Suppleent to: Subsapling Methods for Persistent Hoology A. Technical results In this section, we present soe technical results that will be used to prove the ain theores. First, we expand the notation
More informationThis article appeared in a journal published by Elsevier. The attached copy is furnished to the author for internal non-commercial research and
This article appeared in a ournal published by Elsevier. The attached copy is furnished to the author for internal non-coercial research and education use, including for instruction at the authors institution
More informationExperimental Design For Model Discrimination And Precise Parameter Estimation In WDS Analysis
City University of New York (CUNY) CUNY Acadeic Works International Conference on Hydroinforatics 8-1-2014 Experiental Design For Model Discriination And Precise Paraeter Estiation In WDS Analysis Giovanna
More informationPolygonal Designs: Existence and Construction
Polygonal Designs: Existence and Construction John Hegean Departent of Matheatics, Stanford University, Stanford, CA 9405 Jeff Langford Departent of Matheatics, Drake University, Des Moines, IA 5011 G
More informationSolutions of some selected problems of Homework 4
Solutions of soe selected probles of Hoework 4 Sangchul Lee May 7, 2018 Proble 1 Let there be light A professor has two light bulbs in his garage. When both are burned out, they are replaced, and the next
More informationUpper bound on false alarm rate for landmine detection and classification using syntactic pattern recognition
Upper bound on false alar rate for landine detection and classification using syntactic pattern recognition Ahed O. Nasif, Brian L. Mark, Kenneth J. Hintz, and Nathalia Peixoto Dept. of Electrical and
More informationIntelligent Systems: Reasoning and Recognition. Perceptrons and Support Vector Machines
Intelligent Systes: Reasoning and Recognition Jaes L. Crowley osig 1 Winter Seester 2018 Lesson 6 27 February 2018 Outline Perceptrons and Support Vector achines Notation...2 Linear odels...3 Lines, Planes
More informationImproved Guarantees for Agnostic Learning of Disjunctions
Iproved Guarantees for Agnostic Learning of Disjunctions Pranjal Awasthi Carnegie Mellon University pawasthi@cs.cu.edu Avri Blu Carnegie Mellon University avri@cs.cu.edu Or Sheffet Carnegie Mellon University
More informationUsing EM To Estimate A Probablity Density With A Mixture Of Gaussians
Using EM To Estiate A Probablity Density With A Mixture Of Gaussians Aaron A. D Souza adsouza@usc.edu Introduction The proble we are trying to address in this note is siple. Given a set of data points
More informationTesting Properties of Collections of Distributions
Testing Properties of Collections of Distributions Reut Levi Dana Ron Ronitt Rubinfeld April 9, 0 Abstract We propose a fraework for studying property testing of collections of distributions, where the
More informationLower Bounds for Quantized Matrix Completion
Lower Bounds for Quantized Matrix Copletion Mary Wootters and Yaniv Plan Departent of Matheatics University of Michigan Ann Arbor, MI Eail: wootters, yplan}@uich.edu Mark A. Davenport School of Elec. &
More information1 Identical Parallel Machines
FB3: Matheatik/Inforatik Dr. Syaantak Das Winter 2017/18 Optiizing under Uncertainty Lecture Notes 3: Scheduling to Miniize Makespan In any standard scheduling proble, we are given a set of jobs J = {j
More informationA := A i : {A i } S. is an algebra. The same object is obtained when the union in required to be disjoint.
59 6. ABSTRACT MEASURE THEORY Having developed the Lebesgue integral with respect to the general easures, we now have a general concept with few specific exaples to actually test it on. Indeed, so far
More informationEstimating Parameters for a Gaussian pdf
Pattern Recognition and achine Learning Jaes L. Crowley ENSIAG 3 IS First Seester 00/0 Lesson 5 7 Noveber 00 Contents Estiating Paraeters for a Gaussian pdf Notation... The Pattern Recognition Proble...3
More informationSupport Vector Machines. Goals for the lecture
Support Vector Machines Mark Craven and David Page Coputer Sciences 760 Spring 2018 www.biostat.wisc.edu/~craven/cs760/ Soe of the slides in these lectures have been adapted/borrowed fro aterials developed
More information1 Bounding the Margin
COS 511: Theoretical Machine Learning Lecturer: Rob Schapire Lecture #12 Scribe: Jian Min Si March 14, 2013 1 Bounding the Margin We are continuing the proof of a bound on the generalization error of AdaBoost
More informationThe Simplex Method is Strongly Polynomial for the Markov Decision Problem with a Fixed Discount Rate
The Siplex Method is Strongly Polynoial for the Markov Decision Proble with a Fixed Discount Rate Yinyu Ye April 20, 2010 Abstract In this note we prove that the classic siplex ethod with the ost-negativereduced-cost
More informationNew Bounds for Learning Intervals with Implications for Semi-Supervised Learning
JMLR: Workshop and Conference Proceedings vol (1) 1 15 New Bounds for Learning Intervals with Iplications for Sei-Supervised Learning David P. Helbold dph@soe.ucsc.edu Departent of Coputer Science, University
More informationQuantum algorithms (CO 781, Winter 2008) Prof. Andrew Childs, University of Waterloo LECTURE 15: Unstructured search and spatial search
Quantu algoriths (CO 781, Winter 2008) Prof Andrew Childs, University of Waterloo LECTURE 15: Unstructured search and spatial search ow we begin to discuss applications of quantu walks to search algoriths
More informationPrediction by random-walk perturbation
Prediction by rando-walk perturbation Luc Devroye School of Coputer Science McGill University Gábor Lugosi ICREA and Departent of Econoics Universitat Popeu Fabra lucdevroye@gail.co gabor.lugosi@gail.co
More informationThe proofs of Theorem 1-3 are along the lines of Wied and Galeano (2013).
A Appendix: Proofs The proofs of Theore 1-3 are along the lines of Wied and Galeano (2013) Proof of Theore 1 Let D[d 1, d 2 ] be the space of càdlàg functions on the interval [d 1, d 2 ] equipped with
More informationEstimating Entropy and Entropy Norm on Data Streams
Estiating Entropy and Entropy Nor on Data Streas Ait Chakrabarti 1, Khanh Do Ba 1, and S. Muthukrishnan 2 1 Departent of Coputer Science, Dartouth College, Hanover, NH 03755, USA 2 Departent of Coputer
More informationChaotic Coupled Map Lattices
Chaotic Coupled Map Lattices Author: Dustin Keys Advisors: Dr. Robert Indik, Dr. Kevin Lin 1 Introduction When a syste of chaotic aps is coupled in a way that allows the to share inforation about each
More informationlecture 36: Linear Multistep Mehods: Zero Stability
95 lecture 36: Linear Multistep Mehods: Zero Stability 5.6 Linear ultistep ethods: zero stability Does consistency iply convergence for linear ultistep ethods? This is always the case for one-step ethods,
More informationRademacher Complexity Margin Bounds for Learning with a Large Number of Classes
Radeacher Coplexity Margin Bounds for Learning with a Large Nuber of Classes Vitaly Kuznetsov Courant Institute of Matheatical Sciences, 25 Mercer street, New York, NY, 002 Mehryar Mohri Courant Institute
More informationLecture October 23. Scribes: Ruixin Qiang and Alana Shine
CSCI699: Topics in Learning and Gae Theory Lecture October 23 Lecturer: Ilias Scribes: Ruixin Qiang and Alana Shine Today s topic is auction with saples. 1 Introduction to auctions Definition 1. In a single
More informationCombining Classifiers
Cobining Classifiers Generic ethods of generating and cobining ultiple classifiers Bagging Boosting References: Duda, Hart & Stork, pg 475-480. Hastie, Tibsharini, Friedan, pg 246-256 and Chapter 10. http://www.boosting.org/
More informationCourse Notes for EE227C (Spring 2018): Convex Optimization and Approximation
Course Notes for EE7C (Spring 018: Convex Optiization and Approxiation Instructor: Moritz Hardt Eail: hardt+ee7c@berkeley.edu Graduate Instructor: Max Sichowitz Eail: sichow+ee7c@berkeley.edu October 15,
More informationPAC-Bayes Analysis Of Maximum Entropy Learning
PAC-Bayes Analysis Of Maxiu Entropy Learning John Shawe-Taylor and David R. Hardoon Centre for Coputational Statistics and Machine Learning Departent of Coputer Science University College London, UK, WC1E
More informationOn the Inapproximability of Vertex Cover on k-partite k-uniform Hypergraphs
On the Inapproxiability of Vertex Cover on k-partite k-unifor Hypergraphs Venkatesan Guruswai and Rishi Saket Coputer Science Departent Carnegie Mellon University Pittsburgh, PA 1513. Abstract. Coputing
More informationThe Frequent Paucity of Trivial Strings
The Frequent Paucity of Trivial Strings Jack H. Lutz Departent of Coputer Science Iowa State University Aes, IA 50011, USA lutz@cs.iastate.edu Abstract A 1976 theore of Chaitin can be used to show that
More informationOn the Communication Complexity of Lipschitzian Optimization for the Coordinated Model of Computation
journal of coplexity 6, 459473 (2000) doi:0.006jco.2000.0544, available online at http:www.idealibrary.co on On the Counication Coplexity of Lipschitzian Optiization for the Coordinated Model of Coputation
More informationAlgorithms for parallel processor scheduling with distinct due windows and unit-time jobs
BULLETIN OF THE POLISH ACADEMY OF SCIENCES TECHNICAL SCIENCES Vol. 57, No. 3, 2009 Algoriths for parallel processor scheduling with distinct due windows and unit-tie obs A. JANIAK 1, W.A. JANIAK 2, and
More informationOn Conditions for Linearity of Optimal Estimation
On Conditions for Linearity of Optial Estiation Erah Akyol, Kuar Viswanatha and Kenneth Rose {eakyol, kuar, rose}@ece.ucsb.edu Departent of Electrical and Coputer Engineering University of California at
More informationThis model assumes that the probability of a gap has size i is proportional to 1/i. i.e., i log m e. j=1. E[gap size] = i P r(i) = N f t.
CS 493: Algoriths for Massive Data Sets Feb 2, 2002 Local Models, Bloo Filter Scribe: Qin Lv Local Models In global odels, every inverted file entry is copressed with the sae odel. This work wells when
More informationCurious Bounds for Floor Function Sums
1 47 6 11 Journal of Integer Sequences, Vol. 1 (018), Article 18.1.8 Curious Bounds for Floor Function Sus Thotsaporn Thanatipanonda and Elaine Wong 1 Science Division Mahidol University International
More informationAlgorithmic Stability and Sanity-Check Bounds for Leave-One-Out Cross-Validation
Algorithic Stability and Sanity-Check Bounds for Leave-One-Out Cross-Validation Michael Kearns AT&T Labs Research Murray Hill, New Jersey kearns@research.att.co Dana Ron MIT Cabridge, MA danar@theory.lcs.it.edu
More informationEstimating Average-Case Learning Curves Using Bayesian, Statistical Physics and VC Dimension Methods
Estiating Average-Case Learning Curves Using Bayesian, Statistical Physics and VC Diension Methods David Haussler University of California Santa Cruz, California Manfred Opper Institut fur Theoretische
More informationFeature Extraction Techniques
Feature Extraction Techniques Unsupervised Learning II Feature Extraction Unsupervised ethods can also be used to find features which can be useful for categorization. There are unsupervised ethods that
More informationA Note on the Applied Use of MDL Approximations
A Note on the Applied Use of MDL Approxiations Daniel J. Navarro Departent of Psychology Ohio State University Abstract An applied proble is discussed in which two nested psychological odels of retention
More informationA note on the multiplication of sparse matrices
Cent. Eur. J. Cop. Sci. 41) 2014 1-11 DOI: 10.2478/s13537-014-0201-x Central European Journal of Coputer Science A note on the ultiplication of sparse atrices Research Article Keivan Borna 12, Sohrab Aboozarkhani
More informationMath 262A Lecture Notes - Nechiporuk s Theorem
Math 6A Lecture Notes - Nechiporuk s Theore Lecturer: Sa Buss Scribe: Stefan Schneider October, 013 Nechiporuk [1] gives a ethod to derive lower bounds on forula size over the full binary basis B The lower
More informationMASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.436J/15.085J Fall 2008 Lecture 11 10/15/2008 ABSTRACT INTEGRATION I
MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.436J/15.085J Fall 2008 Lecture 11 10/15/2008 ABSTRACT INTEGRATION I Contents 1. Preliinaries 2. The ain result 3. The Rieann integral 4. The integral of a nonnegative
More informationLecture 21. Interior Point Methods Setup and Algorithm
Lecture 21 Interior Point Methods In 1984, Kararkar introduced a new weakly polynoial tie algorith for solving LPs [Kar84a], [Kar84b]. His algorith was theoretically faster than the ellipsoid ethod and
More informationRandomized Recovery for Boolean Compressed Sensing
Randoized Recovery for Boolean Copressed Sensing Mitra Fatei and Martin Vetterli Laboratory of Audiovisual Counication École Polytechnique Fédéral de Lausanne (EPFL) Eail: {itra.fatei, artin.vetterli}@epfl.ch
More informationChapter 6 1-D Continuous Groups
Chapter 6 1-D Continuous Groups Continuous groups consist of group eleents labelled by one or ore continuous variables, say a 1, a 2,, a r, where each variable has a well- defined range. This chapter explores:
More informationASSUME a source over an alphabet size m, from which a sequence of n independent samples are drawn. The classical
IEEE TRANSACTIONS ON INFORMATION THEORY Large Alphabet Source Coding using Independent Coponent Analysis Aichai Painsky, Meber, IEEE, Saharon Rosset and Meir Feder, Fellow, IEEE arxiv:67.7v [cs.it] Jul
More informationDistributed Subgradient Methods for Multi-agent Optimization
1 Distributed Subgradient Methods for Multi-agent Optiization Angelia Nedić and Asuan Ozdaglar October 29, 2007 Abstract We study a distributed coputation odel for optiizing a su of convex objective functions
More informationCOS 424: Interacting with Data. Written Exercises
COS 424: Interacting with Data Hoework #4 Spring 2007 Regression Due: Wednesday, April 18 Written Exercises See the course website for iportant inforation about collaboration and late policies, as well
More information1 Proof of learning bounds
COS 511: Theoretical Machine Learning Lecturer: Rob Schapire Lecture #4 Scribe: Akshay Mittal February 13, 2013 1 Proof of learning bounds For intuition of the following theore, suppose there exists a
More informationOptimal Jamming Over Additive Noise: Vector Source-Channel Case
Fifty-first Annual Allerton Conference Allerton House, UIUC, Illinois, USA October 2-3, 2013 Optial Jaing Over Additive Noise: Vector Source-Channel Case Erah Akyol and Kenneth Rose Abstract This paper
More informationTight Information-Theoretic Lower Bounds for Welfare Maximization in Combinatorial Auctions
Tight Inforation-Theoretic Lower Bounds for Welfare Maxiization in Cobinatorial Auctions Vahab Mirrokni Jan Vondrák Theory Group, Microsoft Dept of Matheatics Research Princeton University Redond, WA 9805
More informationTight Bounds for Maximal Identifiability of Failure Nodes in Boolean Network Tomography
Tight Bounds for axial Identifiability of Failure Nodes in Boolean Network Toography Nicola Galesi Sapienza Università di Roa nicola.galesi@uniroa1.it Fariba Ranjbar Sapienza Università di Roa fariba.ranjbar@uniroa1.it
More informationInteractive Markov Models of Evolutionary Algorithms
Cleveland State University EngagedScholarship@CSU Electrical Engineering & Coputer Science Faculty Publications Electrical Engineering & Coputer Science Departent 2015 Interactive Markov Models of Evolutionary
More informationStochastic Subgradient Methods
Stochastic Subgradient Methods Lingjie Weng Yutian Chen Bren School of Inforation and Coputer Science University of California, Irvine {wengl, yutianc}@ics.uci.edu Abstract Stochastic subgradient ethods
More informationMULTIPLAYER ROCK-PAPER-SCISSORS
MULTIPLAYER ROCK-PAPER-SCISSORS CHARLOTTE ATEN Contents 1. Introduction 1 2. RPS Magas 3 3. Ites as a Function of Players and Vice Versa 5 4. Algebraic Properties of RPS Magas 6 References 6 1. Introduction
More informationMetric Entropy of Convex Hulls
Metric Entropy of Convex Hulls Fuchang Gao University of Idaho Abstract Let T be a precopact subset of a Hilbert space. The etric entropy of the convex hull of T is estiated in ters of the etric entropy
More informationNew Slack-Monotonic Schedulability Analysis of Real-Time Tasks on Multiprocessors
New Slack-Monotonic Schedulability Analysis of Real-Tie Tasks on Multiprocessors Risat Mahud Pathan and Jan Jonsson Chalers University of Technology SE-41 96, Göteborg, Sweden {risat, janjo}@chalers.se
More informationPattern Recognition and Machine Learning. Artificial Neural networks
Pattern Recognition and Machine Learning Jaes L. Crowley ENSIMAG 3 - MMIS Fall Seester 2017 Lessons 7 20 Dec 2017 Outline Artificial Neural networks Notation...2 Introduction...3 Key Equations... 3 Artificial
More informationSharp Time Data Tradeoffs for Linear Inverse Problems
Sharp Tie Data Tradeoffs for Linear Inverse Probles Saet Oyak Benjain Recht Mahdi Soltanolkotabi January 016 Abstract In this paper we characterize sharp tie-data tradeoffs for optiization probles used
More informationFinite fields. and we ve used it in various examples and homework problems. In these notes I will introduce more finite fields
Finite fields I talked in class about the field with two eleents F 2 = {, } and we ve used it in various eaples and hoework probles. In these notes I will introduce ore finite fields F p = {,,...,p } for
More informationTEST OF HOMOGENEITY OF PARALLEL SAMPLES FROM LOGNORMAL POPULATIONS WITH UNEQUAL VARIANCES
TEST OF HOMOGENEITY OF PARALLEL SAMPLES FROM LOGNORMAL POPULATIONS WITH UNEQUAL VARIANCES S. E. Ahed, R. J. Tokins and A. I. Volodin Departent of Matheatics and Statistics University of Regina Regina,
More information