Computational and Statistical Learning Theory
|
|
- Hester Dortha Brooks
- 5 years ago
- Views:
Transcription
1 Coputational and Statistical Learning Theory Proble sets 5 and 6 Due: Noveber th Please send your solutions to learning-subissions@ttic.edu Notations/Definitions Recall the definition of saple based Radeacher coplexity : [ ] R S (F) := E ɛ {±} n sup ɛ i f(x i ) Definition. Given a saple S = {x,..., x }, and any > 0, a set V R is said to be an -cover (in l p ) of function class F on saple S if ( ) /p f F, v V s.t. f(x i ) v i p Specifically for p = ( f(x i) v i p) /p is replaced by axi [] f(x i ) v i. Also define N p (F,, S) := in{ V : V is an -cover of (in l p ) off on saple S} and N p (F,, n) := sup N p (F,, {x,..., x }) x,...,x Definition 2. A function F is said to -shatter a saple S = {x,..., x } if there exists a sequence of thresholds, θ,..., θ R such that ξ {±}, f F s.t. i [], ξ i (f(x i ) s i )
2 Probles. VC Lea for Real-valued Function classes : We shall prove that for any function class F (assue functions in F are bounded by ) and scale > 0, the l covering nuber at scale can be bounded using fat shattering diension at that scale by proving a stateent analogous to VC lea. We shall proceed by first extending the stateent to finite (specifically {0,..., k}) valued function classes and then using this to prove the final bound of for N (F,, n) fat /2 ( n i ) ( (a) Let F k {0,..., k} X be a function class with fat (F k ) = d, show that N (F k, /2, ) d ) i ( ) k i i Show the above stateent using induction on n + d (very siilar to first proble on Assignent 2). Hint : In Assignent 2 proble where we used H + S and H S use instead, for all i {0,..., k}, F i = {f F : f(x ) = i} (note that this is a siple ultilable extension and for k =, F 0, F are identical to H +, H ). Use the notion of -shattering instead of shattering for the VC case and use /2-cover instead of growth function. (b) Using the idea of -discretizing the output of function class F we shall conclude the required stateent. Do the following : i. Create a {0,..., k}-valued class G where k is of order /. Show that covering G at scale /2 iplies we can cover F at scale and hence conclude that we can bound N (F,, ) in ters of covering nuber at scale /2 for G. ii. Show that fat (G) fat /2 (F) iii. Cobine with the bound on N (G, /2, ) fro previous sub-proble and conclude that N (F,, n) fat /2 ( n i ) ( 2. Dudley Vs Pollards Bounds : In class we saw that Radeacher coplexity of a function class F (assue functions in F are bounded by ) can be bounded in ters of covering nubers using Pollard s bound, Dudley integral bound and the slightly odified version of Dudley integral bound as follows ) i 2
3 : R S (F) inf + 0 { R S (F) inf 0 { } { } 2 log N (F,, ) 2 log N2 (F,, ) inf + 0 } log N2 (F, τ, ) dτ (Pollard) (Refined Dudley) In this proble using soe exaples we shall copare these bounds. (a) Class with finite VC subgraph-diension : Assue that the VC subgraph-diension of function class F is bounded by D. In this case result in proble can be used to bound the covering nuber of F in ters of D. Use this bound on covering nuber and copare Pollard s bound with refined Dudley integral bound by writing down the bounds iplied by each one. (b) Linear class with bounded nor : Linear classes in high diensional spaces is probably one of the ost iportant and ost used function class in achine learning. Consider the specific exaple where X = {x : x 2 } and F = {x w x : w 2 } In class we saw that for any ɛ > 0, fat ɛ (F). Using this with the result in proble ɛ 2 we have that : ( en ) ɛ N 2 (F,, ) N (F,, ) 2 ɛ Use the above bound on the covering nuber and write down the bound on Radeacher coplexity iplied by Pollard s bound. Write down the bound on Radeacher coplexity iplied by the refined version of the Dudley integral bound. 3. Data Dependent Bound : Recall the Radeacher coplexity bound we proved in class for functions F bounded by. For any δ > 0 with probability at least δ, ( ) [ sup E [f(x)] ÊS[f(x)] 2E S D RS (F)] + log(/δ) Note that we don t know the distribution D. One way we used the above bound was by providing [ upper bounds on R S (F) for any saple of size and using this instead of E S D RS (F)]. But ideally we would like to get tight bounds when the distribution we are faced with is nicer. The ai of this proble is to do this. Prove that, for any δ > 0 with probability at least δ, over draw of saple S D, ( ) sup E [f(x)] ÊS[f(x)] 2 R log(2/δ) S (F) + K 3
4 (provide explicit value of constant K above). Notice that in the above bound the expected Radeacher coplexity is replaced by saple based one which can be calculated fro the training saple. Hint : Use McDiarid s inequality on the expected Radeacher coplexity. 4. Learnability and Fat-shattering Diension: Recall the setting of stochastic optiization proble where objective function is apping r : H Z R. Saple S = {z,..., z } drawn iid fro unknown distribution D is provided to the learner and the ai of the learner is to output ĥ H based on saple that has low expected objective E [r(h, z)]. (a) Consider the stochastic optiization proble with r bounded by a, i.e. r(h, z) < a < for all h H and z Z. If function class F := {z r(h, z) h H} has finite fat for all > 0, then show that the proble is learnable. (b) Conclude that for a supervised learning proble with bounded hypothesis class H (ie. x X, h(x) < a), and loss φ : Ŷ Y R that is L-Lipschtiz (in first arguent), if H has finite fat for all > 0, then the proble is learnable. (c) Show a stochastic optiization proble that is learnable even though it has infinite fat for all 0. (or any other cosntant of your choice). Explicitly write down the hypothesis class, and the learning rule which learns the class, argue that the proble is learnable, and explain why the fat is infinite. Hint : You can ake the learning rule that is successful to even be ERM. (d) Prove that for a supervised learning proble with the absolute loss φ(ŷ, y) = ŷ y, if the fat is infinite for soe > 0, then the proble is not learnable. Hint: as with the binary case, for every, construct a distribution which is concentrated on a set of points that can be fat-shattered. 5. Massart s Lea: In this proble, we want to prove Massart s lea, i.e. for finite F R the following inequality holds: 2 log F R (F) a () where a = sup f 2. (a) Use Jensen s inequality and union bound to prove that for any λ > 0 e λr [ E ɛ {±} e λɛ i f i ] (2) 4
5 (b) Find an upper bound for the expected value in the above inequality using the fact that e x + e x 2e x2 /2. (c) Show that for any λ > 0 R (F) (d) Find λ that iniize the above bound. log F λ + λa2 2 (3) 6. Refined Dudley integral bound on the Radeacher coplexity: Given a sequence of saples S = (x,..., x ) we shall prove that for any function class F containing functions f : X R { } sup Ê[f 2 ] N2 (F, τ, S) R S (F) inf dτ (4) >0 where Ê[f 2 ] = f 2 (x i ). (a) Let β 0 = sup Ê[f 2 ] and for any j Z + let β j = β 0 /2 j and C j be β j -cover (in l 2 ) of function class F on saple S. For each f F we pick an ˆf j C j such that Ê[(f ˆfj ) 2 ] β j. The ain idea of the proof is that for any N > 0 we can express f by chaining as f = f ˆf N + N ( ˆfj ˆf ) j j= where ˆf 0 = 0. Show that under the sae conditions stated above, for any N > 0 the following inequality holds: N R S (F) β N + R S (G j ). (6) where G j = { ˆf ˆf ˆf T j, ˆf T j } (b) Show that G j N 2 (F, β j, S). (c) Use Massart s lea to find the following upper bound for Radeacher coplexity of class G j : log N2 (F, β j, S) R S (G j ) 5β j (7) (d) Show that the following inequality holds: j= (5) R S (F) β N + 0 N log N2 (F, β j, S) (β j β j ) j= (8) 5
6 (e) Prove the following bound for any j > 0: log N2 (F, β j, S) (β j β j ) βj β j Find the upper bound for the su ter as a single integral. log N 2(F, τ, S) dτ (9) (f) Show that for any > 0 we can choose N such that sup Ê[f 2 ] log N2 (F, τ, S) R S (F) dτ (0) Conclude that the Dudley integral bound holds. 7. A general bound for all nors: In class we studied the hypothesis class H B = {w w B} and the corresponding rule ERM B, and showed that for every B, w.h.p over S, for all w H B we had ( ) B2 + log(/δ) ˆL(w) L(w) O () We then used this to get a bound on L {0,} in ters of ˆL γ, for each γ separately. We will now change the order of quantifiers, and get a bound that holds, w.h.p., over all γ concurrently, with only a very ild (double logarithic) additional ter. (a) Prove that, for H [ a, a] X, for any D(X, Y ), w.h.p. over S D, for all h H and all γ > 0: ( ) L {0,} (h) ˆL R (H) log(log(a/γ)) + log(/δ) γ (h) + O + (2) γ Rewrite the above bound with proper constants and without big-o notation. (b) In particular, linear separators with geoetric argin γ correspond to using H = {w w }, in which case show that we have, w.h.p. over S D., for all w and γ > 0: ( ) L {0,} (w) ˆL /γ2 + log(/δ)) γ (h) + O (3) Rewrite the above bound with proper constants and without big-o notation. Challenge Probles. We saw that for any distribution D, the expected Radeacher coplexity provided an upper bound on the axiu deviation between ean and average uniforly over function class, specifically we saw that E S D [ sup ( ) ] [ E [f(x)] Ê [f(x)] 2E S D RS (F)] 6
7 Prove the (alost) converse that [ ] 2 E S D RS (F) E S D [ sup (E [f(x)] Ê [f(x)] ) ] This basically establishes that Radeacher coplexity tightly bounds the unifor axial deviation for every distribution. 2. The worst case Radeacher coplexity is defined as R (F) = (ie. supreu over saples of size ). sup S={x,...,x } R S (F) (a) Prove that for any function class F and any τ > R (F), we have that fat τ (F) R (F) 2 (b) Cobine the above with the refined version of Dudley integral bound to prove that { } N2 (F, τ, ) inf dτ 0 R (F) O(log 3/2 ) This shows that the refined dudley integral bound is tight to within log factors of the Radeacher coplexity. Thus we have established that in the worst case all the coplexity easures for function class like Radeacher coplexity, covering nubers and fat shattering diension all tightly govern the rate of unifor axial deviation for the function class (all to within log factor). 3. Bounded Difference Inequality, Stability and Generalization : Recall that a function G : X R is said to satisfy the bounded difference inequality if for all i [] and all x,..., x, x i X, G(x,..., x i,..., x ) G(x,..., x i, x i, x i+,..., x ) c for soe c 0. In this case the McDiarid s inequality gave us that for any δ > 0, with probability at least δ, (c)2 log(/δ) G(x,..., x ) E [G(x,..., x )] + The bounded difference property turns out to be quiet useful to analyze learning algoriths directly (instead of looking at the unifor deviation over function class). τ 2 7
8 A proper learning algorith is A : = X F is said to be a uniforly β stable is for all i [], and any x,..., x, x i X, sup A(x,..., x i,..., x )(x) A(x,..., x i, x i, x i+,..., x )(x) β x Assuing functions in F are bounded by we shall prove that the learning algorith generalizes well (expected loss is close to epirical loss of the algorith). Specifically we shall prove that for any δ > 0, with probability at least δ, L(A(S)) L(A(S)) 2 log(/δ) + β + 2(β + ) where L(A(S)) = E x [A(S)(x)] and L(A(S)) = A(S)(x i). (a) First show that E S [L(A(S)) L(A(S)) ] β. Hint : Use renaing of variables to first show that for any i [], E S [L(A(S))] = E S,x i [A(x,..., x i, x i, x i+,..., x )(x i )] (b) Show that the function G(S) = L(A(S)) L(A(S)) satisfies bounded difference property with c 2β + 2. Conclude the required stateent using McDiarid s inequality. (c) Consider the stochastic convex optiization proble where saple z = (x, y) where y is real valued and x s are fro the unit ball in soe Hilbert space and hypothesis is weight vectors w fro the sae Hilbert space with objective r(w, (x, y)) = w, x y + λ w 2 Show that the ERM algorith is stable for this proble and thus provide a bound for this algorith. 4. L Neural Network : A k-layer -nor neural network is given by function class F k which is in turn defined recursively as follows. { } d F = x wj x j w B and further for each 2 i k, { } d i F i = x wj σ(f j (x)) j [d i], f j F i, w i B j= j= 8
9 where d i is the nuber of nodes in the ith layer of the network. Function σ : R [, ] is called the squash function and is generally a sooth onotonic non-decreasing function (typical exaple is the tanh function). Assue that input space X = [0, ] d and that σ is L-Lipschitz. Prove that ( k R S (F k ) 2B i ) L k 2T log d Notice that the above bound the d i s don t appear in the bound indicating the nuber of nodes in interediate layers don t affect the upper bound on Radeacher coplexity. Hint : prove bound on Radeacher coplexity of F i recursively in ters of radeacher coplexity of F i. 9
Computational and Statistical Learning theory Assignment 4
Coputatonal and Statstcal Learnng theory Assgnent 4 Due: March 2nd Eal solutons to : karthk at ttc dot edu Notatons/Defntons Recall the defnton of saple based Radeacher coplexty : [ ] R S F) := E ɛ {±}
More informationE0 370 Statistical Learning Theory Lecture 5 (Aug 25, 2011)
E0 370 Statistical Learning Theory Lecture 5 Aug 5, 0 Covering Nubers, Pseudo-Diension, and Fat-Shattering Diension Lecturer: Shivani Agarwal Scribe: Shivani Agarwal Introduction So far we have seen how
More informationE0 370 Statistical Learning Theory Lecture 6 (Aug 30, 2011) Margin Analysis
E0 370 tatistical Learning Theory Lecture 6 (Aug 30, 20) Margin Analysis Lecturer: hivani Agarwal cribe: Narasihan R Introduction In the last few lectures we have seen how to obtain high confidence bounds
More informationUnderstanding Machine Learning Solution Manual
Understanding Machine Learning Solution Manual Written by Alon Gonen Edited by Dana Rubinstein Noveber 17, 2014 2 Gentle Start 1. Given S = ((x i, y i )), define the ultivariate polynoial p S (x) = i []:y
More information1 Rademacher Complexity Bounds
COS 511: Theoretical Machine Learning Lecturer: Rob Schapire Lecture #10 Scribe: Max Goer March 07, 2013 1 Radeacher Coplexity Bounds Recall the following theore fro last lecture: Theore 1. With probability
More informationComputational and Statistical Learning Theory
Coputational and Statistical Learning Theory TTIC 31120 Prof. Nati Srebro Lecture 2: PAC Learning and VC Theory I Fro Adversarial Online to Statistical Three reasons to ove fro worst-case deterinistic
More information1 Generalization bounds based on Rademacher complexity
COS 5: Theoretical Machine Learning Lecturer: Rob Schapire Lecture #0 Scribe: Suqi Liu March 07, 08 Last tie we started proving this very general result about how quickly the epirical average converges
More informationSymmetrization and Rademacher Averages
Stat 928: Statistical Learning Theory Lecture: Syetrization and Radeacher Averages Instructor: Sha Kakade Radeacher Averages Recall that we are interested in bounding the difference between epirical and
More informationCourse Notes for EE227C (Spring 2018): Convex Optimization and Approximation
Course Notes for EE7C (Spring 018: Convex Optiization and Approxiation Instructor: Moritz Hardt Eail: hardt+ee7c@berkeley.edu Graduate Instructor: Max Sichowitz Eail: sichow+ee7c@berkeley.edu October 15,
More information1 Bounding the Margin
COS 511: Theoretical Machine Learning Lecturer: Rob Schapire Lecture #12 Scribe: Jian Min Si March 14, 2013 1 Bounding the Margin We are continuing the proof of a bound on the generalization error of AdaBoost
More information1 Proof of learning bounds
COS 511: Theoretical Machine Learning Lecturer: Rob Schapire Lecture #4 Scribe: Akshay Mittal February 13, 2013 1 Proof of learning bounds For intuition of the following theore, suppose there exists a
More information3.8 Three Types of Convergence
3.8 Three Types of Convergence 3.8 Three Types of Convergence 93 Suppose that we are given a sequence functions {f k } k N on a set X and another function f on X. What does it ean for f k to converge to
More informationCourse Notes for EE227C (Spring 2018): Convex Optimization and Approximation
Course Notes for EE227C (Spring 2018): Convex Optiization and Approxiation Instructor: Moritz Hardt Eail: hardt+ee227c@berkeley.edu Graduate Instructor: Max Sichowitz Eail: sichow+ee227c@berkeley.edu October
More informationVC Dimension and Sauer s Lemma
CMSC 35900 (Spring 2008) Learning Theory Lecture: VC Diension and Sauer s Lea Instructors: Sha Kakade and Abuj Tewari Radeacher Averages and Growth Function Theore Let F be a class of ±-valued functions
More informationLearnability and Stability in the General Learning Setting
Learnability and Stability in the General Learning Setting Shai Shalev-Shwartz TTI-Chicago shai@tti-c.org Ohad Shair The Hebrew University ohadsh@cs.huji.ac.il Nathan Srebro TTI-Chicago nati@uchicago.edu
More information1 Proving the Fundamental Theorem of Statistical Learning
THEORETICAL MACHINE LEARNING COS 5 LECTURE #7 APRIL 5, 6 LECTURER: ELAD HAZAN NAME: FERMI MA ANDDANIEL SUO oving te Fundaental Teore of Statistical Learning In tis section, we prove te following: Teore.
More informationSupport Vector Machine Classification of Uncertain and Imbalanced data using Robust Optimization
Recent Researches in Coputer Science Support Vector Machine Classification of Uncertain and Ibalanced data using Robust Optiization RAGHAV PAT, THEODORE B. TRAFALIS, KASH BARKER School of Industrial Engineering
More informationA Simple Regression Problem
A Siple Regression Proble R. M. Castro March 23, 2 In this brief note a siple regression proble will be introduced, illustrating clearly the bias-variance tradeoff. Let Y i f(x i ) + W i, i,..., n, where
More informationLecture October 23. Scribes: Ruixin Qiang and Alana Shine
CSCI699: Topics in Learning and Gae Theory Lecture October 23 Lecturer: Ilias Scribes: Ruixin Qiang and Alana Shine Today s topic is auction with saples. 1 Introduction to auctions Definition 1. In a single
More informationComputable Shell Decomposition Bounds
Coputable Shell Decoposition Bounds John Langford TTI-Chicago jcl@cs.cu.edu David McAllester TTI-Chicago dac@autoreason.co Editor: Leslie Pack Kaelbling and David Cohn Abstract Haussler, Kearns, Seung
More informationShannon Sampling II. Connections to Learning Theory
Shannon Sapling II Connections to Learning heory Steve Sale oyota echnological Institute at Chicago 147 East 60th Street, Chicago, IL 60637, USA E-ail: sale@athberkeleyedu Ding-Xuan Zhou Departent of Matheatics,
More informationStochastic Subgradient Methods
Stochastic Subgradient Methods Lingjie Weng Yutian Chen Bren School of Inforation and Coputer Science University of California, Irvine {wengl, yutianc}@ics.uci.edu Abstract Stochastic subgradient ethods
More informationLearnability, Stability and Uniform Convergence
Journal of Machine Learning Research (200) 2635-2670 ubitted 4/0; Revised 9/0; Published 0/0 Learnability, tability and Unifor Convergence hai halev-hwartz chool of Coputer cience and Engineering The Hebrew
More informationSupport Vector Machines. Maximizing the Margin
Support Vector Machines Support vector achines (SVMs) learn a hypothesis: h(x) = b + Σ i= y i α i k(x, x i ) (x, y ),..., (x, y ) are the training exs., y i {, } b is the bias weight. α,..., α are the
More informationPrerequisites. We recall: Theorem 2 A subset of a countably innite set is countable.
Prerequisites 1 Set Theory We recall the basic facts about countable and uncountable sets, union and intersection of sets and iages and preiages of functions. 1.1 Countable and uncountable sets We can
More informationConsistent Multiclass Algorithms for Complex Performance Measures. Supplementary Material
Consistent Multiclass Algoriths for Coplex Perforance Measures Suppleentary Material Notations. Let λ be the base easure over n given by the unifor rando variable (say U over n. Hence, for all easurable
More informationStability Bounds for Non-i.i.d. Processes
tability Bounds for Non-i.i.d. Processes Mehryar Mohri Courant Institute of Matheatical ciences and Google Research 25 Mercer treet New York, NY 002 ohri@cis.nyu.edu Afshin Rostaiadeh Departent of Coputer
More informationKernel Methods and Support Vector Machines
Intelligent Systes: Reasoning and Recognition Jaes L. Crowley ENSIAG 2 / osig 1 Second Seester 2012/2013 Lesson 20 2 ay 2013 Kernel ethods and Support Vector achines Contents Kernel Functions...2 Quadratic
More informationPattern Recognition and Machine Learning. Learning and Evaluation for Pattern Recognition
Pattern Recognition and Machine Learning Jaes L. Crowley ENSIMAG 3 - MMIS Fall Seester 2017 Lesson 1 4 October 2017 Outline Learning and Evaluation for Pattern Recognition Notation...2 1. The Pattern Recognition
More informationThe Weierstrass Approximation Theorem
36 The Weierstrass Approxiation Theore Recall that the fundaental idea underlying the construction of the real nubers is approxiation by the sipler rational nubers. Firstly, nubers are often deterined
More informationIntelligent Systems: Reasoning and Recognition. Perceptrons and Support Vector Machines
Intelligent Systes: Reasoning and Recognition Jaes L. Crowley osig 1 Winter Seester 2018 Lesson 6 27 February 2018 Outline Perceptrons and Support Vector achines Notation...2 Linear odels...3 Lines, Planes
More informationLearnability of Gaussians with flexible variances
Learnability of Gaussians with flexible variances Ding-Xuan Zhou City University of Hong Kong E-ail: azhou@cityu.edu.hk Supported in part by Research Grants Council of Hong Kong Start October 20, 2007
More informationComputable Shell Decomposition Bounds
Journal of Machine Learning Research 5 (2004) 529-547 Subitted 1/03; Revised 8/03; Published 5/04 Coputable Shell Decoposition Bounds John Langford David McAllester Toyota Technology Institute at Chicago
More informationList Scheduling and LPT Oliver Braun (09/05/2017)
List Scheduling and LPT Oliver Braun (09/05/207) We investigate the classical scheduling proble P ax where a set of n independent jobs has to be processed on 2 parallel and identical processors (achines)
More informationComputational and Statistical Learning theory
Computational and Statistical Learning theory Problem set 2 Due: January 31st Email solutions to : karthik at ttic dot edu Notation : Input space : X Label space : Y = {±1} Sample : (x 1, y 1,..., (x n,
More informationFeature Extraction Techniques
Feature Extraction Techniques Unsupervised Learning II Feature Extraction Unsupervised ethods can also be used to find features which can be useful for categorization. There are unsupervised ethods that
More informationEfficient Learning of Generalized Linear and Single Index Models with Isotonic Regression
Efficient Learning of Generalized Linear and Single Index Models with Isotonic Regression Sha M Kakade Microsoft Research and Wharton, U Penn skakade@icrosoftco Varun Kanade SEAS, Harvard University vkanade@fasharvardedu
More informationRademacher Complexity Margin Bounds for Learning with a Large Number of Classes
Radeacher Coplexity Margin Bounds for Learning with a Large Nuber of Classes Vitaly Kuznetsov Courant Institute of Matheatical Sciences, 25 Mercer street, New York, NY, 002 Mehryar Mohri Courant Institute
More informationFoundations of Machine Learning Boosting. Mehryar Mohri Courant Institute and Google Research
Foundations of Machine Learning Boosting Mehryar Mohri Courant Institute and Google Research ohri@cis.nyu.edu Weak Learning Definition: concept class C is weakly PAC-learnable if there exists a (weak)
More information1 Identical Parallel Machines
FB3: Matheatik/Inforatik Dr. Syaantak Das Winter 2017/18 Optiizing under Uncertainty Lecture Notes 3: Scheduling to Miniize Makespan In any standard scheduling proble, we are given a set of jobs J = {j
More informationStability Bounds for Stationary ϕ-mixing and β-mixing Processes
Journal of Machine Learning Research (200) 789-84 Subitted /08; Revised /0; Published 2/0 Stability Bounds for Stationary ϕ-ixing and β-ixing Processes Mehryar Mohri Courant Institute of Matheatical Sciences
More informationMASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.436J/15.085J Fall 2008 Lecture 11 10/15/2008 ABSTRACT INTEGRATION I
MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.436J/15.085J Fall 2008 Lecture 11 10/15/2008 ABSTRACT INTEGRATION I Contents 1. Preliinaries 2. The ain result 3. The Rieann integral 4. The integral of a nonnegative
More informationRobustness and Regularization of Support Vector Machines
Robustness and Regularization of Support Vector Machines Huan Xu ECE, McGill University Montreal, QC, Canada xuhuan@ci.cgill.ca Constantine Caraanis ECE, The University of Texas at Austin Austin, TX, USA
More informationA Theoretical Framework for Deep Transfer Learning
A Theoretical Fraewor for Deep Transfer Learning Toer Galanti The School of Coputer Science Tel Aviv University toer22g@gail.co Lior Wolf The School of Coputer Science Tel Aviv University wolf@cs.tau.ac.il
More informationLean Walsh Transform
Lean Walsh Transfor Edo Liberty 5th March 007 inforal intro We show an orthogonal atrix A of size d log 4 3 d (α = log 4 3) which is applicable in tie O(d). By applying a rando sign change atrix S to the
More informationCombining Classifiers
Cobining Classifiers Generic ethods of generating and cobining ultiple classifiers Bagging Boosting References: Duda, Hart & Stork, pg 475-480. Hastie, Tibsharini, Friedan, pg 246-256 and Chapter 10. http://www.boosting.org/
More information13.2 Fully Polynomial Randomized Approximation Scheme for Permanent of Random 0-1 Matrices
CS71 Randoness & Coputation Spring 018 Instructor: Alistair Sinclair Lecture 13: February 7 Disclaier: These notes have not been subjected to the usual scrutiny accorded to foral publications. They ay
More informationStructured Prediction Theory Based on Factor Graph Complexity
Structured Prediction Theory Based on Factor Graph Coplexity Corinna Cortes Google Research New York, NY 00 corinna@googleco Mehryar Mohri Courant Institute and Google New York, NY 00 ohri@cisnyuedu Vitaly
More informationarxiv: v3 [cs.lg] 7 Jan 2016
Efficient and Parsionious Agnostic Active Learning Tzu-Kuo Huang Alekh Agarwal Daniel J. Hsu tkhuang@icrosoft.co alekha@icrosoft.co djhsu@cs.colubia.edu John Langford Robert E. Schapire jcl@icrosoft.co
More informationSharp Time Data Tradeoffs for Linear Inverse Problems
Sharp Tie Data Tradeoffs for Linear Inverse Probles Saet Oyak Benjain Recht Mahdi Soltanolkotabi January 016 Abstract In this paper we characterize sharp tie-data tradeoffs for optiization probles used
More informationBlock designs and statistics
Bloc designs and statistics Notes for Math 447 May 3, 2011 The ain paraeters of a bloc design are nuber of varieties v, bloc size, nuber of blocs b. A design is built on a set of v eleents. Each eleent
More informationSome Classical Ergodic Theorems
Soe Classical Ergodic Theores Matt Rosenzweig Contents Classical Ergodic Theores. Mean Ergodic Theores........................................2 Maxial Ergodic Theore.....................................
More informationFAST DYNAMO ON THE REAL LINE
FAST DYAMO O THE REAL LIE O. KOZLOVSKI & P. VYTOVA Abstract. In this paper we show that a piecewise expanding ap on the interval, extended to the real line by a non-expanding ap satisfying soe ild hypthesis
More informationA Bernstein-Markov Theorem for Normed Spaces
A Bernstein-Markov Theore for Nored Spaces Lawrence A. Harris Departent of Matheatics, University of Kentucky Lexington, Kentucky 40506-0027 Abstract Let X and Y be real nored linear spaces and let φ :
More informatione-companion ONLY AVAILABLE IN ELECTRONIC FORM
OPERATIONS RESEARCH doi 10.1287/opre.1070.0427ec pp. ec1 ec5 e-copanion ONLY AVAILABLE IN ELECTRONIC FORM infors 07 INFORMS Electronic Copanion A Learning Approach for Interactive Marketing to a Custoer
More informationKernel Choice and Classifiability for RKHS Embeddings of Probability Distributions
Kernel Choice and Classifiability for RKHS Ebeddings of Probability Distributions Bharath K. Sriperubudur Departent of ECE UC San Diego, La Jolla, USA bharathsv@ucsd.edu Kenji Fukuizu The Institute of
More informationSupplement to: Subsampling Methods for Persistent Homology
Suppleent to: Subsapling Methods for Persistent Hoology A. Technical results In this section, we present soe technical results that will be used to prove the ain theores. First, we expand the notation
More informationChapter 6 1-D Continuous Groups
Chapter 6 1-D Continuous Groups Continuous groups consist of group eleents labelled by one or ore continuous variables, say a 1, a 2,, a r, where each variable has a well- defined range. This chapter explores:
More informationQuantum algorithms (CO 781, Winter 2008) Prof. Andrew Childs, University of Waterloo LECTURE 15: Unstructured search and spatial search
Quantu algoriths (CO 781, Winter 2008) Prof Andrew Childs, University of Waterloo LECTURE 15: Unstructured search and spatial search ow we begin to discuss applications of quantu walks to search algoriths
More informationStability Bounds for Stationary ϕ-mixing and β-mixing Processes
Journal of Machine Learning Research (200 66-686 Subitted /08; Revised /0; Published 2/0 Stability Bounds for Stationary ϕ-ixing and β-ixing Processes Mehryar Mohri Courant Institute of Matheatical Sciences
More informationLecture Learning infinite hypothesis class via VC-dimension and Rademacher complexity;
CSCI699: Topics in Learning and Game Theory Lecture 2 Lecturer: Ilias Diakonikolas Scribes: Li Han Today we will cover the following 2 topics: 1. Learning infinite hypothesis class via VC-dimension and
More informationMachine Learning Basics: Estimators, Bias and Variance
Machine Learning Basics: Estiators, Bias and Variance Sargur N. srihari@cedar.buffalo.edu This is part of lecture slides on Deep Learning: http://www.cedar.buffalo.edu/~srihari/cse676 1 Topics in Basics
More informationCS Lecture 13. More Maximum Likelihood
CS 6347 Lecture 13 More Maxiu Likelihood Recap Last tie: Introduction to axiu likelihood estiation MLE for Bayesian networks Optial CPTs correspond to epirical counts Today: MLE for CRFs 2 Maxiu Likelihood
More informationFixed-to-Variable Length Distribution Matching
Fixed-to-Variable Length Distribution Matching Rana Ali Ajad and Georg Böcherer Institute for Counications Engineering Technische Universität München, Gerany Eail: raa2463@gail.co,georg.boecherer@tu.de
More informationAsynchronous Gossip Algorithms for Stochastic Optimization
Asynchronous Gossip Algoriths for Stochastic Optiization S. Sundhar Ra ECE Dept. University of Illinois Urbana, IL 680 ssrini@illinois.edu A. Nedić IESE Dept. University of Illinois Urbana, IL 680 angelia@illinois.edu
More informationSupport Vector Machines. Machine Learning Series Jerry Jeychandra Blohm Lab
Support Vector Machines Machine Learning Series Jerry Jeychandra Bloh Lab Outline Main goal: To understand how support vector achines (SVMs) perfor optial classification for labelled data sets, also a
More informationReed-Muller Codes. m r inductive definition. Later, we shall explain how to construct Reed-Muller codes using the Kronecker product.
Coding Theory Massoud Malek Reed-Muller Codes An iportant class of linear block codes rich in algebraic and geoetric structure is the class of Reed-Muller codes, which includes the Extended Haing code.
More informationBounds on the Minimax Rate for Estimating a Prior over a VC Class from Independent Learning Tasks
Bounds on the Miniax Rate for Estiating a Prior over a VC Class fro Independent Learning Tasks Liu Yang Steve Hanneke Jaie Carbonell Deceber 01 CMU-ML-1-11 School of Coputer Science Carnegie Mellon University
More informationUniform Approximation and Bernstein Polynomials with Coefficients in the Unit Interval
Unifor Approxiation and Bernstein Polynoials with Coefficients in the Unit Interval Weiang Qian and Marc D. Riedel Electrical and Coputer Engineering, University of Minnesota 200 Union St. S.E. Minneapolis,
More informationFairness via priority scheduling
Fairness via priority scheduling Veeraruna Kavitha, N Heachandra and Debayan Das IEOR, IIT Bobay, Mubai, 400076, India vavitha,nh,debayan}@iitbacin Abstract In the context of ulti-agent resource allocation
More informationOn the Communication Complexity of Lipschitzian Optimization for the Coordinated Model of Computation
journal of coplexity 6, 459473 (2000) doi:0.006jco.2000.0544, available online at http:www.idealibrary.co on On the Counication Coplexity of Lipschitzian Optiization for the Coordinated Model of Coputation
More informationApproximation by Piecewise Constants on Convex Partitions
Aroxiation by Piecewise Constants on Convex Partitions Oleg Davydov Noveber 4, 2011 Abstract We show that the saturation order of iecewise constant aroxiation in L nor on convex artitions with N cells
More informationGeneralization Bounds for Learning Weighted Automata
Generalization Bounds for Learning Weighted Autoata Borja Balle a,, Mehryar Mohri b,c a Departent of Matheatics and Statistics, Lancaster University, Lancaster, UK b Courant Institute of Matheatical Sciences,
More informationSupport Vector Machines. Goals for the lecture
Support Vector Machines Mark Craven and David Page Coputer Sciences 760 Spring 2018 www.biostat.wisc.edu/~craven/cs760/ Soe of the slides in these lectures have been adapted/borrowed fro aterials developed
More informationPolytopes and arrangements: Diameter and curvature
Operations Research Letters 36 2008 2 222 Operations Research Letters wwwelsevierco/locate/orl Polytopes and arrangeents: Diaeter and curvature Antoine Deza, Taás Terlaky, Yuriy Zinchenko McMaster University,
More informationA Note on Scheduling Tall/Small Multiprocessor Tasks with Unit Processing Time to Minimize Maximum Tardiness
A Note on Scheduling Tall/Sall Multiprocessor Tasks with Unit Processing Tie to Miniize Maxiu Tardiness Philippe Baptiste and Baruch Schieber IBM T.J. Watson Research Center P.O. Box 218, Yorktown Heights,
More informationTight Information-Theoretic Lower Bounds for Welfare Maximization in Combinatorial Auctions
Tight Inforation-Theoretic Lower Bounds for Welfare Maxiization in Cobinatorial Auctions Vahab Mirrokni Jan Vondrák Theory Group, Microsoft Dept of Matheatics Research Princeton University Redond, WA 9805
More informationModel Fitting. CURM Background Material, Fall 2014 Dr. Doreen De Leon
Model Fitting CURM Background Material, Fall 014 Dr. Doreen De Leon 1 Introduction Given a set of data points, we often want to fit a selected odel or type to the data (e.g., we suspect an exponential
More informationRANDOM GRADIENT EXTRAPOLATION FOR DISTRIBUTED AND STOCHASTIC OPTIMIZATION
RANDOM GRADIENT EXTRAPOLATION FOR DISTRIBUTED AND STOCHASTIC OPTIMIZATION GUANGHUI LAN AND YI ZHOU Abstract. In this paper, we consider a class of finite-su convex optiization probles defined over a distributed
More informationSolutions 1. Introduction to Coding Theory - Spring 2010 Solutions 1. Exercise 1.1. See Examples 1.2 and 1.11 in the course notes.
Solutions 1 Exercise 1.1. See Exaples 1.2 and 1.11 in the course notes. Exercise 1.2. Observe that the Haing distance of two vectors is the iniu nuber of bit flips required to transfor one into the other.
More informationA Theoretical Analysis of a Warm Start Technique
A Theoretical Analysis of a War Start Technique Martin A. Zinkevich Yahoo! Labs 701 First Avenue Sunnyvale, CA Abstract Batch gradient descent looks at every data point for every step, which is wasteful
More informationIntroduction to Discrete Optimization
Prof. Friedrich Eisenbrand Martin Nieeier Due Date: March 9 9 Discussions: March 9 Introduction to Discrete Optiization Spring 9 s Exercise Consider a school district with I neighborhoods J schools and
More informationarxiv: v2 [stat.ml] 23 Feb 2016
Perutational Radeacher Coplexity A New Coplexity Measure for Transductive Learning Ilya Tolstikhin 1, Nikita Zhivotovskiy 3, and Gilles Blanchard 4 arxiv:1505.0910v stat.ml 3 Feb 016 1 Max-Planck-Institute
More informationEstimating Parameters for a Gaussian pdf
Pattern Recognition and achine Learning Jaes L. Crowley ENSIAG 3 IS First Seester 00/0 Lesson 5 7 Noveber 00 Contents Estiating Paraeters for a Gaussian pdf Notation... The Pattern Recognition Proble...3
More informationComputational and Statistical Learning Theory
Computational and Statistical Learning Theory Problem set 1 Due: Monday, October 10th Please send your solutions to learning-submissions@ttic.edu Notation: Input space: X Label space: Y = {±1} Sample:
More informationOn the Inapproximability of Vertex Cover on k-partite k-uniform Hypergraphs
On the Inapproxiability of Vertex Cover on k-partite k-unifor Hypergraphs Venkatesan Guruswai and Rishi Saket Coputer Science Departent Carnegie Mellon University Pittsburgh, PA 1513. Abstract. Coputing
More informationNecessity of low effective dimension
Necessity of low effective diension Art B. Owen Stanford University October 2002, Orig: July 2002 Abstract Practitioners have long noticed that quasi-monte Carlo ethods work very well on functions that
More informationTesting Properties of Collections of Distributions
Testing Properties of Collections of Distributions Reut Levi Dana Ron Ronitt Rubinfeld April 9, 0 Abstract We propose a fraework for studying property testing of collections of distributions, where the
More informationBayes Decision Rule and Naïve Bayes Classifier
Bayes Decision Rule and Naïve Bayes Classifier Le Song Machine Learning I CSE 6740, Fall 2013 Gaussian Mixture odel A density odel p(x) ay be ulti-odal: odel it as a ixture of uni-odal distributions (e.g.
More informationDistributed Subgradient Methods for Multi-agent Optimization
1 Distributed Subgradient Methods for Multi-agent Optiization Angelia Nedić and Asuan Ozdaglar October 29, 2007 Abstract We study a distributed coputation odel for optiizing a su of convex objective functions
More informationBoosting with log-loss
Boosting with log-loss Marco Cusuano-Towner Septeber 2, 202 The proble Suppose we have data exaples {x i, y i ) i =... } for a two-class proble with y i {, }. Let F x) be the predictor function with the
More informationBest Arm Identification: A Unified Approach to Fixed Budget and Fixed Confidence
Best Ar Identification: A Unified Approach to Fixed Budget and Fixed Confidence Victor Gabillon Mohaad Ghavazadeh Alessandro Lazaric INRIA Lille - Nord Europe, Tea SequeL {victor.gabillon,ohaad.ghavazadeh,alessandro.lazaric}@inria.fr
More informationEnsemble Based on Data Envelopment Analysis
Enseble Based on Data Envelopent Analysis So Young Sohn & Hong Choi Departent of Coputer Science & Industrial Systes Engineering, Yonsei University, Seoul, Korea Tel) 82-2-223-404, Fax) 82-2- 364-7807
More informationORIE 6340: Mathematics of Data Science
ORIE 6340: Matheatics of Data Science Daek Davis Contents 1 Estiation in High Diensions 1 1.1 Tools for understanding high-diensional sets................. 3 1.1.1 Concentration of volue in high-diensions...............
More informationPREPRINT 2006:17. Inequalities of the Brunn-Minkowski Type for Gaussian Measures CHRISTER BORELL
PREPRINT 2006:7 Inequalities of the Brunn-Minkowski Type for Gaussian Measures CHRISTER BORELL Departent of Matheatical Sciences Division of Matheatics CHALMERS UNIVERSITY OF TECHNOLOGY GÖTEBORG UNIVERSITY
More informationImproved Guarantees for Agnostic Learning of Disjunctions
Iproved Guarantees for Agnostic Learning of Disjunctions Pranjal Awasthi Carnegie Mellon University pawasthi@cs.cu.edu Avri Blu Carnegie Mellon University avri@cs.cu.edu Or Sheffet Carnegie Mellon University
More informationUniversal algorithms for learning theory Part II : piecewise polynomial functions
Universal algoriths for learning theory Part II : piecewise polynoial functions Peter Binev, Albert Cohen, Wolfgang Dahen, and Ronald DeVore Deceber 6, 2005 Abstract This paper is concerned with estiating
More informationIntroduction to Machine Learning. Recitation 11
Introduction to Machine Learning Lecturer: Regev Schweiger Recitation Fall Seester Scribe: Regev Schweiger. Kernel Ridge Regression We now take on the task of kernel-izing ridge regression. Let x,...,
More informationA Better Algorithm For an Ancient Scheduling Problem. David R. Karger Steven J. Phillips Eric Torng. Department of Computer Science
A Better Algorith For an Ancient Scheduling Proble David R. Karger Steven J. Phillips Eric Torng Departent of Coputer Science Stanford University Stanford, CA 9435-4 Abstract One of the oldest and siplest
More informationNew Bounds for Learning Intervals with Implications for Semi-Supervised Learning
JMLR: Workshop and Conference Proceedings vol (1) 1 15 New Bounds for Learning Intervals with Iplications for Sei-Supervised Learning David P. Helbold dph@soe.ucsc.edu Departent of Coputer Science, University
More information