Understanding Machine Learning Solution Manual
|
|
- Lenard Washington
- 5 years ago
- Views:
Transcription
1 Understanding Machine Learning Solution Manual Written by Alon Gonen Edited by Dana Rubinstein Noveber 17, Gentle Start 1. Given S = ((x i, y i )), define the ultivariate polynoial p S (x) = i []:y i =1 x x i 2. Then, for every i s.t. y i = 1 we have p S (x i ) = 0, while for every other x we have p S (x) < By the linearity of expectation, E S x D [L S(h)] = E S x D = 1 = 1 [ 1 1 [h(xi ) f(x i )] E [1 [h(x x i D i ) f(x i )]] P [h(x i) f(x i )] x i D = 1 L (D,f)(h) = L (D,f) (h). The solutions to Chapters 13,14 were written by Shai Shalev-Shwartz ] 1
2 3. (a) First, observe that by definition, A labels positively all the positive instances in the training set. Second, as we assue realizability, and since the tightest rectangle enclosing all positive exaples is returned, all the negative instances are labeled correctly by A as well. We conclude that A is an ERM. (b) Fix soe distribution D over X, and define R as in the hint. Let f be the hypothesis associated with R a training set S, denote by R(S) the rectangle returned by the proposed algorith and by A(S) the corresponding hypothesis. The definition of the algorith A iplies that R(S) R for every S. Thus, L (D,f) (R(S)) = D(R \ R(S)). Fix soe (0, 1). Define R 1, R 2, R 3 and R 4 as in the hint. For each i [4], define the event F i = {S x : S x R i = }. Applying the union bound, we obtain ( 4 ) D ({S : L (D,f) (A(S)) > }) D F i 4 D (F i ). Thus, it suffices to ensure that D (F i ) δ/4 for every i. Fix soe i [4]. Then, the probability that a saple is in F i is the probability that all of the instances don t fall in R i, which is exactly (1 /4). Therefore, and hence, D (F i ) = (1 /4) exp( /4), D ({S : L (D,f) (A(S)) > ]}) 4 exp( /4). Plugging in the assuption on, we conclude our proof. (c) The hypothesis class of axis aligned rectangles in R d is defined as follows. Given real nubers a 1 b 1, a 2 b 2,..., a d b d, define the classifier h (a1,b 1,...,a d,b d ) by h (a1,b 1,...,a d,b d )(x 1,..., x d ) = 2 { 1 if i [d], a i x i b i 0 otherwise (1)
3 The class of all axis-aligned rectangles in R d is defined as H d rec = {h (a1,b 1,...,a d,b d ) : i [d], a i b i, }. It can be seen that the sae algorith proposed above is an ERM for this case as well. The saple coplexity is analyzed siilarly. The only difference is that instead of 4 strips, we have 2d strips (2 strips for each diension). Thus, it suffices to draw a training set of size 2d log(2d/δ). (d) For each diension, the algorith has to find the inial and the axial values aong the positive instances in the training sequence. Therefore, its runtie is O(d). Since we have shown that the required value of is at ost 2d log(2d/δ), it follows that the runtie of the algorith is indeed polynoial in d, 1/, and log(1/δ). 3 A Foral Learning Model 1. The proofs follow (alost) iediately fro the definition. We will show that the saple coplexity is onotonically decreasing in the accuracy paraeter. The proof that the saple coplexity is onotonically decreasing in the confidence paraeter δ is analogous. Denote by D an unknown distribution over X, and let f H be the target hypothesis. Denote by A an algorith which learns H with saple coplexity H (, ). Fix soe δ (0, 1). Suppose that 0 < def We need to show that 1 = H ( 1, δ) H ( 2, δ) def = 2. Given an i.i.d. training sequence of size 1, we have that with probability at least 1 δ, A returns a hypothesis h such that L D,f (h) 1 2. By the iniality of 2, we conclude that (a) We propose the following algorith. If a positive instance x + appears in S, return the (true) hypothesis h x+. If S doesn t contain any positive instance, the algorith returns the all-negative hypothesis. It is clear that this algorith is an ERM. (b) Let (0, 1), and fix the distribution D over X. If the true hypothesis is h, then our algorith returns a perfect hypothesis. 3
4 Assue now that there exists a unique positive instance x +. It s clear that if x + appears in the training sequence S, our algorith returns a perfect hypothesis. Furtherore, if D[{x + }] then in any case, the returned hypothesis has a generalization error of at ost (with probability 1). Thus, it is only left to bound the probability of the case in which D[{x + }] >, but x + doesn t appear in S. Denote this event by F. Then P S x D [F ] (1 ) e. Hence, H Singleton is PAC learnable, and its saple coplexity is bounded by log(1/δ) H (, δ). 3. Consider the ERM algorith A which given a training sequence S = ((x i, y i )), returns the hypothesis ĥ corresponding to the tightest circle which contains all the positive instances. Denote the radius of this hypothesis by ˆr. Assue realizability and let h be a circle with zero generalization error. Denote its radius by r. Let, δ (0, 1). Let r r be a scalar s.t. D X ({x : r x r }) =. Define E = {x R 2 : r x r }. The probability (over drawing S) that L D (h S ) is bounded above by the probability that no point in S belongs to E. This probability of this event is bounded above by (1 ) e. The desired bound on the saple coplexity follows by requiring that e δ. 4. We first observe that H is finite. Let us calculate its size accurately. Each hypothesis, besides the all-negative hypothesis, is deterined by deciding for each variable x i, whether x i, x i or none of which appear in the corresponding conjunction. Thus, H = 3 d + 1. We conclude that H is PAC learnable and its saple coplexity can be bounded by d log 3 + log(1/δ) H (, δ). Let s describe our learning algorith. We define h 0 = x 1 x 1... x d x d. Observe that h 0 is the always-inus hypothesis. Let ((a 1, y 1 ),..., (a, y )) be an i.i.d. training sequence of size. Since 4
5 we cannot produce any inforation fro negative exaples, our algorith neglects the. For each positive exaple a, we reove fro h i all the literals that are issing in a. That is, if a i = 1, we reove x i fro h and if a i = 0, we reove x i fro h i. Finally, our algorith returns h. By construction and realizability, h i labels positively all the positive exaples aong a 1,..., a i. Fro the sae reasons, the set of literals in h i contains the set of literals in the target hypothesis. Thus, h i classifies correctly the negative eleents aong a 1,..., a i. This iplies that h is an ERM. Since the algorith takes linear tie (in ters of the diension d) to process each exaple, the running tie is bounded by O( d). 5. Fix soe h H with L (D,f) (h) >. By definition, P X D1 [h(x) = f(x)] P X D [h(x) = f(x))] < 1. We now bound the probability that h is consistent with S (i.e., that L S (h) = 0) as follows: P S [L S (h) = 0] = P [h(x) = f(x)] D i X D i ( = P [h(x) = f(x)] X D i ) 1 ( P X D i [h(x) = f(x)] < (1 ) e. The first inequality is the geoetric-arithetic ean inequality. Applying the union bound, we conclude that the probability that there exists soe h H with L (D,f) (h) >, which is consistent with S is at ost H exp( ). 6. Suppose that H is agnostic PAC learnable, and let A be a learning algorith that learns H with saple coplexity H (, ). We show that H is PAC learnable using A. ) 5
6 Let D, f be an (unknown) distribution over X, and the target function respectively. We ay assue w.l.o.g. that D is a joint distribution over X {0, 1}, where the conditional probability of y given x is deterined deterinistically by f. Since we assue realizability, we have inf h H L D (h) = 0. Let, δ (0, 1). Then, for every positive integer H (, δ), if we equip A with a training set S consisting of i.i.d. instances which are labeled by f, then with probability at least 1 δ (over the choice of S x ), it returns a hypothesis h with L D (h) inf h H L D(h ) + = 0 + =. 7. Let x X. Let α x be the conditional probability of a positive label given x. We have P[f D (X) y X = x] = 1 [αx 1/2] P[Y = 0 X = x] + 1 [αx<1/2] P[Y = 1 X = x] = 1 [αx 1/2] (1 α x ) + 1 [αx<1/2] α x = in{α x, 1 α x }. Let g be a classifier 1 fro X to {0, 1}. We have P[g(X) Y X = x] = P[g(X) = 0 X = x] P[Y = 1 X = x] + P[g(X) = 1 X = x] P[Y = 0 X = x] = P[g(X) = 0 X = x] α x + P[g(X) = 1 X = x] (1 α x ) P[g(X) = 0 X = x] in{α x, 1 α x } + P[g(X) = 1 x] in{α x, 1 α x } = in{α x, 1 α x }, The stateent follows now due to the fact that the above is true for every x X. More forally, by the law of total expectation, L D (f D ) = E (x,y) D [1 [fd (x) y]] ] = E x DX [E y DY x [1 [fd (x) y] X = x] = E x DX [α x ] ] E x DX [E y DY x [1 [g(x) y] X = x] = L D (g). 1 As we shall see, g ight be non-deterinistic. 6
7 8. (a) This was proved in the previous exercise. (b) We proved in the previous exercise that for every distribution D, the bayes optial predictor f D is optial w.r.t. D. (c) Choose any distribution D. Then A is not better than f D w.r.t. D. 9. (a) Suppose that H is PAC learnable in the one-oracle odel. Let A be an algorith which learns H and denote by H the function that deterines its saple coplexity. We prove that H is PAC learnable also in the two-oracle odel. Let D be a distribution over X {0, 1}. Note that drawing points fro the negative and positive oracles with equal provability is equivalent to obtaining i.i.d. exaples fro a distribution D which gives equal probability to positive and negative exaples. Forally, for every subset E X we have D [E] = 1 2 D+ [E] D [E]. Thus, D [{x : f(x) = 1}] = D [{x : f(x) = 0}] = 1 2. If we let A an access to a training set which is drawn i.i.d. according to D with size H (/2, δ), then with probability at least 1 δ, A returns h with /2 L (D,f)(h) = P x D [h(x) f(x)] = P = 1, h(x) = 0] + P = 0, h(x) = 1] = P = 1] P x D [h(x) = 0 f(x) = 1] + P = 0] P x D [h(x) = 1 f(x) = 0] = P = 1] P [h(x) = 0 f(x) = 1] x D + P = 0] P [h(x) = 1 f(x) = 0] x D = 1 2 L (D +,f)(h) L (D,f)(h). This iplies that with probability at least 1 δ, both L (D +,f)(h) and L (D,f)(h). 7
Computational and Statistical Learning Theory
Coputational and Statistical Learning Theory Proble sets 5 and 6 Due: Noveber th Please send your solutions to learning-subissions@ttic.edu Notations/Definitions Recall the definition of saple based Radeacher
More information1 Proof of learning bounds
COS 511: Theoretical Machine Learning Lecturer: Rob Schapire Lecture #4 Scribe: Akshay Mittal February 13, 2013 1 Proof of learning bounds For intuition of the following theore, suppose there exists a
More informationIntroduction to Machine Learning (67577) Lecture 3
Introduction to Machine Learning (67577) Lecture 3 Shai Shalev-Shwartz School of CS and Engineering, The Hebrew University of Jerusalem General Learning Model and Bias-Complexity tradeoff Shai Shalev-Shwartz
More informationComputational and Statistical Learning Theory
Coputational and Statistical Learning Theory TTIC 31120 Prof. Nati Srebro Lecture 2: PAC Learning and VC Theory I Fro Adversarial Online to Statistical Three reasons to ove fro worst-case deterinistic
More information1 Rademacher Complexity Bounds
COS 511: Theoretical Machine Learning Lecturer: Rob Schapire Lecture #10 Scribe: Max Goer March 07, 2013 1 Radeacher Coplexity Bounds Recall the following theore fro last lecture: Theore 1. With probability
More informationE0 370 Statistical Learning Theory Lecture 5 (Aug 25, 2011)
E0 370 Statistical Learning Theory Lecture 5 Aug 5, 0 Covering Nubers, Pseudo-Diension, and Fat-Shattering Diension Lecturer: Shivani Agarwal Scribe: Shivani Agarwal Introduction So far we have seen how
More informationE0 370 Statistical Learning Theory Lecture 6 (Aug 30, 2011) Margin Analysis
E0 370 tatistical Learning Theory Lecture 6 (Aug 30, 20) Margin Analysis Lecturer: hivani Agarwal cribe: Narasihan R Introduction In the last few lectures we have seen how to obtain high confidence bounds
More information1 Generalization bounds based on Rademacher complexity
COS 5: Theoretical Machine Learning Lecturer: Rob Schapire Lecture #0 Scribe: Suqi Liu March 07, 08 Last tie we started proving this very general result about how quickly the epirical average converges
More information1 Bounding the Margin
COS 511: Theoretical Machine Learning Lecturer: Rob Schapire Lecture #12 Scribe: Jian Min Si March 14, 2013 1 Bounding the Margin We are continuing the proof of a bound on the generalization error of AdaBoost
More informationPrerequisites. We recall: Theorem 2 A subset of a countably innite set is countable.
Prerequisites 1 Set Theory We recall the basic facts about countable and uncountable sets, union and intersection of sets and iages and preiages of functions. 1.1 Countable and uncountable sets We can
More informationLearnability and Stability in the General Learning Setting
Learnability and Stability in the General Learning Setting Shai Shalev-Shwartz TTI-Chicago shai@tti-c.org Ohad Shair The Hebrew University ohadsh@cs.huji.ac.il Nathan Srebro TTI-Chicago nati@uchicago.edu
More informationPattern Recognition and Machine Learning. Learning and Evaluation for Pattern Recognition
Pattern Recognition and Machine Learning Jaes L. Crowley ENSIMAG 3 - MMIS Fall Seester 2017 Lesson 1 4 October 2017 Outline Learning and Evaluation for Pattern Recognition Notation...2 1. The Pattern Recognition
More informationList Scheduling and LPT Oliver Braun (09/05/2017)
List Scheduling and LPT Oliver Braun (09/05/207) We investigate the classical scheduling proble P ax where a set of n independent jobs has to be processed on 2 parallel and identical processors (achines)
More informationVC Dimension and Sauer s Lemma
CMSC 35900 (Spring 2008) Learning Theory Lecture: VC Diension and Sauer s Lea Instructors: Sha Kakade and Abuj Tewari Radeacher Averages and Growth Function Theore Let F be a class of ±-valued functions
More informationComputable Shell Decomposition Bounds
Coputable Shell Decoposition Bounds John Langford TTI-Chicago jcl@cs.cu.edu David McAllester TTI-Chicago dac@autoreason.co Editor: Leslie Pack Kaelbling and David Cohn Abstract Haussler, Kearns, Seung
More informationSupport Vector Machine Classification of Uncertain and Imbalanced data using Robust Optimization
Recent Researches in Coputer Science Support Vector Machine Classification of Uncertain and Ibalanced data using Robust Optiization RAGHAV PAT, THEODORE B. TRAFALIS, KASH BARKER School of Industrial Engineering
More informationA Note on Scheduling Tall/Small Multiprocessor Tasks with Unit Processing Time to Minimize Maximum Tardiness
A Note on Scheduling Tall/Sall Multiprocessor Tasks with Unit Processing Tie to Miniize Maxiu Tardiness Philippe Baptiste and Baruch Schieber IBM T.J. Watson Research Center P.O. Box 218, Yorktown Heights,
More informationA Simple Regression Problem
A Siple Regression Proble R. M. Castro March 23, 2 In this brief note a siple regression proble will be introduced, illustrating clearly the bias-variance tradeoff. Let Y i f(x i ) + W i, i,..., n, where
More informationConsistent Multiclass Algorithms for Complex Performance Measures. Supplementary Material
Consistent Multiclass Algoriths for Coplex Perforance Measures Suppleentary Material Notations. Let λ be the base easure over n given by the unifor rando variable (say U over n. Hence, for all easurable
More informationOn the Communication Complexity of Lipschitzian Optimization for the Coordinated Model of Computation
journal of coplexity 6, 459473 (2000) doi:0.006jco.2000.0544, available online at http:www.idealibrary.co on On the Counication Coplexity of Lipschitzian Optiization for the Coordinated Model of Coputation
More informationLecture October 23. Scribes: Ruixin Qiang and Alana Shine
CSCI699: Topics in Learning and Gae Theory Lecture October 23 Lecturer: Ilias Scribes: Ruixin Qiang and Alana Shine Today s topic is auction with saples. 1 Introduction to auctions Definition 1. In a single
More informationImproved Guarantees for Agnostic Learning of Disjunctions
Iproved Guarantees for Agnostic Learning of Disjunctions Pranjal Awasthi Carnegie Mellon University pawasthi@cs.cu.edu Avri Blu Carnegie Mellon University avri@cs.cu.edu Or Sheffet Carnegie Mellon University
More informationBayesian Learning. Chapter 6: Bayesian Learning. Bayes Theorem. Roles for Bayesian Methods. CS 536: Machine Learning Littman (Wu, TA)
Bayesian Learning Chapter 6: Bayesian Learning CS 536: Machine Learning Littan (Wu, TA) [Read Ch. 6, except 6.3] [Suggested exercises: 6.1, 6.2, 6.6] Bayes Theore MAP, ML hypotheses MAP learners Miniu
More informationFeature Extraction Techniques
Feature Extraction Techniques Unsupervised Learning II Feature Extraction Unsupervised ethods can also be used to find features which can be useful for categorization. There are unsupervised ethods that
More informationModel Fitting. CURM Background Material, Fall 2014 Dr. Doreen De Leon
Model Fitting CURM Background Material, Fall 014 Dr. Doreen De Leon 1 Introduction Given a set of data points, we often want to fit a selected odel or type to the data (e.g., we suspect an exponential
More informationSupport Vector Machines. Machine Learning Series Jerry Jeychandra Blohm Lab
Support Vector Machines Machine Learning Series Jerry Jeychandra Bloh Lab Outline Main goal: To understand how support vector achines (SVMs) perfor optial classification for labelled data sets, also a
More informationBayes Decision Rule and Naïve Bayes Classifier
Bayes Decision Rule and Naïve Bayes Classifier Le Song Machine Learning I CSE 6740, Fall 2013 Gaussian Mixture odel A density odel p(x) ay be ulti-odal: odel it as a ixture of uni-odal distributions (e.g.
More informationA Theoretical Framework for Deep Transfer Learning
A Theoretical Fraewor for Deep Transfer Learning Toer Galanti The School of Coputer Science Tel Aviv University toer22g@gail.co Lior Wolf The School of Coputer Science Tel Aviv University wolf@cs.tau.ac.il
More informationComputable Shell Decomposition Bounds
Journal of Machine Learning Research 5 (2004) 529-547 Subitted 1/03; Revised 8/03; Published 5/04 Coputable Shell Decoposition Bounds John Langford David McAllester Toyota Technology Institute at Chicago
More informationSupport Vector Machines. Maximizing the Margin
Support Vector Machines Support vector achines (SVMs) learn a hypothesis: h(x) = b + Σ i= y i α i k(x, x i ) (x, y ),..., (x, y ) are the training exs., y i {, } b is the bias weight. α,..., α are the
More information3.8 Three Types of Convergence
3.8 Three Types of Convergence 3.8 Three Types of Convergence 93 Suppose that we are given a sequence functions {f k } k N on a set X and another function f on X. What does it ean for f k to converge to
More informationCourse Notes for EE227C (Spring 2018): Convex Optimization and Approximation
Course Notes for EE7C (Spring 018: Convex Optiization and Approxiation Instructor: Moritz Hardt Eail: hardt+ee7c@berkeley.edu Graduate Instructor: Max Sichowitz Eail: sichow+ee7c@berkeley.edu October 15,
More informationFORMULATION OF THE LEARNING PROBLEM
FORMULTION OF THE LERNING PROBLEM MIM RGINSKY Now that we have seen an informal statement of the learning problem, as well as acquired some technical tools in the form of concentration inequalities, we
More informationQuantum algorithms (CO 781, Winter 2008) Prof. Andrew Childs, University of Waterloo LECTURE 15: Unstructured search and spatial search
Quantu algoriths (CO 781, Winter 2008) Prof Andrew Childs, University of Waterloo LECTURE 15: Unstructured search and spatial search ow we begin to discuss applications of quantu walks to search algoriths
More information13.2 Fully Polynomial Randomized Approximation Scheme for Permanent of Random 0-1 Matrices
CS71 Randoness & Coputation Spring 018 Instructor: Alistair Sinclair Lecture 13: February 7 Disclaier: These notes have not been subjected to the usual scrutiny accorded to foral publications. They ay
More informationA1. Find all ordered pairs (a, b) of positive integers for which 1 a + 1 b = 3
A. Find all ordered pairs a, b) of positive integers for which a + b = 3 08. Answer. The six ordered pairs are 009, 08), 08, 009), 009 337, 674) = 35043, 674), 009 346, 673) = 3584, 673), 674, 009 337)
More informationBlock designs and statistics
Bloc designs and statistics Notes for Math 447 May 3, 2011 The ain paraeters of a bloc design are nuber of varieties v, bloc size, nuber of blocs b. A design is built on a set of v eleents. Each eleent
More informationComputational and Statistical Learning Theory
Computational and Statistical Learning Theory Problem set 1 Due: Monday, October 10th Please send your solutions to learning-submissions@ttic.edu Notation: Input space: X Label space: Y = {±1} Sample:
More informationMachine Learning Basics: Estimators, Bias and Variance
Machine Learning Basics: Estiators, Bias and Variance Sargur N. srihari@cedar.buffalo.edu This is part of lecture slides on Deep Learning: http://www.cedar.buffalo.edu/~srihari/cse676 1 Topics in Basics
More informationPolygonal Designs: Existence and Construction
Polygonal Designs: Existence and Construction John Hegean Departent of Matheatics, Stanford University, Stanford, CA 9405 Jeff Langford Departent of Matheatics, Drake University, Des Moines, IA 5011 G
More informationTesting Properties of Collections of Distributions
Testing Properties of Collections of Distributions Reut Levi Dana Ron Ronitt Rubinfeld April 9, 0 Abstract We propose a fraework for studying property testing of collections of distributions, where the
More informationL S is not p m -hard for NP. Moreover, we prove for every L NP P, that there exists a sparse S EXP such that L S is not p m -hard for NP.
Properties of NP-Coplete Sets Christian Glaßer, A. Pavan, Alan L. Selan, Saik Sengupta January 15, 2004 Abstract We study several properties of sets that are coplete for NP. We prove that if L is an NP-coplete
More informationA Self-Organizing Model for Logical Regression Jerry Farlow 1 University of Maine. (1900 words)
1 A Self-Organizing Model for Logical Regression Jerry Farlow 1 University of Maine (1900 words) Contact: Jerry Farlow Dept of Matheatics Univeristy of Maine Orono, ME 04469 Tel (07) 866-3540 Eail: farlow@ath.uaine.edu
More informationTight Information-Theoretic Lower Bounds for Welfare Maximization in Combinatorial Auctions
Tight Inforation-Theoretic Lower Bounds for Welfare Maxiization in Cobinatorial Auctions Vahab Mirrokni Jan Vondrák Theory Group, Microsoft Dept of Matheatics Research Princeton University Redond, WA 9805
More informationA Quantum Observable for the Graph Isomorphism Problem
A Quantu Observable for the Graph Isoorphis Proble Mark Ettinger Los Alaos National Laboratory Peter Høyer BRICS Abstract Suppose we are given two graphs on n vertices. We define an observable in the Hilbert
More informationThis model assumes that the probability of a gap has size i is proportional to 1/i. i.e., i log m e. j=1. E[gap size] = i P r(i) = N f t.
CS 493: Algoriths for Massive Data Sets Feb 2, 2002 Local Models, Bloo Filter Scribe: Qin Lv Local Models In global odels, every inverted file entry is copressed with the sae odel. This work wells when
More information12. Structural Risk Minimization. ECE 830 & CS 761, Spring 2016
12. Structural Risk Minimization ECE 830 & CS 761, Spring 2016 1 / 23 General setup for statistical learning theory We observe training examples {x i, y i } n i=1 x i = features X y i = labels / responses
More informationStochastic Subgradient Methods
Stochastic Subgradient Methods Lingjie Weng Yutian Chen Bren School of Inforation and Coputer Science University of California, Irvine {wengl, yutianc}@ics.uci.edu Abstract Stochastic subgradient ethods
More informationBootstrapping Dependent Data
Bootstrapping Dependent Data One of the key issues confronting bootstrap resapling approxiations is how to deal with dependent data. Consider a sequence fx t g n t= of dependent rando variables. Clearly
More informationIFT Lecture 7 Elements of statistical learning theory
IFT 6085 - Lecture 7 Elements of statistical learning theory This version of the notes has not yet been thoroughly checked. Please report any bugs to the scribes or instructor. Scribe(s): Brady Neal and
More informationNew Bounds for Learning Intervals with Implications for Semi-Supervised Learning
JMLR: Workshop and Conference Proceedings vol (1) 1 15 New Bounds for Learning Intervals with Iplications for Sei-Supervised Learning David P. Helbold dph@soe.ucsc.edu Departent of Coputer Science, University
More informationChapter 6 1-D Continuous Groups
Chapter 6 1-D Continuous Groups Continuous groups consist of group eleents labelled by one or ore continuous variables, say a 1, a 2,, a r, where each variable has a well- defined range. This chapter explores:
More informationKernel Methods and Support Vector Machines
Intelligent Systes: Reasoning and Recognition Jaes L. Crowley ENSIAG 2 / osig 1 Second Seester 2012/2013 Lesson 20 2 ay 2013 Kernel ethods and Support Vector achines Contents Kernel Functions...2 Quadratic
More informationMidterm 1 Sample Solution
Midter 1 Saple Solution NOTE: Throughout the exa a siple graph is an undirected, unweighted graph with no ultiple edges (i.e., no exact repeats of the sae edge) and no self-loops (i.e., no edges fro a
More informationActive Learning: Disagreement Coefficient
Advanced Course in Machine Learning Spring 2010 Active Learning: Disagreement Coefficient Handouts are jointly prepared by Shie Mannor and Shai Shalev-Shwartz In previous lectures we saw examples in which
More informationEstimating Parameters for a Gaussian pdf
Pattern Recognition and achine Learning Jaes L. Crowley ENSIAG 3 IS First Seester 00/0 Lesson 5 7 Noveber 00 Contents Estiating Paraeters for a Gaussian pdf Notation... The Pattern Recognition Proble...3
More informationSupplement to: Subsampling Methods for Persistent Homology
Suppleent to: Subsapling Methods for Persistent Hoology A. Technical results In this section, we present soe technical results that will be used to prove the ain theores. First, we expand the notation
More informationHomework 3 Solutions CSE 101 Summer 2017
Hoework 3 Solutions CSE 0 Suer 207. Scheduling algoriths The following n = 2 jobs with given processing ties have to be scheduled on = 3 parallel and identical processors with the objective of iniizing
More informationSupplementary to Learning Discriminative Bayesian Networks from High-dimensional Continuous Neuroimaging Data
Suppleentary to Learning Discriinative Bayesian Networks fro High-diensional Continuous Neuroiaging Data Luping Zhou, Lei Wang, Lingqiao Liu, Philip Ogunbona, and Dinggang Shen Proposition. Given a sparse
More informationFoundations of Machine Learning Boosting. Mehryar Mohri Courant Institute and Google Research
Foundations of Machine Learning Boosting Mehryar Mohri Courant Institute and Google Research ohri@cis.nyu.edu Weak Learning Definition: concept class C is weakly PAC-learnable if there exists a (weak)
More informationTight Bounds for Maximal Identifiability of Failure Nodes in Boolean Network Tomography
Tight Bounds for axial Identifiability of Failure Nodes in Boolean Network Toography Nicola Galesi Sapienza Università di Roa nicola.galesi@uniroa1.it Fariba Ranjbar Sapienza Università di Roa fariba.ranjbar@uniroa1.it
More informationA Better Algorithm For an Ancient Scheduling Problem. David R. Karger Steven J. Phillips Eric Torng. Department of Computer Science
A Better Algorith For an Ancient Scheduling Proble David R. Karger Steven J. Phillips Eric Torng Departent of Coputer Science Stanford University Stanford, CA 9435-4 Abstract One of the oldest and siplest
More informationLecture 21 Principle of Inclusion and Exclusion
Lecture 21 Principle of Inclusion and Exclusion Holden Lee and Yoni Miller 5/6/11 1 Introduction and first exaples We start off with an exaple Exaple 11: At Sunnydale High School there are 28 students
More informationIntroduction to Statistical Learning Theory
Introduction to Statistical Learning Theory Definition Reminder: We are given m samples {(x i, y i )} m i=1 Dm and a hypothesis space H and we wish to return h H minimizing L D (h) = E[l(h(x), y)]. Problem
More informationCombining Classifiers
Cobining Classifiers Generic ethods of generating and cobining ultiple classifiers Bagging Boosting References: Duda, Hart & Stork, pg 475-480. Hastie, Tibsharini, Friedan, pg 246-256 and Chapter 10. http://www.boosting.org/
More information1 Identical Parallel Machines
FB3: Matheatik/Inforatik Dr. Syaantak Das Winter 2017/18 Optiizing under Uncertainty Lecture Notes 3: Scheduling to Miniize Makespan In any standard scheduling proble, we are given a set of jobs J = {j
More informationHandout 7. and Pr [M(x) = χ L (x) M(x) =? ] = 1.
Notes on Coplexity Theory Last updated: October, 2005 Jonathan Katz Handout 7 1 More on Randoized Coplexity Classes Reinder: so far we have seen RP,coRP, and BPP. We introduce two ore tie-bounded randoized
More informationESTIMATING AND FORMING CONFIDENCE INTERVALS FOR EXTREMA OF RANDOM POLYNOMIALS. A Thesis. Presented to. The Faculty of the Department of Mathematics
ESTIMATING AND FORMING CONFIDENCE INTERVALS FOR EXTREMA OF RANDOM POLYNOMIALS A Thesis Presented to The Faculty of the Departent of Matheatics San Jose State University In Partial Fulfillent of the Requireents
More informationSolutions of some selected problems of Homework 4
Solutions of soe selected probles of Hoework 4 Sangchul Lee May 7, 2018 Proble 1 Let there be light A professor has two light bulbs in his garage. When both are burned out, they are replaced, and the next
More informationPoornima University, For any query, contact us at: , 18
AIEEE//Math S. No Questions Solutions Q. Lets cos (α + β) = and let sin (α + β) = 5, where α, β π, then tan α = 5 (a) 56 (b) 9 (c) 7 (d) 5 6 Sol: (a) cos (α + β) = 5 tan (α + β) = tan α = than (α + β +
More informationProc. of the IEEE/OES Seventh Working Conference on Current Measurement Technology UNCERTAINTIES IN SEASONDE CURRENT VELOCITIES
Proc. of the IEEE/OES Seventh Working Conference on Current Measureent Technology UNCERTAINTIES IN SEASONDE CURRENT VELOCITIES Belinda Lipa Codar Ocean Sensors 15 La Sandra Way, Portola Valley, CA 98 blipa@pogo.co
More informationNon-Parametric Non-Line-of-Sight Identification 1
Non-Paraetric Non-Line-of-Sight Identification Sinan Gezici, Hisashi Kobayashi and H. Vincent Poor Departent of Electrical Engineering School of Engineering and Applied Science Princeton University, Princeton,
More informationA Bernstein-Markov Theorem for Normed Spaces
A Bernstein-Markov Theore for Nored Spaces Lawrence A. Harris Departent of Matheatics, University of Kentucky Lexington, Kentucky 40506-0027 Abstract Let X and Y be real nored linear spaces and let φ :
More informationProbability Distributions
Probability Distributions In Chapter, we ephasized the central role played by probability theory in the solution of pattern recognition probles. We turn now to an exploration of soe particular exaples
More informationMASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.436J/15.085J Fall 2008 Lecture 11 10/15/2008 ABSTRACT INTEGRATION I
MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.436J/15.085J Fall 2008 Lecture 11 10/15/2008 ABSTRACT INTEGRATION I Contents 1. Preliinaries 2. The ain result 3. The Rieann integral 4. The integral of a nonnegative
More informationare equal to zero, where, q = p 1. For each gene j, the pairwise null and alternative hypotheses are,
Page of 8 Suppleentary Materials: A ultiple testing procedure for ulti-diensional pairwise coparisons with application to gene expression studies Anjana Grandhi, Wenge Guo, Shyaal D. Peddada S Notations
More informationarxiv: v1 [math.co] 19 Apr 2017
PROOF OF CHAPOTON S CONJECTURE ON NEWTON POLYTOPES OF q-ehrhart POLYNOMIALS arxiv:1704.0561v1 [ath.co] 19 Apr 017 JANG SOO KIM AND U-KEUN SONG Abstract. Recently, Chapoton found a q-analog of Ehrhart polynoials,
More informationThe degree of a typical vertex in generalized random intersection graph models
Discrete Matheatics 306 006 15 165 www.elsevier.co/locate/disc The degree of a typical vertex in generalized rando intersection graph odels Jerzy Jaworski a, Michał Karoński a, Dudley Stark b a Departent
More informationOn the Inapproximability of Vertex Cover on k-partite k-uniform Hypergraphs
On the Inapproxiability of Vertex Cover on k-partite k-unifor Hypergraphs Venkatesan Guruswai and Rishi Saket Coputer Science Departent Carnegie Mellon University Pittsburgh, PA 1513. Abstract. Coputing
More informationSupport recovery in compressed sensing: An estimation theoretic approach
Support recovery in copressed sensing: An estiation theoretic approach Ain Karbasi, Ali Horati, Soheil Mohajer, Martin Vetterli School of Coputer and Counication Sciences École Polytechnique Fédérale de
More informationStatistical Learning Learning From Examples
Statistical Learning Learning From Examples We want to estimate the working temperature range of an iphone. We could study the physics and chemistry that affect the performance of the phone too hard We
More information1 Proving the Fundamental Theorem of Statistical Learning
THEORETICAL MACHINE LEARNING COS 5 LECTURE #7 APRIL 5, 6 LECTURER: ELAD HAZAN NAME: FERMI MA ANDDANIEL SUO oving te Fundaental Teore of Statistical Learning In tis section, we prove te following: Teore.
More informationMath Reviews classifications (2000): Primary 54F05; Secondary 54D20, 54D65
The Monotone Lindelöf Property and Separability in Ordered Spaces by H. Bennett, Texas Tech University, Lubbock, TX 79409 D. Lutzer, College of Willia and Mary, Williasburg, VA 23187-8795 M. Matveev, Irvine,
More informationThe Distribution of the Covariance Matrix for a Subset of Elliptical Distributions with Extension to Two Kurtosis Parameters
journal of ultivariate analysis 58, 96106 (1996) article no. 0041 The Distribution of the Covariance Matrix for a Subset of Elliptical Distributions with Extension to Two Kurtosis Paraeters H. S. Steyn
More informationSupport Vector Machines. Goals for the lecture
Support Vector Machines Mark Craven and David Page Coputer Sciences 760 Spring 2018 www.biostat.wisc.edu/~craven/cs760/ Soe of the slides in these lectures have been adapted/borrowed fro aterials developed
More informationLean Walsh Transform
Lean Walsh Transfor Edo Liberty 5th March 007 inforal intro We show an orthogonal atrix A of size d log 4 3 d (α = log 4 3) which is applicable in tie O(d). By applying a rando sign change atrix S to the
More informationSupport Vector Machines MIT Course Notes Cynthia Rudin
Support Vector Machines MIT 5.097 Course Notes Cynthia Rudin Credit: Ng, Hastie, Tibshirani, Friedan Thanks: Şeyda Ertekin Let s start with soe intuition about argins. The argin of an exaple x i = distance
More informationFairness via priority scheduling
Fairness via priority scheduling Veeraruna Kavitha, N Heachandra and Debayan Das IEOR, IIT Bobay, Mubai, 400076, India vavitha,nh,debayan}@iitbacin Abstract In the context of ulti-agent resource allocation
More informationarxiv: v1 [cs.ds] 17 Mar 2016
Tight Bounds for Single-Pass Streaing Coplexity of the Set Cover Proble Sepehr Assadi Sanjeev Khanna Yang Li Abstract arxiv:1603.05715v1 [cs.ds] 17 Mar 2016 We resolve the space coplexity of single-pass
More informationCourse Notes for EE227C (Spring 2018): Convex Optimization and Approximation
Course Notes for EE227C (Spring 2018): Convex Optiization and Approxiation Instructor: Moritz Hardt Eail: hardt+ee227c@berkeley.edu Graduate Instructor: Max Sichowitz Eail: sichow+ee227c@berkeley.edu October
More informationRademacher Complexity Margin Bounds for Learning with a Large Number of Classes
Radeacher Coplexity Margin Bounds for Learning with a Large Nuber of Classes Vitaly Kuznetsov Courant Institute of Matheatical Sciences, 25 Mercer street, New York, NY, 002 Mehryar Mohri Courant Institute
More informationProbability and Stochastic Processes: A Friendly Introduction for Electrical and Computer Engineers Roy D. Yates and David J.
Probability and Stochastic Processes: A Friendly Introduction for Electrical and oputer Engineers Roy D. Yates and David J. Goodan Proble Solutions : Yates and Goodan,1..3 1.3.1 1.4.6 1.4.7 1.4.8 1..6
More informationPAC Model and Generalization Bounds
PAC Model and Generalization Bounds Overview Probably Approximately Correct (PAC) model Basic generalization bounds finite hypothesis class infinite hypothesis class Simple case More next week 2 Motivating
More informationEfficient Learning of Generalized Linear and Single Index Models with Isotonic Regression
Efficient Learning of Generalized Linear and Single Index Models with Isotonic Regression Sha M Kakade Microsoft Research and Wharton, U Penn skakade@icrosoftco Varun Kanade SEAS, Harvard University vkanade@fasharvardedu
More informationMultiscale Entropy Analysis: A New Method to Detect Determinism in a Time. Series. A. Sarkar and P. Barat. Variable Energy Cyclotron Centre
Multiscale Entropy Analysis: A New Method to Detect Deterinis in a Tie Series A. Sarkar and P. Barat Variable Energy Cyclotron Centre /AF Bidhan Nagar, Kolkata 700064, India PACS nubers: 05.45.Tp, 89.75.-k,
More informationConstant-Space String-Matching. in Sublinear Average Time. (Extended Abstract) Wojciech Rytter z. Warsaw University. and. University of Liverpool
Constant-Space String-Matching in Sublinear Average Tie (Extended Abstract) Maxie Crocheore Universite de Marne-la-Vallee Leszek Gasieniec y Max-Planck Institut fur Inforatik Wojciech Rytter z Warsaw University
More informationTopic 5a Introduction to Curve Fitting & Linear Regression
/7/08 Course Instructor Dr. Rayond C. Rup Oice: A 337 Phone: (95) 747 6958 E ail: rcrup@utep.edu opic 5a Introduction to Curve Fitting & Linear Regression EE 4386/530 Coputational ethods in EE Outline
More informationEnsemble Based on Data Envelopment Analysis
Enseble Based on Data Envelopent Analysis So Young Sohn & Hong Choi Departent of Coputer Science & Industrial Systes Engineering, Yonsei University, Seoul, Korea Tel) 82-2-223-404, Fax) 82-2- 364-7807
More informationAlgorithmic Stability and Sanity-Check Bounds for Leave-One-Out Cross-Validation
Algorithic Stability and Sanity-Check Bounds for Leave-One-Out Cross-Validation Michael Kearns AT&T Labs Research Murray Hill, New Jersey kearns@research.att.co Dana Ron MIT Cabridge, MA danar@theory.lcs.it.edu
More informationExact tensor completion with sum-of-squares
Proceedings of Machine Learning Research vol 65:1 54, 2017 30th Annual Conference on Learning Theory Exact tensor copletion with su-of-squares Aaron Potechin Institute for Advanced Study, Princeton David
More information