Vapnik-Chervonenkis theory
|
|
- Emmeline Ross
- 5 years ago
- Views:
Transcription
1 Vapnk-Chervonenks theory Rs Kondor June 13, 2008 For the purposes of ths lecture, we restrct ourselves to the bnary supervsed batch learnng settng. We assume that we have an nput space X, and an unknown dstrbuton D on X { 1,+1}. Gven a tranng set S = ((x 1,y 1 ),(x 2,y 2 ),...,(x m,y m )) drawn from D, our learnng algorthm tres to fnd a hypothess ĥ: X { 1,+1} that wll predct well on future (x,y) examples drawn from D n the sense that the true error E(ĥ) = E (x,y) D I ĥ(x) y ] (1) wll not be too bg. Of course we cannot measure E(ĥ), because we don t know D. What we have nstead s the emprcal error measured on the sample S: E S (ĥ) = 1 m m I ĥ(x ) y ]. (2) =1 Ths s otherwse known as the tranng error, and typcally t s an overoptmstc estmate of the true error, smply because the way most learnng algorthms work s to mplctly or explctly drve down just ths quantty. Some amount of overfttng s then unavodable. The job of generalzaton bounds s to relate these two quanttes, so that we can report thngs lke my algorthm found the hypothess ĥ, on the tranng data t has error E S, and on future data the error s not lkely to be more than ǫ worse. In other words, E(ĥ) ES(ĥ) < ǫ. No matter how we set ǫ, however, a really really bad tranng set (n the sense of a very unrepresentatve sample) can always mslead us even more, so explct bounds of ths form are generally not obtanable. The P part of PAC stands for amng for guarantees of ths form only n the probablstc sense,.e., fndng (ǫ,δ) pars, where both are small postve real numbers, such that P S E(ĥ) E S ǫ ] 1 δ. (3) Here P S stands for probablty over choce of tranng set. 1
2 Concentraton The fundamental dea behnd all generalzaton bounds s that although t s possble that the same quantty, n our case, the error, wll turn out to be very dfferent on two dfferent samples from the same dstrbuton, ths s not lkely to happen. In general, as the sample sze grows, emprcal quanttes tend to concentrate more and more around ther mean, n our case, the true error. The smplest nequalty capturng ths fact, and one that s key to our development, s Hoeffdng s nequalty, whch states that f Z 1,Z 2,...,Z m are ndependent draws of a Bernoull random varable wth parameter p and γ > 0, then 1 P m m ] Z > p + γ e 2mγ2. =1 Ths fts our problem ncely, because for any h C f we take Z = Ih(x ) y ], Hoeffdng s nequalty says somethng about the probablty of devatons from the true error p = E(h) of the hypothess h. Pluggng n the hypothess ĥ returned by algorthm and blndly applyng Hoeffdng s nequalty gves P E(ĥ) E S(ĥ) > ǫ ] e 2mǫ2, so settng the rght hand sde equal to 1 δ, the PAC bound P E(ĥ) ES(ĥ) > ǫ ] < δ (4) s satsfed when ln (1/δ) ǫ > 2m. Unfortunately, ths smple argument s NOT CORRECT. What goes wrong s that gven the fact that our algorthm returned ĥ, the (x,y ) examples n the sample are not IID samples from D, and consequently nether are the Z s IID samples from Bernoull(E(ĥ)). One way to put ths couplng between the sample and the hypothess n relef s to note that our algorthm s really just a functon A: S ĥ. Ths makes t clear that the emprcal error s a functon of S n two ways: through the hypothess A(S) and the ndvdual tranng examples (x, y) S. Clearly, we can t hold one constant and regard E S as a statstc based on IID draws of the other. 2
3 Unform convergence Overcomng the problem of the couplng between S and ĥ s a major hurdle n provng generalzaton bounds. The way to proceed s to nstead of focusng on any one partcular h, to focus on all of them smulatenously. In partcular, f we can fnd an (ǫ,δ) par such that or equvalently, P E(h) E S (h) ǫ h C ] 1 δ, P ĥ C such that E(h) E S(h) > ǫ ] < δ, then that (ǫ,δ) par wll certanly satsfy the PAC bound (3). At frst sght ths seems lke a terrble overkll, snce C mght nclude some crazy rregular functons that mght never be chosen by any reasonable algorthm, but these functons mght make our bound very loose. On the other hand, t s worth notng that at least amongst reasonable functons the ĥ chosen by our learnng algorthm s actually lkely to be towards the top of the lst n terms of the magntude of E(h) E S (h), smply because learnng algorthms by ther very nature tend to drve down E S (h). So boundng E(h) E S (h) for all h C mght not be such a crazy thng to do after all. How best to do t s not obvous, though. If C has only a fnte number of hypotheses, the smplstc method s to use the unon bound: P ĥ C such that E(h) E S(h) > ǫ ] h C P E(h) E S (h) > ǫ ] where on the rght hand sde now we are allowed to use the Hoeffdng bound P E(h) E S (h) > ǫ ] e 2mǫ2, leadng to C e 2mγ2 δ and therefore ln C + ln(1/δ) ǫ >. 2m The unon bound s clearly very loose though, and the explct appearance of the number of hypotheses n C s also worryng: what f two hypotheses are almost dentcal? Should we stll count them as separate? Clearly, there must be some more approprate way of quantfyng the rchness of a concept space than just countng the number of hypotheses n C. VC theory s an attempt to do just ths. 3
4 Symmetrzaton Concentraton nequaltes don t just tell us that on a sngle sample S, E S (h) can t devate too much from ts mean E(h), they also mply that for a par of ndependent samples S 1 and S 2, E S1 (h) can t be very far from E S2 (h). The key dea behnd Vapnk and Chervonenks poneerng work was to use explot ths fact by a process called symmetrzaton to reduce everythng to just lookng at fnte samples. We start wth the followng smple applcaton of Hoeffdng s nequalty. Proposton 1 Let S and S be two ndependent samples of sze 2m drawn from a dstrbuton D on X { 1,+1} and let E and E S be defned as n (1) and (2). Then for any h C, P E(h) E S (h) > ǫ ] 2 P E S (h) E S (h) > ǫ/2 ]. Ths result also readly generalzes to the unform case. Proposton 2 Let S and S be two ndependent samples of sze 2m drawn from a dstrbuton D on X { 1,+1} and let E and E S be defned as n (1) and (2). Then for any h C, ] P sup E(h) E S (h)] > ǫ 2 P sup ES (h) E S (h) ] ] > ǫ/2. h C h C Now let us defne S = S S and ask ourselves: what s the probablty that the errors ncurred by h on S are dstrbuted n such a way that mǫ/2 more of them fall n S than n S? For the sake of smplcty here we only consder the case that exactly 0 errors fall n S and k = mǫ/2 fall n S. Proposton 3 Consder 2m balls of whch exactly k balls are black. If we randomly splt the balls nto two sets of sze m, then the probablty P k,0 that all the black balls end up n the frst set s at most 1/2 k Proof. ( ) 2m P k,0 = / = k k m(m 1)(m 2)...(m k) (2m)(2m 1)...(2m k) < 1/2 k. For the general case of u and u + k balls n the two sets, a smlar combnatoral nequalty holds. The real sgnfcance of symmetrzaton s that t allows us to quantfy the complexty of C n terms of just the jont sample S nstead of ts behavor on the entre nput space. In partcular, n boundng sup h ES (h) E S (h) ], two hypothess h and h only need to be counted as dstnct f they dffer on S: how they behave over the rest of X s mmateral. To be somewhat more explct, we defne the restrcton of h to S as h S : S { 1,+1} h S (x) = h(x), 4
5 and the correspondng restrcted concept class as C S = { h S h C }. Whle C S s of course a property of S, t s also a characterstc of the entre concept class C n the sense that t s often possble to bound ts sze ndependently of S. The maxmal rate at whch C S grows wth the sze of m, Π C (n) = max U X n C U s called the growth functon. Usng the growth functon and Proposton 3, we can now gve the fnte sample verson of the unon bound: P sup ES (h) E S (h) ] > ǫ/2 h C ] Π C (2m) 2 mǫ/2. The VC dmenson s solely a devce for computng Π C (n). The VC dmenson The concept class C s sad to shatter a set S X f C can realze all possble labelngs of S,.e., f C S = 2 S. The Vapnk-Chervonenks dmenson of C s the sze of the largest subset of S that C can shatter, VC(C) = max S { S X and C S = 2 n }. The followng famous result (called the Sauer-Shelah lemma) tells us how to bound the growth functon n terms of the VC dmenson. Proposton 4 (Sauer-Shelah lemma) Let C be a concept class of VC-dmenson d, and let Π(m) be the correspondng growth functon. Then for m d, Π(m) = 2 m ; and for m > d, Π(m) ( em d ) d. (5) Proof. The m d case s just a restatement of the defnnton of VC dmenson. For m > d we use nducton on m. For m = 2, t s trval to show that (5) holds. Assumng that t holds for m and any d, we now show that t also holds for m+1 and any d. Let S be any subset of X of sze m+1, and fxng some x S, let us wrte t as S = S \x {x}, where S \x = m. Now for any h C S \x, consder ts two possble extensons h + : S { 1,+1} wth h + (x ) = h(x ) for x S and h + (x) = 1 h : S { 1,+1} wth h (x ) = h(x ) for x S and h (x) = 1. Ether both of these hypotheses are n C S or only one of them s. Let U be the subset of C S \x of hypotheses for whch both h + and h are n C S, and let U S = h U {h+,h }. Then we have C S = C S \x + U. 5
6 By the nductve hypothess C S \x d ) =1. As for the second term, consder that f U shatters any set V, then U S wll shatter V {x}, so VC(U) VC(U S ) 1 d 1, so by the nductve hypothess U d 1 ) =1. Therefore +1 C S d 1 + = + 1, snce d ) =1 s just the number of ways of choosng up to d objects from m + 1, and the sum corresponds to decomposng these choces accordng to whether a partcular object has been chosen or not. Fnally, =1 < d Puttng t all together ) d ( ) ( ) m d ) d m ( )( ) m d < = m d m ) ( d 1 + d m ) d exp(d) d m) d 6
Learning Theory: Lecture Notes
Learnng Theory: Lecture Notes Lecturer: Kamalka Chaudhur Scrbe: Qush Wang October 27, 2012 1 The Agnostc PAC Model Recall that one of the constrants of the PAC model s that the data dstrbuton has to be
More informationMASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.265/15.070J Fall 2013 Lecture 12 10/21/2013. Martingale Concentration Inequalities and Applications
MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.65/15.070J Fall 013 Lecture 1 10/1/013 Martngale Concentraton Inequaltes and Applcatons Content. 1. Exponental concentraton for martngales wth bounded ncrements.
More informationTHE CHINESE REMAINDER THEOREM. We should thank the Chinese for their wonderful remainder theorem. Glenn Stevens
THE CHINESE REMAINDER THEOREM KEITH CONRAD We should thank the Chnese for ther wonderful remander theorem. Glenn Stevens 1. Introducton The Chnese remander theorem says we can unquely solve any par of
More information1 Definition of Rademacher Complexity
COS 511: Theoretcal Machne Learnng Lecturer: Rob Schapre Lecture #9 Scrbe: Josh Chen March 5, 2013 We ve spent the past few classes provng bounds on the generalzaton error of PAClearnng algorths for the
More informationEigenvalues of Random Graphs
Spectral Graph Theory Lecture 2 Egenvalues of Random Graphs Danel A. Spelman November 4, 202 2. Introducton In ths lecture, we consder a random graph on n vertces n whch each edge s chosen to be n the
More informationComputational and Statistical Learning theory Assignment 4
Coputatonal and Statstcal Learnng theory Assgnent 4 Due: March 2nd Eal solutons to : karthk at ttc dot edu Notatons/Defntons Recall the defnton of saple based Radeacher coplexty : [ ] R S F) := E ɛ {±}
More informationFor now, let us focus on a specific model of neurons. These are simplified from reality but can achieve remarkable results.
Neural Networks : Dervaton compled by Alvn Wan from Professor Jtendra Malk s lecture Ths type of computaton s called deep learnng and s the most popular method for many problems, such as computer vson
More informationprinceton univ. F 13 cos 521: Advanced Algorithm Design Lecture 3: Large deviations bounds and applications Lecturer: Sanjeev Arora
prnceton unv. F 13 cos 521: Advanced Algorthm Desgn Lecture 3: Large devatons bounds and applcatons Lecturer: Sanjeev Arora Scrbe: Today s topc s devaton bounds: what s the probablty that a random varable
More informationCOS 511: Theoretical Machine Learning
COS 5: Theoretcal Machne Learnng Lecturer: Rob Schapre Lecture #0 Scrbe: José Sões Ferrera March 06, 203 In the last lecture the concept of Radeacher coplexty was ntroduced, wth the goal of showng that
More informationFinding Dense Subgraphs in G(n, 1/2)
Fndng Dense Subgraphs n Gn, 1/ Atsh Das Sarma 1, Amt Deshpande, and Rav Kannan 1 Georga Insttute of Technology,atsh@cc.gatech.edu Mcrosoft Research-Bangalore,amtdesh,annan@mcrosoft.com Abstract. Fndng
More information1 The Mistake Bound Model
5-850: Advanced Algorthms CMU, Sprng 07 Lecture #: Onlne Learnng and Multplcatve Weghts February 7, 07 Lecturer: Anupam Gupta Scrbe: Bryan Lee,Albert Gu, Eugene Cho he Mstake Bound Model Suppose there
More informationLecture Notes on Linear Regression
Lecture Notes on Lnear Regresson Feng L fl@sdueducn Shandong Unversty, Chna Lnear Regresson Problem In regresson problem, we am at predct a contnuous target value gven an nput feature vector We assume
More informationLimited Dependent Variables
Lmted Dependent Varables. What f the left-hand sde varable s not a contnuous thng spread from mnus nfnty to plus nfnty? That s, gven a model = f (, β, ε, where a. s bounded below at zero, such as wages
More informationStanford University CS359G: Graph Partitioning and Expanders Handout 4 Luca Trevisan January 13, 2011
Stanford Unversty CS359G: Graph Parttonng and Expanders Handout 4 Luca Trevsan January 3, 0 Lecture 4 In whch we prove the dffcult drecton of Cheeger s nequalty. As n the past lectures, consder an undrected
More informationLecture 2: Prelude to the big shrink
Lecture 2: Prelude to the bg shrnk Last tme A slght detour wth vsualzaton tools (hey, t was the frst day... why not start out wth somethng pretty to look at?) Then, we consdered a smple 120a-style regresson
More informationCase A. P k = Ni ( 2L i k 1 ) + (# big cells) 10d 2 P k.
THE CELLULAR METHOD In ths lecture, we ntroduce the cellular method as an approach to ncdence geometry theorems lke the Szemeréd-Trotter theorem. The method was ntroduced n the paper Combnatoral complexty
More information= z 20 z n. (k 20) + 4 z k = 4
Problem Set #7 solutons 7.2.. (a Fnd the coeffcent of z k n (z + z 5 + z 6 + z 7 + 5, k 20. We use the known seres expanson ( n+l ( z l l z n below: (z + z 5 + z 6 + z 7 + 5 (z 5 ( + z + z 2 + z + 5 5
More informationLecture Space-Bounded Derandomization
Notes on Complexty Theory Last updated: October, 2008 Jonathan Katz Lecture Space-Bounded Derandomzaton 1 Space-Bounded Derandomzaton We now dscuss derandomzaton of space-bounded algorthms. Here non-trval
More informationGeneralized Linear Methods
Generalzed Lnear Methods 1 Introducton In the Ensemble Methods the general dea s that usng a combnaton of several weak learner one could make a better learner. More formally, assume that we have a set
More informationU.C. Berkeley CS294: Spectral Methods and Expanders Handout 8 Luca Trevisan February 17, 2016
U.C. Berkeley CS94: Spectral Methods and Expanders Handout 8 Luca Trevsan February 7, 06 Lecture 8: Spectral Algorthms Wrap-up In whch we talk about even more generalzatons of Cheeger s nequaltes, and
More information10-701/ Machine Learning, Fall 2005 Homework 3
10-701/15-781 Machne Learnng, Fall 2005 Homework 3 Out: 10/20/05 Due: begnnng of the class 11/01/05 Instructons Contact questons-10701@autonlaborg for queston Problem 1 Regresson and Cross-valdaton [40
More informationErrors for Linear Systems
Errors for Lnear Systems When we solve a lnear system Ax b we often do not know A and b exactly, but have only approxmatons  and ˆb avalable. Then the best thng we can do s to solve ˆx ˆb exactly whch
More information3.1 Expectation of Functions of Several Random Variables. )' be a k-dimensional discrete or continuous random vector, with joint PMF p (, E X E X1 E X
Statstcs 1: Probablty Theory II 37 3 EPECTATION OF SEVERAL RANDOM VARIABLES As n Probablty Theory I, the nterest n most stuatons les not on the actual dstrbuton of a random vector, but rather on a number
More informationKernel Methods and SVMs Extension
Kernel Methods and SVMs Extenson The purpose of ths document s to revew materal covered n Machne Learnng 1 Supervsed Learnng regardng support vector machnes (SVMs). Ths document also provdes a general
More informationLecture 4: Universal Hash Functions/Streaming Cont d
CSE 5: Desgn and Analyss of Algorthms I Sprng 06 Lecture 4: Unversal Hash Functons/Streamng Cont d Lecturer: Shayan Oves Gharan Aprl 6th Scrbe: Jacob Schreber Dsclamer: These notes have not been subjected
More informationEdge Isoperimetric Inequalities
November 7, 2005 Ross M. Rchardson Edge Isopermetrc Inequaltes 1 Four Questons Recall that n the last lecture we looked at the problem of sopermetrc nequaltes n the hypercube, Q n. Our noton of boundary
More informationSupplementary material: Margin based PU Learning. Matrix Concentration Inequalities
Supplementary materal: Margn based PU Learnng We gve the complete proofs of Theorem and n Secton We frst ntroduce the well-known concentraton nequalty, so the covarance estmator can be bounded Then we
More informationWeek 5: Neural Networks
Week 5: Neural Networks Instructor: Sergey Levne Neural Networks Summary In the prevous lecture, we saw how we can construct neural networks by extendng logstc regresson. Neural networks consst of multple
More informationMore metrics on cartesian products
More metrcs on cartesan products If (X, d ) are metrc spaces for 1 n, then n Secton II4 of the lecture notes we defned three metrcs on X whose underlyng topologes are the product topology The purpose of
More informationExcess Error, Approximation Error, and Estimation Error
E0 370 Statstcal Learnng Theory Lecture 10 Sep 15, 011 Excess Error, Approxaton Error, and Estaton Error Lecturer: Shvan Agarwal Scrbe: Shvan Agarwal 1 Introducton So far, we have consdered the fnte saple
More informationCOS 511: Theoretical Machine Learning. Lecturer: Rob Schapire Lecture #16 Scribe: Yannan Wang April 3, 2014
COS 511: Theoretcal Machne Learnng Lecturer: Rob Schapre Lecture #16 Scrbe: Yannan Wang Aprl 3, 014 1 Introducton The goal of our onlne learnng scenaro from last class s C comparng wth best expert and
More informationLecture 4. Instructor: Haipeng Luo
Lecture 4 Instructor: Hapeng Luo In the followng lectures, we focus on the expert problem and study more adaptve algorthms. Although Hedge s proven to be worst-case optmal, one may wonder how well t would
More informationProblem Set 9 Solutions
Desgn and Analyss of Algorthms May 4, 2015 Massachusetts Insttute of Technology 6.046J/18.410J Profs. Erk Demane, Srn Devadas, and Nancy Lynch Problem Set 9 Solutons Problem Set 9 Solutons Ths problem
More informationLectures - Week 4 Matrix norms, Conditioning, Vector Spaces, Linear Independence, Spanning sets and Basis, Null space and Range of a Matrix
Lectures - Week 4 Matrx norms, Condtonng, Vector Spaces, Lnear Independence, Spannng sets and Bass, Null space and Range of a Matrx Matrx Norms Now we turn to assocatng a number to each matrx. We could
More informationNote on EM-training of IBM-model 1
Note on EM-tranng of IBM-model INF58 Language Technologcal Applcatons, Fall The sldes on ths subject (nf58 6.pdf) ncludng the example seem nsuffcent to gve a good grasp of what s gong on. Hence here are
More informationGoodness of fit and Wilks theorem
DRAFT 0.0 Glen Cowan 3 June, 2013 Goodness of ft and Wlks theorem Suppose we model data y wth a lkelhood L(µ) that depends on a set of N parameters µ = (µ 1,...,µ N ). Defne the statstc t µ ln L(µ) L(ˆµ),
More informationLecture 10: May 6, 2013
TTIC/CMSC 31150 Mathematcal Toolkt Sprng 013 Madhur Tulsan Lecture 10: May 6, 013 Scrbe: Wenje Luo In today s lecture, we manly talked about random walk on graphs and ntroduce the concept of graph expander,
More informationHomework Assignment 3 Due in class, Thursday October 15
Homework Assgnment 3 Due n class, Thursday October 15 SDS 383C Statstcal Modelng I 1 Rdge regresson and Lasso 1. Get the Prostrate cancer data from http://statweb.stanford.edu/~tbs/elemstatlearn/ datasets/prostate.data.
More informationRandomness and Computation
Randomness and Computaton or, Randomzed Algorthms Mary Cryan School of Informatcs Unversty of Ednburgh RC 208/9) Lecture 0 slde Balls n Bns m balls, n bns, and balls thrown unformly at random nto bns usually
More informationDifference Equations
Dfference Equatons c Jan Vrbk 1 Bascs Suppose a sequence of numbers, say a 0,a 1,a,a 3,... s defned by a certan general relatonshp between, say, three consecutve values of the sequence, e.g. a + +3a +1
More informationDimensionality Reduction Notes 1
Dmensonalty Reducton Notes 1 Jelan Nelson mnlek@seas.harvard.edu August 10, 2015 1 Prelmnares Here we collect some notaton and basc lemmas used throughout ths note. Throughout, for a random varable X,
More informationCS 229, Public Course Problem Set #3 Solutions: Learning Theory and Unsupervised Learning
CS9 Problem Set #3 Solutons CS 9, Publc Course Problem Set #3 Solutons: Learnng Theory and Unsupervsed Learnng. Unform convergence and Model Selecton In ths problem, we wll prove a bound on the error of
More informationEcon107 Applied Econometrics Topic 3: Classical Model (Studenmund, Chapter 4)
I. Classcal Assumptons Econ7 Appled Econometrcs Topc 3: Classcal Model (Studenmund, Chapter 4) We have defned OLS and studed some algebrac propertes of OLS. In ths topc we wll study statstcal propertes
More informationCSCE 790S Background Results
CSCE 790S Background Results Stephen A. Fenner September 8, 011 Abstract These results are background to the course CSCE 790S/CSCE 790B, Quantum Computaton and Informaton (Sprng 007 and Fall 011). Each
More informationThe Order Relation and Trace Inequalities for. Hermitian Operators
Internatonal Mathematcal Forum, Vol 3, 08, no, 507-57 HIKARI Ltd, wwwm-hkarcom https://doorg/0988/mf088055 The Order Relaton and Trace Inequaltes for Hermtan Operators Y Huang School of Informaton Scence
More information2E Pattern Recognition Solutions to Introduction to Pattern Recognition, Chapter 2: Bayesian pattern classification
E395 - Pattern Recognton Solutons to Introducton to Pattern Recognton, Chapter : Bayesan pattern classfcaton Preface Ths document s a soluton manual for selected exercses from Introducton to Pattern Recognton
More information11 Tail Inequalities Markov s Inequality. Lecture 11: Tail Inequalities [Fa 13]
Algorthms Lecture 11: Tal Inequaltes [Fa 13] If you hold a cat by the tal you learn thngs you cannot learn any other way. Mark Twan 11 Tal Inequaltes The smple recursve structure of skp lsts made t relatvely
More informationand problem sheet 2
-8 and 5-5 problem sheet Solutons to the followng seven exercses and optonal bonus problem are to be submtted through gradescope by :0PM on Wednesday th September 08. There are also some practce problems,
More information18.1 Introduction and Recap
CS787: Advanced Algorthms Scrbe: Pryananda Shenoy and Shjn Kong Lecturer: Shuch Chawla Topc: Streamng Algorthmscontnued) Date: 0/26/2007 We contnue talng about streamng algorthms n ths lecture, ncludng
More informationP exp(tx) = 1 + t 2k M 2k. k N
1. Subgaussan tals Defnton. Say that a random varable X has a subgaussan dstrbuton wth scale factor σ< f P exp(tx) exp(σ 2 t 2 /2) for all real t. For example, f X s dstrbuted N(,σ 2 ) then t s subgaussan.
More informationE Tail Inequalities. E.1 Markov s Inequality. Non-Lecture E: Tail Inequalities
Algorthms Non-Lecture E: Tal Inequaltes If you hold a cat by the tal you learn thngs you cannot learn any other way. Mar Twan E Tal Inequaltes The smple recursve structure of sp lsts made t relatvely easy
More informationEnsemble Methods: Boosting
Ensemble Methods: Boostng Ncholas Ruozz Unversty of Texas at Dallas Based on the sldes of Vbhav Gogate and Rob Schapre Last Tme Varance reducton va baggng Generate new tranng data sets by samplng wth replacement
More informationFeature Selection: Part 1
CSE 546: Machne Learnng Lecture 5 Feature Selecton: Part 1 Instructor: Sham Kakade 1 Regresson n the hgh dmensonal settng How do we learn when the number of features d s greater than the sample sze n?
More informationLinear Regression Analysis: Terminology and Notation
ECON 35* -- Secton : Basc Concepts of Regresson Analyss (Page ) Lnear Regresson Analyss: Termnology and Notaton Consder the generc verson of the smple (two-varable) lnear regresson model. It s represented
More informationU.C. Berkeley CS294: Beyond Worst-Case Analysis Luca Trevisan September 5, 2017
U.C. Berkeley CS94: Beyond Worst-Case Analyss Handout 4s Luca Trevsan September 5, 07 Summary of Lecture 4 In whch we ntroduce semdefnte programmng and apply t to Max Cut. Semdefnte Programmng Recall that
More informationLecture 4 Hypothesis Testing
Lecture 4 Hypothess Testng We may wsh to test pror hypotheses about the coeffcents we estmate. We can use the estmates to test whether the data rejects our hypothess. An example mght be that we wsh to
More informationCOS 511: Theoretical Machine Learning. Lecturer: Rob Schapire Lecture # 15 Scribe: Jieming Mao April 1, 2013
COS 511: heoretcal Machne Learnng Lecturer: Rob Schapre Lecture # 15 Scrbe: Jemng Mao Aprl 1, 013 1 Bref revew 1.1 Learnng wth expert advce Last tme, we started to talk about learnng wth expert advce.
More informationExpected Value and Variance
MATH 38 Expected Value and Varance Dr. Neal, WKU We now shall dscuss how to fnd the average and standard devaton of a random varable X. Expected Value Defnton. The expected value (or average value, or
More information20. Mon, Oct. 13 What we have done so far corresponds roughly to Chapters 2 & 3 of Lee. Now we turn to Chapter 4. The first idea is connectedness.
20. Mon, Oct. 13 What we have done so far corresponds roughly to Chapters 2 & 3 of Lee. Now we turn to Chapter 4. The frst dea s connectedness. Essentally, we want to say that a space cannot be decomposed
More informationConvergence of random processes
DS-GA 12 Lecture notes 6 Fall 216 Convergence of random processes 1 Introducton In these notes we study convergence of dscrete random processes. Ths allows to characterze phenomena such as the law of large
More informationFirst day August 1, Problems and Solutions
FOURTH INTERNATIONAL COMPETITION FOR UNIVERSITY STUDENTS IN MATHEMATICS July 30 August 4, 997, Plovdv, BULGARIA Frst day August, 997 Problems and Solutons Problem. Let {ε n } n= be a sequence of postve
More informationNotes on Frequency Estimation in Data Streams
Notes on Frequency Estmaton n Data Streams In (one of) the data streamng model(s), the data s a sequence of arrvals a 1, a 2,..., a m of the form a j = (, v) where s the dentty of the tem and belongs to
More informationThe Multiple Classical Linear Regression Model (CLRM): Specification and Assumptions. 1. Introduction
ECONOMICS 5* -- NOTE (Summary) ECON 5* -- NOTE The Multple Classcal Lnear Regresson Model (CLRM): Specfcaton and Assumptons. Introducton CLRM stands for the Classcal Lnear Regresson Model. The CLRM s also
More information), it produces a response (output function g (x)
Lnear Systems Revew Notes adapted from notes by Mchael Braun Typcally n electrcal engneerng, one s concerned wth functons of tme, such as a voltage waveform System descrpton s therefore defned n the domans
More informationOn complexity and randomness of Markov-chain prediction
On complexty and randomness of Markov-chan predcton Joel Ratsaby Department of Electrcal and Electroncs Engneerng Arel Unversty Arel, ISRAEL Emal: ratsaby@arelacl Abstract Let {X t : t Z} be a sequence
More informationSplit alignment. Martin C. Frith April 13, 2012
Splt algnment Martn C. Frth Aprl 13, 2012 1 Introducton Ths document s about algnng a query sequence to a genome, allowng dfferent parts of the query to match dfferent parts of the genome. Here are some
More informationAffine transformations and convexity
Affne transformatons and convexty The purpose of ths document s to prove some basc propertes of affne transformatons nvolvng convex sets. Here are a few onlne references for background nformaton: http://math.ucr.edu/
More informationComplete subgraphs in multipartite graphs
Complete subgraphs n multpartte graphs FLORIAN PFENDER Unverstät Rostock, Insttut für Mathematk D-18057 Rostock, Germany Floran.Pfender@un-rostock.de Abstract Turán s Theorem states that every graph G
More informationCanonical transformations
Canoncal transformatons November 23, 2014 Recall that we have defned a symplectc transformaton to be any lnear transformaton M A B leavng the symplectc form nvarant, Ω AB M A CM B DΩ CD Coordnate transformatons,
More information1. Inference on Regression Parameters a. Finding Mean, s.d and covariance amongst estimates. 2. Confidence Intervals and Working Hotelling Bands
Content. Inference on Regresson Parameters a. Fndng Mean, s.d and covarance amongst estmates.. Confdence Intervals and Workng Hotellng Bands 3. Cochran s Theorem 4. General Lnear Testng 5. Measures of
More informationMin Cut, Fast Cut, Polynomial Identities
Randomzed Algorthms, Summer 016 Mn Cut, Fast Cut, Polynomal Identtes Instructor: Thomas Kesselhem and Kurt Mehlhorn 1 Mn Cuts n Graphs Lecture (5 pages) Throughout ths secton, G = (V, E) s a mult-graph.
More informationDepartment of Computer Science Artificial Intelligence Research Laboratory. Iowa State University MACHINE LEARNING
MACHINE LEANING Vasant Honavar Bonformatcs and Computatonal Bology rogram Center for Computatonal Intellgence, Learnng, & Dscovery Iowa State Unversty honavar@cs.astate.edu www.cs.astate.edu/~honavar/
More informationSome basic inequalities. Definition. Let V be a vector space over the complex numbers. An inner product is given by a function, V V C
Some basc nequaltes Defnton. Let V be a vector space over the complex numbers. An nner product s gven by a functon, V V C (x, y) x, y satsfyng the followng propertes (for all x V, y V and c C) (1) x +
More informationANSWERS. Problem 1. and the moment generating function (mgf) by. defined for any real t. Use this to show that E( U) var( U)
Econ 413 Exam 13 H ANSWERS Settet er nndelt 9 deloppgaver, A,B,C, som alle anbefales å telle lkt for å gøre det ltt lettere å stå. Svar er gtt . Unfortunately, there s a prntng error n the hnt of
More informationWeek 2. This week, we covered operations on sets and cardinality.
Week 2 Ths week, we covered operatons on sets and cardnalty. Defnton 0.1 (Correspondence). A correspondence between two sets A and B s a set S contaned n A B = {(a, b) a A, b B}. A correspondence from
More informationa b a In case b 0, a being divisible by b is the same as to say that
Secton 6.2 Dvsblty among the ntegers An nteger a ε s dvsble by b ε f there s an nteger c ε such that a = bc. Note that s dvsble by any nteger b, snce = b. On the other hand, a s dvsble by only f a = :
More informationFoundations of Arithmetic
Foundatons of Arthmetc Notaton We shall denote the sum and product of numbers n the usual notaton as a 2 + a 2 + a 3 + + a = a, a 1 a 2 a 3 a = a The notaton a b means a dvdes b,.e. ac = b where c s an
More informationIntroduction to Algorithms
Introducton to Algorthms 6.046J/8.40J Lecture 7 Prof. Potr Indyk Data Structures Role of data structures: Encapsulate data Support certan operatons (e.g., INSERT, DELETE, SEARCH) Our focus: effcency of
More informationp 1 c 2 + p 2 c 2 + p 3 c p m c 2
Where to put a faclty? Gven locatons p 1,..., p m n R n of m houses, want to choose a locaton c n R n for the fre staton. Want c to be as close as possble to all the house. We know how to measure dstance
More informatione - c o m p a n i o n
OPERATIONS RESEARCH http://dxdoorg/0287/opre007ec e - c o m p a n o n ONLY AVAILABLE IN ELECTRONIC FORM 202 INFORMS Electronc Companon Generalzed Quantty Competton for Multple Products and Loss of Effcency
More informationA be a probability space. A random vector
Statstcs 1: Probablty Theory II 8 1 JOINT AND MARGINAL DISTRIBUTIONS In Probablty Theory I we formulate the concept of a (real) random varable and descrbe the probablstc behavor of ths random varable by
More informationExercise Solutions to Real Analysis
xercse Solutons to Real Analyss Note: References refer to H. L. Royden, Real Analyss xersze 1. Gven any set A any ɛ > 0, there s an open set O such that A O m O m A + ɛ. Soluton 1. If m A =, then there
More informationCSE 252C: Computer Vision III
CSE 252C: Computer Vson III Lecturer: Serge Belonge Scrbe: Catherne Wah LECTURE 15 Kernel Machnes 15.1. Kernels We wll study two methods based on a specal knd of functon k(x, y) called a kernel: Kernel
More informationCS286r Assign One. Answer Key
CS286r Assgn One Answer Key 1 Game theory 1.1 1.1.1 Let off-equlbrum strateges also be that people contnue to play n Nash equlbrum. Devatng from any Nash equlbrum s a weakly domnated strategy. That s,
More informationAPPENDIX A Some Linear Algebra
APPENDIX A Some Lnear Algebra The collecton of m, n matrces A.1 Matrces a 1,1,..., a 1,n A = a m,1,..., a m,n wth real elements a,j s denoted by R m,n. If n = 1 then A s called a column vector. Smlarly,
More informationAdditional Codes using Finite Difference Method. 1 HJB Equation for Consumption-Saving Problem Without Uncertainty
Addtonal Codes usng Fnte Dfference Method Benamn Moll 1 HJB Equaton for Consumpton-Savng Problem Wthout Uncertanty Before consderng the case wth stochastc ncome n http://www.prnceton.edu/~moll/ HACTproect/HACT_Numercal_Appendx.pdf,
More informationMaximizing the number of nonnegative subsets
Maxmzng the number of nonnegatve subsets Noga Alon Hao Huang December 1, 213 Abstract Gven a set of n real numbers, f the sum of elements of every subset of sze larger than k s negatve, what s the maxmum
More informationSalmon: Lectures on partial differential equations. Consider the general linear, second-order PDE in the form. ,x 2
Salmon: Lectures on partal dfferental equatons 5. Classfcaton of second-order equatons There are general methods for classfyng hgher-order partal dfferental equatons. One s very general (applyng even to
More informationLinear Feature Engineering 11
Lnear Feature Engneerng 11 2 Least-Squares 2.1 Smple least-squares Consder the followng dataset. We have a bunch of nputs x and correspondng outputs y. The partcular values n ths dataset are x y 0.23 0.19
More information/ n ) are compared. The logic is: if the two
STAT C141, Sprng 2005 Lecture 13 Two sample tests One sample tests: examples of goodness of ft tests, where we are testng whether our data supports predctons. Two sample tests: called as tests of ndependence
More information} Often, when learning, we deal with uncertainty:
Uncertanty and Learnng } Often, when learnng, we deal wth uncertanty: } Incomplete data sets, wth mssng nformaton } Nosy data sets, wth unrelable nformaton } Stochastcty: causes and effects related non-determnstcally
More informationAn Experiment/Some Intuition (Fall 2006): Lecture 18 The EM Algorithm heads coin 1 tails coin 2 Overview Maximum Likelihood Estimation
An Experment/Some Intuton I have three cons n my pocket, 6.864 (Fall 2006): Lecture 18 The EM Algorthm Con 0 has probablty λ of heads; Con 1 has probablty p 1 of heads; Con 2 has probablty p 2 of heads
More informationLecture 4: September 12
36-755: Advanced Statstcal Theory Fall 016 Lecture 4: September 1 Lecturer: Alessandro Rnaldo Scrbe: Xao Hu Ta Note: LaTeX template courtesy of UC Berkeley EECS dept. Dsclamer: These notes have not been
More information2.3 Nilpotent endomorphisms
s a block dagonal matrx, wth A Mat dm U (C) In fact, we can assume that B = B 1 B k, wth B an ordered bass of U, and that A = [f U ] B, where f U : U U s the restrcton of f to U 40 23 Nlpotent endomorphsms
More informationChapter 11: Simple Linear Regression and Correlation
Chapter 11: Smple Lnear Regresson and Correlaton 11-1 Emprcal Models 11-2 Smple Lnear Regresson 11-3 Propertes of the Least Squares Estmators 11-4 Hypothess Test n Smple Lnear Regresson 11-4.1 Use of t-tests
More informationOnline Appendix. t=1 (p t w)q t. Then the first order condition shows that
Artcle forthcomng to ; manuscrpt no (Please, provde the manuscrpt number!) 1 Onlne Appendx Appendx E: Proofs Proof of Proposton 1 Frst we derve the equlbrum when the manufacturer does not vertcally ntegrate
More informationSpectral Graph Theory and its Applications September 16, Lecture 5
Spectral Graph Theory and ts Applcatons September 16, 2004 Lecturer: Danel A. Spelman Lecture 5 5.1 Introducton In ths lecture, we wll prove the followng theorem: Theorem 5.1.1. Let G be a planar graph
More informationAnalysis of Discrete Time Queues (Section 4.6)
Analyss of Dscrete Tme Queues (Secton 4.6) Copyrght 2002, Sanjay K. Bose Tme axs dvded nto slots slot slot boundares Arrvals can only occur at slot boundares Servce to a job can only start at a slot boundary
More informationprinceton univ. F 17 cos 521: Advanced Algorithm Design Lecture 7: LP Duality Lecturer: Matt Weinberg
prnceton unv. F 17 cos 521: Advanced Algorthm Desgn Lecture 7: LP Dualty Lecturer: Matt Wenberg Scrbe: LP Dualty s an extremely useful tool for analyzng structural propertes of lnear programs. Whle there
More informationGrover s Algorithm + Quantum Zeno Effect + Vaidman
Grover s Algorthm + Quantum Zeno Effect + Vadman CS 294-2 Bomb 10/12/04 Fall 2004 Lecture 11 Grover s algorthm Recall that Grover s algorthm for searchng over a space of sze wors as follows: consder the
More information