6.867 Machine learning, lecture 13 (Jaakkola)
|
|
- Neal Newton
- 5 years ago
- Views:
Transcription
1 Lecture topics: Boostig, argi, ad gradiet descet copleity of classifiers, geeralizatio Boostig Last tie we arrived at a boostig algorith for sequetially creatig a eseble of base classifiers. Our base classifiers were decisio stups that are siple liear classifiers relyig o a sigle copoet of the iput vector. The stup classifiers ca be writte as h(; θ) = sig( s( k θ 0 ) ) where θ = {s, k, θ 0 } ad s {, } specifies the label to predict o the positive side of k θ 0. Figure below shows a possible decisio boudary fro a stup whe the iput vectors are oly two diesioal Figure : A possible decisio boudary fro a traied decisio stup. The stup i the figure depeds oly o the vertical 2 -ais. The boostig algorith cobies the stups (as base learers) ito a eseble that, after rouds of boostig, takes the followig for h () = αˆj h(; θˆj ) () j= where αˆj 0 but they do ot ecessarily su to oe. We ca always oralize the eseble by j= αˆj after the fact. The siple Adaboost algorith ca be writte i the followig odular for:
2 2 (0) Set W 0 (t) = / for t =,...,. () At stage, fid a base learer h(; θˆ) that approiately iiizes W (t)y t h( t ; θ ) = 2ɛ (2) where ɛ is the weighted classificatio error (zero-oe loss) o the traiig eaples, weighted by the oralized weights W (t). (2) Give θˆ, set ( ) ˆɛ αˆ = 0.5 log (3) ɛˆ where ˆɛ is the weighted error correspodig to θˆ chose i step (). For biary {, } base learers, ˆα eactly iiizes the weighted traiig loss (loss of the eseble): ( ) J(α, θˆ) = W (t) ep y t α h( t ; θˆ) (4) I cases where the base learers are ot biary (e.g., retur values i the iterval [, ]), we would have to iiize Eq.(4) directly. (3) Update the weights o the traiig eaples based o the ew base learer: ( ) W (t) = c W (t) ep y t αˆh( t ; θˆ) (5) where c is the oralizatio costat to esure that W (t) su to oe after the update. The ew weights ca be agai iterpreted as oralized losses for the ew eseble h ( t ) = h () + ˆα h(; θˆ). Let s try to uderstad the boostig algorith fro several differet perspectives. First of all, there are several differet types of errors (errors here refer to zero-oe classificatio losses, ot the surrogate epoetial losses). We ca talk about the weighted error of base learer, itroduced at the th boostig iteratio, relative to the weights W (t) o the traiig eaples. This is the weighted traiig error ˆɛ i the algorith. We ca also easure the weighted error of the sae base classifier relative to the updated weights,
3 3 i.e., relative to W (t). I other words, we ca easure how well the curret base learer would do at the et iteratio. Fially, i ters of the eseble, we have the uweighted traiig error ad the correspodig geeralizatio (test) error, as a fuctio of boostig iteratios. We will discuss each of these i tur. Weighted error. The weighted error achieved by a ew base learer h(; θˆ) relative to W (t) teds to icrease with, i.e., with each boostig iteratio (though ot ootoically). Figure 2 below shows this weighted error ˆɛ as a fuctio of boostig iteratios. The reaso for this is that sice the weights cocetrate o eaples that are difficult to classify correctly, subsequet base learers face harder classificatio tasks weighted traiig error uber of iteratios Figure 2: Weighted error ˆɛ as a fuctio of. Weighted error relative to updated weights. We clai that the weighted error of the base learer h(; θˆ) relative to the updated weights W (t) is eactly 0.5. This eas that the base learer itroduced at the th boostig iteratio will be useless (at chace level) for the et boostig iteratio. So the boostig algorith would ever itroduce the sae base learer twice i a row. The sae learer ight, however, reappear later o (relative to a differet set of weights). Oe reaso for this is that we do t go back ad update αˆj s for base learers already itroduced ito the eseble. So the oly way to chage the previously set coefficiets is to reitroduce the base learers. Let s ow see that the clai is ideed true. We ca equivaletly show that the weighted agreeet relative to W (t) is eactly zero: W (t)y t h( t ; θˆ) = 0 (6)
4 4 Cosider the optiizatio proble for α after we have already foud θˆ: ( ) J(α, θˆ) = W (t) ep y t α h( t ; θˆ) (7) The derivative of J(α, θˆ) with respect to α has to be zero at the optial value ˆα so that d J(α, θˆ) dα α = ( ) W (t) ep y =αˆ t αˆh( t ; θˆ) y t h( t ; θˆ) (8) = c W (t)y t h( t ; θˆ) = 0 (9) where we have used Eq.(5) to ove fro W (t) to W (t). So the result is a optiality coditio for α. Eseble traiig error. The traiig error of the eseble does ot ecessarily decrease ootoically with each boostig iteratio. The epoetial loss of the eseble does, however, decrease ootoically. This should be evidet sice it is eactly the loss we are sequetially iiizig by addig the base learers. We ca also quatify, based o the weighted error achieved by each base learer, how uch the epoetial loss decreases after each iteratio. We will eed this to relate the traiig loss to the classificatio error. I fact, the aout that the traiig loss decreases after iteratio is eactly c, the oralizatio costat for the updated weights (we have to oralize the weights precisely because the epoetial loss over the traiig eaples decreases). Note also that c is eactly J(ˆα, θˆ). Now, ( ) J(ˆα, θˆ) = W (t) ep y t αˆh( t ; θˆ) (0) = W (t) ep ( αˆ) + W (t) ep ( ˆα ) () t: y t =h( t;θˆ) t: y t h( t;θˆ) = ( ɛˆ) ep ( αˆ) + ˆɛ ep ( ˆα ) (2) ˆɛ ɛˆ = ( ɛˆ) + ˆɛ (3) ɛˆ ɛˆ = 2 ˆɛ ( ɛˆ) (4) Note that this is always less tha oe for ay ˆɛ < /2. The traiig loss of the eseble after boostig iteratios is eactly the product of these ters (reoralizatios). I
5 5 other words, ep ( y t h ( t ) ) = 2 ɛˆk( ɛˆk) (5) k= This ad the observatio that step(z) ep(z) (6) for all z, where the step fuctio step(z) = if z > 0 ad zero otherwise, suffices for our purposes. A siple upper boud o the traiig error of the eseble, err (h ), follows fro err (h ) = step ( y t h ( t ) ) (7) ep ( y t h ( t ) ) (8) = 2 ˆɛ k ( ˆɛ k ) (9) k= Thus the epoetial loss over the traiig eaples is a upper boud o the traiig error ad this upper boud goes dow ootoically with provided that the base learers are learig soethig at each iteratio (their weighted errors less tha half). Figure 3 shows the traiig error as well as the upper boud as a fuctio of the boostig iteratios. Eseble test error. We have so far discussed oly traiig errors. The goal, of course, is to geeralize well. What ca we say about the geeralizatio error of eseble geerated by the boostig algorith? We have repeatedly tied the geeralizatio error to soe otio of argi. The sae is true here. Cosider figure 5 below. It shows a typical plot of the eseble traiig error ad the correspodig geeralizatio error as a fuctio of boostig iteratios. Two thigs are otable i the plot. First, the geeralizatio error sees to decrease (slightly) eve after the eseble has reached zero traiig error. Why should this be? The secod surprisig thig sees to be the fact that the geeralizatio error does ot icrease eve after a large uber of boostig iteratios. I other words, the boostig algorith appears to be soewhat resistat to overfittig. Let s try to eplai these two (related) observatios. The votes {αˆj } geerated by the boostig algorith wo t su to oe. We will therefore reoralize eseble h () = αˆh(; θˆ) +... αˆh(; θˆ) αˆ ˆα (20)
6 traiig error uber of iteratios Figure 3: The traiig error of the eseble as well as the correspodig epoetial loss (upper boud) as a fuctio of the boostig iteratios. so that h () [, ]. As a result, we ca defie a votig argi for traiig eaples as argi(t) = y t h ( t ). The argi is positive if the eaple is classified correctly by the eseble. It represets the degree to which the base classifiers agree with the correct classificatio decisio (egative value idicates disagreeet). Note that argi(t) [, ]. It is a very differet type of argi (votig argi) tha the geoetric argi we have discussed i the cotet liear classifiers. Now, i additio to the traiig error err (h ) we ca defie a argi error err (h ; ρ) that is the fractio of eaple argis that are at or below the threshold ρ. Clearly, err (h ) = err (h ; 0). We ow clai that the boostig algorith, eve after err (h ; 0) = 0 will decrease err (h ; ρ) for larger values of ρ > 0. Figure 4a-b provide a epirical illustratio that this is ideed happeig. This is perhaps easy to uderstad as a cosequece of the fact that epoetial loss, ep( argi(t)), decreases as a fuctio of the argi, eve after the argi is positive. The secod issue to eplai is the apparet resistace to overfittig. Oe reaso is that the copleity of the eseble does ot icrease very quickly as a fuctio of the uber of base learers. We will ake this stateet ore precise later o. Moreover, the boostig iteratios odify the eseble i sesible ways (icreasig the argi) eve after the traiig error is zero. We ca also relate the argi, or the argi error err (h ; ρ) directly to geeralizatio error. Aother reaso for resistace to overfittig is that the sequetial procedure for optiizig the epoetial loss is ot very effective. We would overfit uch ore quickly if we reoptiized {α j } s joitly rather tha through the sequetial procedure (see the discussio of boostig as gradiet descet below). Boostig as gradiet descet. We ca also view the boostig algorith as a siple
7 a) c) Figure 4: The argi errors err (h ; ρ) as a fuctio of ρ whe a) = 0 b) = 50. gradiet descet procedure (with lie search) i the space of discriiat fuctios. To uderstad this we ca view each base learer h(; θ) as a vector based o evaluatig it o each of the traiig eaples: h( ; θ) h(θ) = (2) h( ; θ) The eseble vector h, obtaied by evaluatig h () at each of the traiig eaples, is a positive cobiatio of the base learer vectors: h = αˆ h(θˆ) (22) j= The epoetial loss objective we are tryig to iiize is ow a fuctio of the eseble vector h ad the traiig labels. Suppose we have h. To iiize the objective, we ca select a useful directio, h(θˆ), alog which the objective sees to decrease. This is eactly how we derived the base learers. We ca the fid the iiu of the objective by ovig i this directio, i.e., evaluatig vectors of the for h + α h(θˆ). This is a lie search operatio. The iiu is attaied at ˆα, we obtai h = h + ˆα h(θˆ), ad the procedure ca be repeated. Viewig the boostig algorith as a siple gradiet descet procedure also helps us uderstad why it ca overfit if we cotiue with the boostig iteratios.
8 traiig/test error uber of iteratios Figure 5: The traiig error of the eseble as well as the correspodig geeralizatio error as a fuctio of boostig iteratios. Copleity ad geeralizatio We have approached classificatio probles usig liear classifiers, probabilistic classifiers, as well as eseble ethods. Our goal is to uderstad what type of perforace guaratees we ca give for such ethods based o fiite traiig data. This is a core theoretical questio i achie learig. For this purpose we will eed to uderstad i detail what copleity eas i ters of classifiers. A sigle classifier is ever cople or siple; copleity is a property of the set of classifiers or the odel. Each odel selectio criterio we have ecoutered provided a slightly differet defiitio of odel copleity. Our focus here is o perforace guaratees that will evetually relate the argi we ca attai to the geeralizatio error, especially for liear classifiers (geoetric argi) ad esebles (votig argi). Let s start by otivatig the copleity easure we eed for this purpose with a eaple. Cosider a siple decisio stup classifier restricted to coordiate of 2 diesioal iput vectors = [, 2 ] T. I other words, we cosider stups of the for h(; θ) = sig ( s( θ 0 ) ) (23) where s {, } ad θ 0 R ad call this set of classifiers F. Eaple decisio boudaries are displayed i Figure 6. Suppose we are gettig the data poits i a sequece ad we are iterested i seeig whe our predictios for future poits becoe costraied by the labeled poits we have already
9 a) b) c) ? h? - h 2 d) e) f) Figure 6: Possible decisio boudaries correspodig to decisio stups that rely oly o coordiate. The arrow (oral) to the boudary specifies the positive side. see. Such costraits pertai to both the data ad the set of classifiers F. See Figure 6e. Havig see the labels for the first two poits, ad +, all classifiers h F that are cosistet with these two labeled poits have to predict + for the et poit. Sice the labels we have see force us to classify the ew poit i oly oe way, we ca clai to have leared soethig. We ca also uderstad this as a liitatio of (the copleity of) our set of classifiers. Figure 6f illustrates a alterative sceario where we ca fid two classifiers h F ad h 2 F, both cosistet with the first two labels i the figure, but ake differet predictios for the ew poit. We could therefore classify the ew poit either way. Recall that this freedo is ot available for all label assigets to the first two poits. So, the stups i F ca classify ay two poits (i geeral positio) i all possible ways (Figures 6a-d) but are already partially costraied i how they assig labels to three poits (Figure 6e). I ore techical ters we say that F shatters (ca geerate all possible labels over) two poits but ot three. Siilarly, for liear classifiers i 2 diesios, all the eight possible labeligs of three poits ca be obtaied with liear classifiers (Figure 7a). Thus liear classifiers i two diesios shatter three poits. However, there are labels over four poits that o liear classifier ca produce (Figure 7b). VC-diesio. As we icrease the uber of data poits, the set of classifiers we are cosiderig ay o loger be able to label the poits i all possible ways. Such eergig costraits are critical to be able to predict labels for ew poits. This otivates a key
10 a) b) - + Figure 7: Liear classifiers o the plae ca shatter three poits a) but ot four b). easure of copleity of the set of classifiers, the Vapik-Chervoekis diesio. The VC-diesio is defied as the aiu uber of poits that a classifier ca shatter. So, the VC-diesio of F is two, deoted as d V C (F ), ad the VC-diesio of liear classifiers o the plae is three. Note that the defiitio ivolves a aiu over the possible poits. I other words, we ay fid less tha d V C poits that the set of classifiers caot shatter (e.g., liear classifiers with poits eactly o a lie i 2 d) but there caot be ay set of ore tha d V C poits that the classifier ca shatter. The VC-diesio of the set of liear classifiers i d diesios is d +, i.e., the uber of paraeters. This is ot a useful result for uderstadig how kerel ethods work. For eaple, the VC-diesio of liear classifiers usig the radial basis kerel is. We ca icorporate the otio of argi i VC-diesio, however. This is kow as the V γ diesio. The V γ diesio of a set of liear classifiers that attai geoetric argi γ whe eaples lie withi a eclosig sphere of radius R is bouded by R 2 /γ 2. I other words there are ot that ay poits we ca label i all possible ways if ay valid labelig has to be with argi γ. This result is idepedet of the diesio of the classifier, ad is eactly the istake boud for the perceptro algorith!
10-701/ Machine Learning Mid-term Exam Solution
0-70/5-78 Machie Learig Mid-term Exam Solutio Your Name: Your Adrew ID: True or False (Give oe setece explaatio) (20%). (F) For a cotiuous radom variable x ad its probability distributio fuctio p(x), it
More information6.867 Machine learning
6.867 Machie learig Mid-term exam October, ( poits) Your ame ad MIT ID: Problem We are iterested here i a particular -dimesioal liear regressio problem. The dataset correspodig to this problem has examples
More informationStatistics and Data Analysis in MATLAB Kendrick Kay, February 28, Lecture 4: Model fitting
Statistics ad Data Aalysis i MATLAB Kedrick Kay, kedrick.kay@wustl.edu February 28, 2014 Lecture 4: Model fittig 1. The basics - Suppose that we have a set of data ad suppose that we have selected the
More information1 The Primal and Dual of an Optimization Problem
CS 189 Itroductio to Machie Learig Fall 2017 Note 18 Previously, i our ivestigatio of SVMs, we forulated a costraied optiizatio proble that we ca solve to fid the optial paraeters for our hyperplae decisio
More informationSupport vector machine revisited
6.867 Machie learig, lecture 8 (Jaakkola) 1 Lecture topics: Support vector machie ad kerels Kerel optimizatio, selectio Support vector machie revisited Our task here is to first tur the support vector
More informationLecture 10: Bounded Linear Operators and Orthogonality in Hilbert Spaces
Lecture : Bouded Liear Operators ad Orthogoality i Hilbert Spaces 34 Bouded Liear Operator Let ( X, ), ( Y, ) i i be ored liear vector spaces ad { } X Y The, T is said to be bouded if a real uber c such
More informationTopics Machine learning: lecture 2. Review: the learning problem. Hypotheses and estimation. Estimation criterion cont d. Estimation criterion
.87 Machie learig: lecture Tommi S. Jaakkola MIT CSAIL tommi@csail.mit.edu Topics The learig problem hypothesis class, estimatio algorithm loss ad estimatio criterio samplig, empirical ad epected losses
More information6.867 Machine learning, lecture 7 (Jaakkola) 1
6.867 Machie learig, lecture 7 (Jaakkola) 1 Lecture topics: Kerel form of liear regressio Kerels, examples, costructio, properties Liear regressio ad kerels Cosider a slightly simpler model where we omit
More informationMachine Learning Theory Tübingen University, WS 2016/2017 Lecture 12
Machie Learig Theory Tübige Uiversity, WS 06/07 Lecture Tolstikhi Ilya Abstract I this lecture we derive risk bouds for kerel methods. We will start by showig that Soft Margi kerel SVM correspods to miimizig
More informationLecture 19. Curve fitting I. 1 Introduction. 2 Fitting a constant to measured data
Lecture 9 Curve fittig I Itroductio Suppose we are preseted with eight poits of easured data (x i, y j ). As show i Fig. o the left, we could represet the uderlyig fuctio of which these data are saples
More informationNUMERICAL METHODS FOR SOLVING EQUATIONS
Mathematics Revisio Guides Numerical Methods for Solvig Equatios Page 1 of 11 M.K. HOME TUITION Mathematics Revisio Guides Level: GCSE Higher Tier NUMERICAL METHODS FOR SOLVING EQUATIONS Versio:. Date:
More informationSequences A sequence of numbers is a function whose domain is the positive integers. We can see that the sequence
Sequeces A sequece of umbers is a fuctio whose domai is the positive itegers. We ca see that the sequece 1, 1, 2, 2, 3, 3,... is a fuctio from the positive itegers whe we write the first sequece elemet
More informationBoosting. Professor Ameet Talwalkar. Professor Ameet Talwalkar CS260 Machine Learning Algorithms March 1, / 32
Boostig Professor Ameet Talwalkar Professor Ameet Talwalkar CS260 Machie Learig Algorithms March 1, 2017 1 / 32 Outlie 1 Admiistratio 2 Review of last lecture 3 Boostig Professor Ameet Talwalkar CS260
More information(s)h(s) = K( s + 8 ) = 5 and one finite zero is located at z 1
ROOT LOCUS TECHNIQUE 93 should be desiged differetly to eet differet specificatios depedig o its area of applicatio. We have observed i Sectio 6.4 of Chapter 6, how the variatio of a sigle paraeter like
More information6 Integers Modulo n. integer k can be written as k = qn + r, with q,r, 0 r b. So any integer.
6 Itegers Modulo I Example 2.3(e), we have defied the cogruece of two itegers a,b with respect to a modulus. Let us recall that a b (mod ) meas a b. We have proved that cogruece is a equivalece relatio
More informationA sequence of numbers is a function whose domain is the positive integers. We can see that the sequence
Sequeces A sequece of umbers is a fuctio whose domai is the positive itegers. We ca see that the sequece,, 2, 2, 3, 3,... is a fuctio from the positive itegers whe we write the first sequece elemet as
More informationThe Binomial Multi-Section Transformer
4/15/2010 The Bioial Multisectio Matchig Trasforer preset.doc 1/24 The Bioial Multi-Sectio Trasforer Recall that a ulti-sectio atchig etwork ca be described usig the theory of sall reflectios as: where:
More informationLecture 9: Boosting. Akshay Krishnamurthy October 3, 2017
Lecture 9: Boostig Akshay Krishamurthy akshay@csumassedu October 3, 07 Recap Last week we discussed some algorithmic aspects of machie learig We saw oe very powerful family of learig algorithms, amely
More information5.6 Binomial Multi-section Matching Transformer
4/14/21 5_6 Bioial Multisectio Matchig Trasforers 1/1 5.6 Bioial Multi-sectio Matchig Trasforer Readig Assiget: pp. 246-25 Oe way to axiize badwidth is to costruct a ultisectio Γ f that is axially flat.
More informationContents Two Sample t Tests Two Sample t Tests
Cotets 3.5.3 Two Saple t Tests................................... 3.5.3 Two Saple t Tests Setup: Two Saples We ow focus o a sceario where we have two idepedet saples fro possibly differet populatios. Our
More informationand then substitute this into the second equation to get 5(11 4 y) 3y
Math E-b Lecture # Notes The priary focus of this week s lecture is a systeatic way of solvig ad uderstadig systes of liear equatios algebraically, geoetrically, ad logically. Eaple #: Solve the syste
More informationLecture 11. Solution of Nonlinear Equations - III
Eiciecy o a ethod Lecture Solutio o Noliear Equatios - III The eiciecy ide o a iterative ethod is deied by / E r r: rate o covergece o the ethod : total uber o uctios ad derivative evaluatios at each step
More informationCastiel, Supernatural, Season 6, Episode 18
13 Differetial Equatios the aswer to your questio ca best be epressed as a series of partial differetial equatios... Castiel, Superatural, Seaso 6, Episode 18 A differetial equatio is a mathematical equatio
More informationDiscrete-Time Systems, LTI Systems, and Discrete-Time Convolution
EEL5: Discrete-Time Sigals ad Systems. Itroductio I this set of otes, we begi our mathematical treatmet of discrete-time s. As show i Figure, a discrete-time operates or trasforms some iput sequece x [
More information5.6 Binomial Multi-section Matching Transformer
4/14/2010 5_6 Bioial Multisectio Matchig Trasforers 1/1 5.6 Bioial Multi-sectio Matchig Trasforer Readig Assiget: pp. 246-250 Oe way to axiize badwidth is to costruct a ultisectio Γ f that is axially flat.
More informationECE 901 Lecture 4: Estimation of Lipschitz smooth functions
ECE 9 Lecture 4: Estiatio of Lipschitz sooth fuctios R. Nowak 5/7/29 Cosider the followig settig. Let Y f (X) + W, where X is a rado variable (r.v.) o X [, ], W is a r.v. o Y R, idepedet of X ad satisfyig
More informationMachine Learning Theory Tübingen University, WS 2016/2017 Lecture 3
Machie Learig Theory Tübige Uiversity, WS 06/07 Lecture 3 Tolstikhi Ilya Abstract I this lecture we will prove the VC-boud, which provides a high-probability excess risk boud for the ERM algorithm whe
More informationMachine Learning Theory Tübingen University, WS 2016/2017 Lecture 11
Machie Learig Theory Tübige Uiversity, WS 06/07 Lecture Tolstikhi Ilya Abstract We will itroduce the otio of reproducig kerels ad associated Reproducig Kerel Hilbert Spaces (RKHS). We will cosider couple
More information1 Duality revisited. AM 221: Advanced Optimization Spring 2016
AM 22: Advaced Optimizatio Sprig 206 Prof. Yaro Siger Sectio 7 Wedesday, Mar. 9th Duality revisited I this sectio, we will give a slightly differet perspective o duality. optimizatio program: f(x) x R
More informationInfinite Sequences and Series
Chapter 6 Ifiite Sequeces ad Series 6.1 Ifiite Sequeces 6.1.1 Elemetary Cocepts Simply speakig, a sequece is a ordered list of umbers writte: {a 1, a 2, a 3,...a, a +1,...} where the elemets a i represet
More information(A sequence also can be thought of as the list of function values attained for a function f :ℵ X, where f (n) = x n for n 1.) x 1 x N +k x N +4 x 3
MATH 337 Sequeces Dr. Neal, WKU Let X be a metric space with distace fuctio d. We shall defie the geeral cocept of sequece ad limit i a metric space, the apply the results i particular to some special
More informationHOMEWORK 2 SOLUTIONS
HOMEWORK SOLUTIONS CSE 55 RANDOMIZED AND APPROXIMATION ALGORITHMS 1. Questio 1. a) The larger the value of k is, the smaller the expected umber of days util we get all the coupos we eed. I fact if = k
More informationIntro to Learning Theory
Lecture 1, October 18, 2016 Itro to Learig Theory Ruth Urer 1 Machie Learig ad Learig Theory Comig soo 2 Formal Framework 21 Basic otios I our formal model for machie learig, the istaces to be classified
More informationRegression with quadratic loss
Regressio with quadratic loss Maxim Ragisky October 13, 2015 Regressio with quadratic loss is aother basic problem studied i statistical learig theory. We have a radom couple Z = X,Y, where, as before,
More informationWe have also learned that, thanks to the Central Limit Theorem and the Law of Large Numbers,
Cofidece Itervals III What we kow so far: We have see how to set cofidece itervals for the ea, or expected value, of a oral probability distributio, both whe the variace is kow (usig the stadard oral,
More information) is a square matrix with the property that for any m n matrix A, the product AI equals A. The identity matrix has a ii
square atrix is oe that has the sae uber of rows as colus; that is, a atrix. he idetity atrix (deoted by I, I, or [] I ) is a square atrix with the property that for ay atrix, the product I equals. he
More information6.3 Testing Series With Positive Terms
6.3. TESTING SERIES WITH POSITIVE TERMS 307 6.3 Testig Series With Positive Terms 6.3. Review of what is kow up to ow I theory, testig a series a i for covergece amouts to fidig the i= sequece of partial
More informationCHAPTER I: Vector Spaces
CHAPTER I: Vector Spaces Sectio 1: Itroductio ad Examples This first chapter is largely a review of topics you probably saw i your liear algebra course. So why cover it? (1) Not everyoe remembers everythig
More informationFall 2013 MTH431/531 Real analysis Section Notes
Fall 013 MTH431/531 Real aalysis Sectio 8.1-8. Notes Yi Su 013.11.1 1. Defiitio of uiform covergece. We look at a sequece of fuctios f (x) ad study the coverget property. Notice we have two parameters
More informationAVERAGE MARKS SCALING
TERTIARY INSTITUTIONS SERVICE CENTRE Level 1, 100 Royal Street East Perth, Wester Australia 6004 Telephoe (08) 9318 8000 Facsiile (08) 95 7050 http://wwwtisceduau/ 1 Itroductio AVERAGE MARKS SCALING I
More informationRademacher Complexity
EECS 598: Statistical Learig Theory, Witer 204 Topic 0 Rademacher Complexity Lecturer: Clayto Scott Scribe: Ya Deg, Kevi Moo Disclaimer: These otes have ot bee subjected to the usual scrutiy reserved for
More informationDefine a Markov chain on {1,..., 6} with transition probability matrix P =
Pla Group Work 0. The title says it all Next Tie: MCMC ad Geeral-state Markov Chais Midter Exa: Tuesday 8 March i class Hoework 4 due Thursday Uless otherwise oted, let X be a irreducible, aperiodic Markov
More informationEmpirical Process Theory and Oracle Inequalities
Stat 928: Statistical Learig Theory Lecture: 10 Empirical Process Theory ad Oracle Iequalities Istructor: Sham Kakade 1 Risk vs Risk See Lecture 0 for a discussio o termiology. 2 The Uio Boud / Boferoi
More informationOptimally Sparse SVMs
A. Proof of Lemma 3. We here prove a lower boud o the umber of support vectors to achieve geeralizatio bouds of the form which we cosider. Importatly, this result holds ot oly for liear classifiers, but
More informationMath 113 Exam 3 Practice
Math Exam Practice Exam will cover.-.9. This sheet has three sectios. The first sectio will remid you about techiques ad formulas that you should kow. The secod gives a umber of practice questios for you
More informationDifferentiable Convex Functions
Differetiable Covex Fuctios The followig picture motivates Theorem 11. f ( x) f ( x) f '( x)( x x) ˆx x 1 Theorem 11 : Let f : R R be differetiable. The, f is covex o the covex set C R if, ad oly if for
More informationA PROBABILITY PROBLEM
A PROBABILITY PROBLEM A big superarket chai has the followig policy: For every Euros you sped per buy, you ear oe poit (suppose, e.g., that = 3; i this case, if you sped 8.45 Euros, you get two poits,
More informationMA131 - Analysis 1. Workbook 2 Sequences I
MA3 - Aalysis Workbook 2 Sequeces I Autum 203 Cotets 2 Sequeces I 2. Itroductio.............................. 2.2 Icreasig ad Decreasig Sequeces................ 2 2.3 Bouded Sequeces..........................
More informationChapter 4. Fourier Series
Chapter 4. Fourier Series At this poit we are ready to ow cosider the caoical equatios. Cosider, for eample the heat equatio u t = u, < (4.) subject to u(, ) = si, u(, t) = u(, t) =. (4.) Here,
More informationPattern recognition systems Laboratory 10 Linear Classifiers and the Perceptron Algorithm
Patter recogitio systems Laboratory 10 Liear Classifiers ad the Perceptro Algorithm 1. Objectives his laboratory sessio presets the perceptro learig algorithm for the liear classifier. We will apply gradiet
More informationIntroduction to Machine Learning DIS10
CS 189 Fall 017 Itroductio to Machie Learig DIS10 1 Fu with Lagrage Multipliers (a) Miimize the fuctio such that f (x,y) = x + y x + y = 3. Solutio: The Lagragia is: L(x,y,λ) = x + y + λ(x + y 3) Takig
More informationREGRESSION WITH QUADRATIC LOSS
REGRESSION WITH QUADRATIC LOSS MAXIM RAGINSKY Regressio with quadratic loss is aother basic problem studied i statistical learig theory. We have a radom couple Z = X, Y ), where, as before, X is a R d
More informationSome examples of vector spaces
Roberto s Notes o Liear Algebra Chapter 11: Vector spaces Sectio 2 Some examples of vector spaces What you eed to kow already: The te axioms eeded to idetify a vector space. What you ca lear here: Some
More informationChapter 2. Asymptotic Notation
Asyptotic Notatio 3 Chapter Asyptotic Notatio Goal : To siplify the aalysis of ruig tie by gettig rid of details which ay be affected by specific ipleetatio ad hardware. [1] The Big Oh (O-Notatio) : It
More informationIntegrals of Functions of Several Variables
Itegrals of Fuctios of Several Variables We ofte resort to itegratios i order to deterie the exact value I of soe quatity which we are uable to evaluate by perforig a fiite uber of additio or ultiplicatio
More informationECE 901 Lecture 12: Complexity Regularization and the Squared Loss
ECE 90 Lecture : Complexity Regularizatio ad the Squared Loss R. Nowak 5/7/009 I the previous lectures we made use of the Cheroff/Hoeffdig bouds for our aalysis of classifier errors. Hoeffdig s iequality
More informationLecture 2: April 3, 2013
TTIC/CMSC 350 Mathematical Toolkit Sprig 203 Madhur Tulsiai Lecture 2: April 3, 203 Scribe: Shubhedu Trivedi Coi tosses cotiued We retur to the coi tossig example from the last lecture agai: Example. Give,
More informationThe Growth of Functions. Theoretical Supplement
The Growth of Fuctios Theoretical Supplemet The Triagle Iequality The triagle iequality is a algebraic tool that is ofte useful i maipulatig absolute values of fuctios. The triagle iequality says that
More informationSequences I. Chapter Introduction
Chapter 2 Sequeces I 2. Itroductio A sequece is a list of umbers i a defiite order so that we kow which umber is i the first place, which umber is i the secod place ad, for ay atural umber, we kow which
More informationSeptember 2012 C1 Note. C1 Notes (Edexcel) Copyright - For AS, A2 notes and IGCSE / GCSE worksheets 1
September 0 s (Edecel) Copyright www.pgmaths.co.uk - For AS, A otes ad IGCSE / GCSE worksheets September 0 Copyright www.pgmaths.co.uk - For AS, A otes ad IGCSE / GCSE worksheets September 0 Copyright
More informationLecture 27. Capacity of additive Gaussian noise channel and the sphere packing bound
Lecture 7 Ageda for the lecture Gaussia chael with average power costraits Capacity of additive Gaussia oise chael ad the sphere packig boud 7. Additive Gaussia oise chael Up to this poit, we have bee
More informationCHAPTER 10 INFINITE SEQUENCES AND SERIES
CHAPTER 10 INFINITE SEQUENCES AND SERIES 10.1 Sequeces 10.2 Ifiite Series 10.3 The Itegral Tests 10.4 Compariso Tests 10.5 The Ratio ad Root Tests 10.6 Alteratig Series: Absolute ad Coditioal Covergece
More information1 Approximating Integrals using Taylor Polynomials
Seughee Ye Ma 8: Week 7 Nov Week 7 Summary This week, we will lear how we ca approximate itegrals usig Taylor series ad umerical methods. Topics Page Approximatig Itegrals usig Taylor Polyomials. Defiitios................................................
More informationw (1) ˆx w (1) x (1) /ρ and w (2) ˆx w (2) x (2) /ρ.
2 5. Weighted umber of late jobs 5.1. Release dates ad due dates: maximimizig the weight of o-time jobs Oce we add release dates, miimizig the umber of late jobs becomes a sigificatly harder problem. For
More informationTHE SOLUTION OF NONLINEAR EQUATIONS f( x ) = 0.
THE SOLUTION OF NONLINEAR EQUATIONS f( ) = 0. Noliear Equatio Solvers Bracketig. Graphical. Aalytical Ope Methods Bisectio False Positio (Regula-Falsi) Fied poit iteratio Newto Raphso Secat The root of
More informationLecture Outline. 2 Separating Hyperplanes. 3 Banach Mazur Distance An Algorithmist s Toolkit October 22, 2009
18.409 A Algorithist s Toolkit October, 009 Lecture 1 Lecturer: Joatha Keler Scribes: Alex Levi (009) 1 Outlie Today we ll go over soe of the details fro last class ad ake precise ay details that were
More informationMachine Learning Brett Bernstein
Machie Learig Brett Berstei Week 2 Lecture: Cocept Check Exercises Starred problems are optioal. Excess Risk Decompositio 1. Let X = Y = {1, 2,..., 10}, A = {1,..., 10, 11} ad suppose the data distributio
More informationMA131 - Analysis 1. Workbook 3 Sequences II
MA3 - Aalysis Workbook 3 Sequeces II Autum 2004 Cotets 2.8 Coverget Sequeces........................ 2.9 Algebra of Limits......................... 2 2.0 Further Useful Results........................
More informationRoberto s Notes on Series Chapter 2: Convergence tests Section 7. Alternating series
Roberto s Notes o Series Chapter 2: Covergece tests Sectio 7 Alteratig series What you eed to kow already: All basic covergece tests for evetually positive series. What you ca lear here: A test for series
More informationCS 70 Second Midterm 7 April NAME (1 pt): SID (1 pt): TA (1 pt): Name of Neighbor to your left (1 pt): Name of Neighbor to your right (1 pt):
CS 70 Secod Midter 7 April 2011 NAME (1 pt): SID (1 pt): TA (1 pt): Nae of Neighbor to your left (1 pt): Nae of Neighbor to your right (1 pt): Istructios: This is a closed book, closed calculator, closed
More informationMath 312 Lecture Notes One Dimensional Maps
Math 312 Lecture Notes Oe Dimesioal Maps Warre Weckesser Departmet of Mathematics Colgate Uiversity 21-23 February 25 A Example We begi with the simplest model of populatio growth. Suppose, for example,
More informationSequences and Series of Functions
Chapter 6 Sequeces ad Series of Fuctios 6.1. Covergece of a Sequece of Fuctios Poitwise Covergece. Defiitio 6.1. Let, for each N, fuctio f : A R be defied. If, for each x A, the sequece (f (x)) coverges
More informationLecture 2: Monte Carlo Simulation
STAT/Q SCI 43: Itroductio to Resamplig ethods Sprig 27 Istructor: Ye-Chi Che Lecture 2: ote Carlo Simulatio 2 ote Carlo Itegratio Assume we wat to evaluate the followig itegratio: e x3 dx What ca we do?
More informationDisjoint set (Union-Find)
CS124 Lecture 7 Fall 2018 Disjoit set (Uio-Fid) For Kruskal s algorithm for the miimum spaig tree problem, we foud that we eeded a data structure for maitaiig a collectio of disjoit sets. That is, we eed
More informationChapter 6 Infinite Series
Chapter 6 Ifiite Series I the previous chapter we cosidered itegrals which were improper i the sese that the iterval of itegratio was ubouded. I this chapter we are goig to discuss a topic which is somewhat
More informationMa 530 Infinite Series I
Ma 50 Ifiite Series I Please ote that i additio to the material below this lecture icorporated material from the Visual Calculus web site. The material o sequeces is at Visual Sequeces. (To use this li
More informationMassachusetts Institute of Technology
Massachusetts Istitute of Techology 6.867 Machie Learig, Fall 6 Problem Set : Solutios. (a) (5 poits) From the lecture otes (Eq 4, Lecture 5), the optimal parameter values for liear regressio give the
More information4.3 Growth Rates of Solutions to Recurrences
4.3. GROWTH RATES OF SOLUTIONS TO RECURRENCES 81 4.3 Growth Rates of Solutios to Recurreces 4.3.1 Divide ad Coquer Algorithms Oe of the most basic ad powerful algorithmic techiques is divide ad coquer.
More informationCSCI567 Machine Learning (Fall 2014)
CSCI567 Machie Learig (Fall 2014) Drs. Sha & Liu {feisha,yaliu.cs}@usc.edu October 14, 2014 Drs. Sha & Liu ({feisha,yaliu.cs}@usc.edu) CSCI567 Machie Learig (Fall 2014) October 14, 2014 1 / 49 Outlie Admiistratio
More informationInverse Matrix. A meaning that matrix B is an inverse of matrix A.
Iverse Matrix Two square matrices A ad B of dimesios are called iverses to oe aother if the followig holds, AB BA I (11) The otio is dual but we ofte write 1 B A meaig that matrix B is a iverse of matrix
More informationMixture models (cont d)
6.867 Machie learig, lecture 5 (Jaakkola) Lecture topics: Differet types of ixture odels (cot d) Estiatig ixtures: the EM algorith Mixture odels (cot d) Basic ixture odel Mixture odels try to capture ad
More informationOrthogonal Functions
Royal Holloway Uiversity of odo Departet of Physics Orthogoal Fuctios Motivatio Aalogy with vectors You are probably failiar with the cocept of orthogoality fro vectors; two vectors are orthogoal whe they
More informationOptimal Estimator for a Sample Set with Response Error. Ed Stanek
Optial Estiator for a Saple Set wit Respose Error Ed Staek Itroductio We develop a optial estiator siilar to te FP estiator wit respose error tat was cosidered i c08ed63doc Te first 6 pages of tis docuet
More informationOn Modeling On Minimum Description Length Modeling. M-closed
O Modelig O Miiu Descriptio Legth Modelig M M-closed M-ope Do you believe that the data geeratig echais really is i your odel class M? 7 73 Miiu Descriptio Legth Priciple o-m-closed predictive iferece
More informationThe Binomial Multi- Section Transformer
4/4/26 The Bioial Multisectio Matchig Trasforer /2 The Bioial Multi- Sectio Trasforer Recall that a ulti-sectio atchig etwork ca be described usig the theory of sall reflectios as: where: ( ω ) = + e +
More informationPattern recognition systems Lab 10 Linear Classifiers and the Perceptron Algorithm
Patter recogitio systems Lab 10 Liear Classifiers ad the Perceptro Algorithm 1. Objectives his lab sessio presets the perceptro learig algorithm for the liear classifier. We will apply gradiet descet ad
More informationU8L1: Sec Equations of Lines in R 2
MCVU U8L: Sec. 8.9. Equatios of Lies i R Review of Equatios of a Straight Lie (-D) Cosider the lie passig through A (-,) with slope, as show i the diagram below. I poit slope form, the equatio of the lie
More informationIt is always the case that unions, intersections, complements, and set differences are preserved by the inverse image of a function.
MATH 532 Measurable Fuctios Dr. Neal, WKU Throughout, let ( X, F, µ) be a measure space ad let (!, F, P ) deote the special case of a probability space. We shall ow begi to study real-valued fuctios defied
More informationMa 530 Introduction to Power Series
Ma 530 Itroductio to Power Series Please ote that there is material o power series at Visual Calculus. Some of this material was used as part of the presetatio of the topics that follow. What is a Power
More informationStatistical Machine Learning II Spring 2017, Learning Theory, Lecture 7
Statistical Machie Learig II Sprig 2017, Learig Theory, Lecture 7 1 Itroductio Jea Hoorio jhoorio@purdue.edu So far we have see some techiques for provig geeralizatio for coutably fiite hypothesis classes
More information11 KERNEL METHODS From Feature Combinations to Kernels. φ(x) = 1, 2x 1, 2x 2, 2x 3,..., 2x D, Learning Objectives:
11 KERNEL METHODS May who have had a opportuity of kowig ay ore about atheatics cofuse it with arithetic, ad cosider it a arid sciece. I reality, however, it is a sciece which requires a great aout of
More informationHypothesis Testing. Evaluation of Performance of Learned h. Issues. Trade-off Between Bias and Variance
Hypothesis Testig Empirically evaluatig accuracy of hypotheses: importat activity i ML. Three questios: Give observed accuracy over a sample set, how well does this estimate apply over additioal samples?
More informationBertrand s postulate Chapter 2
Bertrad s postulate Chapter We have see that the sequece of prie ubers, 3, 5, 7,... is ifiite. To see that the size of its gaps is ot bouded, let N := 3 5 p deote the product of all prie ubers that are
More informationINEQUALITIES BJORN POONEN
INEQUALITIES BJORN POONEN 1 The AM-GM iequality The most basic arithmetic mea-geometric mea (AM-GM) iequality states simply that if x ad y are oegative real umbers, the (x + y)/2 xy, with equality if ad
More informationMachine Learning Theory (CS 6783)
Machie Learig Theory (CS 6783) Lecture 2 : Learig Frameworks, Examples Settig up learig problems. X : istace space or iput space Examples: Computer Visio: Raw M N image vectorized X = 0, 255 M N, SIFT
More informationThe Random Walk For Dummies
The Radom Walk For Dummies Richard A Mote Abstract We look at the priciples goverig the oe-dimesioal discrete radom walk First we review five basic cocepts of probability theory The we cosider the Beroulli
More information19.1 The dictionary problem
CS125 Lecture 19 Fall 2016 19.1 The dictioary proble Cosider the followig data structural proble, usually called the dictioary proble. We have a set of ites. Each ite is a (key, value pair. Keys are i
More informationMachine Learning. Ilya Narsky, Caltech
Machie Learig Ilya Narsky, Caltech Lecture 4 Multi-class problems. Multi-class versios of Neural Networks, Decisio Trees, Support Vector Machies ad AdaBoost. Reductio of a multi-class problem to a set
More informationRandom Models. Tusheng Zhang. February 14, 2013
Radom Models Tusheg Zhag February 14, 013 1 Radom Walks Let me describe the model. Radom walks are used to describe the motio of a movig particle (object). Suppose that a particle (object) moves alog the
More informationAdmin REGULARIZATION. Schedule. Midterm 9/29/16. Assignment 5. Midterm next week, due Friday (more on this in 1 min)
Admi Assigmet 5! Starter REGULARIZATION David Kauchak CS 158 Fall 2016 Schedule Midterm ext week, due Friday (more o this i 1 mi Assigmet 6 due Friday before fall break Midterm Dowload from course web
More information