Variance Penalizing AdaBoost

Size: px
Start display at page:

Download "Variance Penalizing AdaBoost"

Transcription

1 Variace Pealizig AdaBoost Paagadatta K. Shivaswamy Departmet of Computer Sciece Corell Uiversity, Ithaca NY Toy Jebara Departmet of Compter Sciece Columbia Uiversity, New York NY Abstract This paper proposes a ovel boostig algorithm called VadaBoost which is motivated by recet empirical Berstei bouds. VadaBoost iteratively miimizes a cost fuctio that balaces the sample mea ad the sample variace of the expoetial loss. Each step of the proposed algorithm miimizes the cost efficietly by providig weighted data to a weak learer rather tha requirig a brute force evaluatio of all possible weak learers. Thus, the proposed algorithm solves a key limitatio of previous empirical Berstei boostig methods which required brute force eumeratio of all possible weak learers. Experimetal results cofirm that the ew algorithm achieves the performace improvemets of EBBoost yet goes beyod decisio stumps to hadle ay weak learer. Sigificat performace gais are obtaied over AdaBoost for arbitrary weak learers icludig decisio trees (CART). 1 Itroductio May machie learig algorithms implemet empirical risk miimizatio or a regularized variat of it. For example, the popular AdaBoost [4] algorithm miimizes expoetial loss o the traiig examples. Similarly, the support vector machie [11] miimizes hige loss o the traiig examples. The covexity of these losses is helpful for computatioal as well as geeralizatio reasos []. The goal of most learig problems, however, is ot to obtai a fuctio that performs well o traiig data, but rather to estimate a fuctio (usig traiig data) that performs well o future usee test data. Therefore, empirical risk miimizatio o the traiig set is ofte performed while regularizig the complexity of the fuctio classes beig explored. The ratioale behid this regularizatio approach is that it esures that the empirical risk coverges (uiformly) to the true ukow risk. Various cocetratio iequalities formalize the rate of covergece i terms of the fuctio class complexity ad the umber of samples. A key tool i obtaiig such cocetratio iequalities is Hoeffdig s iequality which relates the empirical mea of a bouded radom variable to its true mea. Berstei s ad Beett s iequalities relate the true mea of a radom variable to the empirical mea but also icorporate the true variace of the radom variable. If the true variace of a radom variable is small, these bouds ca be sigificatly tighter tha Hoeffdig s boud. Recetly, there have bee empirical couterparts of Berstei s iequality [1, 5]; these bouds icorporate the empirical variace of a radom variable rather tha its true variace. The advatage of these bouds is that the quatities they ivolve are empirical. Previously, these bouds have bee applied i samplig procedures [6] ad i multiarmed badit problems [1]. A alterative to empirical risk miimizatio, called sample variace pealizatio [5], has bee proposed ad is motivated by empirical Berstei bouds. A ew boostig algorithm is proposed i this paper which implemets sample variace pealizatio. The algorithm miimizes the empirical risk o the traiig set as well as the empirical variace. The two quatities (the risk ad the variace) are traded-off through a scalar parameter. Moreover, the 1

2 algorithm proposed i this article does ot require exhaustive eumeratio of the weak learers (ulike a earlier algorithm by [10]). Assume that a traiig set (X i,y i ) is provided where X i Xad y i {±1} are draw idepedetly ad idetically distributed (iid) from a fixed but ukow distributio D. The goal is to lear a classifier or a fuctio f : X {±1} that performs well o test examples draw from the same distributio D. I the rest of this article, G : X {±1} deotes the so-called weak learer. The otatio G s deotes the weak learer i a particular iteratio s. Further, the two idices sets I s ad J s, respectively, deote examples that the weak learer G s correctly classified ad misclassified, i.e., I s := {i G s (X i )=y i } ad J s := {j G s (X j ) = y j }. Algorithm 1 AdaBoost Require: (X i,y i ), ad weak learers H Iitialize the weights: w i 1/ for i =1,...,; Iitialize f to predict zero o all iputs. for s 1 to S do Estimate a weak learer G s ( ) from traiig examples weighted by (w i ). α s = 1 log i:g s (X i)=y i w i / j:g s (X j)=y j w j if α s 0 the break ed if f( ) f( )+α s G s ( ) w i w i exp( y i G s (X i )α s )/Z s where Z s is such that w i =1. ed for Algorithm VadaBoost Require: (X i,y i ), scalar parameter 1 λ 0, ad weak learers H Iitialize the weights: w i 1/ for i =1,...,; Iitialize f to predict zero o all iputs. for s 1 to S do u i λwi +(1 λ)w i Estimate a weak learer G s ( ) from traiig examples weighted by (u i ). α s = 1 4 log i:g s (X i)=y i u i / j:g s (X j)=y j u j if α s 0 the break ed if f( ) f( )+α s G s ( ) w i w i exp( y i G s (X i )α s )/Z s where Z s is such that w i =1. ed for Algorithms I this sectio, we briefly discuss AdaBoost [4] ad the propose a ew algorithm called the VadaBoost. The derivatio of VadaBoost will be provided i detail i the ext sectio. AdaBoost (Algorithm 1) assigs a weight w i to each traiig example. I each step of the AdaBoost, a weak learer G s ( ) is obtaied o the weighted examples ad a weight α s is assiged to it. Thus, AdaBoost iteratively builds S α sg s ( ). If a traiig example is correctly classified, its weight is expoetially decreased; if it is misclassified, its weight is expoetially icreased. The process is repeated util a stoppig criterio is met. AdaBoost essetially performs empirical risk miimizatio: mi 1 f F e yif(xi) by greedily costructig the fuctio f( ) via S α sg s ( ). Recetly a alterative to empirical risk miimizatio has bee proposed. This ew criterio, kow as the sample variace pealizatio [5] trades-off the empirical risk with the empirical variace: 1 arg mi f F ˆV[l(f(X),y)] l(f(x i ),y i )+τ, (1) where τ 0 explores the trade-off betwee the two quatities. The motivatio for sample variace pealizatio comes from the followig theorem [5]:

3 Theorem 1 Let (X i,y i ) be draw iid from a distributio D. Let F be a class of fuctios f : X R. The, for a loss l : R Y [0, 1], for ay δ>0, w.p. at least 1 δ, f F E[l(f(X),y)] 1 l(f(x i ),y i )+ where M() is a complexity measure. 15 l(m()/δ) ( 1) + 18 ˆV[l(f(X),y)] l(m()/δ), () From the above uiform covergece result, it ca be argued that future loss ca be miimized by miimizig the right had side of the boud o traiig examples. Sice the variace ˆV[l(f(X),y)] has a multiplicative factor ivolvig M(), δ ad, for a give problem, it is difficult to specify the relative importace betwee empirical risk ad empirical variace a priori. Hece, sample variace pealizatio (1) ecessarily ivolves a trade-off parameter τ. Empirical risk miimizatio or sample variace pealizatio o the 0 1 loss is a hard problem; this problem is ofte circumveted by miimizig a covex upper boud o the 0 1 loss. I this paper, we cosider the expoetial loss l(f(x),y):=e yf(x). With the above loss, it was show by [10] that sample variace pealizatio is equivalet to miimizig the followig cost, e yif(xi) + λ e yif(xi) e yif(xi). (3) Theorem 1 requires that the loss fuctio be bouded. Eve though the expoetial loss is ubouded, boostig is typically performed oly for a fiite umber of iteratios i most practical applicatios. Moreover, sice weak learers typically perform oly slightly better tha radom guessig, each α s i AdaBoost (or i VadaBoost) is typically small thus limitig the rage of the fuctio leared. Furthermore, experimets will cofirm that sample variace pealizatio results i a sigificat empirical performace improvemet over empirical risk miimizatio. Our proposed algorithm is called VadaBoost 1 ad is described i Algorithm. VadaBoost iteratively performs sample variace pealizatio (i.e., it miimizes the cost (3) iteratively). Clearly, VadaBoost shares the simplicity ad ease of implemetatio foud i AdaBoost. 3 Derivatio of VadaBoost I the s th iteratio, our objective is to choose a weak learer G s ad a weight α s such that s t=1 α tg t s 1 ( ) reduces the cost (3). Deote by w i the quatity e yi t=1 αtgt (x i) /Z s. Give a cadidate weak learer G s ( ), the cost (3) for the fuctio s 1 t=1 α tg t ( )+αg s ( ) ca be expressed as a fuctio of α: V (α; w,λ,i,j):= w i e α + w j e α +λ wi e α + j w e α w i e α + w j e α. (4) up to a multiplicative factor. I the quatity above, I ad J are the two idex sets (of correctly classified ad icorrectly classified examples) over G s. Let the vector w whose i th compoet is w i deote the curret set of weights o the traiig examples. Here, we have dropped the subscripts/superscripts s for brevity. Lemma The update of α s i Algorithm miimizes the cost U(α; w,λ,i,j):= λw i +(1 λ)w i e α + λw j +(1 λ)w j e α. (5) 1 The V i VadaBoost emphasizes the fact that Algorithm pealizes the empirical variace. 3

4 Proof By obtaiig the secod derivative of the above expressio (with respect to α), it is easy to see that it is covex i α. Thus, settig the derivative with respect to α to zero gives the optimal choice of α as show i Algorithm. Theorem 3 Assume that 0 λ 1 ad w i = 1 (i.e. ormalized weights). The, V (α; w,λ,i,j) U(α; w,λ,i,j) ad V (0; w,λ,i,j)=u(0; w,λ,i,j). That is, U is a upper boud o V ad the boud is exact at α =0. Proof Deotig 1 λ by λ, we have: V (α; w,λ,i,j)= w i e α + w j e α + λ wi e α + wj e α w i e α + w j e α = λ w i e α + w j e α + λ wi e α + wj e α = λ wi e α + wj e α + λ w i e α + w j e α + w i w j = λ wi e α + wj e α + λ w i 1 w j e α + w j 1 w i e α + λ w i w j = λw i + λw i e α + λw j + λw j e α + λ w i w j e α e α + λw i + λw i e α + λw j + λw j e α = U(α; w,λ,i,j). O lie two, terms were simply regrouped. O lie three, the square term from lie two was expaded. O the ext lie, we used the fact that w i + = w i =1. O the fifth lie, we oce agai regrouped terms; the last term i this expressio (which is e α + e α ) ca be writte as (e α e α ). Whe α =0this term vaishes. Hece the boud is exact at α =0. Corollary 4 VadaBoost mootoically decreases the cost (3). The above corollary follows from: V (α s ; w,λ,i,j) U(α s ; w,λ,i,j) <U(0; w,λ,i,j)=v (0; w,λ,i,j). I the above, the first iequality follows from Theorem (3). The secod strict iequality holds because α s is a miimizer of U from Lemma (); it is ot hard to show that U(α s ; w,λ,i,j) is strictly less tha U(0; w,λ,i,j) from the termiatio criterio of VadaBoost. The third equality agai follows from Theorem (3). Fially, we otice that V (0; w,λ,i,j) merely correspods to the cost (3) at s 1 t=1 α tg t ( ). Thus, we have show that takig a step α s decreases the cost (3). 4

5 Actual Cost:V Upper Boud:U 3 Actual Cost:V Upper Boud:U Cost Cost α α Figure 1: Typical Upper boud U(α; w,λ,i,j) ad the actual cost fuctio V (α; w,λ,i,j) values uder varyig α. The boud is exact at α =0. The boud gets closer to the actual fuctio value as λ grows. The left plot shows the boud for λ =0ad the right plot shows it for λ =0.9 We poit out that we use a differet upper boud i each iteratio sice V ad U are parameterized by the curret weights i the VadaBoost algorithm. Also ote that our upper boud holds oly for 0 λ 1. Although the choice 0 λ 1 seems restrictive, ituitively, it is atural to have a higher pealizatio o the empirical mea rather tha the empirical variace durig miimizatio. Also, a closer look at the empirical Berstei iequality i [5] shows that the empirical variace term is multiplied by 1/ while the empirical mea is multiplied by oe. Thus, for large values of, the weight o the sample variace is small. Furthermore, our experimets suggest that restrictig λ to this rage does ot sigificatly chage the results. 4 How good is the upper boud? First, we observe that our upper boud is exact whe λ =1. Also, our upper boud is loosest for the case λ =0. We visualize the upper boud ad the true cost for two settigs of λ i Figure 1. Sice the cost (4) is miimized via a upper boud (5), a atural questio is: how good is this approximatio? We evaluate the tightess of this upper boud by cosiderig its impact o learig efficiecy. As is clear from figure (1), whe λ =1, the upper boud is exact ad icurs o iefficiecy. I the other extreme whe λ =0, the cost of VadaBoost coicides with AdaBoost ad the boud is effectively at its loosest. Eve i this extreme case, VadaBoost derived through a upper boud oly requires at most twice the umber of iteratios as AdaBoost to achieve a particular cost. The followig theorem shows that our algorithm remais efficiet eve i this worst-case sceario. Theorem 5 Let O A deote the squared cost obtaied by AdaBoost after S iteratios. For weak learers i ay iteratio achievig a fixed error rate <0.5, VadaBoost with the settig λ =0 attais a cost at least as low as O A i o more tha S iteratios. Proof Deote the weight o the example i i s th iteratio by wi s. The weighted error rate of the sth classifier is s = s wj s. We have, for both algorithms, w S+1 i = ws i exp( y iα S G S (X i )) Z s The value of the ormalizatio factor i the case of AdaBoost is = exp( y S i α sg s (X i )) S Z s. (6) Z a s = j j s w s je αs + s w s i e αs = s (1 s ). (7) Similarly, the value of the ormalizatio factor for VadaBoost is give by Z v s = s w s je αs + s w s i e αs =(( s )(1 s )) 1 4 ( s + 1 s ). (8) 5

6 The squared cost fuctio of AdaBoost after S steps is give by S S S O A = exp( y i α s y i G (X)) s = = Zs a We used (6), (7) ad the fact that ws+1 i λ =0the cost of VadaBoost satisfies S O V = exp( y i α s y i G (X)) s = S = ( s (1 s )+ s (1 s )). w s+1 i Z a s S = 4 s (1 s ). =1to derive the above expressio. Similarly, for S Zs a w s+1 i = S Now, suppose that s = for all s. The, the squared cost achieved by AdaBoost is give by (4(1 )) S. To achieve the same cost value, VadaBoost, with weak learers with the same log(4(1 )) error rate eeds at most S times. Withi the rage of iterest for, the term multiplyig S above is at most. log((1 )+ (1 )) Z v s Although the above worse-case boud achieves a factor of two, for >0.4, VadaBoost requires oly about 33% more iteratios tha AdaBoost. To summarize, eve i the worst possible sceario where λ =0(whe the variatioal boud is at its loosest), the VadaBoost algorithm takes o more tha double (a small costat factor) the umber of iteratios of AdaBoost to achieve the same cost. Algorithm 3 EBBoost: Require: (X i,y i ), scalar parameter λ 0, ad weak learers H Iitialize the weights: w i 1/ for i =1,...,; Iitialize f to predict zero o all iputs. for s 1 to S do Get a weak learer G s ( ) that miimizes (3) with the followig choice of α s : α s = 1 4 log (1 λ)( s wi) +λ s w i (1 λ)( i Js wi) +λ i Js w i if α s < 0 the break ed if f( ) f( )+α s G s ( ) w i w i exp( y i G s (X i )α s )/Z s where Z s is such that w i =1. ed for 5 A limitatio of the EBBoost algorithm A sample variace pealizatio algorithm kow as EBBoost was previously explored [10]. While this algorithm was simple to implemet ad showed sigificat improvemets over AdaBoost, it suffers from a severe limitatio: it requires eumeratio ad evaluatio of every possible weak learer per iteratio. Recall the steps implemetig EBBoost i Algorithm 3. A implemetatio of EBBoost requires exhaustive eumeratio of weak learers i search of the oe that miimizes cost (3). It is preferable, istead, to fid the best weak learer by providig weights o the traiig examples ad efficietly computig the rule whose performace o that weighted set of examples is guarateed to be better tha radom guessig. However, with the EBBoost algorithm, the weight o all the misclassified examples is i J s w i + i J s w i ad the weight o correctly classified examples is s w i + s w i ; these aggregate weights o misclassified examples ad correctly classified examples do ot traslate ito weights o the idividual examples. Thus, it becomes ecessary to exhaustively eumerate weak learers i Algorithm 3. While eumeratio of weak learers is possible i the case of decisio stumps, it poses serious difficulties i the case of weak learers such as decisio trees, ridge regressio, etc. Thus, VadaBoost is the more versatile boostig algorithm for sample variace pealizatio. The cost which VadaBoost miimizes at λ =0is the squared cost of AdaBoost, we do ot square it agai. 6

7 Table 1: Mea ad stadard errors with decisio stump as the weak learer. Dataset AdaBoost EBBoost VadaBoost RLP-Boost RQP-Boost a5a ± ± ± ± ± 0.1 abaloe 1.64 ± ± ± 0..9 ± ± 0. image 3.37 ± ± ± ± ± 0.1 mushrooms 0.0 ± ± ± ± ± 0.0 musk 3.84 ± ± ± ± ± 0.1 mist ± ± ± ± ± 0.0 mist ± ± ± ± ± 0.0 mist7.11 ± ± ± ± ± 0.1 mist ± ± ± ± ± 0.1 mist56.79 ± ± ± ± ± 0.1 rigorm ± ± ± ± ± 0.6 spambase 5.90 ± ± ± ± ± 0.1 splice 8.83 ± ± ± ± ± 0.1 twoorm 3.16 ± ± ± ± ± 0.1 w4a.60 ± ± ± ± ± 0.1 waveform ± ± ± ± ± 0.1 wie 3.6 ± ± ± ± ± 0.1 wisc 5.3 ± ± ± ± ± Experimets Table : Mea ad stadard errors with CART as the weak learer. Dataset AdaBoost VadaBoost RLP-Boost RQP-Boost a5a ± ± ± ± 0.1 abaloe 1.87 ± ± ± ± 0. image 1.93 ± ± ± ± 0.1 mushrooms 0.01 ± ± ± ± 0.0 musk.36 ± ± ± ± 0.1 mist ± ± ± ± 0.0 mist ± ± ± ± 0.0 mist ± ± ± ± 0.0 mist ± ± ± ± 0.1 mist ± ± ± ± 0.1 rigorm 7.94 ± ± ± ± 0.4 spambase 6.14 ± ± ± ± 0.1 splice 4.0 ± ± ± ± 0.1 twoorm 3.40 ± ± ± ± 0.1 w4a.90 ± ± ± ± 0.1 waveform ± ± ± ± 0.1 wie 1.94 ± ± ± ± 0. wisc 4.61 ± ± ± ± 0. I this sectio, we evaluate the empirical performace of the VadaBoost algorithm with respect to several other algorithms. The primary purpose of our experimets is to compare sample variace pealizatio versus empirical risk miimizatio ad to show that we ca efficietly perform sample variace pealizatio for weak learers beyod decisio stumps. We compared VadaBoost agaist EBBoost, AdaBoost, regularized LP ad QP boost algorithms [7]. All the algorithms except AdaBoost have oe extra parameter to tue. Experimets were performed o bechmark datasets that have bee previously used i [10]. These datasets iclude a variety of tasks icludig all digits from the MNIST dataset. Each dataset was divided ito three parts: 50% for traiig, 5% for validatio ad 5% for test. The total umber of examples was restricted to 5000 i the case of MNIST ad musk datasets due to computatioal restrictios of solvig LP/QP. The first set of experimets use decisio stumps as the weak learers. The secod set of experimets used Classificatio ad Regressio Trees or CART [3] as weak learers. A stadard MATLAB implemetatio of CART was used without modificatio. For all the datasets, i both experimets, 7

8 AdaBoost, VadaBoost ad EBBoost (i the case of stumps) were ru util there was o drop i the error rate o the validatio for the ext 100 cosecutive iteratios. The values of the parameters for VadaBoost ad EBBoost were chose to miimize the validatio error upo termiatio. RLP-Boost ad RQP-Boost were give the predictios obtaied by AdaBoost. Their regularizatio parameter was also chose to miimize the error rate o the validatio set. Oce the parameter values were fixed via the validatio set, we oted the test set error correspodig to that parameter value. The etire experimet was repeated 50 times by radomly selectig trai, test ad validatio sets. The umbers reported here are average from these rus. The results for the decisio stump ad CART experimets are reported i Tables 1 ad. For each dataset, the algorithm with the best percetage test error is represeted by a dark shaded cell. All lightly shaded etries i a row deote results that are ot sigificatly differet from the miimum error (accordig to a paired t-test at a 1% sigificace level). With decisio stumps, both EBBoost ad VadaBoost have comparable performace ad sigificatly outperform AdaBoost. With CART as the weak learer, VadaBoost is oce agai sigificatly better tha AdaBoost. We gave a guaratee o the umber of iteratios required i the worst case for Vadaboost (which approximately matches the AdaBoost cost (squared) i Theorem 5). A assumptio i that theorem was that the error rate of each weak learer was fixed. However, i practice, the error rates of the weak learers are ot costat over the iteratios. To see this behavior i practice, we have show the results with the MNIST 3 versus 8 classificatio experimet. I figure we show the cost (plus 1) for each algorithm (the AdaBoost cost has bee squared) versus the umber of iteratios usig a logarithmic scale o the Y-axis. Sice at λ = 0, EBBoost reduces to AdaBoost, we omit its plot at that settig. From the figure, it ca be see cost AdaBoost EBBoost λ=0.5 VadaBoost λ=0 VadaBoost λ= Iteratio Figure : 1+ cost vs the umber of iteratios. that the umber of iteratios required by VadaBoost is roughly twice the umber of iteratios required by AdaBoost. At λ = 0.5, there is oly a mior differece i the umber of iteratios required by EBBoost ad VadaBoost. 7 Coclusios This paper idetified a key weakess i the EBBoost algorithm ad proposed a ovel algorithm that efficietly overcomes the limitatio to eumerable weak learers. VadaBoost reduces a well motivated cost by iteratively miimizig a upper boud which, ulike EBBoost, allows the boostig method to hadle ay weak learer by estimatig weights o the data. The update rule of VadaBoost has a simplicity that is remiiscet of AdaBoost. Furthermore, despite the use of a upper boud, the ovel boostig method remais efficiet. Eve whe the boud is at its loosest, the umber of iteratios required by VadaBoost is a small costat factor more tha the umber of iteratios required by AdaBoost. Experimetal results showed that VadaBoost outperforms AdaBoost i terms of classificatio accuracy ad efficietly applyig to ay family of weak learers. The effectiveess of boostig has bee explaied via margi theory [9] though it has take a umber of years to settle certai ope questios [8]. Cosiderig the simplicity ad effectiveess of VadaBoost, oe atural future research directio is to study the margi distributios it obtais. Aother future research directio is to desig efficiet sample variace pealizatio algorithms for other problems such as multi-class classificatio, rakig, ad so o. Ackowledgemets This material is based upo work supported by the Natioal Sciece Foudatio uder Grat No , by a Google Research Award, ad by the Departmet of Homelad Security uder Grat No. N C

9 Refereces [1] J-Y. Audibert, R. Muos, ad C. Szepesvári. Tuig badit algorithms i stochastic eviromets. I ALT, 007. [] P. L. Bartlett, M. I. Jorda, ad J. D. McAuliffe. Covexity, classificatio, ad risk bouds. Joural of the America Statistical Associatio, 101(473): , 006. [3] L. Breima, J.H. Friedma, R.A. Olshe, ad C.J. Stoe. Classificatio ad Regressio Trees. Chapma ad Hall, New York, [4] Y. Freud ad R. E. Schapire. A decisio-theoretic geeralizatio of o-lie learig ad a applicatio to boostig. Joural of Computer ad System Scieces, 55(1): , [5] A. Maurer ad M. Potil. Empirical Berstei bouds ad sample variace pealizatio. I COLT, 009. [6] V. Mih, C. Szepesvári, ad J-Y. Audibert. Empirical Berstei stoppig. I COLT, 008. [7] G. Raetsch, T. Ooda, ad K.-R. Muller. Soft margis for AdaBoost. Machie Learig, 43:87 30, 001. [8] L. Reyzi ad R. Schapire. How boostig the margi ca also boost classifier complexity. I ICML, 006. [9] R. E. Schapire, Y. Freud, P. L. Bartlett, ad W. S. Lee. Boostig the margi: a ew explaatio for the effectiveess of votig methods. Aals of Statistics, 6(5): , [10] P. K. Shivaswamy ad T. Jebara. Empirical Berstei boostig. I AISTATS, 010. [11] V. Vapik. The Nature of Statistical Learig Theory. Spriger, New York, NY,

Empirical Bernstein Boosting

Empirical Bernstein Boosting Empirical Berstei Boostig Paagadatta K. Shivaswamy Departmet of Computer Sciece Columbia Uiversity, New York NY 0027 pks203@cs.columbia.edu Toy Jebara Departmet of Computer Sciece Columbia Uiversity, New

More information

10-701/ Machine Learning Mid-term Exam Solution

10-701/ Machine Learning Mid-term Exam Solution 0-70/5-78 Machie Learig Mid-term Exam Solutio Your Name: Your Adrew ID: True or False (Give oe setece explaatio) (20%). (F) For a cotiuous radom variable x ad its probability distributio fuctio p(x), it

More information

Optimally Sparse SVMs

Optimally Sparse SVMs A. Proof of Lemma 3. We here prove a lower boud o the umber of support vectors to achieve geeralizatio bouds of the form which we cosider. Importatly, this result holds ot oly for liear classifiers, but

More information

ECE 901 Lecture 12: Complexity Regularization and the Squared Loss

ECE 901 Lecture 12: Complexity Regularization and the Squared Loss ECE 90 Lecture : Complexity Regularizatio ad the Squared Loss R. Nowak 5/7/009 I the previous lectures we made use of the Cheroff/Hoeffdig bouds for our aalysis of classifier errors. Hoeffdig s iequality

More information

Lecture 9: Boosting. Akshay Krishnamurthy October 3, 2017

Lecture 9: Boosting. Akshay Krishnamurthy October 3, 2017 Lecture 9: Boostig Akshay Krishamurthy akshay@csumassedu October 3, 07 Recap Last week we discussed some algorithmic aspects of machie learig We saw oe very powerful family of learig algorithms, amely

More information

A survey on penalized empirical risk minimization Sara A. van de Geer

A survey on penalized empirical risk minimization Sara A. van de Geer A survey o pealized empirical risk miimizatio Sara A. va de Geer We address the questio how to choose the pealty i empirical risk miimizatio. Roughly speakig, this pealty should be a good boud for the

More information

Boosting. Professor Ameet Talwalkar. Professor Ameet Talwalkar CS260 Machine Learning Algorithms March 1, / 32

Boosting. Professor Ameet Talwalkar. Professor Ameet Talwalkar CS260 Machine Learning Algorithms March 1, / 32 Boostig Professor Ameet Talwalkar Professor Ameet Talwalkar CS260 Machie Learig Algorithms March 1, 2017 1 / 32 Outlie 1 Admiistratio 2 Review of last lecture 3 Boostig Professor Ameet Talwalkar CS260

More information

Intro to Learning Theory

Intro to Learning Theory Lecture 1, October 18, 2016 Itro to Learig Theory Ruth Urer 1 Machie Learig ad Learig Theory Comig soo 2 Formal Framework 21 Basic otios I our formal model for machie learig, the istaces to be classified

More information

1 Review of Probability & Statistics

1 Review of Probability & Statistics 1 Review of Probability & Statistics a. I a group of 000 people, it has bee reported that there are: 61 smokers 670 over 5 960 people who imbibe (drik alcohol) 86 smokers who imbibe 90 imbibers over 5

More information

w (1) ˆx w (1) x (1) /ρ and w (2) ˆx w (2) x (2) /ρ.

w (1) ˆx w (1) x (1) /ρ and w (2) ˆx w (2) x (2) /ρ. 2 5. Weighted umber of late jobs 5.1. Release dates ad due dates: maximimizig the weight of o-time jobs Oce we add release dates, miimizig the umber of late jobs becomes a sigificatly harder problem. For

More information

REGRESSION WITH QUADRATIC LOSS

REGRESSION WITH QUADRATIC LOSS REGRESSION WITH QUADRATIC LOSS MAXIM RAGINSKY Regressio with quadratic loss is aother basic problem studied i statistical learig theory. We have a radom couple Z = X, Y ), where, as before, X is a R d

More information

Support vector machine revisited

Support vector machine revisited 6.867 Machie learig, lecture 8 (Jaakkola) 1 Lecture topics: Support vector machie ad kerels Kerel optimizatio, selectio Support vector machie revisited Our task here is to first tur the support vector

More information

Ada Boost, Risk Bounds, Concentration Inequalities. 1 AdaBoost and Estimates of Conditional Probabilities

Ada Boost, Risk Bounds, Concentration Inequalities. 1 AdaBoost and Estimates of Conditional Probabilities CS8B/Stat4B Sprig 008) Statistical Learig Theory Lecture: Ada Boost, Risk Bouds, Cocetratio Iequalities Lecturer: Peter Bartlett Scribe: Subhrasu Maji AdaBoost ad Estimates of Coditioal Probabilities We

More information

6.883: Online Methods in Machine Learning Alexander Rakhlin

6.883: Online Methods in Machine Learning Alexander Rakhlin 6.883: Olie Methods i Machie Learig Alexader Rakhli LECTURE 23. SOME CONSEQUENCES OF ONLINE NO-REGRET METHODS I this lecture, we explore some cosequeces of the developed techiques.. Covex optimizatio Wheever

More information

Regression with quadratic loss

Regression with quadratic loss Regressio with quadratic loss Maxim Ragisky October 13, 2015 Regressio with quadratic loss is aother basic problem studied i statistical learig theory. We have a radom couple Z = X,Y, where, as before,

More information

Vector Quantization: a Limiting Case of EM

Vector Quantization: a Limiting Case of EM . Itroductio & defiitios Assume that you are give a data set X = { x j }, j { 2,,, }, of d -dimesioal vectors. The vector quatizatio (VQ) problem requires that we fid a set of prototype vectors Z = { z

More information

Empirical Process Theory and Oracle Inequalities

Empirical Process Theory and Oracle Inequalities Stat 928: Statistical Learig Theory Lecture: 10 Empirical Process Theory ad Oracle Iequalities Istructor: Sham Kakade 1 Risk vs Risk See Lecture 0 for a discussio o termiology. 2 The Uio Boud / Boferoi

More information

Machine Learning Theory Tübingen University, WS 2016/2017 Lecture 12

Machine Learning Theory Tübingen University, WS 2016/2017 Lecture 12 Machie Learig Theory Tübige Uiversity, WS 06/07 Lecture Tolstikhi Ilya Abstract I this lecture we derive risk bouds for kerel methods. We will start by showig that Soft Margi kerel SVM correspods to miimizig

More information

Machine Learning Brett Bernstein

Machine Learning Brett Bernstein Machie Learig Brett Berstei Week 2 Lecture: Cocept Check Exercises Starred problems are optioal. Excess Risk Decompositio 1. Let X = Y = {1, 2,..., 10}, A = {1,..., 10, 11} ad suppose the data distributio

More information

Topics Machine learning: lecture 2. Review: the learning problem. Hypotheses and estimation. Estimation criterion cont d. Estimation criterion

Topics Machine learning: lecture 2. Review: the learning problem. Hypotheses and estimation. Estimation criterion cont d. Estimation criterion .87 Machie learig: lecture Tommi S. Jaakkola MIT CSAIL tommi@csail.mit.edu Topics The learig problem hypothesis class, estimatio algorithm loss ad estimatio criterio samplig, empirical ad epected losses

More information

OPTIMAL ALGORITHMS -- SUPPLEMENTAL NOTES

OPTIMAL ALGORITHMS -- SUPPLEMENTAL NOTES OPTIMAL ALGORITHMS -- SUPPLEMENTAL NOTES Peter M. Maurer Why Hashig is θ(). As i biary search, hashig assumes that keys are stored i a array which is idexed by a iteger. However, hashig attempts to bypass

More information

Introduction to Machine Learning DIS10

Introduction to Machine Learning DIS10 CS 189 Fall 017 Itroductio to Machie Learig DIS10 1 Fu with Lagrage Multipliers (a) Miimize the fuctio such that f (x,y) = x + y x + y = 3. Solutio: The Lagragia is: L(x,y,λ) = x + y + λ(x + y 3) Takig

More information

6.867 Machine learning

6.867 Machine learning 6.867 Machie learig Mid-term exam October, ( poits) Your ame ad MIT ID: Problem We are iterested here i a particular -dimesioal liear regressio problem. The dataset correspodig to this problem has examples

More information

Machine Learning Theory (CS 6783)

Machine Learning Theory (CS 6783) Machie Learig Theory (CS 6783) Lecture 2 : Learig Frameworks, Examples Settig up learig problems. X : istace space or iput space Examples: Computer Visio: Raw M N image vectorized X = 0, 255 M N, SIFT

More information

Properties and Hypothesis Testing

Properties and Hypothesis Testing Chapter 3 Properties ad Hypothesis Testig 3.1 Types of data The regressio techiques developed i previous chapters ca be applied to three differet kids of data. 1. Cross-sectioal data. 2. Time series data.

More information

A Risk Comparison of Ordinary Least Squares vs Ridge Regression

A Risk Comparison of Ordinary Least Squares vs Ridge Regression Joural of Machie Learig Research 14 (2013) 1505-1511 Submitted 5/12; Revised 3/13; Published 6/13 A Risk Compariso of Ordiary Least Squares vs Ridge Regressio Paramveer S. Dhillo Departmet of Computer

More information

6.883: Online Methods in Machine Learning Alexander Rakhlin

6.883: Online Methods in Machine Learning Alexander Rakhlin 6.883: Olie Methods i Machie Learig Alexader Rakhli LECTURES 5 AND 6. THE EXPERTS SETTING. EXPONENTIAL WEIGHTS All the algorithms preseted so far halluciate the future values as radom draws ad the perform

More information

Machine Learning. Ilya Narsky, Caltech

Machine Learning. Ilya Narsky, Caltech Machie Learig Ilya Narsky, Caltech Lecture 4 Multi-class problems. Multi-class versios of Neural Networks, Decisio Trees, Support Vector Machies ad AdaBoost. Reductio of a multi-class problem to a set

More information

1 Review and Overview

1 Review and Overview CS9T/STATS3: Statistical Learig Theory Lecturer: Tegyu Ma Lecture #6 Scribe: Jay Whag ad Patrick Cho October 0, 08 Review ad Overview Recall i the last lecture that for ay family of scalar fuctios F, we

More information

Information-based Feature Selection

Information-based Feature Selection Iformatio-based Feature Selectio Farza Faria, Abbas Kazeroui, Afshi Babveyh Email: {faria,abbask,afshib}@staford.edu 1 Itroductio Feature selectio is a topic of great iterest i applicatios dealig with

More information

Lecture 2: Monte Carlo Simulation

Lecture 2: Monte Carlo Simulation STAT/Q SCI 43: Itroductio to Resamplig ethods Sprig 27 Istructor: Ye-Chi Che Lecture 2: ote Carlo Simulatio 2 ote Carlo Itegratio Assume we wat to evaluate the followig itegratio: e x3 dx What ca we do?

More information

Resampling Methods. X (1/2), i.e., Pr (X i m) = 1/2. We order the data: X (1) X (2) X (n). Define the sample median: ( n.

Resampling Methods. X (1/2), i.e., Pr (X i m) = 1/2. We order the data: X (1) X (2) X (n). Define the sample median: ( n. Jauary 1, 2019 Resamplig Methods Motivatio We have so may estimators with the property θ θ d N 0, σ 2 We ca also write θ a N θ, σ 2 /, where a meas approximately distributed as Oce we have a cosistet estimator

More information

Binary classification, Part 1

Binary classification, Part 1 Biary classificatio, Part 1 Maxim Ragisky September 25, 2014 The problem of biary classificatio ca be stated as follows. We have a radom couple Z = (X,Y ), where X R d is called the feature vector ad Y

More information

Lecture 10 October Minimaxity and least favorable prior sequences

Lecture 10 October Minimaxity and least favorable prior sequences STATS 300A: Theory of Statistics Fall 205 Lecture 0 October 22 Lecturer: Lester Mackey Scribe: Brya He, Rahul Makhijai Warig: These otes may cotai factual ad/or typographic errors. 0. Miimaxity ad least

More information

Rademacher Complexity

Rademacher Complexity EECS 598: Statistical Learig Theory, Witer 204 Topic 0 Rademacher Complexity Lecturer: Clayto Scott Scribe: Ya Deg, Kevi Moo Disclaimer: These otes have ot bee subjected to the usual scrutiy reserved for

More information

Mixtures of Gaussians and the EM Algorithm

Mixtures of Gaussians and the EM Algorithm Mixtures of Gaussias ad the EM Algorithm CSE 6363 Machie Learig Vassilis Athitsos Computer Sciece ad Egieerig Departmet Uiversity of Texas at Arligto 1 Gaussias A popular way to estimate probability desity

More information

NYU Center for Data Science: DS-GA 1003 Machine Learning and Computational Statistics (Spring 2018)

NYU Center for Data Science: DS-GA 1003 Machine Learning and Computational Statistics (Spring 2018) NYU Ceter for Data Sciece: DS-GA 003 Machie Learig ad Computatioal Statistics (Sprig 208) Brett Berstei, David Roseberg, Be Jakubowski Jauary 20, 208 Istructios: Followig most lab ad lecture sectios, we

More information

Lecture 11: Decision Trees

Lecture 11: Decision Trees ECE9 Sprig 7 Statistical Learig Theory Istructor: R. Nowak Lecture : Decisio Trees Miimum Complexity Pealized Fuctio Recall the basic results of the last lectures: let X ad Y deote the iput ad output spaces

More information

Monte Carlo Integration

Monte Carlo Integration Mote Carlo Itegratio I these otes we first review basic umerical itegratio methods (usig Riema approximatio ad the trapezoidal rule) ad their limitatios for evaluatig multidimesioal itegrals. Next we itroduce

More information

Lecture 15: Learning Theory: Concentration Inequalities

Lecture 15: Learning Theory: Concentration Inequalities STAT 425: Itroductio to Noparametric Statistics Witer 208 Lecture 5: Learig Theory: Cocetratio Iequalities Istructor: Ye-Chi Che 5. Itroductio Recall that i the lecture o classificatio, we have see that

More information

ECE 8527: Introduction to Machine Learning and Pattern Recognition Midterm # 1. Vaishali Amin Fall, 2015

ECE 8527: Introduction to Machine Learning and Pattern Recognition Midterm # 1. Vaishali Amin Fall, 2015 ECE 8527: Itroductio to Machie Learig ad Patter Recogitio Midterm # 1 Vaishali Ami Fall, 2015 tue39624@temple.edu Problem No. 1: Cosider a two-class discrete distributio problem: ω 1 :{[0,0], [2,0], [2,2],

More information

Agnostic Learning and Concentration Inequalities

Agnostic Learning and Concentration Inequalities ECE901 Sprig 2004 Statistical Regularizatio ad Learig Theory Lecture: 7 Agostic Learig ad Cocetratio Iequalities Lecturer: Rob Nowak Scribe: Aravid Kailas 1 Itroductio 1.1 Motivatio I the last lecture

More information

CS284A: Representations and Algorithms in Molecular Biology

CS284A: Representations and Algorithms in Molecular Biology CS284A: Represetatios ad Algorithms i Molecular Biology Scribe Notes o Lectures 3 & 4: Motif Discovery via Eumeratio & Motif Represetatio Usig Positio Weight Matrix Joshua Gervi Based o presetatios by

More information

Estimation for Complete Data

Estimation for Complete Data Estimatio for Complete Data complete data: there is o loss of iformatio durig study. complete idividual complete data= grouped data A complete idividual data is the oe i which the complete iformatio of

More information

Element sampling: Part 2

Element sampling: Part 2 Chapter 4 Elemet samplig: Part 2 4.1 Itroductio We ow cosider uequal probability samplig desigs which is very popular i practice. I the uequal probability samplig, we ca improve the efficiecy of the resultig

More information

CSCI567 Machine Learning (Fall 2014)

CSCI567 Machine Learning (Fall 2014) CSCI567 Machie Learig (Fall 2014) Drs. Sha & Liu {feisha,yaliu.cs}@usc.edu October 14, 2014 Drs. Sha & Liu ({feisha,yaliu.cs}@usc.edu) CSCI567 Machie Learig (Fall 2014) October 14, 2014 1 / 49 Outlie Admiistratio

More information

Machine Learning for Data Science (CS 4786)

Machine Learning for Data Science (CS 4786) Machie Learig for Data Sciece CS 4786) Lecture & 3: Pricipal Compoet Aalysis The text i black outlies high level ideas. The text i blue provides simple mathematical details to derive or get to the algorithm

More information

There is no straightforward approach for choosing the warmup period l.

There is no straightforward approach for choosing the warmup period l. B. Maddah INDE 504 Discrete-Evet Simulatio Output Aalysis () Statistical Aalysis for Steady-State Parameters I a otermiatig simulatio, the iterest is i estimatig the log ru steady state measures of performace.

More information

Machine Learning Theory Tübingen University, WS 2016/2017 Lecture 3

Machine Learning Theory Tübingen University, WS 2016/2017 Lecture 3 Machie Learig Theory Tübige Uiversity, WS 06/07 Lecture 3 Tolstikhi Ilya Abstract I this lecture we will prove the VC-boud, which provides a high-probability excess risk boud for the ERM algorithm whe

More information

ECE 901 Lecture 14: Maximum Likelihood Estimation and Complexity Regularization

ECE 901 Lecture 14: Maximum Likelihood Estimation and Complexity Regularization ECE 90 Lecture 4: Maximum Likelihood Estimatio ad Complexity Regularizatio R Nowak 5/7/009 Review : Maximum Likelihood Estimatio We have iid observatios draw from a ukow distributio Y i iid p θ, i,, where

More information

Infinite Sequences and Series

Infinite Sequences and Series Chapter 6 Ifiite Sequeces ad Series 6.1 Ifiite Sequeces 6.1.1 Elemetary Cocepts Simply speakig, a sequece is a ordered list of umbers writte: {a 1, a 2, a 3,...a, a +1,...} where the elemets a i represet

More information

Machine Learning Brett Bernstein

Machine Learning Brett Bernstein Machie Learig Brett Berstei Week Lecture: Cocept Check Exercises Starred problems are optioal. Statistical Learig Theory. Suppose A = Y = R ad X is some other set. Furthermore, assume P X Y is a discrete

More information

Lecture 3: August 31

Lecture 3: August 31 36-705: Itermediate Statistics Fall 018 Lecturer: Siva Balakrisha Lecture 3: August 31 This lecture will be mostly a summary of other useful expoetial tail bouds We will ot prove ay of these i lecture,

More information

MA131 - Analysis 1. Workbook 3 Sequences II

MA131 - Analysis 1. Workbook 3 Sequences II MA3 - Aalysis Workbook 3 Sequeces II Autum 2004 Cotets 2.8 Coverget Sequeces........................ 2.9 Algebra of Limits......................... 2 2.0 Further Useful Results........................

More information

Maximum Likelihood Estimation and Complexity Regularization

Maximum Likelihood Estimation and Complexity Regularization ECE90 Sprig 004 Statistical Regularizatio ad Learig Theory Lecture: 4 Maximum Likelihood Estimatio ad Complexity Regularizatio Lecturer: Rob Nowak Scribe: Pam Limpiti Review : Maximum Likelihood Estimatio

More information

Statistical Pattern Recognition

Statistical Pattern Recognition Statistical Patter Recogitio Classificatio: No-Parametric Modelig Hamid R. Rabiee Jafar Muhammadi Sprig 2014 http://ce.sharif.edu/courses/92-93/2/ce725-2/ Ageda Parametric Modelig No-Parametric Modelig

More information

Discrete Mathematics for CS Spring 2008 David Wagner Note 22

Discrete Mathematics for CS Spring 2008 David Wagner Note 22 CS 70 Discrete Mathematics for CS Sprig 2008 David Wager Note 22 I.I.D. Radom Variables Estimatig the bias of a coi Questio: We wat to estimate the proportio p of Democrats i the US populatio, by takig

More information

4. Partial Sums and the Central Limit Theorem

4. Partial Sums and the Central Limit Theorem 1 of 10 7/16/2009 6:05 AM Virtual Laboratories > 6. Radom Samples > 1 2 3 4 5 6 7 4. Partial Sums ad the Cetral Limit Theorem The cetral limit theorem ad the law of large umbers are the two fudametal theorems

More information

Linear regression. Daniel Hsu (COMS 4771) (y i x T i β)2 2πσ. 2 2σ 2. 1 n. (x T i β y i ) 2. 1 ˆβ arg min. β R n d

Linear regression. Daniel Hsu (COMS 4771) (y i x T i β)2 2πσ. 2 2σ 2. 1 n. (x T i β y i ) 2. 1 ˆβ arg min. β R n d Liear regressio Daiel Hsu (COMS 477) Maximum likelihood estimatio Oe of the simplest liear regressio models is the followig: (X, Y ),..., (X, Y ), (X, Y ) are iid radom pairs takig values i R d R, ad Y

More information

18.657: Mathematics of Machine Learning

18.657: Mathematics of Machine Learning 8.657: Mathematics of Machie Learig Lecturer: Philippe Rigollet Lecture 4 Scribe: Cheg Mao Sep., 05 I this lecture, we cotiue to discuss the effect of oise o the rate of the excess risk E(h) = R(h) R(h

More information

Roberto s Notes on Series Chapter 2: Convergence tests Section 7. Alternating series

Roberto s Notes on Series Chapter 2: Convergence tests Section 7. Alternating series Roberto s Notes o Series Chapter 2: Covergece tests Sectio 7 Alteratig series What you eed to kow already: All basic covergece tests for evetually positive series. What you ca lear here: A test for series

More information

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.436J/15.085J Fall 2008 Lecture 19 11/17/2008 LAWS OF LARGE NUMBERS II THE STRONG LAW OF LARGE NUMBERS

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.436J/15.085J Fall 2008 Lecture 19 11/17/2008 LAWS OF LARGE NUMBERS II THE STRONG LAW OF LARGE NUMBERS MASSACHUSTTS INSTITUT OF TCHNOLOGY 6.436J/5.085J Fall 2008 Lecture 9 /7/2008 LAWS OF LARG NUMBRS II Cotets. The strog law of large umbers 2. The Cheroff boud TH STRONG LAW OF LARG NUMBRS While the weak

More information

TR/46 OCTOBER THE ZEROS OF PARTIAL SUMS OF A MACLAURIN EXPANSION A. TALBOT

TR/46 OCTOBER THE ZEROS OF PARTIAL SUMS OF A MACLAURIN EXPANSION A. TALBOT TR/46 OCTOBER 974 THE ZEROS OF PARTIAL SUMS OF A MACLAURIN EXPANSION by A. TALBOT .. Itroductio. A problem i approximatio theory o which I have recetly worked [] required for its solutio a proof that the

More information

(A sequence also can be thought of as the list of function values attained for a function f :ℵ X, where f (n) = x n for n 1.) x 1 x N +k x N +4 x 3

(A sequence also can be thought of as the list of function values attained for a function f :ℵ X, where f (n) = x n for n 1.) x 1 x N +k x N +4 x 3 MATH 337 Sequeces Dr. Neal, WKU Let X be a metric space with distace fuctio d. We shall defie the geeral cocept of sequece ad limit i a metric space, the apply the results i particular to some special

More information

1 Inferential Methods for Correlation and Regression Analysis

1 Inferential Methods for Correlation and Regression Analysis 1 Iferetial Methods for Correlatio ad Regressio Aalysis I the chapter o Correlatio ad Regressio Aalysis tools for describig bivariate cotiuous data were itroduced. The sample Pearso Correlatio Coefficiet

More information

Random Walks on Discrete and Continuous Circles. by Jeffrey S. Rosenthal School of Mathematics, University of Minnesota, Minneapolis, MN, U.S.A.

Random Walks on Discrete and Continuous Circles. by Jeffrey S. Rosenthal School of Mathematics, University of Minnesota, Minneapolis, MN, U.S.A. Radom Walks o Discrete ad Cotiuous Circles by Jeffrey S. Rosethal School of Mathematics, Uiversity of Miesota, Mieapolis, MN, U.S.A. 55455 (Appeared i Joural of Applied Probability 30 (1993), 780 789.)

More information

On Random Line Segments in the Unit Square

On Random Line Segments in the Unit Square O Radom Lie Segmets i the Uit Square Thomas A. Courtade Departmet of Electrical Egieerig Uiversity of Califoria Los Ageles, Califoria 90095 Email: tacourta@ee.ucla.edu I. INTRODUCTION Let Q = [0, 1] [0,

More information

Linear Support Vector Machines

Linear Support Vector Machines Liear Support Vector Machies David S. Roseberg The Support Vector Machie For a liear support vector machie (SVM), we use the hypothesis space of affie fuctios F = { f(x) = w T x + b w R d, b R } ad evaluate

More information

4.1 Sigma Notation and Riemann Sums

4.1 Sigma Notation and Riemann Sums 0 the itegral. Sigma Notatio ad Riema Sums Oe strategy for calculatig the area of a regio is to cut the regio ito simple shapes, calculate the area of each simple shape, ad the add these smaller areas

More information

1 Review and Overview

1 Review and Overview DRAFT a fial versio will be posted shortly CS229T/STATS231: Statistical Learig Theory Lecturer: Tegyu Ma Lecture #3 Scribe: Migda Qiao October 1, 2013 1 Review ad Overview I the first half of this course,

More information

CS434a/541a: Pattern Recognition Prof. Olga Veksler. Lecture 5

CS434a/541a: Pattern Recognition Prof. Olga Veksler. Lecture 5 CS434a/54a: Patter Recogitio Prof. Olga Veksler Lecture 5 Today Itroductio to parameter estimatio Two methods for parameter estimatio Maimum Likelihood Estimatio Bayesia Estimatio Itroducto Bayesia Decisio

More information

Topic 9: Sampling Distributions of Estimators

Topic 9: Sampling Distributions of Estimators Topic 9: Samplig Distributios of Estimators Course 003, 2016 Page 0 Samplig distributios of estimators Sice our estimators are statistics (particular fuctios of radom variables), their distributio ca be

More information

Statistics 511 Additional Materials

Statistics 511 Additional Materials Cofidece Itervals o mu Statistics 511 Additioal Materials This topic officially moves us from probability to statistics. We begi to discuss makig ifereces about the populatio. Oe way to differetiate probability

More information

Topic 9: Sampling Distributions of Estimators

Topic 9: Sampling Distributions of Estimators Topic 9: Samplig Distributios of Estimators Course 003, 2018 Page 0 Samplig distributios of estimators Sice our estimators are statistics (particular fuctios of radom variables), their distributio ca be

More information

This is an introductory course in Analysis of Variance and Design of Experiments.

This is an introductory course in Analysis of Variance and Design of Experiments. 1 Notes for M 384E, Wedesday, Jauary 21, 2009 (Please ote: I will ot pass out hard-copy class otes i future classes. If there are writte class otes, they will be posted o the web by the ight before class

More information

Regression with an Evaporating Logarithmic Trend

Regression with an Evaporating Logarithmic Trend Regressio with a Evaporatig Logarithmic Tred Peter C. B. Phillips Cowles Foudatio, Yale Uiversity, Uiversity of Aucklad & Uiversity of York ad Yixiao Su Departmet of Ecoomics Yale Uiversity October 5,

More information

Machine Learning Assignment-1

Machine Learning Assignment-1 Uiversity of Utah, School Of Computig Machie Learig Assigmet-1 Chadramouli, Shridhara sdhara@cs.utah.edu 00873255) Sigla, Sumedha sumedha.sigla@utah.edu 00877456) September 10, 2013 1 Liear Regressio a)

More information

Preponderantly increasing/decreasing data in regression analysis

Preponderantly increasing/decreasing data in regression analysis Croatia Operatioal Research Review 269 CRORR 7(2016), 269 276 Prepoderatly icreasig/decreasig data i regressio aalysis Darija Marković 1, 1 Departmet of Mathematics, J. J. Strossmayer Uiversity of Osijek,

More information

Solution of Final Exam : / Machine Learning

Solution of Final Exam : / Machine Learning Solutio of Fial Exam : 10-701/15-781 Machie Learig Fall 2004 Dec. 12th 2004 Your Adrew ID i capital letters: Your full ame: There are 9 questios. Some of them are easy ad some are more difficult. So, if

More information

Hypothesis Testing. Evaluation of Performance of Learned h. Issues. Trade-off Between Bias and Variance

Hypothesis Testing. Evaluation of Performance of Learned h. Issues. Trade-off Between Bias and Variance Hypothesis Testig Empirically evaluatig accuracy of hypotheses: importat activity i ML. Three questios: Give observed accuracy over a sample set, how well does this estimate apply over additioal samples?

More information

Bayesian Methods: Introduction to Multi-parameter Models

Bayesian Methods: Introduction to Multi-parameter Models Bayesia Methods: Itroductio to Multi-parameter Models Parameter: θ = ( θ, θ) Give Likelihood p(y θ) ad prior p(θ ), the posterior p proportioal to p(y θ) x p(θ ) Margial posterior ( θ, θ y) is Iterested

More information

1 Duality revisited. AM 221: Advanced Optimization Spring 2016

1 Duality revisited. AM 221: Advanced Optimization Spring 2016 AM 22: Advaced Optimizatio Sprig 206 Prof. Yaro Siger Sectio 7 Wedesday, Mar. 9th Duality revisited I this sectio, we will give a slightly differet perspective o duality. optimizatio program: f(x) x R

More information

Estimation of a population proportion March 23,

Estimation of a population proportion March 23, 1 Social Studies 201 Notes for March 23, 2005 Estimatio of a populatio proportio Sectio 8.5, p. 521. For the most part, we have dealt with meas ad stadard deviatios this semester. This sectio of the otes

More information

Lecture 7: October 18, 2017

Lecture 7: October 18, 2017 Iformatio ad Codig Theory Autum 207 Lecturer: Madhur Tulsiai Lecture 7: October 8, 207 Biary hypothesis testig I this lecture, we apply the tools developed i the past few lectures to uderstad the problem

More information

Chapter 6 Sampling Distributions

Chapter 6 Sampling Distributions Chapter 6 Samplig Distributios 1 I most experimets, we have more tha oe measuremet for ay give variable, each measuremet beig associated with oe radomly selected a member of a populatio. Hece we eed to

More information

Comparison of Minimum Initial Capital with Investment and Non-investment Discrete Time Surplus Processes

Comparison of Minimum Initial Capital with Investment and Non-investment Discrete Time Surplus Processes The 22 d Aual Meetig i Mathematics (AMM 207) Departmet of Mathematics, Faculty of Sciece Chiag Mai Uiversity, Chiag Mai, Thailad Compariso of Miimum Iitial Capital with Ivestmet ad -ivestmet Discrete Time

More information

Machine Learning for Data Science (CS 4786)

Machine Learning for Data Science (CS 4786) Machie Learig for Data Sciece CS 4786) Lecture 9: Pricipal Compoet Aalysis The text i black outlies mai ideas to retai from the lecture. The text i blue give a deeper uderstadig of how we derive or get

More information

Sequences and Series of Functions

Sequences and Series of Functions Chapter 6 Sequeces ad Series of Fuctios 6.1. Covergece of a Sequece of Fuctios Poitwise Covergece. Defiitio 6.1. Let, for each N, fuctio f : A R be defied. If, for each x A, the sequece (f (x)) coverges

More information

THE ASYMPTOTIC COMPLEXITY OF MATRIX REDUCTION OVER FINITE FIELDS

THE ASYMPTOTIC COMPLEXITY OF MATRIX REDUCTION OVER FINITE FIELDS THE ASYMPTOTIC COMPLEXITY OF MATRIX REDUCTION OVER FINITE FIELDS DEMETRES CHRISTOFIDES Abstract. Cosider a ivertible matrix over some field. The Gauss-Jorda elimiatio reduces this matrix to the idetity

More information

7.1 Convergence of sequences of random variables

7.1 Convergence of sequences of random variables Chapter 7 Limit Theorems Throughout this sectio we will assume a probability space (, F, P), i which is defied a ifiite sequece of radom variables (X ) ad a radom variable X. The fact that for every ifiite

More information

Analysis of the Chow-Robbins Game with Biased Coins

Analysis of the Chow-Robbins Game with Biased Coins Aalysis of the Chow-Robbis Game with Biased Cois Arju Mithal May 7, 208 Cotets Itroductio to Chow-Robbis 2 2 Recursive Framework for Chow-Robbis 2 3 Geeralizig the Lower Boud 3 4 Geeralizig the Upper Boud

More information

Algorithms for Clustering

Algorithms for Clustering CR2: Statistical Learig & Applicatios Algorithms for Clusterig Lecturer: J. Salmo Scribe: A. Alcolei Settig: give a data set X R p where is the umber of observatio ad p is the umber of features, we wat

More information

DEPARTMENT OF ACTUARIAL STUDIES RESEARCH PAPER SERIES

DEPARTMENT OF ACTUARIAL STUDIES RESEARCH PAPER SERIES DEPARTMENT OF ACTUARIAL STUDIES RESEARCH PAPER SERIES Icreasig ad Decreasig Auities ad Time Reversal by Jim Farmer Jim.Farmer@mq.edu.au Research Paper No. 2000/02 November 2000 Divisio of Ecoomic ad Fiacial

More information

Convergence of random variables. (telegram style notes) P.J.C. Spreij

Convergence of random variables. (telegram style notes) P.J.C. Spreij Covergece of radom variables (telegram style otes).j.c. Spreij this versio: September 6, 2005 Itroductio As we kow, radom variables are by defiitio measurable fuctios o some uderlyig measurable space

More information

Module 1 Fundamentals in statistics

Module 1 Fundamentals in statistics Normal Distributio Repeated observatios that differ because of experimetal error ofte vary about some cetral value i a roughly symmetrical distributio i which small deviatios occur much more frequetly

More information

STAT 350 Handout 19 Sampling Distribution, Central Limit Theorem (6.6)

STAT 350 Handout 19 Sampling Distribution, Central Limit Theorem (6.6) STAT 350 Hadout 9 Samplig Distributio, Cetral Limit Theorem (6.6) A radom sample is a sequece of radom variables X, X 2,, X that are idepedet ad idetically distributed. o This property is ofte abbreviated

More information

6.867 Machine learning, lecture 7 (Jaakkola) 1

6.867 Machine learning, lecture 7 (Jaakkola) 1 6.867 Machie learig, lecture 7 (Jaakkola) 1 Lecture topics: Kerel form of liear regressio Kerels, examples, costructio, properties Liear regressio ad kerels Cosider a slightly simpler model where we omit

More information

An Introduction to Randomized Algorithms

An Introduction to Randomized Algorithms A Itroductio to Radomized Algorithms The focus of this lecture is to study a radomized algorithm for quick sort, aalyze it usig probabilistic recurrece relatios, ad also provide more geeral tools for aalysis

More information

Self-normalized deviation inequalities with application to t-statistic

Self-normalized deviation inequalities with application to t-statistic Self-ormalized deviatio iequalities with applicatio to t-statistic Xiequa Fa Ceter for Applied Mathematics, Tiaji Uiversity, 30007 Tiaji, Chia Abstract Let ξ i i 1 be a sequece of idepedet ad symmetric

More information