Lecture 11: Decision Trees

Size: px
Start display at page:

Download "Lecture 11: Decision Trees"

Transcription

1 ECE9 Sprig 7 Statistical Learig Theory Istructor: R. Nowak Lecture : Decisio Trees Miimum Complexity Pealized Fuctio Recall the basic results of the last lectures: let X ad Y deote the iput ad output spaces respectively. Let X X ad Y X be radom variables with ukow joit probability distributio P XY. We would like to use X to predict Y. Cosider a loss fuctio l(y, y ), y, y Y. This fuctio is used to measure the accuracy of our predictio. Let F be a collectio of cadidate fuctios (models), f : X Y. The expected risk we icur is give by R(f) E XY [l(f(x), Y )]. We have access oly to a umber of i.i.d. samples, {X i, Y i } i=. These allow us to compute the empirical risk R (f) i= l(f(x i), Y i ). Assume i the followig that F is coutable. Assig a positive umber c(f) to each f F such that f F c(f). If we use a prefix code to describe each elemet of F ad defie c(f) to be the codeword legth (i bits) for each f F, the last iequality is automatically satisfied. We defie the miimum complexity pealized estimator as f arg mi f F R c(f) log + (f) + log. As we showed previously we have the boud )] mi f F R(f) + c(f) log + log +. The performace (risk) of f is o average better tha R(f) c(f + ) log + log +, where f = arg mi f F R(f) + c(f) log + log. If it happes that the optimal fuctio, that is f = arg mi f measurable R(f), is close to a f F with a small c(f), the f will perform almost as well as the optimal fuctio. Example Suppose f F, the )] R(f ) + c(f ) log + log +.

2 Lecture : Decisio Trees Furthermore if c(f ) = O(log ) the ( ) log )] R(f ) + O, ( ) log that is, oly withi a small O offset of the optimal risk. I geeral, we ca also boud the excess risk )] R, where R is the Bayes risk, R = if f measurable R(f). By subtractig R (a costat) from both sides of the iequality )] mi f F R(f) + c(f) log + log + we obtai )] R mi f F R(f) R + c(f) log + log +. Note that two terms i this upper boud: R(f) R is a boud o the approximatio error of a model f, ad remaider is a boud o the estimatio error associated with f. Thus, we see that complexity regularizatio automatically optimizes a balace betwee approximatio ad estimatio errors. I other words, complexity regularizatio is adaptive to the ukow tradeoff betwee approximatio ad estimatio. Classificatio Cosider the particularizatio of the above to a classificatio sceario. Let X = [, ] d, Y = {, } ad l(ŷ, y) {by y}. The R(f) = E XY [ {f(x) Y } ] = P (f(x) Y ). The Bayes risk is give by R = if f measurable R(f). As it was observed before, the Bayes classifier (i.e., a classifier that achieves the Bayes risk) is give by f (x) = {, P (Y = X = x), P (Y = X = x) <. This classifier ca be expressed i a differet way. Cosider the set G = {x : P (Y = X = x) /}. The Bayes classifier ca writte as f (x) = {x G }. Therefore the classifier is characterized etirely by the set G, if X G the the best guess is that Y is oe, ad vice-versa. The boudary of this set correspods to the poits where the decisio is harder. The boudary of G is called the Bayes Decisio Boudary. I Figure (a) this cocept is illustrated. If η(x) = P (Y = X = x) is a cotiuous fuctio the the Bayes decisio boudary is simply give by {x : P (Y = X = x) = /}. Clearly the structure of the decisio boudary provides importat iformatio o the difficulty of the problem.. Empirical Classifier Desig Give i.i.d. traiig pairs, {X i, Y i } i=, we wat to costruct a classifier f that performs well o average, i.e., we wat )] as close to R as possible. I Figure (b) a example of the i.i.d. traiig pairs is depicted.

3 Lecture : Decisio Trees 3 Bayes Decisio Boudary Bayes Decisio Boudary (a) (b) Figure : (a) The Bayes classifier ad the Bayes decisio boudary ; (b) Example of the i.i.d. traiig pairs. The costructio of a classifier boils dow to the estimatio of the Bayes decisio boudary. The histogram rule, discussed i a previous lecture, approaches the problem by subdividig the feature space ito small boxes ad takig a majority vote of the traiig data i each box. A typical result is depicted i Figure (a). The mai problem with the histogram rule is that it is solvig a more complicated problem tha it is actually ecessary. We do ot eed to determie the correct label for each idividual box directly (the histogram rule is essetially estimatig η(x)). I priciple we oly eed to locate the decisio boudary ad assig the correct label o either side (otice that the accuracy of a majority vote over a regio icreases with the size of the regio). The ext example illustrates this. Example (Three Differet Classifiers) The pictures below correspod to the approximatio of the Bayes classifier by three differet classifiers: Histogram Classifier Liear Classifier Tree Classifier (a) (b) (c) Figure : (a) Histogram classifier ; (b) Liear classifier; (c)decisio tree. The liear classifier ad the tree classifier (to be defied formally later) both attack the problem of fidig the boudary more directly tha the histogram classifier, ad therefore they ted to produce much better results i theory ad practice. I the followig we will demostrate this for decisio trees.

4 Lecture : Decisio Trees 4 3 Decisio Trees Decisio trees are costructed by a two-step process:. Tree growig. Tree pruig The basic idea is to first grow a very large, complicated tree classifier, that explais the the traiig data very accurately, but has poor geeralizatio characteristics, ad the prue this tree, to avoid overfittig. 3. Growig Trees The growig process is based o recursively subudividig the feature space. Usually the subdivisios are splits of existig regios ito two smaller regios (i.e., biary splits) ad usually the splits are perpedicular to oe of the feature axes. A example of such costructio is depicted i Figure 3. ad so o... Figure 3: Growig a recursive biary tree (X = [, ] ). Ofte the splittig process is based o the traiig dat, ad is desiged to separate data with differet labels as much as possible. It such costructios, the splits, ad hece the tree-structure itself, are data depedet. Alteratively, the splittig ad subdivisio could be idepedet from the traiig data. The latter approach is the oe we are goig to ivestigate i detail, ad we will cosider Dyadic Decisio Trees ad Recursive Dyadic Partitios (depicted i Figure 4) i particular. Util ow we have bee referrig to trees, but did ot made clear how do trees relate to partitios. It turs out that ay decisio tree ca be associated with a partitio of the iput space X ad vice-versa. I particular, a Recursive Dyadic Partitio (RDP) ca be associated with a (biary) tree. I fact, this is the most efficiet way of describig a RDP. I Figure 4 we illustrate the procedure. Each leaf of the tree correspods to a cell of the partitio. The odes i the tree correspod to the various partitio cells that are geerated through i the costructio of the tree. The orietatio of the dyadic split alterates betwee the levels of the tree (for the example of Figure 4, at the root level the split is doe i the horizotal axis, at the level below that (the level of odes ad 3) the split is doe i the vertical axis, ad so o...). The tree is called dyadic because the splits of cells are always at the midpoit alog oe coordiate axis, ad cosequetly the sidelegths of all cells are dyadic (i.e., powers of ). I the followig we are goig to cosider the -dimesioal case, but all the results ca be easily geeralized for the d-dimesioal case (d ), provided the dyadic tree costructio is defied properly. Cosider a recursive dyadic partitio of the feature space ito k boxes of equal size. Associated with this partitio is a tree T. Miimizig the empirical risk with respect to this partitio produces the histogram classifier with k equal-sized cells. Cosider also all the possible partitios correspodig to prued versios of the tree T. Miimizig the empirical risk with respect to those other partitios results i other classifiers (dyadic decisio trees) that are fudametally differet tha the histogram rule we aalyzed earlier. 3. Pruig Let F be the collectio of all possible dyadic decisio trees correspodig to recursive dyadic partitios of the feature space. Each such tree ca be prefix ecoded with a bit-strig proportioal to the umber of leafs

5 Lecture : Decisio Trees Figure 4: Example of Recursive Dyadic Partitio (RDP) growig (X = [, ] ). i the tree as follows; ecode the structure of the tree i a top-dow fashio: (i) assig a zero at each brach ode ad a oe at each leaf ode (termial ode) (ii) read the code i a breadth-first fashio, top-dow, left-right. Figure exemplifies this codig strategy. Notice that, sice we are cosiderig biary trees, the total umber of odes is twice the umber of leafs mius oe, that is, if the umber of leafs i the tree is k the the umber of odes is k. Therefore to ecode a tree with k leafs we eed k bits. Sice we wat to use the partitio associated with this tree for classificatio we eed to assig a decisio label (either zero or oe) to each leaf. Hece, to ecode a decisio tree i this fashio we eed 3k bits, where k is the umber of leafs. For a tree with k leafs the first k bits of the codeword ecode the tree structure, ad the remaiig k bits ecode the classificatio labels. This is easily show to be a prefix code, therefore we ca use this uder our classificatio sceario. Figure : Illustratio of the tree codig techique: example of a tree ad correspodig prefix code. Let f = arg mi f F R (3k ) log + (f) + log. This optimizatio ca be solved through a bottom-up pruig process (startig from a very large iitial tree T ) i O( T ) operatios, where T is the umber of leafs i the iitial tree. The complexity regularizatio theorem tells us that )] mi f F R(f) + (3k ) log + log +. ()

6 Lecture : Decisio Trees 6 4 Compariso betwee Histogram Classifiers ad Classificatio Trees I the followig we will illustrate the idea behid complexity regularizatio by applyig the basic theorem to histogram classifiers ad decisio trees (usig our setup above). Cosider the classificatio setup described i Sectio, with X = [, ]. 4. Histogram Risk Boud Recall the setup ad results of a previous lecture. Let F H k = {histogram rules with k cells}. The Fk H =. Let F H = k k F k H. We ca ecode each elemet f of F H with c H (f) = k + k bits, where the first k bits idicate the smallest k such that f Fk H ad the followig k bits ecode the labels of each bi. This is a prefix ecodig of all the elemets i F H. We defie our estimator as f H = f ( b k), where f (k) = arg mi R (f), f Fk H ad Therefore f H miimizes k = arg mi k R ( R (f) + f (k) ) + over all f F H. We showed before that H )] R mi f F H R(f) R + (k + k ) log + log. c H (f) log + log, c H (f) log + log +. To proceed with our aalysis we eed to make some assumptios o the itrisic difficulty of the problem. We will assume that the Bayes decisio boudary is a well-behaved -dimesioal set, i the sese that it has box-coutig dimesio oe (see Appedix A). This implies that, for a histogram with k cells, the Bayes decisio boudary itersects less tha Ck cells, where C is a costat that does ot deped o k. Furthermore we assume that the margial distributio of X satisfies P X (A) K A, for ay measurable subset A [, ]. This meas that the samples collected do ot accumulate aywhere i the uit square. Uder the above assumptios we ca coclude that Therefore mi f F H k H )] R CK/k + R(f) R K k Ck = CK k. (k + k ) log + log +. We ca balace the terms i the right side of the above expressio usig k = /4 (for large) therefore H )] R = O( /4 ), as. The descriptio here is slightly differet tha the oe i the previous lecture.

7 Lecture : Decisio Trees 7 4. Risk Bouds for Dyadic Decisio Trees Now let s cosider the dyadic decisio trees, uder the assumptios above, ad cotrast these with the histogram classifier. Let = {dyadic decisio trees with k leafs}. F T k Let F T = k F k T. We ca prefix ecode each elemet f of F T with c T (f) = 3k bits, as described before. Let f T = f ( b k), where ad Hece f T miimizes k = arg mi k mi f F T f (k) R ( R (f) + over all f F T. I fact, the optimizatio = arg mi f F T k f (k) ) + R (f) + R (f), (3k ) log + log. c T (f) log + log, c T (f) log + log, ca be performed usig a simple bottom up tree-pruig algorithm i O( ) time. Moreover T )] R mi f F T R(f) c T (f) log + R + log +. If the Bayes decisio boudary is a -dimesioal set, as i Sectio 4., there exists a tree with at most 8Ck leafs such that the boudary is cotaied i at most Ck squares, each of volume /k. To see this, start with a tree yieldig the histogram partitio with k boxes (i.e., the tree partitioig the uit square ito k equal sized squares). Now prue all the odes that do ot itersect the boudary. I Figure 6 we illustrate the procedure. If oe carefully bouds the umber of leafs required at each level, the it ca be show that the total umber of leafs is at most 8Ck. We coclude the that there exists a tree with at most 8Ck leafs that has the same risk as a histogram with k cells. Therefore, usig equatio () we have T )] R (3(8Ck) ) log + CK/k + log +. We ca balace the terms i the right side of the above expressio usig k = /3 (for large) therefore T )] R = O( /3 ), as. C. Scott, Tree pruig with subadditive pealties, IEEE Trasactios o Sigal Processig, vol. 3, o., pp. 48-4, Dec..

8 Lecture : Decisio Trees 8 (a) (b) Figure 6: Illustratio of the tree pruig procedure: (a) Histogram classificatio rule, for a partitio with 6 cells, ad correspodig biary tree represetatio (with 6 leafs). (b) Prued versio of the histogram tree, yieldig exactly the same classificatio rule, but ow requirig oly 6 leafs. (Note: The trees where costructed usig the procedure of Figure 4) Fial Commets Trees geerally work much better tha histogram classifiers. This is essetially because they provide much more efficiet ways of approximatig the Bayes decisio boudary (as we saw i our example, uder reasoable assumptios o the Bayes boudary, a tree ecoded with O(k) bits ca describe the same classifier as a histogram that requires O(k ) bits). The dyadic decisio trees studied here are differet tha classical tree rules, such as CART (Breima et al., 984) or C4. (Quila, 993). Those techiques select a tree accordig to k = arg mi k { R ( (k) f ) + αk for some α > whereas i the aalysis above the pealty was roughly { k = arg mi (k) R ( f ) + α } k, k 3 log for α. The square root pealty is essetial for the risk boud. No such boud exists for CART or C4., except uder very restrictive assumptios. Moreover, recet experimetal work has show that the square root pealty ofte performs better i practice. Fially, recet results show that a slightly tighter boudig procedure for the estimatio error ca be used to show that dyadic decisio trees (with a slightly differet pruig procedure) achieve a rate of }, T )] R = O( / ), as, which turs out to be the miimax optimal rate (i.e., uder the boudary assumptios above, o method ca achieve a faster rate of covergece to the Bayes error). A Box Coutig Dimesio The otio of dimesio of a sets arises i may aspects of mathematics, ad it is particularly relevat to the study of fractals (that besides some importat applicatios make really cool t-shirts). The dimesio somehow idicates how we should measure the complexity of a set (legth, area, volume, etc...). The boxcoutig dimesio is a simple measure of the dimesio of a set. The mai idea is to cover the set with boxes with sidelegth r. Let N(r) deote the smallest umber of such boxes, the the box coutig dimesio is defied as log N(r) lim r log r.

9 Lecture : Decisio Trees 9 Although the boxes cosidered above do ot eed to be aliged o a rectagular grid (ad ca i fact overlap) we ca usually cosider them over a grid ad obtai a upper boud o the box-coutig dimesio. To illustrate the mai ideas let s cosider a simple example, ad coect it to the classificatio sceario cosidered before. Let f : [, ] [, ] be a Lipschitz fuctio, with Lipschitz costat L (i.e., f(a) f(b) L a b, a, b [, ]). Defie the set A = {x = (x, x ) : x = f(x )}, that is, the set A is the graph of fuctio f. Cosider a partitio with k squared boxes (just like the oes we used i the histograms), the poits i set A itersect at most C k boxes, with C = ( + L ) (ad also the umber of itersected boxes is at least k). The sidelegth of the boxes is /k therefore the box-coutig dimesio of A satisfies dim B (A) log C k lim /k log(/k) = log C + log(k) lim k log(k) =. The result above will hold for ay ormal set A [, ] that does ot occupy ay area. For most sets the box-coutig dimesio is always goig to be a iteger, but for some weird sets (called fractal sets) it is ot a iteger. For example, the Koch curve (see for example ItroToFrac/IitGe/IitGeKoch.html) has box-coutig dimesio log(4)/ log(3) = This meas that it is ot quite as small as a -dimesioal curve, but ot as big as a -dimesioal set (hece occupies o area). To coect these cocepts to our classificatio sceario cosider a simple example. Let η(x) = P (Y = X = x) ad assume η(x) has the form η(x) = + x f(x ), x (x, x ) X, () where f : [, ] [, ] is Lipschitz with Lipschitz costat L. The Bayes classifier is the give by f (x) = {η(x) /} {x f(x )}. This is depicted i Figure 7. Note that this is a special, restricted class of problems. That is, we are cosiderig the subset of all classificatio problems such that the joit distributio P XY satisfies P (Y = X = x) = / + x f(x ) for some fuctio f that is Lipschitz. The Bayes decisio boudary is therefore give by A = {x = (x, x ) : x = f(x )}. Has we observed before this set has box-coutig dimesio.

10 Lecture : Decisio Trees f( x) - Bayes Decisio Boudary x x Figure 7: Bayes decisio boudary for the setup described i Appedix A.

ECE 901 Lecture 12: Complexity Regularization and the Squared Loss

ECE 901 Lecture 12: Complexity Regularization and the Squared Loss ECE 90 Lecture : Complexity Regularizatio ad the Squared Loss R. Nowak 5/7/009 I the previous lectures we made use of the Cheroff/Hoeffdig bouds for our aalysis of classifier errors. Hoeffdig s iequality

More information

Agnostic Learning and Concentration Inequalities

Agnostic Learning and Concentration Inequalities ECE901 Sprig 2004 Statistical Regularizatio ad Learig Theory Lecture: 7 Agostic Learig ad Cocetratio Iequalities Lecturer: Rob Nowak Scribe: Aravid Kailas 1 Itroductio 1.1 Motivatio I the last lecture

More information

ECE 901 Lecture 14: Maximum Likelihood Estimation and Complexity Regularization

ECE 901 Lecture 14: Maximum Likelihood Estimation and Complexity Regularization ECE 90 Lecture 4: Maximum Likelihood Estimatio ad Complexity Regularizatio R Nowak 5/7/009 Review : Maximum Likelihood Estimatio We have iid observatios draw from a ukow distributio Y i iid p θ, i,, where

More information

Information-based Feature Selection

Information-based Feature Selection Iformatio-based Feature Selectio Farza Faria, Abbas Kazeroui, Afshi Babveyh Email: {faria,abbask,afshib}@staford.edu 1 Itroductio Feature selectio is a topic of great iterest i applicatios dealig with

More information

Machine Learning Theory Tübingen University, WS 2016/2017 Lecture 12

Machine Learning Theory Tübingen University, WS 2016/2017 Lecture 12 Machie Learig Theory Tübige Uiversity, WS 06/07 Lecture Tolstikhi Ilya Abstract I this lecture we derive risk bouds for kerel methods. We will start by showig that Soft Margi kerel SVM correspods to miimizig

More information

Sieve Estimators: Consistency and Rates of Convergence

Sieve Estimators: Consistency and Rates of Convergence EECS 598: Statistical Learig Theory, Witer 2014 Topic 6 Sieve Estimators: Cosistecy ad Rates of Covergece Lecturer: Clayto Scott Scribe: Julia Katz-Samuels, Brado Oselio, Pi-Yu Che Disclaimer: These otes

More information

10-701/ Machine Learning Mid-term Exam Solution

10-701/ Machine Learning Mid-term Exam Solution 0-70/5-78 Machie Learig Mid-term Exam Solutio Your Name: Your Adrew ID: True or False (Give oe setece explaatio) (20%). (F) For a cotiuous radom variable x ad its probability distributio fuctio p(x), it

More information

REGRESSION WITH QUADRATIC LOSS

REGRESSION WITH QUADRATIC LOSS REGRESSION WITH QUADRATIC LOSS MAXIM RAGINSKY Regressio with quadratic loss is aother basic problem studied i statistical learig theory. We have a radom couple Z = X, Y ), where, as before, X is a R d

More information

Machine Learning Brett Bernstein

Machine Learning Brett Bernstein Machie Learig Brett Berstei Week 2 Lecture: Cocept Check Exercises Starred problems are optioal. Excess Risk Decompositio 1. Let X = Y = {1, 2,..., 10}, A = {1,..., 10, 11} ad suppose the data distributio

More information

The picture in figure 1.1 helps us to see that the area represents the distance traveled. Figure 1: Area represents distance travelled

The picture in figure 1.1 helps us to see that the area represents the distance traveled. Figure 1: Area represents distance travelled 1 Lecture : Area Area ad distace traveled Approximatig area by rectagles Summatio The area uder a parabola 1.1 Area ad distace Suppose we have the followig iformatio about the velocity of a particle, how

More information

Optimally Sparse SVMs

Optimally Sparse SVMs A. Proof of Lemma 3. We here prove a lower boud o the umber of support vectors to achieve geeralizatio bouds of the form which we cosider. Importatly, this result holds ot oly for liear classifiers, but

More information

Maximum Likelihood Estimation and Complexity Regularization

Maximum Likelihood Estimation and Complexity Regularization ECE90 Sprig 004 Statistical Regularizatio ad Learig Theory Lecture: 4 Maximum Likelihood Estimatio ad Complexity Regularizatio Lecturer: Rob Nowak Scribe: Pam Limpiti Review : Maximum Likelihood Estimatio

More information

Statistical Pattern Recognition

Statistical Pattern Recognition Statistical Patter Recogitio Classificatio: No-Parametric Modelig Hamid R. Rabiee Jafar Muhammadi Sprig 2014 http://ce.sharif.edu/courses/92-93/2/ce725-2/ Ageda Parametric Modelig No-Parametric Modelig

More information

Regression with quadratic loss

Regression with quadratic loss Regressio with quadratic loss Maxim Ragisky October 13, 2015 Regressio with quadratic loss is aother basic problem studied i statistical learig theory. We have a radom couple Z = X,Y, where, as before,

More information

1 Review of Probability & Statistics

1 Review of Probability & Statistics 1 Review of Probability & Statistics a. I a group of 000 people, it has bee reported that there are: 61 smokers 670 over 5 960 people who imbibe (drik alcohol) 86 smokers who imbibe 90 imbibers over 5

More information

4.3 Growth Rates of Solutions to Recurrences

4.3 Growth Rates of Solutions to Recurrences 4.3. GROWTH RATES OF SOLUTIONS TO RECURRENCES 81 4.3 Growth Rates of Solutios to Recurreces 4.3.1 Divide ad Coquer Algorithms Oe of the most basic ad powerful algorithmic techiques is divide ad coquer.

More information

Lecture 10: Universal coding and prediction

Lecture 10: Universal coding and prediction 0-704: Iformatio Processig ad Learig Sprig 0 Lecture 0: Uiversal codig ad predictio Lecturer: Aarti Sigh Scribes: Georg M. Goerg Disclaimer: These otes have ot bee subjected to the usual scrutiy reserved

More information

Math 61CM - Solutions to homework 3

Math 61CM - Solutions to homework 3 Math 6CM - Solutios to homework 3 Cédric De Groote October 2 th, 208 Problem : Let F be a field, m 0 a fixed oegative iteger ad let V = {a 0 + a x + + a m x m a 0,, a m F} be the vector space cosistig

More information

A survey on penalized empirical risk minimization Sara A. van de Geer

A survey on penalized empirical risk minimization Sara A. van de Geer A survey o pealized empirical risk miimizatio Sara A. va de Geer We address the questio how to choose the pealty i empirical risk miimizatio. Roughly speakig, this pealty should be a good boud for the

More information

Empirical Process Theory and Oracle Inequalities

Empirical Process Theory and Oracle Inequalities Stat 928: Statistical Learig Theory Lecture: 10 Empirical Process Theory ad Oracle Iequalities Istructor: Sham Kakade 1 Risk vs Risk See Lecture 0 for a discussio o termiology. 2 The Uio Boud / Boferoi

More information

Lecture 14: Graph Entropy

Lecture 14: Graph Entropy 15-859: Iformatio Theory ad Applicatios i TCS Sprig 2013 Lecture 14: Graph Etropy March 19, 2013 Lecturer: Mahdi Cheraghchi Scribe: Euiwoog Lee 1 Recap Bergma s boud o the permaet Shearer s Lemma Number

More information

ECE 8527: Introduction to Machine Learning and Pattern Recognition Midterm # 1. Vaishali Amin Fall, 2015

ECE 8527: Introduction to Machine Learning and Pattern Recognition Midterm # 1. Vaishali Amin Fall, 2015 ECE 8527: Itroductio to Machie Learig ad Patter Recogitio Midterm # 1 Vaishali Ami Fall, 2015 tue39624@temple.edu Problem No. 1: Cosider a two-class discrete distributio problem: ω 1 :{[0,0], [2,0], [2,2],

More information

Machine Learning for Data Science (CS 4786)

Machine Learning for Data Science (CS 4786) Machie Learig for Data Sciece CS 4786) Lecture & 3: Pricipal Compoet Aalysis The text i black outlies high level ideas. The text i blue provides simple mathematical details to derive or get to the algorithm

More information

Ada Boost, Risk Bounds, Concentration Inequalities. 1 AdaBoost and Estimates of Conditional Probabilities

Ada Boost, Risk Bounds, Concentration Inequalities. 1 AdaBoost and Estimates of Conditional Probabilities CS8B/Stat4B Sprig 008) Statistical Learig Theory Lecture: Ada Boost, Risk Bouds, Cocetratio Iequalities Lecturer: Peter Bartlett Scribe: Subhrasu Maji AdaBoost ad Estimates of Coditioal Probabilities We

More information

Lecture 7: October 18, 2017

Lecture 7: October 18, 2017 Iformatio ad Codig Theory Autum 207 Lecturer: Madhur Tulsiai Lecture 7: October 8, 207 Biary hypothesis testig I this lecture, we apply the tools developed i the past few lectures to uderstad the problem

More information

Lecture 15: Learning Theory: Concentration Inequalities

Lecture 15: Learning Theory: Concentration Inequalities STAT 425: Itroductio to Noparametric Statistics Witer 208 Lecture 5: Learig Theory: Cocetratio Iequalities Istructor: Ye-Chi Che 5. Itroductio Recall that i the lecture o classificatio, we have see that

More information

Intro to Learning Theory

Intro to Learning Theory Lecture 1, October 18, 2016 Itro to Learig Theory Ruth Urer 1 Machie Learig ad Learig Theory Comig soo 2 Formal Framework 21 Basic otios I our formal model for machie learig, the istaces to be classified

More information

Infinite Sequences and Series

Infinite Sequences and Series Chapter 6 Ifiite Sequeces ad Series 6.1 Ifiite Sequeces 6.1.1 Elemetary Cocepts Simply speakig, a sequece is a ordered list of umbers writte: {a 1, a 2, a 3,...a, a +1,...} where the elemets a i represet

More information

Lecture 2: Monte Carlo Simulation

Lecture 2: Monte Carlo Simulation STAT/Q SCI 43: Itroductio to Resamplig ethods Sprig 27 Istructor: Ye-Chi Che Lecture 2: ote Carlo Simulatio 2 ote Carlo Itegratio Assume we wat to evaluate the followig itegratio: e x3 dx What ca we do?

More information

Machine Learning Theory Tübingen University, WS 2016/2017 Lecture 11

Machine Learning Theory Tübingen University, WS 2016/2017 Lecture 11 Machie Learig Theory Tübige Uiversity, WS 06/07 Lecture Tolstikhi Ilya Abstract We will itroduce the otio of reproducig kerels ad associated Reproducig Kerel Hilbert Spaces (RKHS). We will cosider couple

More information

Problem Set 4 Due Oct, 12

Problem Set 4 Due Oct, 12 EE226: Radom Processes i Systems Lecturer: Jea C. Walrad Problem Set 4 Due Oct, 12 Fall 06 GSI: Assae Gueye This problem set essetially reviews detectio theory ad hypothesis testig ad some basic otios

More information

Lecture 13: Maximum Likelihood Estimation

Lecture 13: Maximum Likelihood Estimation ECE90 Sprig 007 Statistical Learig Theory Istructor: R. Nowak Lecture 3: Maximum Likelihood Estimatio Summary of Lecture I the last lecture we derived a risk (MSE) boud for regressio problems; i.e., select

More information

Chapter 3. Strong convergence. 3.1 Definition of almost sure convergence

Chapter 3. Strong convergence. 3.1 Definition of almost sure convergence Chapter 3 Strog covergece As poited out i the Chapter 2, there are multiple ways to defie the otio of covergece of a sequece of radom variables. That chapter defied covergece i probability, covergece i

More information

Convergence of random variables. (telegram style notes) P.J.C. Spreij

Convergence of random variables. (telegram style notes) P.J.C. Spreij Covergece of radom variables (telegram style otes).j.c. Spreij this versio: September 6, 2005 Itroductio As we kow, radom variables are by defiitio measurable fuctios o some uderlyig measurable space

More information

Vector Quantization: a Limiting Case of EM

Vector Quantization: a Limiting Case of EM . Itroductio & defiitios Assume that you are give a data set X = { x j }, j { 2,,, }, of d -dimesioal vectors. The vector quatizatio (VQ) problem requires that we fid a set of prototype vectors Z = { z

More information

Sequences A sequence of numbers is a function whose domain is the positive integers. We can see that the sequence

Sequences A sequence of numbers is a function whose domain is the positive integers. We can see that the sequence Sequeces A sequece of umbers is a fuctio whose domai is the positive itegers. We ca see that the sequece 1, 1, 2, 2, 3, 3,... is a fuctio from the positive itegers whe we write the first sequece elemet

More information

1 Review and Overview

1 Review and Overview CS9T/STATS3: Statistical Learig Theory Lecturer: Tegyu Ma Lecture #6 Scribe: Jay Whag ad Patrick Cho October 0, 08 Review ad Overview Recall i the last lecture that for ay family of scalar fuctios F, we

More information

3. Z Transform. Recall that the Fourier transform (FT) of a DT signal xn [ ] is ( ) [ ] = In order for the FT to exist in the finite magnitude sense,

3. Z Transform. Recall that the Fourier transform (FT) of a DT signal xn [ ] is ( ) [ ] = In order for the FT to exist in the finite magnitude sense, 3. Z Trasform Referece: Etire Chapter 3 of text. Recall that the Fourier trasform (FT) of a DT sigal x [ ] is ω ( ) [ ] X e = j jω k = xe I order for the FT to exist i the fiite magitude sese, S = x [

More information

NYU Center for Data Science: DS-GA 1003 Machine Learning and Computational Statistics (Spring 2018)

NYU Center for Data Science: DS-GA 1003 Machine Learning and Computational Statistics (Spring 2018) NYU Ceter for Data Sciece: DS-GA 003 Machie Learig ad Computatioal Statistics (Sprig 208) Brett Berstei, David Roseberg, Be Jakubowski Jauary 20, 208 Istructios: Followig most lab ad lecture sectios, we

More information

Machine Learning Theory (CS 6783)

Machine Learning Theory (CS 6783) Machie Learig Theory (CS 6783) Lecture 2 : Learig Frameworks, Examples Settig up learig problems. X : istace space or iput space Examples: Computer Visio: Raw M N image vectorized X = 0, 255 M N, SIFT

More information

The multiplicative structure of finite field and a construction of LRC

The multiplicative structure of finite field and a construction of LRC IERG6120 Codig for Distributed Storage Systems Lecture 8-06/10/2016 The multiplicative structure of fiite field ad a costructio of LRC Lecturer: Keeth Shum Scribe: Zhouyi Hu Notatios: We use the otatio

More information

w (1) ˆx w (1) x (1) /ρ and w (2) ˆx w (2) x (2) /ρ.

w (1) ˆx w (1) x (1) /ρ and w (2) ˆx w (2) x (2) /ρ. 2 5. Weighted umber of late jobs 5.1. Release dates ad due dates: maximimizig the weight of o-time jobs Oce we add release dates, miimizig the umber of late jobs becomes a sigificatly harder problem. For

More information

The Method of Least Squares. To understand least squares fitting of data.

The Method of Least Squares. To understand least squares fitting of data. The Method of Least Squares KEY WORDS Curve fittig, least square GOAL To uderstad least squares fittig of data To uderstad the least squares solutio of icosistet systems of liear equatios 1 Motivatio Curve

More information

An Introduction to Randomized Algorithms

An Introduction to Randomized Algorithms A Itroductio to Radomized Algorithms The focus of this lecture is to study a radomized algorithm for quick sort, aalyze it usig probabilistic recurrece relatios, ad also provide more geeral tools for aalysis

More information

A sequence of numbers is a function whose domain is the positive integers. We can see that the sequence

A sequence of numbers is a function whose domain is the positive integers. We can see that the sequence Sequeces A sequece of umbers is a fuctio whose domai is the positive itegers. We ca see that the sequece,, 2, 2, 3, 3,... is a fuctio from the positive itegers whe we write the first sequece elemet as

More information

6.867 Machine learning, lecture 7 (Jaakkola) 1

6.867 Machine learning, lecture 7 (Jaakkola) 1 6.867 Machie learig, lecture 7 (Jaakkola) 1 Lecture topics: Kerel form of liear regressio Kerels, examples, costructio, properties Liear regressio ad kerels Cosider a slightly simpler model where we omit

More information

CS434a/541a: Pattern Recognition Prof. Olga Veksler. Lecture 5

CS434a/541a: Pattern Recognition Prof. Olga Veksler. Lecture 5 CS434a/54a: Patter Recogitio Prof. Olga Veksler Lecture 5 Today Itroductio to parameter estimatio Two methods for parameter estimatio Maimum Likelihood Estimatio Bayesia Estimatio Itroducto Bayesia Decisio

More information

Riemann Sums y = f (x)

Riemann Sums y = f (x) Riema Sums Recall that we have previously discussed the area problem I its simplest form we ca state it this way: The Area Problem Let f be a cotiuous, o-egative fuctio o the closed iterval [a, b] Fid

More information

Machine Learning Brett Bernstein

Machine Learning Brett Bernstein Machie Learig Brett Berstei Week Lecture: Cocept Check Exercises Starred problems are optioal. Statistical Learig Theory. Suppose A = Y = R ad X is some other set. Furthermore, assume P X Y is a discrete

More information

Rademacher Complexity

Rademacher Complexity EECS 598: Statistical Learig Theory, Witer 204 Topic 0 Rademacher Complexity Lecturer: Clayto Scott Scribe: Ya Deg, Kevi Moo Disclaimer: These otes have ot bee subjected to the usual scrutiy reserved for

More information

Singular Continuous Measures by Michael Pejic 5/14/10

Singular Continuous Measures by Michael Pejic 5/14/10 Sigular Cotiuous Measures by Michael Peic 5/4/0 Prelimiaries Give a set X, a σ-algebra o X is a collectio of subsets of X that cotais X ad ad is closed uder complemetatio ad coutable uios hece, coutable

More information

Linear regression. Daniel Hsu (COMS 4771) (y i x T i β)2 2πσ. 2 2σ 2. 1 n. (x T i β y i ) 2. 1 ˆβ arg min. β R n d

Linear regression. Daniel Hsu (COMS 4771) (y i x T i β)2 2πσ. 2 2σ 2. 1 n. (x T i β y i ) 2. 1 ˆβ arg min. β R n d Liear regressio Daiel Hsu (COMS 477) Maximum likelihood estimatio Oe of the simplest liear regressio models is the followig: (X, Y ),..., (X, Y ), (X, Y ) are iid radom pairs takig values i R d R, ad Y

More information

5.1 A mutual information bound based on metric entropy

5.1 A mutual information bound based on metric entropy Chapter 5 Global Fao Method I this chapter, we exted the techiques of Chapter 2.4 o Fao s method the local Fao method) to a more global costructio. I particular, we show that, rather tha costructig a local

More information

1 of 7 7/16/2009 6:06 AM Virtual Laboratories > 6. Radom Samples > 1 2 3 4 5 6 7 6. Order Statistics Defiitios Suppose agai that we have a basic radom experimet, ad that X is a real-valued radom variable

More information

ECE 901 Lecture 4: Estimation of Lipschitz smooth functions

ECE 901 Lecture 4: Estimation of Lipschitz smooth functions ECE 9 Lecture 4: Estiatio of Lipschitz sooth fuctios R. Nowak 5/7/29 Cosider the followig settig. Let Y f (X) + W, where X is a rado variable (r.v.) o X [, ], W is a r.v. o Y R, idepedet of X ad satisfyig

More information

Lesson 10: Limits and Continuity

Lesson 10: Limits and Continuity www.scimsacademy.com Lesso 10: Limits ad Cotiuity SCIMS Academy 1 Limit of a fuctio The cocept of limit of a fuctio is cetral to all other cocepts i calculus (like cotiuity, derivative, defiite itegrals

More information

4.1 Data processing inequality

4.1 Data processing inequality ECE598: Iformatio-theoretic methods i high-dimesioal statistics Sprig 206 Lecture 4: Total variatio/iequalities betwee f-divergeces Lecturer: Yihog Wu Scribe: Matthew Tsao, Feb 8, 206 [Ed. Mar 22] Recall

More information

1 Review and Overview

1 Review and Overview DRAFT a fial versio will be posted shortly CS229T/STATS231: Statistical Learig Theory Lecturer: Tegyu Ma Lecture #3 Scribe: Migda Qiao October 1, 2013 1 Review ad Overview I the first half of this course,

More information

6.3 Testing Series With Positive Terms

6.3 Testing Series With Positive Terms 6.3. TESTING SERIES WITH POSITIVE TERMS 307 6.3 Testig Series With Positive Terms 6.3. Review of what is kow up to ow I theory, testig a series a i for covergece amouts to fidig the i= sequece of partial

More information

Algebra of Least Squares

Algebra of Least Squares October 19, 2018 Algebra of Least Squares Geometry of Least Squares Recall that out data is like a table [Y X] where Y collects observatios o the depedet variable Y ad X collects observatios o the k-dimesioal

More information

6.867 Machine learning

6.867 Machine learning 6.867 Machie learig Mid-term exam October, ( poits) Your ame ad MIT ID: Problem We are iterested here i a particular -dimesioal liear regressio problem. The dataset correspodig to this problem has examples

More information

Lecture Notes for Analysis Class

Lecture Notes for Analysis Class Lecture Notes for Aalysis Class Topological Spaces A topology for a set X is a collectio T of subsets of X such that: (a) X ad the empty set are i T (b) Uios of elemets of T are i T (c) Fiite itersectios

More information

Lecture 7: Density Estimation: k-nearest Neighbor and Basis Approach

Lecture 7: Density Estimation: k-nearest Neighbor and Basis Approach STAT 425: Itroductio to Noparametric Statistics Witer 28 Lecture 7: Desity Estimatio: k-nearest Neighbor ad Basis Approach Istructor: Ye-Chi Che Referece: Sectio 8.4 of All of Noparametric Statistics.

More information

Introduction to Machine Learning DIS10

Introduction to Machine Learning DIS10 CS 189 Fall 017 Itroductio to Machie Learig DIS10 1 Fu with Lagrage Multipliers (a) Miimize the fuctio such that f (x,y) = x + y x + y = 3. Solutio: The Lagragia is: L(x,y,λ) = x + y + λ(x + y 3) Takig

More information

Analysis of Algorithms. Introduction. Contents

Analysis of Algorithms. Introduction. Contents Itroductio The focus of this module is mathematical aspects of algorithms. Our mai focus is aalysis of algorithms, which meas evaluatig efficiecy of algorithms by aalytical ad mathematical methods. We

More information

Chapter 6 Infinite Series

Chapter 6 Infinite Series Chapter 6 Ifiite Series I the previous chapter we cosidered itegrals which were improper i the sese that the iterval of itegratio was ubouded. I this chapter we are goig to discuss a topic which is somewhat

More information

Optimization Methods MIT 2.098/6.255/ Final exam

Optimization Methods MIT 2.098/6.255/ Final exam Optimizatio Methods MIT 2.098/6.255/15.093 Fial exam Date Give: December 19th, 2006 P1. [30 pts] Classify the followig statemets as true or false. All aswers must be well-justified, either through a short

More information

CHAPTER 10 INFINITE SEQUENCES AND SERIES

CHAPTER 10 INFINITE SEQUENCES AND SERIES CHAPTER 10 INFINITE SEQUENCES AND SERIES 10.1 Sequeces 10.2 Ifiite Series 10.3 The Itegral Tests 10.4 Compariso Tests 10.5 The Ratio ad Root Tests 10.6 Alteratig Series: Absolute ad Coditioal Covergece

More information

Lecture #20. n ( x p i )1/p = max

Lecture #20. n ( x p i )1/p = max COMPSCI 632: Approximatio Algorithms November 8, 2017 Lecturer: Debmalya Paigrahi Lecture #20 Scribe: Yua Deg 1 Overview Today, we cotiue to discuss about metric embeddigs techique. Specifically, we apply

More information

Discrete-Time Systems, LTI Systems, and Discrete-Time Convolution

Discrete-Time Systems, LTI Systems, and Discrete-Time Convolution EEL5: Discrete-Time Sigals ad Systems. Itroductio I this set of otes, we begi our mathematical treatmet of discrete-time s. As show i Figure, a discrete-time operates or trasforms some iput sequece x [

More information

Lecture 16: Monotone Formula Lower Bounds via Graph Entropy. 2 Monotone Formula Lower Bounds via Graph Entropy

Lecture 16: Monotone Formula Lower Bounds via Graph Entropy. 2 Monotone Formula Lower Bounds via Graph Entropy 15-859: Iformatio Theory ad Applicatios i TCS CMU: Sprig 2013 Lecture 16: Mootoe Formula Lower Bouds via Graph Etropy March 26, 2013 Lecturer: Mahdi Cheraghchi Scribe: Shashak Sigh 1 Recap Graph Etropy:

More information

1 Introduction to reducing variance in Monte Carlo simulations

1 Introduction to reducing variance in Monte Carlo simulations Copyright c 010 by Karl Sigma 1 Itroductio to reducig variace i Mote Carlo simulatios 11 Review of cofidece itervals for estimatig a mea I statistics, we estimate a ukow mea µ = E(X) of a distributio by

More information

Introductory Analysis I Fall 2014 Homework #7 Solutions

Introductory Analysis I Fall 2014 Homework #7 Solutions Itroductory Aalysis I Fall 214 Homework #7 Solutios Note: There were a couple of typos/omissios i the formulatio of this homework. Some of them were, I believe, quite obvious. The fact that the statemet

More information

Lecture 2. The Lovász Local Lemma

Lecture 2. The Lovász Local Lemma Staford Uiversity Sprig 208 Math 233A: No-costructive methods i combiatorics Istructor: Ja Vodrák Lecture date: Jauary 0, 208 Origial scribe: Apoorva Khare Lecture 2. The Lovász Local Lemma 2. Itroductio

More information

Geometry of LS. LECTURE 3 GEOMETRY OF LS, PROPERTIES OF σ 2, PARTITIONED REGRESSION, GOODNESS OF FIT

Geometry of LS. LECTURE 3 GEOMETRY OF LS, PROPERTIES OF σ 2, PARTITIONED REGRESSION, GOODNESS OF FIT OCTOBER 7, 2016 LECTURE 3 GEOMETRY OF LS, PROPERTIES OF σ 2, PARTITIONED REGRESSION, GOODNESS OF FIT Geometry of LS We ca thik of y ad the colums of X as members of the -dimesioal Euclidea space R Oe ca

More information

Problem Set 2 Solutions

Problem Set 2 Solutions CS271 Radomess & Computatio, Sprig 2018 Problem Set 2 Solutios Poit totals are i the margi; the maximum total umber of poits was 52. 1. Probabilistic method for domiatig sets 6pts Pick a radom subset S

More information

Pattern Classification, Ch4 (Part 1)

Pattern Classification, Ch4 (Part 1) Patter Classificatio All materials i these slides were take from Patter Classificatio (2d ed) by R O Duda, P E Hart ad D G Stork, Joh Wiley & Sos, 2000 with the permissio of the authors ad the publisher

More information

Ma 530 Infinite Series I

Ma 530 Infinite Series I Ma 50 Ifiite Series I Please ote that i additio to the material below this lecture icorporated material from the Visual Calculus web site. The material o sequeces is at Visual Sequeces. (To use this li

More information

Math 155 (Lecture 3)

Math 155 (Lecture 3) Math 55 (Lecture 3) September 8, I this lecture, we ll cosider the aswer to oe of the most basic coutig problems i combiatorics Questio How may ways are there to choose a -elemet subset of the set {,,,

More information

1 Inferential Methods for Correlation and Regression Analysis

1 Inferential Methods for Correlation and Regression Analysis 1 Iferetial Methods for Correlatio ad Regressio Aalysis I the chapter o Correlatio ad Regressio Aalysis tools for describig bivariate cotiuous data were itroduced. The sample Pearso Correlatio Coefficiet

More information

6 Integers Modulo n. integer k can be written as k = qn + r, with q,r, 0 r b. So any integer.

6 Integers Modulo n. integer k can be written as k = qn + r, with q,r, 0 r b. So any integer. 6 Itegers Modulo I Example 2.3(e), we have defied the cogruece of two itegers a,b with respect to a modulus. Let us recall that a b (mod ) meas a b. We have proved that cogruece is a equivalece relatio

More information

Lecture 11 and 12: Basic estimation theory

Lecture 11 and 12: Basic estimation theory Lecture ad 2: Basic estimatio theory Sprig 202 - EE 94 Networked estimatio ad cotrol Prof. Kha March 2 202 I. MAXIMUM-LIKELIHOOD ESTIMATORS The maximum likelihood priciple is deceptively simple. Louis

More information

Lecture 11: Pseudorandom functions

Lecture 11: Pseudorandom functions COM S 6830 Cryptography Oct 1, 2009 Istructor: Rafael Pass 1 Recap Lecture 11: Pseudoradom fuctios Scribe: Stefao Ermo Defiitio 1 (Ge, Ec, Dec) is a sigle message secure ecryptio scheme if for all uppt

More information

Classification Using Decision Trees. Jackknife Estimator: Example 1. Data Mining. Jackknife Estimator: Example 2(cont. Jackknife Estimator: Example 2

Classification Using Decision Trees. Jackknife Estimator: Example 1. Data Mining. Jackknife Estimator: Example 2(cont. Jackknife Estimator: Example 2 Data Miig CS 341, Sprig 2007 Lecture 8: Decisio tree algorithms Jackkife Estimator: Example 1 Estimate of mea for X={x 1, x 2, x 3,}, =3, g=3, m=1, θ = µ = (x( 1 + x 2 + x 3 )/3 θ 1 = (x( 2 + x 3 )/2,

More information

ECE 564/645 - Digital Communication Systems (Spring 2014) Final Exam Friday, May 2nd, 8:00-10:00am, Marston 220

ECE 564/645 - Digital Communication Systems (Spring 2014) Final Exam Friday, May 2nd, 8:00-10:00am, Marston 220 ECE 564/645 - Digital Commuicatio Systems (Sprig 014) Fial Exam Friday, May d, 8:00-10:00am, Marsto 0 Overview The exam cosists of four (or five) problems for 100 (or 10) poits. The poits for each part

More information

Sequences. Notation. Convergence of a Sequence

Sequences. Notation. Convergence of a Sequence Sequeces A sequece is essetially just a list. Defiitio (Sequece of Real Numbers). A sequece of real umbers is a fuctio Z (, ) R for some real umber. Do t let the descriptio of the domai cofuse you; it

More information

6.895 Essential Coding Theory October 20, Lecture 11. This lecture is focused in comparisons of the following properties/parameters of a code:

6.895 Essential Coding Theory October 20, Lecture 11. This lecture is focused in comparisons of the following properties/parameters of a code: 6.895 Essetial Codig Theory October 0, 004 Lecture 11 Lecturer: Madhu Suda Scribe: Aastasios Sidiropoulos 1 Overview This lecture is focused i comparisos of the followig properties/parameters of a code:

More information

Math 113 Exam 3 Practice

Math 113 Exam 3 Practice Math Exam Practice Exam 4 will cover.-., 0. ad 0.. Note that eve though. was tested i exam, questios from that sectios may also be o this exam. For practice problems o., refer to the last review. This

More information

f(x) dx as we do. 2x dx x also diverges. Solution: We compute 2x dx lim

f(x) dx as we do. 2x dx x also diverges. Solution: We compute 2x dx lim Math 3, Sectio 2. (25 poits) Why we defie f(x) dx as we do. (a) Show that the improper itegral diverges. Hece the improper itegral x 2 + x 2 + b also diverges. Solutio: We compute x 2 + = lim b x 2 + =

More information

Support vector machine revisited

Support vector machine revisited 6.867 Machie learig, lecture 8 (Jaakkola) 1 Lecture topics: Support vector machie ad kerels Kerel optimizatio, selectio Support vector machie revisited Our task here is to first tur the support vector

More information

MA131 - Analysis 1. Workbook 2 Sequences I

MA131 - Analysis 1. Workbook 2 Sequences I MA3 - Aalysis Workbook 2 Sequeces I Autum 203 Cotets 2 Sequeces I 2. Itroductio.............................. 2.2 Icreasig ad Decreasig Sequeces................ 2 2.3 Bouded Sequeces..........................

More information

Alternating Series. 1 n 0 2 n n THEOREM 9.14 Alternating Series Test Let a n > 0. The alternating series. 1 n a n.

Alternating Series. 1 n 0 2 n n THEOREM 9.14 Alternating Series Test Let a n > 0. The alternating series. 1 n a n. 0_0905.qxd //0 :7 PM Page SECTION 9.5 Alteratig Series Sectio 9.5 Alteratig Series Use the Alteratig Series Test to determie whether a ifiite series coverges. Use the Alteratig Series Remaider to approximate

More information

Economics 241B Relation to Method of Moments and Maximum Likelihood OLSE as a Maximum Likelihood Estimator

Economics 241B Relation to Method of Moments and Maximum Likelihood OLSE as a Maximum Likelihood Estimator Ecoomics 24B Relatio to Method of Momets ad Maximum Likelihood OLSE as a Maximum Likelihood Estimator Uder Assumptio 5 we have speci ed the distributio of the error, so we ca estimate the model parameters

More information

Technical Report 670 STATISTICS DEPARTMENT UNIVERSITY OF CALIFORNIA AT BERKELEY. September 9, 2004

Technical Report 670 STATISTICS DEPARTMENT UNIVERSITY OF CALIFORNIA AT BERKELEY. September 9, 2004 CONSISTENCY FOR A SIMPLE MODEL OF RANDOM FORESTS Leo Breima Techical Report 670 STATISTICS DEPARTMENT UNIVERSITY OF CALIFORNIA AT BERKELEY ^: 0) Itroductio September 9, 2004 Radom Forests is a classificatio

More information

It is often useful to approximate complicated functions using simpler ones. We consider the task of approximating a function by a polynomial.

It is often useful to approximate complicated functions using simpler ones. We consider the task of approximating a function by a polynomial. Taylor Polyomials ad Taylor Series It is ofte useful to approximate complicated fuctios usig simpler oes We cosider the task of approximatig a fuctio by a polyomial If f is at least -times differetiable

More information

Binary classification, Part 1

Binary classification, Part 1 Biary classificatio, Part 1 Maxim Ragisky September 25, 2014 The problem of biary classificatio ca be stated as follows. We have a radom couple Z = (X,Y ), where X R d is called the feature vector ad Y

More information

Module 5 EMBEDDED WAVELET CODING. Version 2 ECE IIT, Kharagpur

Module 5 EMBEDDED WAVELET CODING. Version 2 ECE IIT, Kharagpur Module 5 EMBEDDED WAVELET CODING Versio ECE IIT, Kharagpur Lesso 4 SPIHT algorithm Versio ECE IIT, Kharagpur Istructioal Objectives At the ed of this lesso, the studets should be able to:. State the limitatios

More information

Problem Cosider the curve give parametrically as x = si t ad y = + cos t for» t» ß: (a) Describe the path this traverses: Where does it start (whe t =

Problem Cosider the curve give parametrically as x = si t ad y = + cos t for» t» ß: (a) Describe the path this traverses: Where does it start (whe t = Mathematics Summer Wilso Fial Exam August 8, ANSWERS Problem 1 (a) Fid the solutio to y +x y = e x x that satisfies y() = 5 : This is already i the form we used for a first order liear differetial equatio,

More information

Machine Learning for Data Science (CS 4786)

Machine Learning for Data Science (CS 4786) Machie Learig for Data Sciece CS 4786) Lecture 9: Pricipal Compoet Aalysis The text i black outlies mai ideas to retai from the lecture. The text i blue give a deeper uderstadig of how we derive or get

More information