Lecture 15: Learning Theory: Concentration Inequalities

Size: px
Start display at page:

Download "Lecture 15: Learning Theory: Concentration Inequalities"

Transcription

1 STAT 425: Itroductio to Noparametric Statistics Witer 208 Lecture 5: Learig Theory: Cocetratio Iequalities Istructor: Ye-Chi Che 5. Itroductio Recall that i the lecture o classificatio, we have see that empirical risk may ot uiformly coverge to the actual risk. Ad we wat a boud of the type sup R c Rc = somethig coverges to 0 i probability 5. so that we ca use the empirical risk miimizatio to trai our classifier. How do we achieve this? I the ormal meas model, we have see that if we have N IID ormal with mea 0 ad stadard deviatio σ 2, the the maimal error is at rate Oσ 2 log N. This implies that if σ 2 = σ 2 0, the maimal value of may ormal still coverges to 0. The empirical risk R c is a sample average of the risk evaluated at observatios. Thus, we do have R c Rc N 0, σ2 c for some σ 2 c. The variace σ2 c does coverge to 0. Thus, as log as Therefore, as log as we do ot have too may classifiers beig cosidered, i.e, umber of elemet i C is ot too large, we should be able to derive a uiform covergece. I what follows, we itroduce a method called Hoeffdig s iequality that allows us to derive equatio 5. without usig ay ormal distributio approimatio. 5.2 Hoeffdig s Iequality For bouded IID radom variables, the Hoeffdig s iequality is a powerful to boud the behavior of their average. We first start with a property of a bouded radom variable. Lemma 5. Let X be a radom variable with EX = 0 ad a X b. The for ay positive umber t. Ee tx e t2 b a 2 /8 Proof: We will use the fact that e t is a cove fuctio for all positive t. Recall that a fuctio g is a cove fuctio if for ay two poit a < b ad α [0, ], gαa + αb αga + αgb. 5-

2 5-2 Lecture 5: Learig Theory: Cocetratio Iequalities Because X [a, b], we defie α X to This implies X = α X b + α X a. α X = X a b a Usig the fact that e t is cove, Now takig the epectatio i both sides, e tx α X e tb + α X e ta = X a b a etb + b X b a eta. Ee tx EX a e tb + b EX b a b a eta = b b a eta a b a etb = e gs, 5.2 where s = tb a ad gs = γs + log γ + γe s ad γ = a/b a. Note that g0 = g 0 = 0 ad g s /4 for all positive s. Usig Taylor s theorem, gs = g0 + sg s2 g s for some s [0, s]. Thus, we coclude gs 2 s2 4 = 8 s2. The equatio 5.2 implies Ee tx e gs e s2 8 = e t 2 b a 2 8. With this lemma, we ca prove the Hoeffdig s iequality. Theorem 5.2 Let X,, X be IID with EX = µ ad a X b. The P X µ ɛ 2e 2ɛ2 /b a 2. Proof: We first prove that P X µ ɛ e 2ɛ2 /b a 2. Let Y i = X i µ. Because the epoetial fuctio is mootoic, for ay positive r, P X µ ɛ = P Ȳ ɛ = P Y i ɛ = P = P Eet e Yi e ɛ e t Yi e tɛ Yi e tɛ = e tɛ Ee ty e ty2 e ty by arkov s iequality = e tɛ Ee ty Ee ty2 Ee ty = e tɛ Ee ty e tɛ e t2 b a 2 /8 by Lemma 5..

3 Lecture 5: Learig Theory: Cocetratio Iequalities 5-3 Because the above iequality holds for all positive t, we ca choose t to optimize the boud. To get the boud as sharp as possible, we would like to make it as small as possible. Thus, we eed to fid t such that tɛ + t 2 b a 2 /8 is miimized. Takig derivatives with respect to t ad set it to be 0, we obtai ad Thus, the iequality becomes t = 4ɛ b a 2 t ɛ + t 2 b a 2 /8 = 2ɛ 2 /b a 2. P X µ ɛ e t ɛ e t2 b a2 /8 = e 2ɛ2 /b a 2. The same proof also applies to the case P X µ ɛ ad we will obtai the same boud. Therefore, we coclude that P X µ ɛ 2e 2ɛ2 /b a 2. A iequality like the oe i Theorem 5.2 is called a cocetratio iequality ad this pheomea caused by this type of iequalities is called cocetratio of measures ideed the probability measure of X is cocetratig aroud µ, its epectatio. Cocetratio iequalities are commo theoretical tools i aalyzig the property of a statistic. Ad they are particularly powerful whe applies to estimators ad classifiers Applicatio i Classificatio The Hoeffdig s iequality has a direct applicatio to classificatio problem Remember that i biary classificatio case with 0 loss, the loss at each observatio is LcX i, Y i = IcX i Y i, a bouded radom variable with a = 0 ad b =! The empirical risk R c = LcX i, Y i is a ubiased estimator of E R c = Rc. Thus, the Hoeffdig s iequality implies that for a give classifier c, P R c Rc ɛ 2e 2ɛ A powerful feature of the Hoeffdig s iequality is that it holds regardless of the classifier. Namely, eve if we are cosiderig may differet types of classifiers, some are decisio trees, some are knn, some are logistic regressio, they all satisfy equatio 5.3. Assume that the collectio of classifier C cotais oly N classifiers c,, c N. The we have P sup R c Rc ɛ = P ma R c j Rc j ɛ j=,,n N P R c j Rc j ɛ j= N2e 2ɛ2.

4 5-4 Lecture 5: Learig Theory: Cocetratio Iequalities Thus, as log as N 2e 2ɛ2 0, we have P sup R c Rc ɛ the desired property we wat i equatio sup R c Rc P 0, Covergece Rate The Hoeffdig s iequality directly implies that X µ = O P. It actually implies a boud o the epectatio E X µ as well. To see this, recall that for a positive radom variable T, Thus, for ay positive s, ET = 0 P T > tdt. E X µ 2 = = = 0 0 s 0 s 0 = s + P X µ 2 > tdt P X µ > tdt P X µ > t dt + }{{} dt + c c 2e 2t/b a2 dt b a2 e s/b a2. P X µ > t dt }{{} Hoeffdig It is ot easy to optimize this but there are some simple choice that gives a good result. We will choose s = 2b a2 log b a, which leads to Fially, usig the Cauchy-Schwarz iequality, E X µ 2 2b a2 log b a E X µ which is the desired boud o the epectatio. + = O. E X µ 2 = O, 5.4 I the case of classificatio, whe that the collectio of classifier C cotais N classifiers, we have P sup R c Rc ɛ N 2e 2ɛ2.

5 Lecture 5: Learig Theory: Cocetratio Iequalities 5-5 This implies sup R log N c Rc = O P. oreover, usig a similar calculatio as we have doe for derivig equatio 5.4, oe ca show that E sup R c Rc log N = O Applicatio i Histogram We have derived the covergece rate for a histogram desity estimator at a give poit. However, we have ot yet aalyzed the uiform covergece rate of the histogram. Usig the Hoeffdig s iequality, we are able to obtai such a uiform covergece rate. Assume that X,, X F with a PDF p that has a o-zero desity over [0, ]. If we use a histogram with bis: [ [ [ ] B = 0,, B 2 =, 2,, B =,. Let p be the histogram desity estimator: p = IX i B, where B is the bi that belogs to. The goal is to boud sup p p. We kow that the differece p p ca be writte as p p = p E p + E p p. }{{}}{{} stochastic variatio bias The bias aalysis we have doe i Lecture 6 ca be geeralized to every poit, so we have sup E p N p = O. So we oly eed to boud the stochastic variatio part sup p E p. Although we are takig supremum over every, there are oly bis B,, B so we ca rewrite the stochastic part as sup p E p = ma j=,, = ma j=,, IX i B j P X i B j IX i B j P X i B j.

6 5-6 Lecture 5: Learig Theory: Cocetratio Iequalities Because the idicator fuctio takes oly two values: 0 ad, we have P sup p E p N > ɛ = P ma IX i B j P X i B j j=,, > ɛ = P ma IX i B j P X i B j j=,, > ɛ }{{} =ɛ Thus, This, together with the uiform bias, implies 2e 2ɛ 2 = 2e 2ɛ2 / 2. 2 log sup p E p = O P sup p p sup E p N p + sup p E p N 2 log = O + O P. Note that this boud is ot the tightest boud we ca obtai. Usig the Berstei s iequality, you ca obtai a faster covergece rate: log sup p p = O + O P. 5.3 VC Theory Hoeffdig s iequality has helped us to partially solve the problem of provig equatio 5. whe the umber of classifiers are fied. However, i may cases we are workig o ifiite umber of classifiers so that the Hoeffdig s iequality does ot work. For istace, i the logistic regressio, the collectio of all possible classifiers is where C logistic = { c β0,β : β 0 R, β R d }, e β0+βt c β0,β = I + e >. β0+βt 2 The parameters idices are β 0 R, β R d, both cotai ifiite umber of poits the real lie cotai ifiite umber of poits. Thus, the Hoeffdig s iequality will ot work i this case. To overcome this problem, we first thik about the performace of the classifier c β0,β for differet values of β 0, β. For simplicity, we assume that the classifiers c θ are ideed by a quatity θ R p, i.e., there are p parameters cotrollig each classifier ad let C Θ = {c θ : θ R p }.

7 Lecture 5: Learig Theory: Cocetratio Iequalities 5-7 I the case of the logistic regressio, p = d + ad θ = β 0, β. Cosider two classifiers c θ ad c θ2. Whe θ, θ 2 are close, we will epect their performace to be similar. Just like i the logistic regressio, whe two classifiers with similar parameters, their decisio boudary the boudary betwee the two class labels should be similar. Namely, the performace of may classifiers may be similar to each other so whe usig the cocetratio iequality, we do ot eed to treat all of them as idepedet classifiers. Thus, although there are ifiite umber of classifiers i C Θ, there might oly be a few effetive idepedet classifiers. Istead of usig the total umber of classifiers i costructig a uiform boud, we will use P sup R c Rc ɛ N eff 2e 2ɛ2, 5.5 where N eff is some measure of effective idepedet classifiers. I a sese, N eff measures the compleity of C Θ. There are may ways of measurig the umber of effective idepedet classifiers such as the coverig umber ad bracketig umber, we will itroduce a simple ad famous quatity call the VC dimesio Shatterig Number ad VC Dimesio I the 0 loss case, give a classifier c, the empirical risk ad the actual risk R c = IY i cx i Rc = EIY cx = P Y cx. If we defie a ew radom variable Z i = X i, Y i, the the evet Y i cx i Z i A c, for some regio A c that is determied by the classifier c. Thus, we ca the rewrite the differece betwee the empirical risk ad the actual risk as R c Rc = IY i cx i P Y cx = The uiform boud i equatio 5. ca the be rewritte as sup R c Rc = sup IZ i A c P Z A c = sup IZ i A c P Z A c. A A IZ i A P Z A, where A = {A c : c C} is a collectio of regios. Thus, the effective umber N eff should be related to how comple the set A is. If A cotais may very differet regios, the N eff is epected to be large. Note that the quatity IZ i A is just the ratio of our data that falls withi the area A c whereas the quatity P Z A is the probability of observig a radom quatity Z withi the area A c. Thus, evetually we are tryig to boud the differece betwee usig the ratio of a sample as a estimator of the probability of fallig ito a give regio, which is the classical problem i STAT 0. However, the challege here is that we are cosiderig may may may regios so the problem is much more complicated.

8 5-8 Lecture 5: Learig Theory: Cocetratio Iequalities The VC theory shows the followig boud: P sup IZ i A P Z A > ɛ 8 + νa e ɛ2 / A A Namely, 8 + νa = N eff is the effective umber we wat ad the quatity νa is called the VC dimesio of A 2. To defie the VC dimesio, we first itroduce the cocept of shatterig. Shatterig is a cocept about usig sets i a collectio of regios R to partitio a set of poits. Here are some simple eamples of R:. R = {, t] : t R}. This is related to the uiform boud of the EDF. 2. R 2 = {a, b : a b}. 3. R 3 = {[ 0 r, 0 + r] : r 0} for some fied R 4 = {[ 0 r, 0 + r] : r 0, 0 R}. Note that the differece betwee R 3 ad R 4 is that this collectio is much larger tha the previous oe because the 0 i the previous eample is fied whereas here 0 is allowed to chage. The above eamples are where R cotais subregios i D. But it ca be geeralized to other dimesios. Here are some eamples about d-dimesio regios:. R 5 = {, t], s] : t, s R}. I this case, d = 2 ad this is related to the performace of a 2D EDF. 2. R 6 = {a, b a 2, b 2 a d, b d : a j b j, j =,, d}. I this case the dimesio is d. 3. R 7 = {B 0, r : r 0}, where B 0, r = { : 0 r} is a d-dimesioal ball with radius r cetered at R 8 = {B β,c : β R d, c R}, where B β,c = { : β T + c 0} is the half space i d-dimesio. This is related to the liear classifier or the logistic regressio. Let F = {,, } R d be a set of poits. For a subset G F, we say R picks out G if there eists R R such that G = R F. Namely, there eists a subregios i R that we ca fid the subset G by the itersectio of F ad this subregio. Eample. For eample, cosider the case d = ad R = R ad let F = {, 5, 8}. For the case G = {, 5}, it ca be picked out by R because we ca choose, 6] R such that, 6] F = G. Note that the choice may ot be uique;, 7] R also picks up G. However, the subset G = {5, 8} caot be picked out by ay elemet of R. Let SR, F be the umber of subsets of F that ca be picked out by R. By defiitio, if F cotais elemets, the SR, F 2 because there is at most 2 uique subsets of elemets, icludig the empty set. 2 Note that the quatity i the epoetial part becomes ɛ 2 /32 rather tha 2ɛ 2. The rates ɛ 2 are the same but just the costats are differet. This chage is due to the fact that we are ot just usig Hoeffdig s iequality but also aother techique called symmetrizatio i the proof so the costat chages.

9 Lecture 5: Learig Theory: Cocetratio Iequalities 5-9 Eample 2. I the case of Eample, SR, F is 4 because R ca oly picks out the followig subsets:, {}, {, 5}, {, 5, 8}. However, if we cosider R 2, the quatity becomes SR 2, F = 7. The oly case that R 2 caot pick out is {, 8}. Eample 3. Now assume that we are workig o F = {, 5}, the SR, F = 3 because we ca picks out, {}, {, 5} ad SR 2, F = 4 = 2 2 because every subset of F ca be picked out. F is shattered by R if SR, F = 2, where is the umber of elemets i F. Namely, F is shattered by R if every subset of F ca be picked out by R. I Eample 2 ad 3, we see that F = {, 5} is shattered by R 2 but F = {, 5, 8} is ot. The VC dimesio of R, deoted as νr, is the maimal umber of distict poits that ca shattered by R. Namely, if R has a VC dimesio ν, the we will be able to fid a set of ν poits such that R ca pick out every subsets of this set. With the VC dimesio, we ca the use the equatio 5.6 to obtai equatio 5.. Here are the VC dimesio correspods to the regios we have metioed. R = {, t] : t R} = νr =. 2. R 2 = {a, b : a b} = νr 2 = R 3 = {[ 0 r, 0 + r] : r 0} for some fied 0 = νr 3 =. 4. R 4 = {[ 0 r, 0 + r] : r 0, 0 R} = νr 4 = R 5 = {, t], s] : t, s R} = νr 5 =. 6. R 6 = {a, b a 2, b 2 a d, b d : a j b j, j =,, d} = νr 6 = 2d. 7. R 7 = {B 0, r : r 0}, where B 0, r = { : 0 r}. = νr 7 =. 8. R 8 = {B β,c : β R d, c R} = νr 8 = d +. Eample: uiform covergece of the EDF. Let F be the EDF of a CDF F based o a IID radom sample from F. Usig VC theory, we ca prove that P sup F F > ɛ = P sup IX i A P X A > ɛ A R 8 + νr e ɛ2 /32 = 8 + e ɛ2 /32 coverges to 0 as 3. oreover, usig the method i Sectio 5.2.2, we have sup F log F = O P 3 Note that there is a better boud for the EDF called the DKW Dvoretsky-Kiefer-Wolfowitz iequality: P sup F F > ɛ 2e 2ɛ2.

10 5-0 Lecture 5: Learig Theory: Cocetratio Iequalities ad E sup F F log = O.

Lecture 7: Density Estimation: k-nearest Neighbor and Basis Approach

Lecture 7: Density Estimation: k-nearest Neighbor and Basis Approach STAT 425: Itroductio to Noparametric Statistics Witer 28 Lecture 7: Desity Estimatio: k-nearest Neighbor ad Basis Approach Istructor: Ye-Chi Che Referece: Sectio 8.4 of All of Noparametric Statistics.

More information

Empirical Process Theory and Oracle Inequalities

Empirical Process Theory and Oracle Inequalities Stat 928: Statistical Learig Theory Lecture: 10 Empirical Process Theory ad Oracle Iequalities Istructor: Sham Kakade 1 Risk vs Risk See Lecture 0 for a discussio o termiology. 2 The Uio Boud / Boferoi

More information

Lecture 2: Monte Carlo Simulation

Lecture 2: Monte Carlo Simulation STAT/Q SCI 43: Itroductio to Resamplig ethods Sprig 27 Istructor: Ye-Chi Che Lecture 2: ote Carlo Simulatio 2 ote Carlo Itegratio Assume we wat to evaluate the followig itegratio: e x3 dx What ca we do?

More information

REGRESSION WITH QUADRATIC LOSS

REGRESSION WITH QUADRATIC LOSS REGRESSION WITH QUADRATIC LOSS MAXIM RAGINSKY Regressio with quadratic loss is aother basic problem studied i statistical learig theory. We have a radom couple Z = X, Y ), where, as before, X is a R d

More information

Regression with quadratic loss

Regression with quadratic loss Regressio with quadratic loss Maxim Ragisky October 13, 2015 Regressio with quadratic loss is aother basic problem studied i statistical learig theory. We have a radom couple Z = X,Y, where, as before,

More information

Hypothesis Testing. Evaluation of Performance of Learned h. Issues. Trade-off Between Bias and Variance

Hypothesis Testing. Evaluation of Performance of Learned h. Issues. Trade-off Between Bias and Variance Hypothesis Testig Empirically evaluatig accuracy of hypotheses: importat activity i ML. Three questios: Give observed accuracy over a sample set, how well does this estimate apply over additioal samples?

More information

ECE 901 Lecture 12: Complexity Regularization and the Squared Loss

ECE 901 Lecture 12: Complexity Regularization and the Squared Loss ECE 90 Lecture : Complexity Regularizatio ad the Squared Loss R. Nowak 5/7/009 I the previous lectures we made use of the Cheroff/Hoeffdig bouds for our aalysis of classifier errors. Hoeffdig s iequality

More information

An Introduction to Randomized Algorithms

An Introduction to Randomized Algorithms A Itroductio to Radomized Algorithms The focus of this lecture is to study a radomized algorithm for quick sort, aalyze it usig probabilistic recurrece relatios, ad also provide more geeral tools for aalysis

More information

Rademacher Complexity

Rademacher Complexity EECS 598: Statistical Learig Theory, Witer 204 Topic 0 Rademacher Complexity Lecturer: Clayto Scott Scribe: Ya Deg, Kevi Moo Disclaimer: These otes have ot bee subjected to the usual scrutiy reserved for

More information

10-701/ Machine Learning Mid-term Exam Solution

10-701/ Machine Learning Mid-term Exam Solution 0-70/5-78 Machie Learig Mid-term Exam Solutio Your Name: Your Adrew ID: True or False (Give oe setece explaatio) (20%). (F) For a cotiuous radom variable x ad its probability distributio fuctio p(x), it

More information

1 Review and Overview

1 Review and Overview CS9T/STATS3: Statistical Learig Theory Lecturer: Tegyu Ma Lecture #6 Scribe: Jay Whag ad Patrick Cho October 0, 08 Review ad Overview Recall i the last lecture that for ay family of scalar fuctios F, we

More information

Learning Theory: Lecture Notes

Learning Theory: Lecture Notes Learig Theory: Lecture Notes Kamalika Chaudhuri October 4, 0 Cocetratio of Averages Cocetratio of measure is very useful i showig bouds o the errors of machie-learig algorithms. We will begi with a basic

More information

Glivenko-Cantelli Classes

Glivenko-Cantelli Classes CS28B/Stat24B (Sprig 2008 Statistical Learig Theory Lecture: 4 Gliveko-Catelli Classes Lecturer: Peter Bartlett Scribe: Michelle Besi Itroductio This lecture will cover Gliveko-Catelli (GC classes ad itroduce

More information

Lecture 10 October Minimaxity and least favorable prior sequences

Lecture 10 October Minimaxity and least favorable prior sequences STATS 300A: Theory of Statistics Fall 205 Lecture 0 October 22 Lecturer: Lester Mackey Scribe: Brya He, Rahul Makhijai Warig: These otes may cotai factual ad/or typographic errors. 0. Miimaxity ad least

More information

Lecture 3: August 31

Lecture 3: August 31 36-705: Itermediate Statistics Fall 018 Lecturer: Siva Balakrisha Lecture 3: August 31 This lecture will be mostly a summary of other useful expoetial tail bouds We will ot prove ay of these i lecture,

More information

Convergence of random variables. (telegram style notes) P.J.C. Spreij

Convergence of random variables. (telegram style notes) P.J.C. Spreij Covergece of radom variables (telegram style otes).j.c. Spreij this versio: September 6, 2005 Itroductio As we kow, radom variables are by defiitio measurable fuctios o some uderlyig measurable space

More information

Definitions and Theorems. where x are the decision variables. c, b, and a are constant coefficients.

Definitions and Theorems. where x are the decision variables. c, b, and a are constant coefficients. Defiitios ad Theorems Remember the scalar form of the liear programmig problem, Miimize, Subject to, f(x) = c i x i a 1i x i = b 1 a mi x i = b m x i 0 i = 1,2,, where x are the decisio variables. c, b,

More information

Machine Learning Brett Bernstein

Machine Learning Brett Bernstein Machie Learig Brett Berstei Week 2 Lecture: Cocept Check Exercises Starred problems are optioal. Excess Risk Decompositio 1. Let X = Y = {1, 2,..., 10}, A = {1,..., 10, 11} ad suppose the data distributio

More information

Linear Regression Demystified

Linear Regression Demystified Liear Regressio Demystified Liear regressio is a importat subject i statistics. I elemetary statistics courses, formulae related to liear regressio are ofte stated without derivatio. This ote iteds to

More information

Fall 2013 MTH431/531 Real analysis Section Notes

Fall 2013 MTH431/531 Real analysis Section Notes Fall 013 MTH431/531 Real aalysis Sectio 8.1-8. Notes Yi Su 013.11.1 1. Defiitio of uiform covergece. We look at a sequece of fuctios f (x) ad study the coverget property. Notice we have two parameters

More information

Discrete Mathematics for CS Spring 2008 David Wagner Note 22

Discrete Mathematics for CS Spring 2008 David Wagner Note 22 CS 70 Discrete Mathematics for CS Sprig 2008 David Wager Note 22 I.I.D. Radom Variables Estimatig the bias of a coi Questio: We wat to estimate the proportio p of Democrats i the US populatio, by takig

More information

Econ 325/327 Notes on Sample Mean, Sample Proportion, Central Limit Theorem, Chi-square Distribution, Student s t distribution 1.

Econ 325/327 Notes on Sample Mean, Sample Proportion, Central Limit Theorem, Chi-square Distribution, Student s t distribution 1. Eco 325/327 Notes o Sample Mea, Sample Proportio, Cetral Limit Theorem, Chi-square Distributio, Studet s t distributio 1 Sample Mea By Hiro Kasahara We cosider a radom sample from a populatio. Defiitio

More information

Advanced Stochastic Processes.

Advanced Stochastic Processes. Advaced Stochastic Processes. David Gamarik LECTURE 2 Radom variables ad measurable fuctios. Strog Law of Large Numbers (SLLN). Scary stuff cotiued... Outlie of Lecture Radom variables ad measurable fuctios.

More information

Lecture Chapter 6: Convergence of Random Sequences

Lecture Chapter 6: Convergence of Random Sequences ECE5: Aalysis of Radom Sigals Fall 6 Lecture Chapter 6: Covergece of Radom Sequeces Dr Salim El Rouayheb Scribe: Abhay Ashutosh Doel, Qibo Zhag, Peiwe Tia, Pegzhe Wag, Lu Liu Radom sequece Defiitio A ifiite

More information

(A sequence also can be thought of as the list of function values attained for a function f :ℵ X, where f (n) = x n for n 1.) x 1 x N +k x N +4 x 3

(A sequence also can be thought of as the list of function values attained for a function f :ℵ X, where f (n) = x n for n 1.) x 1 x N +k x N +4 x 3 MATH 337 Sequeces Dr. Neal, WKU Let X be a metric space with distace fuctio d. We shall defie the geeral cocept of sequece ad limit i a metric space, the apply the results i particular to some special

More information

Sieve Estimators: Consistency and Rates of Convergence

Sieve Estimators: Consistency and Rates of Convergence EECS 598: Statistical Learig Theory, Witer 2014 Topic 6 Sieve Estimators: Cosistecy ad Rates of Covergece Lecturer: Clayto Scott Scribe: Julia Katz-Samuels, Brado Oselio, Pi-Yu Che Disclaimer: These otes

More information

Topic 9: Sampling Distributions of Estimators

Topic 9: Sampling Distributions of Estimators Topic 9: Samplig Distributios of Estimators Course 003, 2018 Page 0 Samplig distributios of estimators Sice our estimators are statistics (particular fuctios of radom variables), their distributio ca be

More information

Resampling Methods. X (1/2), i.e., Pr (X i m) = 1/2. We order the data: X (1) X (2) X (n). Define the sample median: ( n.

Resampling Methods. X (1/2), i.e., Pr (X i m) = 1/2. We order the data: X (1) X (2) X (n). Define the sample median: ( n. Jauary 1, 2019 Resamplig Methods Motivatio We have so may estimators with the property θ θ d N 0, σ 2 We ca also write θ a N θ, σ 2 /, where a meas approximately distributed as Oce we have a cosistet estimator

More information

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.436J/15.085J Fall 2008 Lecture 19 11/17/2008 LAWS OF LARGE NUMBERS II THE STRONG LAW OF LARGE NUMBERS

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.436J/15.085J Fall 2008 Lecture 19 11/17/2008 LAWS OF LARGE NUMBERS II THE STRONG LAW OF LARGE NUMBERS MASSACHUSTTS INSTITUT OF TCHNOLOGY 6.436J/5.085J Fall 2008 Lecture 9 /7/2008 LAWS OF LARG NUMBRS II Cotets. The strog law of large umbers 2. The Cheroff boud TH STRONG LAW OF LARG NUMBRS While the weak

More information

Infinite Sequences and Series

Infinite Sequences and Series Chapter 6 Ifiite Sequeces ad Series 6.1 Ifiite Sequeces 6.1.1 Elemetary Cocepts Simply speakig, a sequece is a ordered list of umbers writte: {a 1, a 2, a 3,...a, a +1,...} where the elemets a i represet

More information

Algebra of Least Squares

Algebra of Least Squares October 19, 2018 Algebra of Least Squares Geometry of Least Squares Recall that out data is like a table [Y X] where Y collects observatios o the depedet variable Y ad X collects observatios o the k-dimesioal

More information

Optimally Sparse SVMs

Optimally Sparse SVMs A. Proof of Lemma 3. We here prove a lower boud o the umber of support vectors to achieve geeralizatio bouds of the form which we cosider. Importatly, this result holds ot oly for liear classifiers, but

More information

Topic 9: Sampling Distributions of Estimators

Topic 9: Sampling Distributions of Estimators Topic 9: Samplig Distributios of Estimators Course 003, 2018 Page 0 Samplig distributios of estimators Sice our estimators are statistics (particular fuctios of radom variables), their distributio ca be

More information

Machine Learning Theory Tübingen University, WS 2016/2017 Lecture 3

Machine Learning Theory Tübingen University, WS 2016/2017 Lecture 3 Machie Learig Theory Tübige Uiversity, WS 06/07 Lecture 3 Tolstikhi Ilya Abstract I this lecture we will prove the VC-boud, which provides a high-probability excess risk boud for the ERM algorithm whe

More information

Lecture 19: Convergence

Lecture 19: Convergence Lecture 19: Covergece Asymptotic approach I statistical aalysis or iferece, a key to the success of fidig a good procedure is beig able to fid some momets ad/or distributios of various statistics. I may

More information

Lecture 6 Chi Square Distribution (χ 2 ) and Least Squares Fitting

Lecture 6 Chi Square Distribution (χ 2 ) and Least Squares Fitting Lecture 6 Chi Square Distributio (χ ) ad Least Squares Fittig Chi Square Distributio (χ ) Suppose: We have a set of measuremets {x 1, x, x }. We kow the true value of each x i (x t1, x t, x t ). We would

More information

7.1 Convergence of sequences of random variables

7.1 Convergence of sequences of random variables Chapter 7 Limit Theorems Throughout this sectio we will assume a probability space (, F, P), i which is defied a ifiite sequece of radom variables (X ) ad a radom variable X. The fact that for every ifiite

More information

32 estimating the cumulative distribution function

32 estimating the cumulative distribution function 32 estimatig the cumulative distributio fuctio 4.6 types of cofidece itervals/bads Let F be a class of distributio fuctios F ad let θ be some quatity of iterest, such as the mea of F or the whole fuctio

More information

Machine Learning Theory Tübingen University, WS 2016/2017 Lecture 11

Machine Learning Theory Tübingen University, WS 2016/2017 Lecture 11 Machie Learig Theory Tübige Uiversity, WS 06/07 Lecture Tolstikhi Ilya Abstract We will itroduce the otio of reproducig kerels ad associated Reproducig Kerel Hilbert Spaces (RKHS). We will cosider couple

More information

Topic 9: Sampling Distributions of Estimators

Topic 9: Sampling Distributions of Estimators Topic 9: Samplig Distributios of Estimators Course 003, 2016 Page 0 Samplig distributios of estimators Sice our estimators are statistics (particular fuctios of radom variables), their distributio ca be

More information

Sequences and Series of Functions

Sequences and Series of Functions Chapter 6 Sequeces ad Series of Fuctios 6.1. Covergece of a Sequece of Fuctios Poitwise Covergece. Defiitio 6.1. Let, for each N, fuctio f : A R be defied. If, for each x A, the sequece (f (x)) coverges

More information

Lecture 6 Chi Square Distribution (χ 2 ) and Least Squares Fitting

Lecture 6 Chi Square Distribution (χ 2 ) and Least Squares Fitting Lecture 6 Chi Square Distributio (χ ) ad Least Squares Fittig Chi Square Distributio (χ ) Suppose: We have a set of measuremets {x 1, x, x }. We kow the true value of each x i (x t1, x t, x t ). We would

More information

Lecture Notes for Analysis Class

Lecture Notes for Analysis Class Lecture Notes for Aalysis Class Topological Spaces A topology for a set X is a collectio T of subsets of X such that: (a) X ad the empty set are i T (b) Uios of elemets of T are i T (c) Fiite itersectios

More information

Ada Boost, Risk Bounds, Concentration Inequalities. 1 AdaBoost and Estimates of Conditional Probabilities

Ada Boost, Risk Bounds, Concentration Inequalities. 1 AdaBoost and Estimates of Conditional Probabilities CS8B/Stat4B Sprig 008) Statistical Learig Theory Lecture: Ada Boost, Risk Bouds, Cocetratio Iequalities Lecturer: Peter Bartlett Scribe: Subhrasu Maji AdaBoost ad Estimates of Coditioal Probabilities We

More information

ECE 6980 An Algorithmic and Information-Theoretic Toolbox for Massive Data

ECE 6980 An Algorithmic and Information-Theoretic Toolbox for Massive Data ECE 6980 A Algorithmic ad Iformatio-Theoretic Toolbo for Massive Data Istructor: Jayadev Acharya Lecture # Scribe: Huayu Zhag 8th August, 017 1 Recap X =, ε is a accuracy parameter, ad δ is a error parameter.

More information

Section 11.8: Power Series

Section 11.8: Power Series Sectio 11.8: Power Series 1. Power Series I this sectio, we cosider geeralizig the cocept of a series. Recall that a series is a ifiite sum of umbers a. We ca talk about whether or ot it coverges ad i

More information

ECE 901 Lecture 14: Maximum Likelihood Estimation and Complexity Regularization

ECE 901 Lecture 14: Maximum Likelihood Estimation and Complexity Regularization ECE 90 Lecture 4: Maximum Likelihood Estimatio ad Complexity Regularizatio R Nowak 5/7/009 Review : Maximum Likelihood Estimatio We have iid observatios draw from a ukow distributio Y i iid p θ, i,, where

More information

Apply change-of-basis formula to rewrite x as a linear combination of eigenvectors v j.

Apply change-of-basis formula to rewrite x as a linear combination of eigenvectors v j. Eigevalue-Eigevector Istructor: Nam Su Wag eigemcd Ay vector i real Euclidea space of dimesio ca be uiquely epressed as a liear combiatio of liearly idepedet vectors (ie, basis) g j, j,,, α g α g α g α

More information

Random Variables, Sampling and Estimation

Random Variables, Sampling and Estimation Chapter 1 Radom Variables, Samplig ad Estimatio 1.1 Itroductio This chapter will cover the most importat basic statistical theory you eed i order to uderstad the ecoometric material that will be comig

More information

Introduction to Probability. Ariel Yadin

Introduction to Probability. Ariel Yadin Itroductio to robability Ariel Yadi Lecture 2 *** Ja. 7 ***. Covergece of Radom Variables As i the case of sequeces of umbers, we would like to talk about covergece of radom variables. There are may ways

More information

Machine Learning Theory Tübingen University, WS 2016/2017 Lecture 12

Machine Learning Theory Tübingen University, WS 2016/2017 Lecture 12 Machie Learig Theory Tübige Uiversity, WS 06/07 Lecture Tolstikhi Ilya Abstract I this lecture we derive risk bouds for kerel methods. We will start by showig that Soft Margi kerel SVM correspods to miimizig

More information

January 25, 2017 INTRODUCTION TO MATHEMATICAL STATISTICS

January 25, 2017 INTRODUCTION TO MATHEMATICAL STATISTICS Jauary 25, 207 INTRODUCTION TO MATHEMATICAL STATISTICS Abstract. A basic itroductio to statistics assumig kowledge of probability theory.. Probability I a typical udergraduate problem i probability, we

More information

MATH 320: Probability and Statistics 9. Estimation and Testing of Parameters. Readings: Pruim, Chapter 4

MATH 320: Probability and Statistics 9. Estimation and Testing of Parameters. Readings: Pruim, Chapter 4 MATH 30: Probability ad Statistics 9. Estimatio ad Testig of Parameters Estimatio ad Testig of Parameters We have bee dealig situatios i which we have full kowledge of the distributio of a radom variable.

More information

CEE 522 Autumn Uncertainty Concepts for Geotechnical Engineering

CEE 522 Autumn Uncertainty Concepts for Geotechnical Engineering CEE 5 Autum 005 Ucertaity Cocepts for Geotechical Egieerig Basic Termiology Set A set is a collectio of (mutually exclusive) objects or evets. The sample space is the (collectively exhaustive) collectio

More information

Let us give one more example of MLE. Example 3. The uniform distribution U[0, θ] on the interval [0, θ] has p.d.f.

Let us give one more example of MLE. Example 3. The uniform distribution U[0, θ] on the interval [0, θ] has p.d.f. Lecture 5 Let us give oe more example of MLE. Example 3. The uiform distributio U[0, ] o the iterval [0, ] has p.d.f. { 1 f(x =, 0 x, 0, otherwise The likelihood fuctio ϕ( = f(x i = 1 I(X 1,..., X [0,

More information

Lecture 01: the Central Limit Theorem. 1 Central Limit Theorem for i.i.d. random variables

Lecture 01: the Central Limit Theorem. 1 Central Limit Theorem for i.i.d. random variables CSCI-B609: A Theorist s Toolkit, Fall 06 Aug 3 Lecture 0: the Cetral Limit Theorem Lecturer: Yua Zhou Scribe: Yua Xie & Yua Zhou Cetral Limit Theorem for iid radom variables Let us say that we wat to aalyze

More information

Problem Set 4 Due Oct, 12

Problem Set 4 Due Oct, 12 EE226: Radom Processes i Systems Lecturer: Jea C. Walrad Problem Set 4 Due Oct, 12 Fall 06 GSI: Assae Gueye This problem set essetially reviews detectio theory ad hypothesis testig ad some basic otios

More information

18.657: Mathematics of Machine Learning

18.657: Mathematics of Machine Learning 8.657: Mathematics of Machie Learig Lecturer: Philippe Rigollet Lecture 4 Scribe: Cheg Mao Sep., 05 I this lecture, we cotiue to discuss the effect of oise o the rate of the excess risk E(h) = R(h) R(h

More information

18.657: Mathematics of Machine Learning

18.657: Mathematics of Machine Learning 8.657: Mathematics of Machie Learig Lecturer: Philippe Rigollet Lecture 0 Scribe: Ade Forrow Oct. 3, 05 Recall the followig defiitios from last time: Defiitio: A fuctio K : X X R is called a positive symmetric

More information

7.1 Convergence of sequences of random variables

7.1 Convergence of sequences of random variables Chapter 7 Limit theorems Throughout this sectio we will assume a probability space (Ω, F, P), i which is defied a ifiite sequece of radom variables (X ) ad a radom variable X. The fact that for every ifiite

More information

Lecture 2: Concentration Bounds

Lecture 2: Concentration Bounds CSE 52: Desig ad Aalysis of Algorithms I Sprig 206 Lecture 2: Cocetratio Bouds Lecturer: Shaya Oveis Ghara March 30th Scribe: Syuzaa Sargsya Disclaimer: These otes have ot bee subjected to the usual scrutiy

More information

CS434a/541a: Pattern Recognition Prof. Olga Veksler. Lecture 5

CS434a/541a: Pattern Recognition Prof. Olga Veksler. Lecture 5 CS434a/54a: Patter Recogitio Prof. Olga Veksler Lecture 5 Today Itroductio to parameter estimatio Two methods for parameter estimatio Maimum Likelihood Estimatio Bayesia Estimatio Itroducto Bayesia Decisio

More information

Math 155 (Lecture 3)

Math 155 (Lecture 3) Math 55 (Lecture 3) September 8, I this lecture, we ll cosider the aswer to oe of the most basic coutig problems i combiatorics Questio How may ways are there to choose a -elemet subset of the set {,,,

More information

Linear regression. Daniel Hsu (COMS 4771) (y i x T i β)2 2πσ. 2 2σ 2. 1 n. (x T i β y i ) 2. 1 ˆβ arg min. β R n d

Linear regression. Daniel Hsu (COMS 4771) (y i x T i β)2 2πσ. 2 2σ 2. 1 n. (x T i β y i ) 2. 1 ˆβ arg min. β R n d Liear regressio Daiel Hsu (COMS 477) Maximum likelihood estimatio Oe of the simplest liear regressio models is the followig: (X, Y ),..., (X, Y ), (X, Y ) are iid radom pairs takig values i R d R, ad Y

More information

Chapter 3. Strong convergence. 3.1 Definition of almost sure convergence

Chapter 3. Strong convergence. 3.1 Definition of almost sure convergence Chapter 3 Strog covergece As poited out i the Chapter 2, there are multiple ways to defie the otio of covergece of a sequece of radom variables. That chapter defied covergece i probability, covergece i

More information

Agnostic Learning and Concentration Inequalities

Agnostic Learning and Concentration Inequalities ECE901 Sprig 2004 Statistical Regularizatio ad Learig Theory Lecture: 7 Agostic Learig ad Cocetratio Iequalities Lecturer: Rob Nowak Scribe: Aravid Kailas 1 Itroductio 1.1 Motivatio I the last lecture

More information

Lecture 1 Probability and Statistics

Lecture 1 Probability and Statistics Wikipedia: Lecture 1 Probability ad Statistics Bejami Disraeli, British statesma ad literary figure (1804 1881): There are three kids of lies: lies, damed lies, ad statistics. popularized i US by Mark

More information

Distribution of Random Samples & Limit theorems

Distribution of Random Samples & Limit theorems STAT/MATH 395 A - PROBABILITY II UW Witer Quarter 2017 Néhémy Lim Distributio of Radom Samples & Limit theorems 1 Distributio of i.i.d. Samples Motivatig example. Assume that the goal of a study is to

More information

ECE 330:541, Stochastic Signals and Systems Lecture Notes on Limit Theorems from Probability Fall 2002

ECE 330:541, Stochastic Signals and Systems Lecture Notes on Limit Theorems from Probability Fall 2002 ECE 330:541, Stochastic Sigals ad Systems Lecture Notes o Limit Theorems from robability Fall 00 I practice, there are two ways we ca costruct a ew sequece of radom variables from a old sequece of radom

More information

1 Review and Overview

1 Review and Overview DRAFT a fial versio will be posted shortly CS229T/STATS231: Statistical Learig Theory Lecturer: Tegyu Ma Lecture #3 Scribe: Migda Qiao October 1, 2013 1 Review ad Overview I the first half of this course,

More information

Lecture 7: Properties of Random Samples

Lecture 7: Properties of Random Samples Lecture 7: Properties of Radom Samples 1 Cotiued From Last Class Theorem 1.1. Let X 1, X,...X be a radom sample from a populatio with mea µ ad variace σ

More information

Statistics 511 Additional Materials

Statistics 511 Additional Materials Cofidece Itervals o mu Statistics 511 Additioal Materials This topic officially moves us from probability to statistics. We begi to discuss makig ifereces about the populatio. Oe way to differetiate probability

More information

This section is optional.

This section is optional. 4 Momet Geeratig Fuctios* This sectio is optioal. The momet geeratig fuctio g : R R of a radom variable X is defied as g(t) = E[e tx ]. Propositio 1. We have g () (0) = E[X ] for = 1, 2,... Proof. Therefore

More information

Notes 27 : Brownian motion: path properties

Notes 27 : Brownian motion: path properties Notes 27 : Browia motio: path properties Math 733-734: Theory of Probability Lecturer: Sebastie Roch Refereces:[Dur10, Sectio 8.1], [MP10, Sectio 1.1, 1.2, 1.3]. Recall: DEF 27.1 (Covariace) Let X = (X

More information

Notes 19 : Martingale CLT

Notes 19 : Martingale CLT Notes 9 : Martigale CLT Math 733-734: Theory of Probability Lecturer: Sebastie Roch Refereces: [Bil95, Chapter 35], [Roc, Chapter 3]. Sice we have ot ecoutered weak covergece i some time, we first recall

More information

THE SOLUTION OF NONLINEAR EQUATIONS f( x ) = 0.

THE SOLUTION OF NONLINEAR EQUATIONS f( x ) = 0. THE SOLUTION OF NONLINEAR EQUATIONS f( ) = 0. Noliear Equatio Solvers Bracketig. Graphical. Aalytical Ope Methods Bisectio False Positio (Regula-Falsi) Fied poit iteratio Newto Raphso Secat The root of

More information

Maximum Likelihood Estimation and Complexity Regularization

Maximum Likelihood Estimation and Complexity Regularization ECE90 Sprig 004 Statistical Regularizatio ad Learig Theory Lecture: 4 Maximum Likelihood Estimatio ad Complexity Regularizatio Lecturer: Rob Nowak Scribe: Pam Limpiti Review : Maximum Likelihood Estimatio

More information

7-1. Chapter 4. Part I. Sampling Distributions and Confidence Intervals

7-1. Chapter 4. Part I. Sampling Distributions and Confidence Intervals 7-1 Chapter 4 Part I. Samplig Distributios ad Cofidece Itervals 1 7- Sectio 1. Samplig Distributio 7-3 Usig Statistics Statistical Iferece: Predict ad forecast values of populatio parameters... Test hypotheses

More information

Product measures, Tonelli s and Fubini s theorems For use in MAT3400/4400, autumn 2014 Nadia S. Larsen. Version of 13 October 2014.

Product measures, Tonelli s and Fubini s theorems For use in MAT3400/4400, autumn 2014 Nadia S. Larsen. Version of 13 October 2014. Product measures, Toelli s ad Fubii s theorems For use i MAT3400/4400, autum 2014 Nadia S. Larse Versio of 13 October 2014. 1. Costructio of the product measure The purpose of these otes is to preset the

More information

Intro to Learning Theory

Intro to Learning Theory Lecture 1, October 18, 2016 Itro to Learig Theory Ruth Urer 1 Machie Learig ad Learig Theory Comig soo 2 Formal Framework 21 Basic otios I our formal model for machie learig, the istaces to be classified

More information

Estimation for Complete Data

Estimation for Complete Data Estimatio for Complete Data complete data: there is o loss of iformatio durig study. complete idividual complete data= grouped data A complete idividual data is the oe i which the complete iformatio of

More information

1 Approximating Integrals using Taylor Polynomials

1 Approximating Integrals using Taylor Polynomials Seughee Ye Ma 8: Week 7 Nov Week 7 Summary This week, we will lear how we ca approximate itegrals usig Taylor series ad umerical methods. Topics Page Approximatig Itegrals usig Taylor Polyomials. Defiitios................................................

More information

IIT JAM Mathematical Statistics (MS) 2006 SECTION A

IIT JAM Mathematical Statistics (MS) 2006 SECTION A IIT JAM Mathematical Statistics (MS) 6 SECTION A. If a > for ad lim a / L >, the which of the followig series is ot coverget? (a) (b) (c) (d) (d) = = a = a = a a + / a lim a a / + = lim a / a / + = lim

More information

Geometry of LS. LECTURE 3 GEOMETRY OF LS, PROPERTIES OF σ 2, PARTITIONED REGRESSION, GOODNESS OF FIT

Geometry of LS. LECTURE 3 GEOMETRY OF LS, PROPERTIES OF σ 2, PARTITIONED REGRESSION, GOODNESS OF FIT OCTOBER 7, 2016 LECTURE 3 GEOMETRY OF LS, PROPERTIES OF σ 2, PARTITIONED REGRESSION, GOODNESS OF FIT Geometry of LS We ca thik of y ad the colums of X as members of the -dimesioal Euclidea space R Oe ca

More information

Lecture 3 The Lebesgue Integral

Lecture 3 The Lebesgue Integral Lecture 3: The Lebesgue Itegral 1 of 14 Course: Theory of Probability I Term: Fall 2013 Istructor: Gorda Zitkovic Lecture 3 The Lebesgue Itegral The costructio of the itegral Uless expressly specified

More information

Bertrand s Postulate

Bertrand s Postulate Bertrad s Postulate Lola Thompso Ross Program July 3, 2009 Lola Thompso (Ross Program Bertrad s Postulate July 3, 2009 1 / 33 Bertrad s Postulate I ve said it oce ad I ll say it agai: There s always a

More information

Expectation and Variance of a random variable

Expectation and Variance of a random variable Chapter 11 Expectatio ad Variace of a radom variable The aim of this lecture is to defie ad itroduce mathematical Expectatio ad variace of a fuctio of discrete & cotiuous radom variables ad the distributio

More information

6.3 Testing Series With Positive Terms

6.3 Testing Series With Positive Terms 6.3. TESTING SERIES WITH POSITIVE TERMS 307 6.3 Testig Series With Positive Terms 6.3. Review of what is kow up to ow I theory, testig a series a i for covergece amouts to fidig the i= sequece of partial

More information

A survey on penalized empirical risk minimization Sara A. van de Geer

A survey on penalized empirical risk minimization Sara A. van de Geer A survey o pealized empirical risk miimizatio Sara A. va de Geer We address the questio how to choose the pealty i empirical risk miimizatio. Roughly speakig, this pealty should be a good boud for the

More information

1 Inferential Methods for Correlation and Regression Analysis

1 Inferential Methods for Correlation and Regression Analysis 1 Iferetial Methods for Correlatio ad Regressio Aalysis I the chapter o Correlatio ad Regressio Aalysis tools for describig bivariate cotiuous data were itroduced. The sample Pearso Correlatio Coefficiet

More information

Introduction to Machine Learning DIS10

Introduction to Machine Learning DIS10 CS 189 Fall 017 Itroductio to Machie Learig DIS10 1 Fu with Lagrage Multipliers (a) Miimize the fuctio such that f (x,y) = x + y x + y = 3. Solutio: The Lagragia is: L(x,y,λ) = x + y + λ(x + y 3) Takig

More information

Singular Continuous Measures by Michael Pejic 5/14/10

Singular Continuous Measures by Michael Pejic 5/14/10 Sigular Cotiuous Measures by Michael Peic 5/4/0 Prelimiaries Give a set X, a σ-algebra o X is a collectio of subsets of X that cotais X ad ad is closed uder complemetatio ad coutable uios hece, coutable

More information

ACCESS TO SCIENCE, ENGINEERING AND AGRICULTURE: MATHEMATICS 1 MATH00030 SEMESTER / Statistics

ACCESS TO SCIENCE, ENGINEERING AND AGRICULTURE: MATHEMATICS 1 MATH00030 SEMESTER / Statistics ACCESS TO SCIENCE, ENGINEERING AND AGRICULTURE: MATHEMATICS 1 MATH00030 SEMESTER 1 018/019 DR. ANTHONY BROWN 8. Statistics 8.1. Measures of Cetre: Mea, Media ad Mode. If we have a series of umbers the

More information

Section 14. Simple linear regression.

Section 14. Simple linear regression. Sectio 14 Simple liear regressio. Let us look at the cigarette dataset from [1] (available to dowload from joural s website) ad []. The cigarette dataset cotais measuremets of tar, icotie, weight ad carbo

More information

Journal of Multivariate Analysis. Superefficient estimation of the marginals by exploiting knowledge on the copula

Journal of Multivariate Analysis. Superefficient estimation of the marginals by exploiting knowledge on the copula Joural of Multivariate Aalysis 102 (2011) 1315 1319 Cotets lists available at ScieceDirect Joural of Multivariate Aalysis joural homepage: www.elsevier.com/locate/jmva Superefficiet estimatio of the margials

More information

Lecture 6: Integration and the Mean Value Theorem. slope =

Lecture 6: Integration and the Mean Value Theorem. slope = Math 8 Istructor: Padraic Bartlett Lecture 6: Itegratio ad the Mea Value Theorem Week 6 Caltech 202 The Mea Value Theorem The Mea Value Theorem abbreviated MVT is the followig result: Theorem. Suppose

More information

Machine Learning Theory (CS 6783)

Machine Learning Theory (CS 6783) Machie Learig Theory (CS 6783) Lecture 2 : Learig Frameworks, Examples Settig up learig problems. X : istace space or iput space Examples: Computer Visio: Raw M N image vectorized X = 0, 255 M N, SIFT

More information

Notes on iteration and Newton s method. Iteration

Notes on iteration and Newton s method. Iteration Notes o iteratio ad Newto s method Iteratio Iteratio meas doig somethig over ad over. I our cotet, a iteratio is a sequece of umbers, vectors, fuctios, etc. geerated by a iteratio rule of the type 1 f

More information

Lecture 2 February 8, 2016

Lecture 2 February 8, 2016 MIT 6.854/8.45: Advaced Algorithms Sprig 206 Prof. Akur Moitra Lecture 2 February 8, 206 Scribe: Calvi Huag, Lih V. Nguye I this lecture, we aalyze the problem of schedulig equal size tasks arrivig olie

More information