1 Review and Overview
|
|
- Morris Beasley
- 5 years ago
- Views:
Transcription
1 DRAFT a fial versio will be posted shortly CS229T/STATS231: Statistical Learig Theory Lecturer: Tegyu Ma Lecture #3 Scribe: Migda Qiao October 1, Review ad Overview I the first half of this course, the cetral questio that we wat to aswer is: Why miimizig the traiig error ofte leads to a small testig error? I the last lecture, we proved the asymptotics of the maximum likelihood estimator (MLE). I particular, as the umber of traiig examples, deoted by, teds to ifiity, L(ˆθ) L(θ ) p 2 + o ( ) 1. Here L is the expected loss. ˆθ is the miimizer of the traiig loss, while θ is the groud truth parameter. This result partially explais why the MLE, which has the smallest traiig loss, is also likely to achieve a small testig error whe there are eough traiig examples. Oe limitatio of the above result is that it requires well-specifiedess, i.e., the data are distributed precisely accordig to a particular groud truth parameter θ i the parameter space. We would like to prove a more geeral result i the followig form without assumig well-specifiedess. 1 L(ˆθ) L(θ ) f(p, ), p, 1. Aother limitatio of this asymptotic result is that it igores the depedece of higher order terms o other hyperparameters (i this case, the dimesio p). Cosider the followig two fuctios, both of which are of order p 2 + o ( 1 ) : p vs p 2 + p Arguably, the first oe is a better upper boud sice the secod boud requires > p 50 traiig examples to be below 1. From ow o, we restrict our attetio to the o-asymptotic regime, where is fiite. I this lecture, we preview the form of results that we would like to prove i the followig few lectures. The we itroduce the uiform covergece framework, which we will istatiate i various settigs to prove geeralizatio bouds. We ed this lecture by provig uiform covergece o fiite hypothesis classes. 2 Notatios I the o-asymptotic settig, we oly igore the absolute costats (also kow as uiversal costats). Specifically, wheever otatio O(X) appears i a statemet, it meas that there exists a uiversal costat c such that the statemet holds if we replace O(X) by c X. Similarly, we write A B as a shorthad for A O(B). I the followig, we review a few cetral otatios from previous lectures. 1 I this case, θ eeds to be redefied as the miimizer of the expected loss.
2 Hypothesis space: H is a family of hypotheses, i.e., predictio fuctios. Loss fuctio: l : (X Y) H R. This is aalogous to the otatio l((x, y), θ) for loss fuctios i the last lecture, yet it accommodates the geeral case where we caot aturally parametrize H by a cotiuum of parameters. Expected loss: L(h) = E (x,y) P l((x, y), h), where P is a data distributio over X Y. Moreover, we defie h argmi L(h) as the miimizer of the expected loss. Traiig loss (also kow as empirical risk): ˆL(h) = 1 i=1 l (( x (i), y (i)), h ), where ( x (1), y (1)), ( x (2), y (2)),..., ( x (), y ()) are traiig examples draw i.i.d. from P. Empirical risk miimizer (ERM): ĥ argmi ˆL(h). Remark 1. For a fixed data distributio P, there is o radomess i h, sice h is just the miimizer of a determiistic fuctio L(h). O the other had, ĥ is ideed a radom variable, as the traiig loss fuctio ˆL is defied based o the traiig examples. The depedece betwee some of the above cocepts is depicted i the followig diagram: 3 Objective Our goal is to prove a upper boud o L(ĥ) L(h ), the differece betwee the expected loss of the ERM ĥ ad that of the optimal hypothesis h. I particular, the results that we are goig to prove i this ad the followig lectures are of the followig form: Pr L(ĥ) L(h ) > ɛ δ. I words, with probability at least 1 δ, it holds that L(ĥ) L(h ) ɛ. 2 We ca iterpret L(ĥ) L(h ) > ɛ as a failure evet sice it meas that the ERM ĥ has a excess risk greater tha parameter ɛ, which is udesirable. Therefore, we would like the probability of this failure evet to be as small as possible. Moreover, the smaller ɛ is, the stricter we are whe evaluatig the performace of ĥ. Thus, we would like parameters ɛ ad δ to be as small as possible, give a fixed umber of traiig examples. 3 Recall the defiitio of the ERM ĥ. We have ˆL(ĥ) = mi ˆL(h) ˆL(h ). I order to prove a high-probability boud of form L(ĥ) L(h ) + extra term, 2 Here ad i the followig, the probability is always take over the radomess i traiig examples uless otherwise specified. 3 As a geeral rule of thumb, we ca make δ iverse polyomially small without usig too may traiig examples, while the cost of miimizig ɛ is geerally higher. 2
3 it remais to argue that ˆL(ĥ) L(ĥ) ad ˆL(h ) L(h ) (up to a small additive error) with high probability. Provig ˆL(h ) L(h ) is relatively simple: ˆL(h ) = 1 i=1 l((x(i), y (i) ), h ) is the average of i.i.d. radom variables, each with expectatio L(h ), so we ca prove a boud of the followig form by applyig stadard cocetratio iequalities: Pr ˆL(h ) L(h ) ɛ 1 δ. I fact, this argumet holds for ay fixed hypothesis h H, as log as h does ot deped o the traiig examples. The difficulty is i provig ˆL(ĥ) L(ĥ) sice ĥ is ideed a radom variable that depeds o the traiig examples (see Remark 1). Istead of a cocetratio boud for a sigle fixed hypothesis h, we eed a stroger cocetratio property that holds for every hypothesis i H simultaeously. This is where the otio of uiform covergece comes ito play. 4 Uiform Covergece Uiform covergece is a property of the hypothesis class H of the followig form: Pr h H, ˆL(h) L(h) ɛ 1 δ. (1) I words, it states that with probability at least 1 δ over the radom draw of traiig data, the traiig loss is poitwise close to the expected loss, up to a additive error of at most ɛ. I geeral, if we parametrize the hypothesis space by R, we would expect the picture of traiig loss ad expected loss to be as i Figure 1(a) if uiform covergece holds. It turs out that for particular learig tasks, the traiig loss exhibits a icer ladscape: it is ot oly poitwise close to the expected loss but also of the same shape, which is iformally depicted i Figure 1(b). (See GLM16, MBM18 for some of the recet work alog this lie of research.) Nevertheless, as oly uiform covergece is cocered i this lecture, we do ot distiguish these two differet ladscapes. (a) (b) Figure 1: Two differet empirical risk ladscapes. The blue lie ad the red lie deote the expected ad traiig losses. The dashed gree lies deote the expected loss ± ɛ. 3
4 4.1 Uiform Covergece Implies Geeralizatio Before provig uiform covergece for specific hypothesis classes, we first demostrate how it implies geeralizatio, i.e., a upper boud o L(ĥ) L(h ). We ca write L(ĥ) L(h ) as L(ĥ) L(ĥ) L(h ) = ˆL(ĥ) + ˆL( ĥ) ˆL(h ) + ˆL(h ) L(h ) L(ĥ) ˆL(ĥ) ˆL(h ) L(h ) 2 sup ˆL(h) L(h). Here the secod step applies the defiitio of ĥ, ad the third step follows directly from the defiitio of supremum. By Equatio (2), we have the followig implicatio h H, ˆL(h) L(h) ɛ = sup ˆL(h) L(h) ɛ = L(ĥ) L(h ) 2ɛ. (3) Therefore, if we could prove uiform covergece (1) for hypothesis class H, we have Pr L(ĥ) L(h ) 2ɛ Pr h H, ˆL(h) L(h) ɛ 1 δ, a geeralizatio boud that we desire. 4.2 Fiite Hypothesis Classes Now we prove that uiform covergece ideed holds for fiite hypothesis classes. Recall that is the umber of examples draw i.i.d. from the data distributio, ad the probability is always take over the radomess i traiig examples. Theorem 2. If H is fiite ad l((x, y), h) 0, 1, we have the followig statemets: (1) For ay fixed h H ad ɛ > 0, Pr ˆL(h) L(h) ɛ 1 2e 2ɛ2. (2) (2) For ay ɛ > 0, Pr h H, ˆL(h) L(h) ɛ 1 2 H e 2ɛ2. (3) With probability at least 1 δ, for ay h H, (4) With probability at least 1 δ, ˆL(h) L(h) L(ĥ) L(h ) l H + l 2 δ. 2 l H + l 1 δ. 4
5 The proof of the theorem relies o the followig cocetratio iequality, which is a quatitative versio of the cetral limit theorem. Lemma 3 (Hoeffdig s iequality). Let X 1, X 2,..., X be idepedet radom variables such that a i X i b i almost surely for each i. The, for ay ɛ > 0, 1 1 ( 2 2 ɛ 2 ) Pr X i E X i ɛ 1 exp i=1 (b i a i ) 2. i=1 i=1 Now we are ready to prove Theorem 2. We first prove Statemet (1) usig Hoeffdig s iequality, the we show that each statemet directly implies the ext. Proof of Theorem 2. Statemet (1) follows from Hoeffdig s iequality by takig X i = l((x (i), y (i) ), h), a i = 0 ad b i = 1 for each i i Lemma 3. The, Statemet (2) follows from a uio boud: Pr h H, ˆL(h) L(h) > ɛ Pr ˆL(h) L(h) > ɛ (uio boud) 2e 2ɛ2 = 2 H e 2ɛ2. (Statemet (1)) l H +l 2 δ Statemet (3) follows from pluggig ɛ = 2 ito Statemet (2). Fially, Statemet (3) ad the implicatio i Equatio (3) imply Statemet (4). 5 Digressio: The PAC Learig Framework Probably approximately correct (PAC) learig is a theoretical framework of machie learig proposed by Valiat Val84. Oe of the key defiitios i PAC learig is the otio of PAC learig algorithms. Defiitio 4 (PAC learig algorithm). Algorithm A is a PAC learig algorithm for hypothesis class H, if for ay distributio P over X Y, ɛ > 0 ad δ (0, 1), (( ĥ = A x (1), y (1)) (, x (2), y (2)),..., (x (), y ())) satisfies Pr L(ĥ) L(h ) > ɛ δ, ad A rus i polyomial time with respect to size(x ), 1 ɛ ad 1 δ. Remark 5. Iformally, size(x ) is the umber of bits to describe a elemet of X. For example, size(x ) = log 2 X if X is fiite. I the geeral case where X is parametrized by real umbers, this defiitio requires X to be discretized first. Remark 6. Defiitio 4 implicitly requires the umber of traiig examples,, to be polyomial i size(x ), 1 ɛ ad 1 δ. Otherwise, A does ot have eough time to read its etire iput. 5
6 Oe limitatio of Valiat s framework is that it requires the algorithm to work o every data distributio P, which turs out to be overly ambitious ad thus urealistic. I cotrast, recet research o learig theory usually requires certai assumptios o the data distributio, e.g., the distributio is Gaussia or realizable (i.e., the hypothesis class cotais the fuctio to be leared). Thus, it is worth thikig about the role of assumptios o learig theory research. Suppose we could prove the followig three theorems: Theorem A. Statemet P implies Statemet Q. Theorem B. Uder certai assumptios, P implies Q. Theorem C. Uder certai assumptios, P implies a statemet stroger tha Q. Theorem B is defiitely the weakest amog these three, yet Theorems A ad C are icomparable. It is ofte a matter of taste whether oe prefers A or C. I deep learig theory, however, Statemet Q is ofte vacuous for most practical uses. I this case, we had better prove a result similar to Theorem C, ad the fid out how the assumptios ca be weakeed ad evetually removed. Refereces GLM16 Rog Ge, Jaso D Lee, ad Tegyu Ma. Matrix completio has o spurious local miimum. I Advaces i Neural Iformatio Processig Systems (NIPS), pages , MBM18 Sog Mei, Yu Bai, ad Adrea Motaari. The ladscape of empirical risk for ocovex losses. The Aals of Statistics, 46(6A): , Val84 Leslie G Valiat. A theory of the learable. Commuicatios of the ACM (CACM), 27(11): ,
1 Review and Overview
CS9T/STATS3: Statistical Learig Theory Lecturer: Tegyu Ma Lecture #6 Scribe: Jay Whag ad Patrick Cho October 0, 08 Review ad Overview Recall i the last lecture that for ay family of scalar fuctios F, we
More informationIntro to Learning Theory
Lecture 1, October 18, 2016 Itro to Learig Theory Ruth Urer 1 Machie Learig ad Learig Theory Comig soo 2 Formal Framework 21 Basic otios I our formal model for machie learig, the istaces to be classified
More informationConvergence of random variables. (telegram style notes) P.J.C. Spreij
Covergece of radom variables (telegram style otes).j.c. Spreij this versio: September 6, 2005 Itroductio As we kow, radom variables are by defiitio measurable fuctios o some uderlyig measurable space
More informationRegression with quadratic loss
Regressio with quadratic loss Maxim Ragisky October 13, 2015 Regressio with quadratic loss is aother basic problem studied i statistical learig theory. We have a radom couple Z = X,Y, where, as before,
More informationMachine Learning Theory Tübingen University, WS 2016/2017 Lecture 12
Machie Learig Theory Tübige Uiversity, WS 06/07 Lecture Tolstikhi Ilya Abstract I this lecture we derive risk bouds for kerel methods. We will start by showig that Soft Margi kerel SVM correspods to miimizig
More informationEmpirical Process Theory and Oracle Inequalities
Stat 928: Statistical Learig Theory Lecture: 10 Empirical Process Theory ad Oracle Iequalities Istructor: Sham Kakade 1 Risk vs Risk See Lecture 0 for a discussio o termiology. 2 The Uio Boud / Boferoi
More informationREGRESSION WITH QUADRATIC LOSS
REGRESSION WITH QUADRATIC LOSS MAXIM RAGINSKY Regressio with quadratic loss is aother basic problem studied i statistical learig theory. We have a radom couple Z = X, Y ), where, as before, X is a R d
More informationAda Boost, Risk Bounds, Concentration Inequalities. 1 AdaBoost and Estimates of Conditional Probabilities
CS8B/Stat4B Sprig 008) Statistical Learig Theory Lecture: Ada Boost, Risk Bouds, Cocetratio Iequalities Lecturer: Peter Bartlett Scribe: Subhrasu Maji AdaBoost ad Estimates of Coditioal Probabilities We
More informationLet us give one more example of MLE. Example 3. The uniform distribution U[0, θ] on the interval [0, θ] has p.d.f.
Lecture 5 Let us give oe more example of MLE. Example 3. The uiform distributio U[0, ] o the iterval [0, ] has p.d.f. { 1 f(x =, 0 x, 0, otherwise The likelihood fuctio ϕ( = f(x i = 1 I(X 1,..., X [0,
More informationMachine Learning Theory Tübingen University, WS 2016/2017 Lecture 3
Machie Learig Theory Tübige Uiversity, WS 06/07 Lecture 3 Tolstikhi Ilya Abstract I this lecture we will prove the VC-boud, which provides a high-probability excess risk boud for the ERM algorithm whe
More informationOptimally Sparse SVMs
A. Proof of Lemma 3. We here prove a lower boud o the umber of support vectors to achieve geeralizatio bouds of the form which we cosider. Importatly, this result holds ot oly for liear classifiers, but
More informationLecture 14: Graph Entropy
15-859: Iformatio Theory ad Applicatios i TCS Sprig 2013 Lecture 14: Graph Etropy March 19, 2013 Lecturer: Mahdi Cheraghchi Scribe: Euiwoog Lee 1 Recap Bergma s boud o the permaet Shearer s Lemma Number
More informationECE 901 Lecture 12: Complexity Regularization and the Squared Loss
ECE 90 Lecture : Complexity Regularizatio ad the Squared Loss R. Nowak 5/7/009 I the previous lectures we made use of the Cheroff/Hoeffdig bouds for our aalysis of classifier errors. Hoeffdig s iequality
More informationLecture 10 October Minimaxity and least favorable prior sequences
STATS 300A: Theory of Statistics Fall 205 Lecture 0 October 22 Lecturer: Lester Mackey Scribe: Brya He, Rahul Makhijai Warig: These otes may cotai factual ad/or typographic errors. 0. Miimaxity ad least
More informationMaximum Likelihood Estimation and Complexity Regularization
ECE90 Sprig 004 Statistical Regularizatio ad Learig Theory Lecture: 4 Maximum Likelihood Estimatio ad Complexity Regularizatio Lecturer: Rob Nowak Scribe: Pam Limpiti Review : Maximum Likelihood Estimatio
More information1 Convergence in Probability and the Weak Law of Large Numbers
36-752 Advaced Probability Overview Sprig 2018 8. Covergece Cocepts: i Probability, i L p ad Almost Surely Istructor: Alessadro Rialdo Associated readig: Sec 2.4, 2.5, ad 4.11 of Ash ad Doléas-Dade; Sec
More informationECE 901 Lecture 14: Maximum Likelihood Estimation and Complexity Regularization
ECE 90 Lecture 4: Maximum Likelihood Estimatio ad Complexity Regularizatio R Nowak 5/7/009 Review : Maximum Likelihood Estimatio We have iid observatios draw from a ukow distributio Y i iid p θ, i,, where
More informationIntroductory statistics
CM9S: Machie Learig for Bioiformatics Lecture - 03/3/06 Itroductory statistics Lecturer: Sriram Sakararama Scribe: Sriram Sakararama We will provide a overview of statistical iferece focussig o the key
More informationChapter 3. Strong convergence. 3.1 Definition of almost sure convergence
Chapter 3 Strog covergece As poited out i the Chapter 2, there are multiple ways to defie the otio of covergece of a sequece of radom variables. That chapter defied covergece i probability, covergece i
More informationMachine Learning Theory (CS 6783)
Machie Learig Theory (CS 6783) Lecture 2 : Learig Frameworks, Examples Settig up learig problems. X : istace space or iput space Examples: Computer Visio: Raw M N image vectorized X = 0, 255 M N, SIFT
More informationLecture 10: Universal coding and prediction
0-704: Iformatio Processig ad Learig Sprig 0 Lecture 0: Uiversal codig ad predictio Lecturer: Aarti Sigh Scribes: Georg M. Goerg Disclaimer: These otes have ot bee subjected to the usual scrutiy reserved
More informationMachine Learning Brett Bernstein
Machie Learig Brett Berstei Week 2 Lecture: Cocept Check Exercises Starred problems are optioal. Excess Risk Decompositio 1. Let X = Y = {1, 2,..., 10}, A = {1,..., 10, 11} ad suppose the data distributio
More informationSieve Estimators: Consistency and Rates of Convergence
EECS 598: Statistical Learig Theory, Witer 2014 Topic 6 Sieve Estimators: Cosistecy ad Rates of Covergece Lecturer: Clayto Scott Scribe: Julia Katz-Samuels, Brado Oselio, Pi-Yu Che Disclaimer: These otes
More informationRates of Convergence by Moduli of Continuity
Rates of Covergece by Moduli of Cotiuity Joh Duchi: Notes for Statistics 300b March, 017 1 Itroductio I this ote, we give a presetatio showig the importace, ad relatioship betwee, the modulis of cotiuity
More informationLecture 2. The Lovász Local Lemma
Staford Uiversity Sprig 208 Math 233A: No-costructive methods i combiatorics Istructor: Ja Vodrák Lecture date: Jauary 0, 208 Origial scribe: Apoorva Khare Lecture 2. The Lovász Local Lemma 2. Itroductio
More informationRademacher Complexity
EECS 598: Statistical Learig Theory, Witer 204 Topic 0 Rademacher Complexity Lecturer: Clayto Scott Scribe: Ya Deg, Kevi Moo Disclaimer: These otes have ot bee subjected to the usual scrutiy reserved for
More informationLearning Theory: Lecture Notes
Learig Theory: Lecture Notes Kamalika Chaudhuri October 4, 0 Cocetratio of Averages Cocetratio of measure is very useful i showig bouds o the errors of machie-learig algorithms. We will begi with a basic
More informationMASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.436J/15.085J Fall 2008 Lecture 19 11/17/2008 LAWS OF LARGE NUMBERS II THE STRONG LAW OF LARGE NUMBERS
MASSACHUSTTS INSTITUT OF TCHNOLOGY 6.436J/5.085J Fall 2008 Lecture 9 /7/2008 LAWS OF LARG NUMBRS II Cotets. The strog law of large umbers 2. The Cheroff boud TH STRONG LAW OF LARG NUMBRS While the weak
More information18.657: Mathematics of Machine Learning
8.657: Mathematics of Machie Learig Lecturer: Philippe Rigollet Lecture 4 Scribe: Cheg Mao Sep., 05 I this lecture, we cotiue to discuss the effect of oise o the rate of the excess risk E(h) = R(h) R(h
More informationNotes 19 : Martingale CLT
Notes 9 : Martigale CLT Math 733-734: Theory of Probability Lecturer: Sebastie Roch Refereces: [Bil95, Chapter 35], [Roc, Chapter 3]. Sice we have ot ecoutered weak covergece i some time, we first recall
More informationAgnostic Learning and Concentration Inequalities
ECE901 Sprig 2004 Statistical Regularizatio ad Learig Theory Lecture: 7 Agostic Learig ad Cocetratio Iequalities Lecturer: Rob Nowak Scribe: Aravid Kailas 1 Itroductio 1.1 Motivatio I the last lecture
More informationStatistical Machine Learning II Spring 2017, Learning Theory, Lecture 7
Statistical Machie Learig II Sprig 2017, Learig Theory, Lecture 7 1 Itroductio Jea Hoorio jhoorio@purdue.edu So far we have see some techiques for provig geeralizatio for coutably fiite hypothesis classes
More information10-701/ Machine Learning Mid-term Exam Solution
0-70/5-78 Machie Learig Mid-term Exam Solutio Your Name: Your Adrew ID: True or False (Give oe setece explaatio) (20%). (F) For a cotiuous radom variable x ad its probability distributio fuctio p(x), it
More informationSupplementary Material for Fast Stochastic AUC Maximization with O(1/n)-Convergence Rate
Supplemetary Material for Fast Stochastic AUC Maximizatio with O/-Covergece Rate Migrui Liu Xiaoxua Zhag Zaiyi Che Xiaoyu Wag 3 iabao Yag echical Lemmas ized versio of Hoeffdig s iequality, ote that We
More informationAn Introduction to Randomized Algorithms
A Itroductio to Radomized Algorithms The focus of this lecture is to study a radomized algorithm for quick sort, aalyze it usig probabilistic recurrece relatios, ad also provide more geeral tools for aalysis
More informationTopic 9: Sampling Distributions of Estimators
Topic 9: Samplig Distributios of Estimators Course 003, 2018 Page 0 Samplig distributios of estimators Sice our estimators are statistics (particular fuctios of radom variables), their distributio ca be
More informationGlivenko-Cantelli Classes
CS28B/Stat24B (Sprig 2008 Statistical Learig Theory Lecture: 4 Gliveko-Catelli Classes Lecturer: Peter Bartlett Scribe: Michelle Besi Itroductio This lecture will cover Gliveko-Catelli (GC classes ad itroduce
More informationSection 11.8: Power Series
Sectio 11.8: Power Series 1. Power Series I this sectio, we cosider geeralizig the cocept of a series. Recall that a series is a ifiite sum of umbers a. We ca talk about whether or ot it coverges ad i
More informationECE 901 Lecture 13: Maximum Likelihood Estimation
ECE 90 Lecture 3: Maximum Likelihood Estimatio R. Nowak 5/7/009 The focus of this lecture is to cosider aother approach to learig based o maximum likelihood estimatio. Ulike earlier approaches cosidered
More informationLecture 3: August 31
36-705: Itermediate Statistics Fall 018 Lecturer: Siva Balakrisha Lecture 3: August 31 This lecture will be mostly a summary of other useful expoetial tail bouds We will ot prove ay of these i lecture,
More informationMachine Learning Theory Tübingen University, WS 2016/2017 Lecture 11
Machie Learig Theory Tübige Uiversity, WS 06/07 Lecture Tolstikhi Ilya Abstract We will itroduce the otio of reproducig kerels ad associated Reproducig Kerel Hilbert Spaces (RKHS). We will cosider couple
More informationCS434a/541a: Pattern Recognition Prof. Olga Veksler. Lecture 5
CS434a/54a: Patter Recogitio Prof. Olga Veksler Lecture 5 Today Itroductio to parameter estimatio Two methods for parameter estimatio Maimum Likelihood Estimatio Bayesia Estimatio Itroducto Bayesia Decisio
More informationLecture 6 Chi Square Distribution (χ 2 ) and Least Squares Fitting
Lecture 6 Chi Square Distributio (χ ) ad Least Squares Fittig Chi Square Distributio (χ ) Suppose: We have a set of measuremets {x 1, x, x }. We kow the true value of each x i (x t1, x t, x t ). We would
More information7.1 Convergence of sequences of random variables
Chapter 7 Limit theorems Throughout this sectio we will assume a probability space (Ω, F, P), i which is defied a ifiite sequece of radom variables (X ) ad a radom variable X. The fact that for every ifiite
More informationLECTURE 14 NOTES. A sequence of α-level tests {ϕ n (x)} is consistent if
LECTURE 14 NOTES 1. Asymptotic power of tests. Defiitio 1.1. A sequece of -level tests {ϕ x)} is cosistet if β θ) := E θ [ ϕ x) ] 1 as, for ay θ Θ 1. Just like cosistecy of a sequece of estimators, Defiitio
More informationDiscrete Mathematics for CS Spring 2008 David Wagner Note 22
CS 70 Discrete Mathematics for CS Sprig 2008 David Wager Note 22 I.I.D. Radom Variables Estimatig the bias of a coi Questio: We wat to estimate the proportio p of Democrats i the US populatio, by takig
More informationFrequentist Inference
Frequetist Iferece The topics of the ext three sectios are useful applicatios of the Cetral Limit Theorem. Without kowig aythig about the uderlyig distributio of a sequece of radom variables {X i }, for
More informationLecture 9: Boosting. Akshay Krishnamurthy October 3, 2017
Lecture 9: Boostig Akshay Krishamurthy akshay@csumassedu October 3, 07 Recap Last week we discussed some algorithmic aspects of machie learig We saw oe very powerful family of learig algorithms, amely
More informationNotes 5 : More on the a.s. convergence of sums
Notes 5 : More o the a.s. covergece of sums Math 733-734: Theory of Probability Lecturer: Sebastie Roch Refereces: Dur0, Sectios.5; Wil9, Sectio 4.7, Shi96, Sectio IV.4, Dur0, Sectio.. Radom series. Three-series
More informationA survey on penalized empirical risk minimization Sara A. van de Geer
A survey o pealized empirical risk miimizatio Sara A. va de Geer We address the questio how to choose the pealty i empirical risk miimizatio. Roughly speakig, this pealty should be a good boud for the
More informationInfinite Sequences and Series
Chapter 6 Ifiite Sequeces ad Series 6.1 Ifiite Sequeces 6.1.1 Elemetary Cocepts Simply speakig, a sequece is a ordered list of umbers writte: {a 1, a 2, a 3,...a, a +1,...} where the elemets a i represet
More informationLecture 6 Chi Square Distribution (χ 2 ) and Least Squares Fitting
Lecture 6 Chi Square Distributio (χ ) ad Least Squares Fittig Chi Square Distributio (χ ) Suppose: We have a set of measuremets {x 1, x, x }. We kow the true value of each x i (x t1, x t, x t ). We would
More informationLecture 01: the Central Limit Theorem. 1 Central Limit Theorem for i.i.d. random variables
CSCI-B609: A Theorist s Toolkit, Fall 06 Aug 3 Lecture 0: the Cetral Limit Theorem Lecturer: Yua Zhou Scribe: Yua Xie & Yua Zhou Cetral Limit Theorem for iid radom variables Let us say that we wat to aalyze
More informationLecture 7: October 18, 2017
Iformatio ad Codig Theory Autum 207 Lecturer: Madhur Tulsiai Lecture 7: October 8, 207 Biary hypothesis testig I this lecture, we apply the tools developed i the past few lectures to uderstad the problem
More informationEcon 325 Notes on Point Estimator and Confidence Interval 1 By Hiro Kasahara
Poit Estimator Eco 325 Notes o Poit Estimator ad Cofidece Iterval 1 By Hiro Kasahara Parameter, Estimator, ad Estimate The ormal probability desity fuctio is fully characterized by two costats: populatio
More informationTopic 9: Sampling Distributions of Estimators
Topic 9: Samplig Distributios of Estimators Course 003, 2016 Page 0 Samplig distributios of estimators Sice our estimators are statistics (particular fuctios of radom variables), their distributio ca be
More informationEmpirical Processes: Glivenko Cantelli Theorems
Empirical Processes: Gliveko Catelli Theorems Mouliath Baerjee Jue 6, 200 Gliveko Catelli classes of fuctios The reader is referred to Chapter.6 of Weller s Torgo otes, Chapter??? of VDVW ad Chapter 8.3
More information(A sequence also can be thought of as the list of function values attained for a function f :ℵ X, where f (n) = x n for n 1.) x 1 x N +k x N +4 x 3
MATH 337 Sequeces Dr. Neal, WKU Let X be a metric space with distace fuctio d. We shall defie the geeral cocept of sequece ad limit i a metric space, the apply the results i particular to some special
More information6.3 Testing Series With Positive Terms
6.3. TESTING SERIES WITH POSITIVE TERMS 307 6.3 Testig Series With Positive Terms 6.3. Review of what is kow up to ow I theory, testig a series a i for covergece amouts to fidig the i= sequece of partial
More informationMA131 - Analysis 1. Workbook 3 Sequences II
MA3 - Aalysis Workbook 3 Sequeces II Autum 2004 Cotets 2.8 Coverget Sequeces........................ 2.9 Algebra of Limits......................... 2 2.0 Further Useful Results........................
More informationProduct measures, Tonelli s and Fubini s theorems For use in MAT3400/4400, autumn 2014 Nadia S. Larsen. Version of 13 October 2014.
Product measures, Toelli s ad Fubii s theorems For use i MAT3400/4400, autum 2014 Nadia S. Larse Versio of 13 October 2014. 1. Costructio of the product measure The purpose of these otes is to preset the
More informationLecture 11: Pseudorandom functions
COM S 6830 Cryptography Oct 1, 2009 Istructor: Rafael Pass 1 Recap Lecture 11: Pseudoradom fuctios Scribe: Stefao Ermo Defiitio 1 (Ge, Ec, Dec) is a sigle message secure ecryptio scheme if for all uppt
More informationSequences and Series of Functions
Chapter 6 Sequeces ad Series of Fuctios 6.1. Covergece of a Sequece of Fuctios Poitwise Covergece. Defiitio 6.1. Let, for each N, fuctio f : A R be defied. If, for each x A, the sequece (f (x)) coverges
More information17. Joint distributions of extreme order statistics Lehmann 5.1; Ferguson 15
17. Joit distributios of extreme order statistics Lehma 5.1; Ferguso 15 I Example 10., we derived the asymptotic distributio of the maximum from a radom sample from a uiform distributio. We did this usig
More informationMA131 - Analysis 1. Workbook 2 Sequences I
MA3 - Aalysis Workbook 2 Sequeces I Autum 203 Cotets 2 Sequeces I 2. Itroductio.............................. 2.2 Icreasig ad Decreasig Sequeces................ 2 2.3 Bouded Sequeces..........................
More information1 Review and Overview
CS229T/STATS231: Statistical Learig Theory Lecturer: Tegyu Ma Lecture #12 Scribe: Garrett Thomas, Pega Liu October 31, 2018 1 Review a Overview Recall the GAN setup: we have iepeet samples x 1,..., x raw
More informationLecture 15: Learning Theory: Concentration Inequalities
STAT 425: Itroductio to Noparametric Statistics Witer 208 Lecture 5: Learig Theory: Cocetratio Iequalities Istructor: Ye-Chi Che 5. Itroductio Recall that i the lecture o classificatio, we have see that
More informationMathematical Induction
Mathematical Iductio Itroductio Mathematical iductio, or just iductio, is a proof techique. Suppose that for every atural umber, P() is a statemet. We wish to show that all statemets P() are true. I a
More informationTopic 9: Sampling Distributions of Estimators
Topic 9: Samplig Distributios of Estimators Course 003, 2018 Page 0 Samplig distributios of estimators Sice our estimators are statistics (particular fuctios of radom variables), their distributio ca be
More informationDiscrete Mathematics for CS Spring 2007 Luca Trevisan Lecture 22
CS 70 Discrete Mathematics for CS Sprig 2007 Luca Trevisa Lecture 22 Aother Importat Distributio The Geometric Distributio Questio: A biased coi with Heads probability p is tossed repeatedly util the first
More informationRandom Variables, Sampling and Estimation
Chapter 1 Radom Variables, Samplig ad Estimatio 1.1 Itroductio This chapter will cover the most importat basic statistical theory you eed i order to uderstad the ecoometric material that will be comig
More informationRead carefully the instructions on the answer book and make sure that the particulars required are entered on each answer book.
THE UNIVERSITY OF WARWICK FIRST YEAR EXAMINATION: Jauary 2009 Aalysis I Time Allowed:.5 hours Read carefully the istructios o the aswer book ad make sure that the particulars required are etered o each
More informationBinary classification, Part 1
Biary classificatio, Part 1 Maxim Ragisky September 25, 2014 The problem of biary classificatio ca be stated as follows. We have a radom couple Z = (X,Y ), where X R d is called the feature vector ad Y
More information(b) What is the probability that a particle reaches the upper boundary n before the lower boundary m?
MATH 529 The Boudary Problem The drukard s walk (or boudary problem) is oe of the most famous problems i the theory of radom walks. Oe versio of the problem is described as follows: Suppose a particle
More informationStatistical Theory MT 2008 Problems 1: Solution sketches
Statistical Theory MT 008 Problems : Solutio sketches. Which of the followig desities are withi a expoetial family? Explai your reasoig. a) Let 0 < θ < ad put fx, θ) = θ)θ x ; x = 0,,,... b) c) where α
More informationThe multiplicative structure of finite field and a construction of LRC
IERG6120 Codig for Distributed Storage Systems Lecture 8-06/10/2016 The multiplicative structure of fiite field ad a costructio of LRC Lecturer: Keeth Shum Scribe: Zhouyi Hu Notatios: We use the otatio
More informationSelective Prediction
COMS 6998-4 Fall 2017 November 8, 2017 Selective Predictio Preseter: Rog Zhou Scribe: Wexi Che 1 Itroductio I our previous discussio o a variatio o the Valiat Model [3], the described learer has the ability
More informationEntropy and Ergodic Theory Lecture 5: Joint typicality and conditional AEP
Etropy ad Ergodic Theory Lecture 5: Joit typicality ad coditioal AEP 1 Notatio: from RVs back to distributios Let (Ω, F, P) be a probability space, ad let X ad Y be A- ad B-valued discrete RVs, respectively.
More informationSequences I. Chapter Introduction
Chapter 2 Sequeces I 2. Itroductio A sequece is a list of umbers i a defiite order so that we kow which umber is i the first place, which umber is i the secod place ad, for ay atural umber, we kow which
More informationAdvanced Stochastic Processes.
Advaced Stochastic Processes. David Gamarik LECTURE 2 Radom variables ad measurable fuctios. Strog Law of Large Numbers (SLLN). Scary stuff cotiued... Outlie of Lecture Radom variables ad measurable fuctios.
More informationLecture 3 The Lebesgue Integral
Lecture 3: The Lebesgue Itegral 1 of 14 Course: Theory of Probability I Term: Fall 2013 Istructor: Gorda Zitkovic Lecture 3 The Lebesgue Itegral The costructio of the itegral Uless expressly specified
More informationLecture 9: Expanders Part 2, Extractors
Lecture 9: Expaders Part, Extractors Topics i Complexity Theory ad Pseudoradomess Sprig 013 Rutgers Uiversity Swastik Kopparty Scribes: Jaso Perry, Joh Kim I this lecture, we will discuss further the pseudoradomess
More informationA statistical method to determine sample size to estimate characteristic value of soil parameters
A statistical method to determie sample size to estimate characteristic value of soil parameters Y. Hojo, B. Setiawa 2 ad M. Suzuki 3 Abstract Sample size is a importat factor to be cosidered i determiig
More informationChapter 5. Inequalities. 5.1 The Markov and Chebyshev inequalities
Chapter 5 Iequalities 5.1 The Markov ad Chebyshev iequalities As you have probably see o today s frot page: every perso i the upper teth percetile ears at least 1 times more tha the average salary. I other
More informationNotes for Lecture 11
U.C. Berkeley CS78: Computatioal Complexity Hadout N Professor Luca Trevisa 3/4/008 Notes for Lecture Eigevalues, Expasio, ad Radom Walks As usual by ow, let G = (V, E) be a udirected d-regular graph with
More informationLecture 4: April 10, 2013
TTIC/CMSC 1150 Mathematical Toolkit Sprig 01 Madhur Tulsiai Lecture 4: April 10, 01 Scribe: Haris Agelidakis 1 Chebyshev s Iequality recap I the previous lecture, we used Chebyshev s iequality to get a
More informationProblem Set 2 Solutions
CS271 Radomess & Computatio, Sprig 2018 Problem Set 2 Solutios Poit totals are i the margi; the maximum total umber of poits was 52. 1. Probabilistic method for domiatig sets 6pts Pick a radom subset S
More informationn outcome is (+1,+1, 1,..., 1). Let the r.v. X denote our position (relative to our starting point 0) after n moves. Thus X = X 1 + X 2 + +X n,
CS 70 Discrete Mathematics for CS Sprig 2008 David Wager Note 9 Variace Questio: At each time step, I flip a fair coi. If it comes up Heads, I walk oe step to the right; if it comes up Tails, I walk oe
More informationNotes 27 : Brownian motion: path properties
Notes 27 : Browia motio: path properties Math 733-734: Theory of Probability Lecturer: Sebastie Roch Refereces:[Dur10, Sectio 8.1], [MP10, Sectio 1.1, 1.2, 1.3]. Recall: DEF 27.1 (Covariace) Let X = (X
More informationLecture 15: Strong, Conditional, & Joint Typicality
EE376A/STATS376A Iformatio Theory Lecture 15-02/27/2018 Lecture 15: Strog, Coditioal, & Joit Typicality Lecturer: Tsachy Weissma Scribe: Nimit Sohoi, William McCloskey, Halwest Mohammad I this lecture,
More informationMASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.265/15.070J Fall 2013 Lecture 3 9/11/2013. Large deviations Theory. Cramér s Theorem
MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.265/5.070J Fall 203 Lecture 3 9//203 Large deviatios Theory. Cramér s Theorem Cotet.. Cramér s Theorem. 2. Rate fuctio ad properties. 3. Chage of measure techique.
More informationSupplementary Material for Fast Stochastic AUC Maximization with O(1/n)-Convergence Rate
Supplemetary Material for Fast Stochastic AUC Maximizatio with O/-Covergece Rate Migrui Liu Xiaoxua Zhag Zaiyi Che Xiaoyu Wag 3 iabao Yag echical Lemmas ized versio of Hoeffdig s iequality, ote that We
More informationLaw of the sum of Bernoulli random variables
Law of the sum of Beroulli radom variables Nicolas Chevallier Uiversité de Haute Alsace, 4, rue des frères Lumière 68093 Mulhouse icolas.chevallier@uha.fr December 006 Abstract Let be the set of all possible
More information7.1 Convergence of sequences of random variables
Chapter 7 Limit Theorems Throughout this sectio we will assume a probability space (, F, P), i which is defied a ifiite sequece of radom variables (X ) ad a radom variable X. The fact that for every ifiite
More informationStatistical Theory MT 2009 Problems 1: Solution sketches
Statistical Theory MT 009 Problems : Solutio sketches. Which of the followig desities are withi a expoetial family? Explai your reasoig. (a) Let 0 < θ < ad put f(x, θ) = ( θ)θ x ; x = 0,,,... (b) (c) where
More informationThe random version of Dvoretzky s theorem in l n
The radom versio of Dvoretzky s theorem i l Gideo Schechtma Abstract We show that with high probability a sectio of the l ball of dimesio k cε log c > 0 a uiversal costat) is ε close to a multiple of the
More informationECE 8527: Introduction to Machine Learning and Pattern Recognition Midterm # 1. Vaishali Amin Fall, 2015
ECE 8527: Itroductio to Machie Learig ad Patter Recogitio Midterm # 1 Vaishali Ami Fall, 2015 tue39624@temple.edu Problem No. 1: Cosider a two-class discrete distributio problem: ω 1 :{[0,0], [2,0], [2,2],
More informationEcon 325/327 Notes on Sample Mean, Sample Proportion, Central Limit Theorem, Chi-square Distribution, Student s t distribution 1.
Eco 325/327 Notes o Sample Mea, Sample Proportio, Cetral Limit Theorem, Chi-square Distributio, Studet s t distributio 1 Sample Mea By Hiro Kasahara We cosider a radom sample from a populatio. Defiitio
More information