Concentration inequalities
|
|
- Lucas Marshall
- 5 years ago
- Views:
Transcription
1 Cocetratio iequalities Jea-Yves Audibert 1,2 1. Imagie - ENPC/CSTB - uiversité Paris Est 2. Willow (INRIA/ENS/CNRS) ThRaSH 2010
2 with Problem Tight upper ad lower bouds o f(x 1,..., X ) X 1,..., X i.i.d. radom variables takig their values i some (measurable) space X ad f : X R a fuctio which value depeds o all the variables but ot too much o ay of them. For example: f(x 1,..., X ) = X 1+ +X or f(x 1,..., X ) = sup g G g(x 1 ) + + g(x )
3 Outlie Asymptotic viewpoit No asymptotic Gaussia approximatio Gaussia processes Sum of i.i.d. r.v. Fuctios with bouded differeces Self-boudig fuctios
4 The asymptotic viewpoit What is the limit of f(x 1,..., X )? What is the limit of its cetered ad scaled versio: f(x 1,..., X ) Ef(X 1,..., X ) Var f(x1,..., X )?
5 Covergece of radom variables Covergece i distributio: W d + t R s.t. F W cot. at t, F W (t) F W (t) + f : R R cot. ad bouded, Ef(W ) Ef(W ) + t R, Ee itw Ee itw (with i 2 = 1) + Covergece i probability: W P + W W ε > 0, P( W W ε) + 0 Almost sure covergece: W a.s. + W P(W + W ) = 1 Almost sure cvg cvg i probability cvg i distributio If ε > 0, 1 P( W W > ε) < +, the W a.s. + W
6 Covergece of the empirical mea f(x 1,..., X ) = X 1+ +X LLN (1713): If X, X 1, X 2,... are i.i.d. r.v. with E X < +, the X = i=1 X i a.s. + EX CLT (1733): If X, X 1, X 2,... are i.i.d. r.v. with EX 2 < +, the ( X EX ) or equivaletly: for ay t, d + N (0, Var X), P{ ( ) } Var X X EX > t + + t e u2 2 2π du.
7 If V Slutsky s lemma (1925) Let (V ) ad (W ) be two sequeces of radom vectors or variables. P + v ad W d + W, the 1. V + W d + v + W 2. V W d + vw 3. V 1 d W + v 1 W if v ivertible
8 A example of complicated fuctioal: the t-statistics with Let f(x 1,..., X ) = S 2 = 1 ( X EX) S, (X i X) 2 i=1 Sice S 2 = 1 i=1 (X i EX) 2 (EX X) 2, from the LLN, we have S 2 a.s. + Var X. From the CLT, ( X EX) Thus, from Slutsky s lemma, d + N (0, Var X). f(x 1,..., X ) d + N (0, 1). Appropriate decompositios of complicated fuctioals allow to compute their asymptotic distributio.
9 Noasymptotic bouds Motivatios: Whe the oasymptotic regime plays a crucial role (for istace, multi-armed badit problems, racig algorithms, stoppig times problems) Whe asymptotic aalysis is ot achievable through stadard argumets To derive asymptotic results!
10 The Berry (1941)-Essee (1942) theorem X, X 1,..., X i.i.d. E X 3 < + ad σ 2 = Var X X = X 1+ +X Z N (E X, Var X) sup P( X > x) P(Z > x) E X EX 3 1 x R 2σ 3
11 Slud s theorem (1977) X 1,..., X i.i.d. B(p) with p 1 2 Z N (E X, Var X) for ay x [p, 1 p] P( X > x) P(Z > x)
12 the Paley-Zygmud iequality (1932) X 1,..., X i.i.d. for ay 0 λ < 1, ( ( X EX) P Var X ) > λ (1 λ 2 ) 2 mi ( 1 3, (Var X) 2 ) E(X EX) 4.
13 Supremum of Gaussia processes (GP) Gaussia process (W (g)) g G : for ay g 1,..., g d G ( W (g1 ),..., W (g d ) ) is a Gaussia radom vector GP: a powerful flexible probabilistic model parametrized by µ(g) = EW (g) ad K(g, g ) = Cov ( W (g), W (g ) ) Good ituitio o GP good ituitio o sup g G g(x 1 )+ +g(x ) sup g G g(x 1 ) + + g(x ) sup g G W (g) with µ(g) = Eg(X) ad K(g, g ) = 1 Cov( g(x), g (X) ).
14 The Borell (1975) - Cirel so et al. (1976) iequality Z = sup g G { W (g) EW (g) } σ 2 = sup g G Var W (g) = sup g G K(g, g) for ay λ R, for ay t > 0, log Ee λ(z EZ) λ2 σ 2 P(Z EZ t) e t2 2σ 2 2
15 Dudley s itegral (1967) d(g, g ) = E[W (g) W (g )] 2 N(ε) = ε-packig umber of (G, d) σ 2 = sup g G Var W (g) = sup g G K(g, g) E sup g G { } σ W (g) EW (g) 12 log N(ε)dε, 0
16 Aother Borell (1975) - Cirel so et al. (1976) iequality X 1,..., X i.i.d. N (0, 1) f : R R L-Lipschitz for the Euclidea distace for ay x, x i R, f(x) f(x ) L x x for ay t > 0, P ( f(x 1,..., X ) Ef(X 1,..., X ) t ) e t2 2L 2.
17 Some useful probabilistic iequalities Markov s iequality: for ay r.v. X ad a > 0, sice X a1 X a P( X a) 1 a E X. Jese s ieq.: for ay itegrable r.v. X ad ϕ : R d R covex, ϕ(ex) Eϕ(X). For ay r.v. X, EX + 0 P(X t)dt (with equality if X 0) Markov s iequality is at the basis of Cheroff s argumet: s > 0 P(X t) = P ( e sx e st) e st Ee sx. Cotrol of the Laplace trasform cotrol of the large deviatios.
18 Hoeffdig s iequality (1963) If X, X 1, X 2,... are i.i.d. r.v. with a X b, the 1. s R, 2. For ay t 0, Ee s(x EX) e s2 (b a) 2 8 P ( ) X 2t 2 EX t e (b a) 2, or equivaletly, for ay ε > 0 ( ) log(ε 1 ) P X EX < (b a) 2 i.e., w.h.p. X log(ε EX < (b a) 1 ) 2. 1 ε,
19 1. s R, Ee s(x EX) e s2 (b a) 2 8 Log-Laplace upper boud ϕ(s) = log Ee sx ϕ (s) = E Ps X P s (dω) = esx(ω) Ee sx ϕ (s) = Var Ps X P(dω) Var Ps X = if r R E Ps (X r) 2 ( ) E Ps X a+b 2 2 (b a) 2 4. ϕ(s) = ϕ(0) + sϕ (0) + s 0 (s t)ϕ (t)dt log Ee sx sex + s 0 (s t) sex + (b a)2 s 2 8 (b a)2 dt 4
20 Cheroff s Argumet 2. For ay t 0, P ( ) X 2t 2 EX > t e (b a) 2. P(X EX t) = P ( e s(x EX) e st) e st E[e s(x EX) ] = e st E = e st E (e s i=1 (X i EX) (e s(x EX) e st+s2 b a2 8 ) = e 2t 2 (b a) 2 by choosig s = 4t (b a) 2. )
21 Uio boud P(A) 1 ε ad P(B) 1 ε P(A B) 1 2ε (sice P(A c B c ) P(A c ) + P(B c )) For istace: Hoeffdig to X + Hoeffdig to X + uio boud with proba 1 ε, X EX < (b a) (leads to pessimistic but correct cofidece itervals ulike the CLT) If P(A 1 ) 1 ε,...,p(a m ) 1 ε, the P ( A 1 A m ) 1 mε log(2ε 1 ) 2
22 Berstei s (1946) iequality Hoeffdig s iequality vs CLT: e 2α 2 Var X (b a) 2 P [ Var X ( X EX) > α ] e P(Z > α) + Hoeffdig s iequality is imprecise for r.v. havig low variace Berstei s iequality: If X, X 1, X 2,... are i.i.d. r.v. with X EX c, the for ay ε > 0, with proba at least 1 ε, 2 log(ε X EX + 1 ) Var X + c log(ε 1 ) 3 for ay t 0, P ( X EX > t ) e t 2 2 Var X+2ct/3 α 2 2 α 2π
23 Empirical Berstei s iequality (A., Muos, Szepesvári, 2007; Maurer, Potil, 2009) If X, X 1, X 2,... are i.i.d. r.v. with a X b, the for ay ε > 0, with proba at least 1 ε, EX X + 2 log(ε 1 )ˆσ 2 + 7(b a) log(ε 1 ) 3 with ˆσ 2 = ( to be compared with EX X + i=1 (X i X) log(ε 1 )Var X + (b a) log(ε 1 ) 3 )
24 Hoeffdig-Azuma iequalities (McDiarmid s versio, 1989) If for some c 0, sup i {1,...,} (x 1,...,x ) X x X f(x 1,..., x ) f(x 1,..., x i 1, x, x i+1,..., x ) c, the, for ay λ R, W = f(x 1,..., X ) satisfies ad for ay t 0, Ee λ(w EW ) e λ2 c 2 8 P ( W EW > t ) e 2t2 c 2
25 First example: Hoeffdig s iequality i Hilbert space X 1,..., X i.i.d. r.v. takig values i a separable Hilbert space EX = 0 ad X 1 For ay t 4, P ( X1 + + X t ) e t2 8.
26 Secod example: supremum of empirical process W = f(x 1,..., X ) = sup g G g(x 1 )+ +g(x ) G fiite Assumptios: g G, g takes its values i [ 1, 1] ad Eg(X 1 ) = 0 sup i {1,...,} (x 1,...,x ) X x X f(x i 1 1, x i, x i+1 ) f(xi 1 1, x, x i+1 ) 2, McDiarmid s iequality P ( W EW > t ) e t2 /2 with proba 1 ε, sup g G g(x 1 ) + + g(x ) E sup g G g(x 1 ) + + g(x ) + 2 log(ε 1 )
27 Third example: kerel desity estimatio X 1,..., X i.i.d. r.v. from a distributio with desity p o R. h > 0 ad K : R R + with R K = 1 ˆp(x) = 1 h i=1 K ( x X i h W = f(x 1,..., X ) = ˆp(x) p(x) dx f(x i 1 1, x i, x i+1) f(x i 1 1, x i, x i+1) 1 h ) ( x xi K h ) K ( ) x x i 2 h, W EW 2 log(ε 1 )
28 Self bouded fuctios (Bouchero, Lugosi, Massart, 2003, 2009; Maurer, 2005) f i (x 1,..., x ) = if xi X f(x 1,..., x ) If for some a, b 0, for ay (x 1,..., x ) X, [ f(x1,..., x ) f i (x 1,..., x ) ] 2 af(x1,..., x ) + b, i=1 the, for ay t 0, W = f(x 1,..., X ) satisfies P ( W EW > t ) e t 2 2(aEW +b+at/2)
29 Talagrad s iequality (Talagrad, 1996; Rio, 2002; Bousquet, 2003) W = sup g G g(x 1 )+ +g(x ) Eg(X) = 0 ad g(x) c v = sup g G Var g(x) + 2cEW for ay ε > 0, with proba at least 1 ε, 2v log(ε W EW 1 ) + c log(ε 1 ) 3 for ay t 0, P ( W EW > t ) e 2v+2ct/3 t2
30 Expected maximal deviatios Let σ > 0, m 2, W 1,..., W m r.v. s.t. for all s > 0 ad ay 1 i m, Ee sw i e s2 σ 2 2. The E { max } i σ 2 log m. 1 i m If for ay s > 0, we also have Ee sw i e s2 σ 2 2, the E { max 1 i m W i } σ 2 log(2m). Proof: max W i 1 m 1 i m s log i=1 e sw i 1 s log(mes2 σ 2 /2 ).
31 Extesio to martigale differece sequeces Let X 1, X 2,... ad U 1, U 2,... be r.v. such that E[X i U 1,..., U i 1 ] = 0 for all i 1 Assume that for some c > 0, ad some r.v. A i measurable w.r.t. U 1,..., U i 1, X i takes its values i [A i, A i + 1] for P( X > t) e 2t2 same r.h.s. as if we had i.i.d. r.v. takig values i [0, 1]
32 Other extesios All upper bouds easily exteds to idepedet o idetically distributed r.v. Some upper bouds o the empirical mea ca be exteded to radom vectors All upper bouds o the empirical mea are valid if the X i samples without replacemet are
33 Some ice refereces: Appedix of G. Lugosi ad N. Cesa-Biachi s book: learig ad games predictio, G. Lugosi s lecture otes o cocetratio iequalities. Bouchero, Lugosi, Massart (2003,2009) P. Massart Sait Flour lecture otes
Lecture 3: August 31
36-705: Itermediate Statistics Fall 018 Lecturer: Siva Balakrisha Lecture 3: August 31 This lecture will be mostly a summary of other useful expoetial tail bouds We will ot prove ay of these i lecture,
More informationST5215: Advanced Statistical Theory
ST525: Advaced Statistical Theory Departmet of Statistics & Applied Probability Tuesday, September 7, 2 ST525: Advaced Statistical Theory Lecture : The law of large umbers The Law of Large Numbers The
More informationMASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.436J/15.085J Fall 2008 Lecture 19 11/17/2008 LAWS OF LARGE NUMBERS II THE STRONG LAW OF LARGE NUMBERS
MASSACHUSTTS INSTITUT OF TCHNOLOGY 6.436J/5.085J Fall 2008 Lecture 9 /7/2008 LAWS OF LARG NUMBRS II Cotets. The strog law of large umbers 2. The Cheroff boud TH STRONG LAW OF LARG NUMBRS While the weak
More informationLECTURE 8: ASYMPTOTICS I
LECTURE 8: ASYMPTOTICS I We are iterested i the properties of estimators as. Cosider a sequece of radom variables {, X 1}. N. M. Kiefer, Corell Uiversity, Ecoomics 60 1 Defiitio: (Weak covergece) A sequece
More informationThis section is optional.
4 Momet Geeratig Fuctios* This sectio is optioal. The momet geeratig fuctio g : R R of a radom variable X is defied as g(t) = E[e tx ]. Propositio 1. We have g () (0) = E[X ] for = 1, 2,... Proof. Therefore
More informationLimit theorems. Sayan Mukherjee
1 Limit theorems Saya Mukherjee Limit theorems Wewilllearvariouslawoflargeumberresults.Theseresultswillalsobeusedtomotivateissues such as itegratio ad switchig limits. These are usually taught before limit
More informationECE 901 Lecture 12: Complexity Regularization and the Squared Loss
ECE 90 Lecture : Complexity Regularizatio ad the Squared Loss R. Nowak 5/7/009 I the previous lectures we made use of the Cheroff/Hoeffdig bouds for our aalysis of classifier errors. Hoeffdig s iequality
More informationNotes 19 : Martingale CLT
Notes 9 : Martigale CLT Math 733-734: Theory of Probability Lecturer: Sebastie Roch Refereces: [Bil95, Chapter 35], [Roc, Chapter 3]. Sice we have ot ecoutered weak covergece i some time, we first recall
More informationLecture 2: Concentration Bounds
CSE 52: Desig ad Aalysis of Algorithms I Sprig 206 Lecture 2: Cocetratio Bouds Lecturer: Shaya Oveis Ghara March 30th Scribe: Syuzaa Sargsya Disclaimer: These otes have ot bee subjected to the usual scrutiy
More informationOn Equivalence of Martingale Tail Bounds and Deterministic Regret Inequalities
O Equivalece of Martigale Tail Bouds ad Determiistic Regret Iequalities Sasha Rakhli Departmet of Statistics, The Wharto School Uiversity of Pesylvaia Dec 16, 2015 Joit work with K. Sridhara arxiv:1510.03925
More informationAgnostic Learning and Concentration Inequalities
ECE901 Sprig 2004 Statistical Regularizatio ad Learig Theory Lecture: 7 Agostic Learig ad Cocetratio Iequalities Lecturer: Rob Nowak Scribe: Aravid Kailas 1 Itroductio 1.1 Motivatio I the last lecture
More informationLecture 19: Convergence
Lecture 19: Covergece Asymptotic approach I statistical aalysis or iferece, a key to the success of fidig a good procedure is beig able to fid some momets ad/or distributios of various statistics. I may
More informationLecture 01: the Central Limit Theorem. 1 Central Limit Theorem for i.i.d. random variables
CSCI-B609: A Theorist s Toolkit, Fall 06 Aug 3 Lecture 0: the Cetral Limit Theorem Lecturer: Yua Zhou Scribe: Yua Xie & Yua Zhou Cetral Limit Theorem for iid radom variables Let us say that we wat to aalyze
More informationConvergence of random variables. (telegram style notes) P.J.C. Spreij
Covergece of radom variables (telegram style otes).j.c. Spreij this versio: September 6, 2005 Itroductio As we kow, radom variables are by defiitio measurable fuctios o some uderlyig measurable space
More informationA survey on penalized empirical risk minimization Sara A. van de Geer
A survey o pealized empirical risk miimizatio Sara A. va de Geer We address the questio how to choose the pealty i empirical risk miimizatio. Roughly speakig, this pealty should be a good boud for the
More informationRademacher Complexity
EECS 598: Statistical Learig Theory, Witer 204 Topic 0 Rademacher Complexity Lecturer: Clayto Scott Scribe: Ya Deg, Kevi Moo Disclaimer: These otes have ot bee subjected to the usual scrutiy reserved for
More informationLearning Theory: Lecture Notes
Learig Theory: Lecture Notes Kamalika Chaudhuri October 4, 0 Cocetratio of Averages Cocetratio of measure is very useful i showig bouds o the errors of machie-learig algorithms. We will begi with a basic
More information1 Review and Overview
CS9T/STATS3: Statistical Learig Theory Lecturer: Tegyu Ma Lecture #6 Scribe: Jay Whag ad Patrick Cho October 0, 08 Review ad Overview Recall i the last lecture that for ay family of scalar fuctios F, we
More informationarxiv: v1 [math.pr] 13 Oct 2011
A tail iequality for quadratic forms of subgaussia radom vectors Daiel Hsu, Sham M. Kakade,, ad Tog Zhag 3 arxiv:0.84v math.pr] 3 Oct 0 Microsoft Research New Eglad Departmet of Statistics, Wharto School,
More informationMachine Learning Theory (CS 6783)
Machie Learig Theory (CS 6783) Lecture 2 : Learig Frameworks, Examples Settig up learig problems. X : istace space or iput space Examples: Computer Visio: Raw M N image vectorized X = 0, 255 M N, SIFT
More informationDistribution of Random Samples & Limit theorems
STAT/MATH 395 A - PROBABILITY II UW Witer Quarter 2017 Néhémy Lim Distributio of Radom Samples & Limit theorems 1 Distributio of i.i.d. Samples Motivatig example. Assume that the goal of a study is to
More information7.1 Convergence of sequences of random variables
Chapter 7 Limit theorems Throughout this sectio we will assume a probability space (Ω, F, P), i which is defied a ifiite sequece of radom variables (X ) ad a radom variable X. The fact that for every ifiite
More informationSupplementary Material for Fast Stochastic AUC Maximization with O(1/n)-Convergence Rate
Supplemetary Material for Fast Stochastic AUC Maximizatio with O/-Covergece Rate Migrui Liu Xiaoxua Zhag Zaiyi Che Xiaoyu Wag 3 iabao Yag echical Lemmas ized versio of Hoeffdig s iequality, ote that We
More information7.1 Convergence of sequences of random variables
Chapter 7 Limit Theorems Throughout this sectio we will assume a probability space (, F, P), i which is defied a ifiite sequece of radom variables (X ) ad a radom variable X. The fact that for every ifiite
More informationMASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.265/15.070J Fall 2013 Lecture 3 9/11/2013. Large deviations Theory. Cramér s Theorem
MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.265/5.070J Fall 203 Lecture 3 9//203 Large deviatios Theory. Cramér s Theorem Cotet.. Cramér s Theorem. 2. Rate fuctio ad properties. 3. Chage of measure techique.
More informationBerry-Esseen bounds for self-normalized martingales
Berry-Essee bouds for self-ormalized martigales Xiequa Fa a, Qi-Ma Shao b a Ceter for Applied Mathematics, Tiaji Uiversity, Tiaji 30007, Chia b Departmet of Statistics, The Chiese Uiversity of Hog Kog,
More informationCentral Limit Theorem using Characteristic functions
Cetral Limit Theorem usig Characteristic fuctios RogXi Guo MAT 477 Jauary 20, 2014 RogXi Guo (2014 Cetral Limit Theorem usig Characteristic fuctios Jauary 20, 2014 1 / 15 Itroductio study a radom variable
More informationAda Boost, Risk Bounds, Concentration Inequalities. 1 AdaBoost and Estimates of Conditional Probabilities
CS8B/Stat4B Sprig 008) Statistical Learig Theory Lecture: Ada Boost, Risk Bouds, Cocetratio Iequalities Lecturer: Peter Bartlett Scribe: Subhrasu Maji AdaBoost ad Estimates of Coditioal Probabilities We
More information32 estimating the cumulative distribution function
32 estimatig the cumulative distributio fuctio 4.6 types of cofidece itervals/bads Let F be a class of distributio fuctios F ad let θ be some quatity of iterest, such as the mea of F or the whole fuctio
More informationECE 901 Lecture 13: Maximum Likelihood Estimation
ECE 90 Lecture 3: Maximum Likelihood Estimatio R. Nowak 5/7/009 The focus of this lecture is to cosider aother approach to learig based o maximum likelihood estimatio. Ulike earlier approaches cosidered
More informationNotes 5 : More on the a.s. convergence of sums
Notes 5 : More o the a.s. covergece of sums Math 733-734: Theory of Probability Lecturer: Sebastie Roch Refereces: Dur0, Sectios.5; Wil9, Sectio 4.7, Shi96, Sectio IV.4, Dur0, Sectio.. Radom series. Three-series
More informationCONCENTRATION INEQUALITIES
CONCENTRATION INEQUALITIES MAXIM RAGINSKY I te previous lecture, te followig result was stated witout proof. If X 1,..., X are idepedet Beroulliθ radom variables represetig te outcomes of a sequece of
More informationSupplementary Material for Fast Stochastic AUC Maximization with O(1/n)-Convergence Rate
Supplemetary Material for Fast Stochastic AUC Maximizatio with O/-Covergece Rate Migrui Liu Xiaoxua Zhag Zaiyi Che Xiaoyu Wag 3 iabao Yag echical Lemmas ized versio of Hoeffdig s iequality, ote that We
More informationProbability and Random Processes
Probability ad Radom Processes Lecture 5 Probability ad radom variables The law of large umbers Mikael Skoglud, Probability ad radom processes 1/21 Why Measure Theoretic Probability? Stroger limit theorems
More informationSelf-normalized deviation inequalities with application to t-statistic
Self-ormalized deviatio iequalities with applicatio to t-statistic Xiequa Fa Ceter for Applied Mathematics, Tiaji Uiversity, 30007 Tiaji, Chia Abstract Let ξ i i 1 be a sequece of idepedet ad symmetric
More informationECE 330:541, Stochastic Signals and Systems Lecture Notes on Limit Theorems from Probability Fall 2002
ECE 330:541, Stochastic Sigals ad Systems Lecture Notes o Limit Theorems from robability Fall 00 I practice, there are two ways we ca costruct a ew sequece of radom variables from a old sequece of radom
More informationREGRESSION WITH QUADRATIC LOSS
REGRESSION WITH QUADRATIC LOSS MAXIM RAGINSKY Regressio with quadratic loss is aother basic problem studied i statistical learig theory. We have a radom couple Z = X, Y ), where, as before, X is a R d
More informationDimension-free PAC-Bayesian bounds for the estimation of the mean of a random vector
Dimesio-free PAC-Bayesia bouds for the estimatio of the mea of a radom vector Olivier Catoi CREST CNRS UMR 9194 Uiversité Paris Saclay olivier.catoi@esae.fr Ilaria Giulii Laboratoire de Probabilités et
More informationLecture 3 : Random variables and their distributions
Lecture 3 : Radom variables ad their distributios 3.1 Radom variables Let (Ω, F) ad (S, S) be two measurable spaces. A map X : Ω S is measurable or a radom variable (deoted r.v.) if X 1 (A) {ω : X(ω) A}
More information18.657: Mathematics of Machine Learning
8.657: Mathematics of Machie Learig Lecturer: Philippe Rigollet Lecture 4 Scribe: Cheg Mao Sep., 05 I this lecture, we cotiue to discuss the effect of oise o the rate of the excess risk E(h) = R(h) R(h
More informationAsymptotic distribution of products of sums of independent random variables
Proc. Idia Acad. Sci. Math. Sci. Vol. 3, No., May 03, pp. 83 9. c Idia Academy of Scieces Asymptotic distributio of products of sums of idepedet radom variables YANLING WANG, SUXIA YAO ad HONGXIA DU ollege
More informationLarge Sample Theory. Convergence. Central Limit Theorems Asymptotic Distribution Delta Method. Convergence in Probability Convergence in Distribution
Large Sample Theory Covergece Covergece i Probability Covergece i Distributio Cetral Limit Theorems Asymptotic Distributio Delta Method Covergece i Probability A sequece of radom scalars {z } = (z 1,z,
More informationSTA Object Data Analysis - A List of Projects. January 18, 2018
STA 6557 Jauary 8, 208 Object Data Aalysis - A List of Projects. Schoeberg Mea glaucomatous shape chages of the Optic Nerve Head regio i aimal models 2. Aalysis of VW- Kedall ati-mea shapes with a applicatio
More informationSDS 321: Introduction to Probability and Statistics
SDS 321: Itroductio to Probability ad Statistics Lecture 23: Cotiuous radom variables- Iequalities, CLT Puramrita Sarkar Departmet of Statistics ad Data Sciece The Uiversity of Texas at Austi www.cs.cmu.edu/
More informationDISCRETE PREDICTION PROBLEMS: RANDOMIZED PREDICTION
DISCRETE PREDICTION PROBLEMS: RANDOMIZED PREDICTION Csaba Szepesvári Uiversity of Alberta CMPUT 654 E-mail: szepesva@ualberta.ca UofA, October 10-12-14, 2006 OUTLINE 1 DISCRETE PREDICTION PROBLEMS 2 RANDOMIZED
More informationAdvanced Stochastic Processes.
Advaced Stochastic Processes. David Gamarik LECTURE 2 Radom variables ad measurable fuctios. Strog Law of Large Numbers (SLLN). Scary stuff cotiued... Outlie of Lecture Radom variables ad measurable fuctios.
More informationThis exam contains 19 pages (including this cover page) and 10 questions. A Formulae sheet is provided with the exam.
Probability ad Statistics FS 07 Secod Sessio Exam 09.0.08 Time Limit: 80 Miutes Name: Studet ID: This exam cotais 9 pages (icludig this cover page) ad 0 questios. A Formulae sheet is provided with the
More informationMASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.265/15.070J Fall 2013 Lecture 21 11/27/2013
MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.265/15.070J Fall 2013 Lecture 21 11/27/2013 Fuctioal Law of Large Numbers. Costructio of the Wieer Measure Cotet. 1. Additioal techical results o weak covergece
More informationKernel density estimator
Jauary, 07 NONPARAMETRIC ERNEL DENSITY ESTIMATION I this lecture, we discuss kerel estimatio of probability desity fuctios PDF Noparametric desity estimatio is oe of the cetral problems i statistics I
More informationMachine Learning Brett Bernstein
Machie Learig Brett Berstei Week 2 Lecture: Cocept Check Exercises Starred problems are optioal. Excess Risk Decompositio 1. Let X = Y = {1, 2,..., 10}, A = {1,..., 10, 11} ad suppose the data distributio
More informationStatistical and Mathematical Methods DS-GA 1002 December 8, Sample Final Problems Solutions
Statistical ad Mathematical Methods DS-GA 00 December 8, 05. Short questios Sample Fial Problems Solutios a. Ax b has a solutio if b is i the rage of A. The dimesio of the rage of A is because A has liearly-idepedet
More informationMASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.265/15.070J Fall 2013 Lecture 2 9/9/2013. Large Deviations for i.i.d. Random Variables
MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.265/15.070J Fall 2013 Lecture 2 9/9/2013 Large Deviatios for i.i.d. Radom Variables Cotet. Cheroff boud usig expoetial momet geeratig fuctios. Properties of a momet
More informationMachine Learning Theory Tübingen University, WS 2016/2017 Lecture 12
Machie Learig Theory Tübige Uiversity, WS 06/07 Lecture Tolstikhi Ilya Abstract I this lecture we derive risk bouds for kerel methods. We will start by showig that Soft Margi kerel SVM correspods to miimizig
More informationLet us give one more example of MLE. Example 3. The uniform distribution U[0, θ] on the interval [0, θ] has p.d.f.
Lecture 5 Let us give oe more example of MLE. Example 3. The uiform distributio U[0, ] o the iterval [0, ] has p.d.f. { 1 f(x =, 0 x, 0, otherwise The likelihood fuctio ϕ( = f(x i = 1 I(X 1,..., X [0,
More informationIntroduction to Probability. Ariel Yadin
Itroductio to robability Ariel Yadi Lecture 2 *** Ja. 7 ***. Covergece of Radom Variables As i the case of sequeces of umbers, we would like to talk about covergece of radom variables. There are may ways
More informationLecture 13: Maximum Likelihood Estimation
ECE90 Sprig 007 Statistical Learig Theory Istructor: R. Nowak Lecture 3: Maximum Likelihood Estimatio Summary of Lecture I the last lecture we derived a risk (MSE) boud for regressio problems; i.e., select
More informationSTAT Homework 1 - Solutions
STAT-36700 Homework 1 - Solutios Fall 018 September 11, 018 This cotais solutios for Homework 1. Please ote that we have icluded several additioal commets ad approaches to the problems to give you better
More informationLecture 6: Coupon Collector s problem
Radomized Algorithms Lecture 6: Coupo Collector s problem Sotiris Nikoletseas Professor CEID - ETY Course 2017-2018 Sotiris Nikoletseas, Professor Radomized Algorithms - Lecture 6 1 / 16 Variace: key features
More information1 Convergence in Probability and the Weak Law of Large Numbers
36-752 Advaced Probability Overview Sprig 2018 8. Covergece Cocepts: i Probability, i L p ad Almost Surely Istructor: Alessadro Rialdo Associated readig: Sec 2.4, 2.5, ad 4.11 of Ash ad Doléas-Dade; Sec
More informationStatistical Analysis on Uncertainty for Autocorrelated Measurements and its Applications to Key Comparisons
Statistical Aalysis o Ucertaity for Autocorrelated Measuremets ad its Applicatios to Key Comparisos Nie Fa Zhag Natioal Istitute of Stadards ad Techology Gaithersburg, MD 0899, USA Outlies. Itroductio.
More information5.1 A mutual information bound based on metric entropy
Chapter 5 Global Fao Method I this chapter, we exted the techiques of Chapter 2.4 o Fao s method the local Fao method) to a more global costructio. I particular, we show that, rather tha costructig a local
More informationEmpirical Process Theory and Oracle Inequalities
Stat 928: Statistical Learig Theory Lecture: 10 Empirical Process Theory ad Oracle Iequalities Istructor: Sham Kakade 1 Risk vs Risk See Lecture 0 for a discussio o termiology. 2 The Uio Boud / Boferoi
More informationGlivenko-Cantelli Classes
CS28B/Stat24B (Sprig 2008 Statistical Learig Theory Lecture: 4 Gliveko-Catelli Classes Lecturer: Peter Bartlett Scribe: Michelle Besi Itroductio This lecture will cover Gliveko-Catelli (GC classes ad itroduce
More informationRegression with quadratic loss
Regressio with quadratic loss Maxim Ragisky October 13, 2015 Regressio with quadratic loss is aother basic problem studied i statistical learig theory. We have a radom couple Z = X,Y, where, as before,
More information1 = δ2 (0, ), Y Y n nδ. , T n = Y Y n n. ( U n,k + X ) ( f U n,k + Y ) n 2n f U n,k + θ Y ) 2 E X1 2 X1
8. The cetral limit theorems 8.1. The cetral limit theorem for i.i.d. sequeces. ecall that C ( is N -separatig. Theorem 8.1. Let X 1, X,... be i.i.d. radom variables with EX 1 = ad EX 1 = σ (,. Suppose
More informationChapter 5. Inequalities. 5.1 The Markov and Chebyshev inequalities
Chapter 5 Iequalities 5.1 The Markov ad Chebyshev iequalities As you have probably see o today s frot page: every perso i the upper teth percetile ears at least 1 times more tha the average salary. I other
More informationProbability and Statistics
ICME Refresher Course: robability ad Statistics Staford Uiversity robability ad Statistics Luyag Che September 20, 2016 1 Basic robability Theory 11 robability Spaces A probability space is a triple (Ω,
More informationProbability 2 - Notes 10. Lemma. If X is a random variable and g(x) 0 for all x in the support of f X, then P(g(X) 1) E[g(X)].
Probability 2 - Notes 0 Some Useful Iequalities. Lemma. If X is a radom variable ad g(x 0 for all x i the support of f X, the P(g(X E[g(X]. Proof. (cotiuous case P(g(X Corollaries x:g(x f X (xdx x:g(x
More informationLecture 4: April 10, 2013
TTIC/CMSC 1150 Mathematical Toolkit Sprig 01 Madhur Tulsiai Lecture 4: April 10, 01 Scribe: Haris Agelidakis 1 Chebyshev s Iequality recap I the previous lecture, we used Chebyshev s iequality to get a
More informationMASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.265/15.070J Fall Midterm Solutions
MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.65/5.070J Fall 0 Midterm Solutios Problem Suppose a radom variable X is such that P(X > ) = 0 ad P(X > E) > 0 for every E > 0. Recall that the large deviatios rate
More informationDetailed proofs of Propositions 3.1 and 3.2
Detailed proofs of Propositios 3. ad 3. Proof of Propositio 3. NB: itegratio sets are geerally omitted for itegrals defied over a uit hypercube [0, s with ay s d. We first give four lemmas. The proof of
More information2.2. Central limit theorem.
36.. Cetral limit theorem. The most ideal case of the CLT is that the radom variables are iid with fiite variace. Although it is a special case of the more geeral Lideberg-Feller CLT, it is most stadard
More informationAn alternative proof of a theorem of Aldous. concerning convergence in distribution for martingales.
A alterative proof of a theorem of Aldous cocerig covergece i distributio for martigales. Maurizio Pratelli Dipartimeto di Matematica, Uiversità di Pisa. Via Buoarroti 2. I-56127 Pisa, Italy e-mail: pratelli@dm.uipi.it
More informationProbability for mathematicians INDEPENDENCE TAU
Probability for mathematicias INDEPENDENCE TAU 2013 28 Cotets 3 Ifiite idepedet sequeces 28 3a Idepedet evets........................ 28 3b Idepedet radom variables.................. 33 3 Ifiite idepedet
More informationAn Introduction to Randomized Algorithms
A Itroductio to Radomized Algorithms The focus of this lecture is to study a radomized algorithm for quick sort, aalyze it usig probabilistic recurrece relatios, ad also provide more geeral tools for aalysis
More informationAn Introduction to Asymptotic Theory
A Itroductio to Asymptotic Theory Pig Yu School of Ecoomics ad Fiace The Uiversity of Hog Kog Pig Yu (HKU) Asymptotic Theory 1 / 20 Five Weapos i Asymptotic Theory Five Weapos i Asymptotic Theory Pig Yu
More informationLecture 20: Multivariate convergence and the Central Limit Theorem
Lecture 20: Multivariate covergece ad the Cetral Limit Theorem Covergece i distributio for radom vectors Let Z,Z 1,Z 2,... be radom vectors o R k. If the cdf of Z is cotiuous, the we ca defie covergece
More informationDiscrete Mathematics for CS Spring 2008 David Wagner Note 22
CS 70 Discrete Mathematics for CS Sprig 2008 David Wager Note 22 I.I.D. Radom Variables Estimatig the bias of a coi Questio: We wat to estimate the proportio p of Democrats i the US populatio, by takig
More informationECE534, Spring 2018: Solutions for Problem Set #2
ECE534, Srig 08: s for roblem Set #. Rademacher Radom Variables ad Symmetrizatio a) Let X be a Rademacher radom variable, i.e., X = ±) = /. Show that E e λx e λ /. E e λx = e λ + e λ = + k= k=0 λ k k k!
More informationEFFECTIVE WLLN, SLLN, AND CLT IN STATISTICAL MODELS
EFFECTIVE WLLN, SLLN, AND CLT IN STATISTICAL MODELS Ryszard Zieliński Ist Math Polish Acad Sc POBox 21, 00-956 Warszawa 10, Polad e-mail: rziel@impagovpl ABSTRACT Weak laws of large umbers (W LLN), strog
More informationLecture 33: Bootstrap
Lecture 33: ootstrap Motivatio To evaluate ad compare differet estimators, we eed cosistet estimators of variaces or asymptotic variaces of estimators. This is also importat for hypothesis testig ad cofidece
More informationSolutions to HW Assignment 1
Solutios to HW: 1 Course: Theory of Probability II Page: 1 of 6 Uiversity of Texas at Austi Solutios to HW Assigmet 1 Problem 1.1. Let Ω, F, {F } 0, P) be a filtered probability space ad T a stoppig time.
More informationApproximations and more PMFs and PDFs
Approximatios ad more PMFs ad PDFs Saad Meimeh 1 Approximatio of biomial with Poisso Cosider the biomial distributio ( b(k,,p = p k (1 p k, k λ: k Assume that is large, ad p is small, but p λ at the limit.
More informationEECS564 Estimation, Filtering, and Detection Hwk 2 Solns. Winter p θ (z) = (2θz + 1 θ), 0 z 1
EECS564 Estimatio, Filterig, ad Detectio Hwk 2 Sols. Witer 25 4. Let Z be a sigle observatio havig desity fuctio where. p (z) = (2z + ), z (a) Assumig that is a oradom parameter, fid ad plot the maximum
More informationExponential Families and Bayesian Inference
Computer Visio Expoetial Families ad Bayesia Iferece Lecture Expoetial Families A expoetial family of distributios is a d-parameter family f(x; havig the followig form: f(x; = h(xe g(t T (x B(, (. where
More informationLearnability with Rademacher Complexities
Learability with Rademacher Complexities Daiel Khashabi Fall 203 Last Update: September 26, 206 Itroductio Our goal i study of passive ervised learig is to fid a hypothesis h based o a set of examples
More informationIntroductory statistics
CM9S: Machie Learig for Bioiformatics Lecture - 03/3/06 Itroductory statistics Lecturer: Sriram Sakararama Scribe: Sriram Sakararama We will provide a overview of statistical iferece focussig o the key
More information36-755, Fall 2017 Homework 5 Solution Due Wed Nov 15 by 5:00pm in Jisu s mailbox
Poits: 00+ pts total for the assigmet 36-755, Fall 07 Homework 5 Solutio Due Wed Nov 5 by 5:00pm i Jisu s mailbox We first review some basic relatios with orms ad the sigular value decompositio o matrices
More informationEntropy and Ergodic Theory Lecture 5: Joint typicality and conditional AEP
Etropy ad Ergodic Theory Lecture 5: Joit typicality ad coditioal AEP 1 Notatio: from RVs back to distributios Let (Ω, F, P) be a probability space, ad let X ad Y be A- ad B-valued discrete RVs, respectively.
More informationLECTURE 2 LEAST SQUARES CROSS-VALIDATION FOR KERNEL DENSITY ESTIMATION
Jauary 3 07 LECTURE LEAST SQUARES CROSS-VALIDATION FOR ERNEL DENSITY ESTIMATION Noparametric kerel estimatio is extremely sesitive to te coice of badwidt as larger values of result i averagig over more
More informationChapter 3. Strong convergence. 3.1 Definition of almost sure convergence
Chapter 3 Strog covergece As poited out i the Chapter 2, there are multiple ways to defie the otio of covergece of a sequece of radom variables. That chapter defied covergece i probability, covergece i
More informationChapter 6 Sampling Distributions
Chapter 6 Samplig Distributios 1 I most experimets, we have more tha oe measuremet for ay give variable, each measuremet beig associated with oe radomly selected a member of a populatio. Hece we eed to
More informationStatistical Theory; Why is the Gaussian Distribution so popular?
Statistical Theory; Why is the Gaussia Distributio so popular? Rob Nicholls MRC LMB Statistics Course 2014 Cotets Cotiuous Radom Variables Expectatio ad Variace Momets The Law of Large Numbers (LLN) The
More informationModule 1 Fundamentals in statistics
Normal Distributio Repeated observatios that differ because of experimetal error ofte vary about some cetral value i a roughly symmetrical distributio i which small deviatios occur much more frequetly
More information4.1 Data processing inequality
ECE598: Iformatio-theoretic methods i high-dimesioal statistics Sprig 206 Lecture 4: Total variatio/iequalities betwee f-divergeces Lecturer: Yihog Wu Scribe: Matthew Tsao, Feb 8, 206 [Ed. Mar 22] Recall
More informationECE 901 Lecture 14: Maximum Likelihood Estimation and Complexity Regularization
ECE 90 Lecture 4: Maximum Likelihood Estimatio ad Complexity Regularizatio R Nowak 5/7/009 Review : Maximum Likelihood Estimatio We have iid observatios draw from a ukow distributio Y i iid p θ, i,, where
More informationPrecise Rates in Complete Moment Convergence for Negatively Associated Sequences
Commuicatios of the Korea Statistical Society 29, Vol. 16, No. 5, 841 849 Precise Rates i Complete Momet Covergece for Negatively Associated Sequeces Dae-Hee Ryu 1,a a Departmet of Computer Sciece, ChugWoo
More informationElements of Statistical Methods Lots of Data or Large Samples (Ch 8)
Elemets of Statistical Methods Lots of Data or Large Samples (Ch 8) Fritz Scholz Sprig Quarter 2010 February 26, 2010 x ad X We itroduced the sample mea x as the average of the observed sample values x
More informationfor all x ; ;x R. A ifiite sequece fx ; g is said to be ND if every fiite subset X ; ;X is ND. The coditios (.) ad (.3) are equivalet for =, but these
sub-gaussia techiques i provig some strog it theorems Λ M. Amii A. Bozorgia Departmet of Mathematics, Faculty of Scieces Sista ad Baluchesta Uiversity, Zaheda, Ira Amii@hamoo.usb.ac.ir, Fax:054446565 Departmet
More information