Learning Theory for Conditional Risk Minimization: Supplementary Material
|
|
- Jonas Copeland
- 5 years ago
- Views:
Transcription
1 Learig Theory for Coditioal Risk Miiizatio: Suppleetary Material Alexader Zii IST Austria Christoph H Lapter IST Austria chl@istacat Proofs Proof of Theore After the applicatio of (6) ad (8) we ca cosider the two parts separately: P R (h ) if R (h) > α (3) P sup (l(h, z t ) R t (h)) > α/4 (32) + P d t, > α/4 (33) The covergece of the probability i (32) is guarateed by the result of Rakhli et al, 204 for ay stochastic process The covergece of (33) follows fro the deitio of the coverget discrepacies ad is a cotet of Lea 2 Lea 2 If double array d t, d t, coverges to 0 i probability is coverget, the Proof The proof is siilar to that of the Toeplitz lea, but adapted to our otio of covergece Fix ε > 0 ad δ > 0 The, by the deitio of a coverget array, for ε = δ = δε 4 0, t 0 : 0 t 0 < 0, 0, t 0 t < : (34) P d t, > ε δ (35) I particular, this eas that for ay 0 ad t 0 t < we have E d t, ε + δ = δε 2, because of the boudedess of d t, Now, choose ay 0 that satises 0 ε 2 The for ay we get P d t, > ε P d t, > ε 2 t= 0+ (36) t= 2 0+ t, ε (37) δ, (38) where the last lie follows fro the boud o the expectatios To characterize a coplexity of soe fuctio class we use coverig ubers ad a sequetial fat-shatterig diesio But before we could give those deitios, we eed to itroduce a otio of Z-valued trees A Z-valued tree of depth is a sequece z : of appigs z i : {± i Z A sequece ε : {± dees a path i a tree To shorte the otatios, z t (ε :t ) is deoted as z t (ε) For a double sequece z :, z :, we dee χ t (ε) as z t if ε = ad z t if ε = Also dee distributios p t (ε :t, z :t, z :t ) over Z as P χ (ε ),, χ t (ε t ), where P is a distributio of a process uder cosideratio The we ca dee a distributio ρ over two Z-valued trees z ad z as follows: z ad z are sapled idepedetly fro the iitial distributio of the process ad for ay path ε : for 2 t, z t (ε) ad z t(ε) are sapled idepedetly fro p t (ε :t, z :t (ε), z :t (ε)) For ay rado variable y that is easurable with respect to σ (a σ-algebra geerated by z : ), we de- e its syetrized couterpart ỹ as follows We kow that there exists a easurable fuctio ψ such that y = ψ(z : ) The we dee ỹ = ψ(χ (ɛ ),, χ (ε )), where the saples used by χ t 's are uderstood fro the cotext Now we ca dee coverig ubers Deitio 5 A set, V, of R-valued trees of depth is a (sequetial) θ-cover (with respect to the l -or) of F {f : Z R o a tree z of depth if f F, ε {±, v V : (39) ax f(z t(ε)) v t (ε) θ (40) t The (sequetial) θ-coverig uber of a fuctio class F o a give tree z is N (F, θ, z) = i{ V : V is a θ-cover (4) wrt l -or of F o z (42)
2 Ruig headig title breaks the lie The axial θ-coverig uber of a fuctio class F over depth- trees is N (F, θ, ) = sup N (F, θ, z) (43) z To cotrol the growth of coverig ubers we use the followig otio of coplexity Deitio 6 A Z-valued tree z of depth is θ- shattered by a fuctio class F {f : Z R if there exists a R-valued tree s of depth such that ε {±, f F st t, (44) ε t (f(z t (ε)) s t (ε)) θ/2 (45) The (sequetial) fat-shatterig diesio fat θ (F) at scale θ is the largest d such that F θ-shatters a Z- valued tree of depth d A iportat result of Rakhli et al, 204 is the followig coectio betwee the coverig ubers ad the fat-shatterig diesio Lea 3 (Corollary of Rakhli et al, 204) Let F {f : Z, For ay θ > 0 ad ay, we have that ( ) fat 2e θ (F) N (F, θ, ) (46) θ I the proofs we deote L(H) as F Proof of Theore 2 After equatios (6), (8) ad (0), we are left to study the large deviatios of the followig quatity Θ(J ) = sup w t (J ) (f(z t ) E t f) (47) f F with the weights deed as i () Let us dee evets A r = {J = r ad B r (j) = {r g(m t,j) r +, such that E k, = { r k A r { r B r (J ) The we have P Θ(J ) α P Θ(J ) α E k, + P E c k, (48) Now we ca take a uio boud for the rst suad over A r 's ad get P Θ(J ) α E k, (49) k P Θ(j) α { r B r (j) (50) j= Takig aother uio boud for each j, we ed up with P Θ(j) α { r B r (j) (5) r P Θ(j) α B r (j) (52) Now we study the last probability for a xed r ad j O B r (j) we ca lower boud the deoiator of the weights g(m t,j) r leadig to Θ(j) Θ r (j) = r sup f F g(m t,j) (f(z t ) E t f) Let λ > 0 ad deote V = r 2 g2 (M t,j ), E = r g(m t,j) The, sice r g(m t,j) σ t by the deitio of a M-boud, Lea 4 gives us E e λθr(j) λ2 V 2λβE l 2N (F,β,) (53) Let C = {Θ r (j) α B r (j) ad ote that E r+ r 2 ad V r+ r 2 2 r o B r(j) by the boudedess of g The we have the followig chai of iequalities E e λθr(j) λ2 V 2λβE l 2N (F,β,) (54) E e λθr(j) λ2 V 2λβE l 2N (F,β,) I C (55) e λα λ2 2 r 4λβ l 2N (F,β,) P C (56) Hece, by optiizig over λ, we get P Θ(j) α B r (j) 2N (F, β, )e 2 r(α 4β)2 (57) Now, coig back to (5), we ca evaluate it by coputig the su to obtai P Θ(J) α E k, 2kN (F, β, ) (α 4β) 2 e 2 (α 4β) 2 (58) Lea 4 Let y : be a process such that each y t σ t ad deote E = y t, V = y2 t The for a xed λ, β > 0 ad c = l 2N (F, β, ) E e λ sup f F yt(f(zt) Et f) λ2 V 2λβE c (59) Proof Let z : be a decoupled taget sequece to z :, ie a sequece that satises E t f(z t ) = E t f(z t) = E f(z t) z : The E e λ sup f F yt(f(zt) Ei f) λ2 V 2λβE c (60) E e λ sup f F yt(f(zt) f(z t )) λ2 V 2λβE c The Lea 5 gives us that (6) equals to (6) E ρ E ε e λ sup f ỹtεt(f(zt(ε)) f(z t (ε))) λ2 Ṽ 2λβẼ c (62) E z ρ E ε e 2λ sup f ỹtεtf(zt(ε)) λ2 Ṽ 2λβẼ c, (63)
3 Alexader Zii, Christoph H Lapter where ỹ is a syetrized versio of y, Ẽ = ỹ t, Ṽ = ỹ2 t ad we used Jese iequality to get the secod lie Now we take a β-cover of F with respect to l -or to get the followig boud o (63) E z ρ N (F, β, )E ε e 2λ ỹtεtf(zt(ε)) λ2 Ṽ c (64) = 2 E z ρe ε e 2λ ỹtεtf(zt(ε)) λ2 Ṽ (65) Itroduce evets Y + = { ỹtε t f(z t ) 0 ad Y = { ỹtε t f(z t ) < 0 The the last lie is equal to 2 E z ρe ε e 2λ ỹtεtf(zt(ε)) λ2ṽ I Y + (66) + 2 E z ρe ε e 2λ ỹtεtf(zt(ε)) λ2ṽ I Y (67) 2 E z ρe ε e 2λ ỹtεtf(zt(ε)) λ2 Ṽ (68) + 2 E z ρe ε e 2λ ỹtεtf(zt(ε)) λ2 Ṽ (69), (70) where the last lie follows by the stadard artigale arguet, sice ỹ t ε t f(z t (ε)) is a artigale dierece sequece (for a xed tree z) Lea 5 Let z : be a saple fro a process ad z : its decoupled sequece Let y : be a process such that each y t σ t, the for ay easurable fuctios ϕ : R R ad ψ : Z R, we have ( ) E ϕ sup y t (f(z t ) f(z f F t)) ψ(z : ) (7) ( ) = E ρ E ε ϕ sup ỹ t ε t (f(z t ) f(z t)) ψ, f F where ψ is a syetrized versio of ψ(z : ) Proof The proof is direct extesio of Theore 3 fro Rakhli et al 20 by usig the fact y t σ t Proof of Corollary The proof follows fro the Theore 2 if we set β = α 8 ad use the Lea 3 Proof of Lea The proof follows fro the followig boud d t, = sup E t f E x f (72) f L(H) E sup E t f x f (73) f L(H) Ad the the covergece of the discrepacies follows fro the deitio of the uiforly coverget artigale 2 Exceptioal set exaples Markov chais A k : P J > k P F z First, we boud the probability of > k S ax P F s > k (74) s O the evet B k, we have the followig chai of iequalities I d t,j b t=j I d t,j b (75) I d t,j = 0 (76) I z t = z J, (77) which gives us P J k I d t,j b < t=j P J k I z t = z J < S ax P s J k (78) (79) I z t = s < z J = s (80) Now, for a give state s, I z t = s ca be lower bouded by the uber of ties we hit the state s agai Let Ts, i i, be idepedet copies of the recurrece ties The I z t = s for ay 0, such that i= T s i k We also have the followig sequece of iclusios { i : Ts i k J k z J = s { Ts i k J k z J = s i= (8) (82) { I z t = s J k z J = s (83) Ad this gives us P J k I z t = s < z J = s (84) P i : Ts i > k (85) P T s > k (86)
4 Ruig headig title breaks the lie Dyaical systes The boud o P A k follows fro the fact that J F (C ) For the B k, we get P B k, k ax J = j I d t,j b j k P t=j (87) Ad siilarly to the Markov chai exaple, P J = j I d t,j b P T (C j ) > j t=j (88) Geeral statioary processes The boud for this case is doe aalogously to the previous two exaples, thus we oit the arguet 3 Couter-exaple for learability Theore 3 Let Z = {0,, H = 0, ad l(h, z) = (h z) 2 Also, let C be a class of all statioary ergodic processes takig values i Z The for ay learig algorith that produces a sequece of hypotheses h, there is a process P C such that ( ) P li sup R (h ) if (h) > 6 8 (89) Proof Usig the fact that the iiizer of E (h z+ ) 2 is E z +, we ca rewrite for ay h σ R (h ) if R (h) (90) = E (h z + ) 2 if E (h z+ ) 2 (9) = E (h z + ) 2 E (E z + z + ) 2 (92) = (h E z + ) 2 (93) A ior odicatio of the proof of Theore of Györ et al, 998 gives that for every algorith that produces a sequece h of hypotheses, there is a statioary ad ergodic process such that P li sup (h E z + ) 2 > 6 8, (94) which shows that o algorith ca be a liit learer for the class of all statioary ad ergodic biary processes 4 Coectio to tie series predictio The goal of this sectio is to show the coectio of our fraework to existig theoretical approaches to tie series predictio I particular, we cosider two fraeworks, which are close eough to coditioal risk iiizatio I both cases, we show that the coditioal risk iiizatio solves harder proble i a sese that its solutios ca be used to solve these particular probles, but it requires ore assuptios to be valid We start with a fraework of tie series predictio by statistical learig, cosidered for exaple i Alquier et al, 203, McDoald et al, 202 Fixig soe poit i tie, we cosider a hypotheses class H {h : Z Z, where each hypotheses h gives us a predictio of the ext step by evaluatig the whole history For ay loss fuctio l : Z Z 0,, we cosider the followig risk iiizatio proble: i E l(h(z : ), z + ) (95) To set up the coditioal risk iiizatio, we dee a class of costat fuctios H = {h z (z) = z, z Z The if the process belogs to a class learable with H ad l, we ca guaratee that there is a algorith to choose a poit z, such that with probability δ E l(z, z + ) z : if E l(z, z + ) z : + ε (δ), z (96) where ε (δ) is a sequece of errors guarateed by the algorith for a give codece δ ad ε (δ) 0 Covertig this to the boud o the expectatio, we get E l(z, z + ) E if E l(z, z + ) z : z (97) + ε (δ) + δ (98) Notice that E if (99) E l(z, z + ) z : z E if E l(h(z : ), z + ) z : (00) if E l(h(z : ), z + ) (0) Therefore, if the process is fro a learable class, there is a algorith that always give good predictios accordig to this fraework as well The secod settig, which was cosidered by Witeberger, 204, is very close to the olie sequece predictio I order to reduce the otatios ad siplify the presetatio, we assue that the learer has a access to a (usually ite) hypothesis class H ad at every step t he should choose a distributio π t over H i a way that iiizes the regret: E t l(e πt h, z t ) i E t l(h, z t ) (02)
5 Alexader Zii, Christoph H Lapter Agai, if the process belogs to a learable class with H ad l, the there is a algorith, which produce the sequece h t that satises with probability δ E t l(h t, z t ) i E t l(h, z t ) + ε t (δ/) (03) for all t Suig up over t, we get E t l(h t, z t ) (04) i E t l(h, z t ) + i E t l(h, z t ) + ε t (δ/) (05) ε t (δ/) (06) Thus givig us ε t(δ/) boud o the regret with high probability For ice sequeces (like iid) ε t (δ/) ( ) is of order O, which gives a regret boud of log t order O ( log ) O the dowside, we ca get guaratees oly for a class of learable processes, while the results of Witeberger, 204 hold for ay stochastic process The reaso for this is that coditioal risk iiizatio is iheretly ore dicult proble, sice it requires to optiize at every step ad ot i the cuulative sese
Lecture 10: Bounded Linear Operators and Orthogonality in Hilbert Spaces
Lecture : Bouded Liear Operators ad Orthogoality i Hilbert Spaces 34 Bouded Liear Operator Let ( X, ), ( Y, ) i i be ored liear vector spaces ad { } X Y The, T is said to be bouded if a real uber c such
More informationOnline Learning & Game Theory
Olie Learig & Gae Theory A quick overview with recet results Viaey Perchet Laboratoire Probabilités et Modèles Aléatoires Uiv. Paris-Diderot Jourées MAS 2014 27 Août 2014 Startig Exaples Startig Exaples
More informationProbability Theory. Exercise Sheet 4. ETH Zurich HS 2017
ETH Zurich HS 2017 D-MATH, D-PHYS Prof. A.-S. Szita Coordiator Yili Wag Probability Theory Exercise Sheet 4 Exercise 4.1 Let X ) N be a sequece of i.i.d. rado variables i a probability space Ω, A, P ).
More informationConvergence of random variables. (telegram style notes) P.J.C. Spreij
Covergece of radom variables (telegram style otes).j.c. Spreij this versio: September 6, 2005 Itroductio As we kow, radom variables are by defiitio measurable fuctios o some uderlyig measurable space
More informationDefine a Markov chain on {1,..., 6} with transition probability matrix P =
Pla Group Work 0. The title says it all Next Tie: MCMC ad Geeral-state Markov Chais Midter Exa: Tuesday 8 March i class Hoework 4 due Thursday Uless otherwise oted, let X be a irreducible, aperiodic Markov
More informationECE 901 Lecture 4: Estimation of Lipschitz smooth functions
ECE 9 Lecture 4: Estiatio of Lipschitz sooth fuctios R. Nowak 5/7/29 Cosider the followig settig. Let Y f (X) + W, where X is a rado variable (r.v.) o X [, ], W is a r.v. o Y R, idepedet of X ad satisfyig
More informationIntro to Learning Theory
Lecture 1, October 18, 2016 Itro to Learig Theory Ruth Urer 1 Machie Learig ad Learig Theory Comig soo 2 Formal Framework 21 Basic otios I our formal model for machie learig, the istaces to be classified
More informationChapter 3. Strong convergence. 3.1 Definition of almost sure convergence
Chapter 3 Strog covergece As poited out i the Chapter 2, there are multiple ways to defie the otio of covergece of a sequece of radom variables. That chapter defied covergece i probability, covergece i
More informationA PROBABILITY PROBLEM
A PROBABILITY PROBLEM A big superarket chai has the followig policy: For every Euros you sped per buy, you ear oe poit (suppose, e.g., that = 3; i this case, if you sped 8.45 Euros, you get two poits,
More informationarxiv: v1 [math.st] 12 Dec 2018
DIVERGENCE MEASURES ESTIMATION AND ITS ASYMPTOTIC NORMALITY THEORY : DISCRETE CASE arxiv:181.04795v1 [ath.st] 1 Dec 018 Abstract. 1) BA AMADOU DIADIÉ AND 1,,4) LO GANE SAMB 1. Itroductio 1.1. Motivatios.
More informationMeasure and Measurable Functions
3 Measure ad Measurable Fuctios 3.1 Measure o a Arbitrary σ-algebra Recall from Chapter 2 that the set M of all Lebesgue measurable sets has the followig properties: R M, E M implies E c M, E M for N implies
More informationProduct measures, Tonelli s and Fubini s theorems For use in MAT3400/4400, autumn 2014 Nadia S. Larsen. Version of 13 October 2014.
Product measures, Toelli s ad Fubii s theorems For use i MAT3400/4400, autum 2014 Nadia S. Larse Versio of 13 October 2014. 1. Costructio of the product measure The purpose of these otes is to preset the
More informationChapter 2. Asymptotic Notation
Asyptotic Notatio 3 Chapter Asyptotic Notatio Goal : To siplify the aalysis of ruig tie by gettig rid of details which ay be affected by specific ipleetatio ad hardware. [1] The Big Oh (O-Notatio) : It
More information7.1 Convergence of sequences of random variables
Chapter 7 Limit Theorems Throughout this sectio we will assume a probability space (, F, P), i which is defied a ifiite sequece of radom variables (X ) ad a radom variable X. The fact that for every ifiite
More informationBIRKHOFF ERGODIC THEOREM
BIRKHOFF ERGODIC THEOREM Abstract. We will give a proof of the poitwise ergodic theorem, which was first proved by Birkhoff. May improvemets have bee made sice Birkhoff s orgial proof. The versio we give
More informationRademacher Complexity
EECS 598: Statistical Learig Theory, Witer 204 Topic 0 Rademacher Complexity Lecturer: Clayto Scott Scribe: Ya Deg, Kevi Moo Disclaimer: These otes have ot bee subjected to the usual scrutiy reserved for
More informationM17 MAT25-21 HOMEWORK 5 SOLUTIONS
M17 MAT5-1 HOMEWORK 5 SOLUTIONS 1. To Had I Cauchy Codesatio Test. Exercise 1: Applicatio of the Cauchy Codesatio Test Use the Cauchy Codesatio Test to prove that 1 diverges. Solutio 1. Give the series
More informationMachine Learning Theory (CS 6783)
Machie Learig Theory (CS 6783) Lecture 2 : Learig Frameworks, Examples Settig up learig problems. X : istace space or iput space Examples: Computer Visio: Raw M N image vectorized X = 0, 255 M N, SIFT
More informationAn Introduction to Randomized Algorithms
A Itroductio to Radomized Algorithms The focus of this lecture is to study a radomized algorithm for quick sort, aalyze it usig probabilistic recurrece relatios, ad also provide more geeral tools for aalysis
More information19.1 The dictionary problem
CS125 Lecture 19 Fall 2016 19.1 The dictioary proble Cosider the followig data structural proble, usually called the dictioary proble. We have a set of ites. Each ite is a (key, value pair. Keys are i
More informationAsymptotic distribution of products of sums of independent random variables
Proc. Idia Acad. Sci. Math. Sci. Vol. 3, No., May 03, pp. 83 9. c Idia Academy of Scieces Asymptotic distributio of products of sums of idepedet radom variables YANLING WANG, SUXIA YAO ad HONGXIA DU ollege
More informationSome remarks on the paper Some elementary inequalities of G. Bennett
Soe rears o the paper Soe eleetary iequalities of G. Beett Dag Ah Tua ad Luu Quag Bay Vieta Natioal Uiversity - Haoi Uiversity of Sciece Abstract We give soe couterexaples ad soe rears of soe of the corollaries
More informationEntropy and Ergodic Theory Lecture 5: Joint typicality and conditional AEP
Etropy ad Ergodic Theory Lecture 5: Joit typicality ad coditioal AEP 1 Notatio: from RVs back to distributios Let (Ω, F, P) be a probability space, ad let X ad Y be A- ad B-valued discrete RVs, respectively.
More information10/ Statistical Machine Learning Homework #1 Solutions
Caregie Mello Uiversity Departet of Statistics & Data Sciece 0/36-70 Statistical Macie Learig Hoework # Solutios Proble [40 pts.] DUE: February, 08 Let X,..., X P were X i [0, ] ad P as desity p. Let p
More informationJORGE LUIS AROCHA AND BERNARDO LLANO. Average atchig polyoial Cosider a siple graph G =(V E): Let M E a atchig of the graph G: If M is a atchig, the a
MEAN VALUE FOR THE MATCHING AND DOMINATING POLYNOMIAL JORGE LUIS AROCHA AND BERNARDO LLANO Abstract. The ea value of the atchig polyoial is coputed i the faily of all labeled graphs with vertices. We dee
More informationQueueing Theory II. Summary. M/M/1 Output process Networks of Queue Method of Stages. General Distributions
Queueig Theory II Suary M/M/1 Output process Networks of Queue Method of Stages Erlag Distributio Hyperexpoetial Distributio Geeral Distributios Ebedded Markov Chais 1 M/M/1 Output Process Burke s Theore:
More information1 Review and Overview
CS9T/STATS3: Statistical Learig Theory Lecturer: Tegyu Ma Lecture #6 Scribe: Jay Whag ad Patrick Cho October 0, 08 Review ad Overview Recall i the last lecture that for ay family of scalar fuctios F, we
More information6.883: Online Methods in Machine Learning Alexander Rakhlin
6.883: Olie Methods i Machie Learig Alexader Rakhli LECTURES 5 AND 6. THE EXPERTS SETTING. EXPONENTIAL WEIGHTS All the algorithms preseted so far halluciate the future values as radom draws ad the perform
More informationMASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.436J/15.085J Fall 2008 Lecture 19 11/17/2008 LAWS OF LARGE NUMBERS II THE STRONG LAW OF LARGE NUMBERS
MASSACHUSTTS INSTITUT OF TCHNOLOGY 6.436J/5.085J Fall 2008 Lecture 9 /7/2008 LAWS OF LARG NUMBRS II Cotets. The strog law of large umbers 2. The Cheroff boud TH STRONG LAW OF LARG NUMBRS While the weak
More informationLecture Outline. 2 Separating Hyperplanes. 3 Banach Mazur Distance An Algorithmist s Toolkit October 22, 2009
18.409 A Algorithist s Toolkit October, 009 Lecture 1 Lecturer: Joatha Keler Scribes: Alex Levi (009) 1 Outlie Today we ll go over soe of the details fro last class ad ake precise ay details that were
More informationBinomial transform of products
Jauary 02 207 Bioial trasfor of products Khristo N Boyadzhiev Departet of Matheatics ad Statistics Ohio Norther Uiversity Ada OH 4580 USA -boyadzhiev@ouedu Abstract Give the bioial trasfors { b } ad {
More informationComplete Solutions to Supplementary Exercises on Infinite Series
Coplete Solutios to Suppleetary Eercises o Ifiite Series. (a) We eed to fid the su ito partial fractios gives By the cover up rule we have Therefore Let S S A / ad A B B. Covertig the suad / the by usig
More informationCS 70 Second Midterm 7 April NAME (1 pt): SID (1 pt): TA (1 pt): Name of Neighbor to your left (1 pt): Name of Neighbor to your right (1 pt):
CS 70 Secod Midter 7 April 2011 NAME (1 pt): SID (1 pt): TA (1 pt): Nae of Neighbor to your left (1 pt): Nae of Neighbor to your right (1 pt): Istructios: This is a closed book, closed calculator, closed
More informationEntropy Rates and Asymptotic Equipartition
Chapter 29 Etropy Rates ad Asymptotic Equipartitio Sectio 29. itroduces the etropy rate the asymptotic etropy per time-step of a stochastic process ad shows that it is well-defied; ad similarly for iformatio,
More informationOn Equivalence of Martingale Tail Bounds and Deterministic Regret Inequalities
O Equivalece of Martigale Tail Bouds ad Determiistic Regret Iequalities Sasha Rakhli Departmet of Statistics, The Wharto School Uiversity of Pesylvaia Dec 16, 2015 Joit work with K. Sridhara arxiv:1510.03925
More informationStatistics for Applications Fall Problem Set 7
18.650. Statistics for Applicatios Fall 016. Proble Set 7 Due Friday, Oct. 8 at 1 oo Proble 1 QQ-plots Recall that the Laplace distributio with paraeter λ > 0 is the cotiuous probaλ bility easure with
More informationLecture 15: Learning Theory: Concentration Inequalities
STAT 425: Itroductio to Noparametric Statistics Witer 208 Lecture 5: Learig Theory: Cocetratio Iequalities Istructor: Ye-Chi Che 5. Itroductio Recall that i the lecture o classificatio, we have see that
More informationAutomated Proofs for Some Stirling Number Identities
Autoated Proofs for Soe Stirlig Nuber Idetities Mauel Kauers ad Carste Scheider Research Istitute for Sybolic Coputatio Johaes Kepler Uiversity Altebergerstraße 69 A4040 Liz, Austria Subitted: Sep 1, 2007;
More informationDiscrete Mathematics for CS Spring 2008 David Wagner Note 22
CS 70 Discrete Mathematics for CS Sprig 2008 David Wager Note 22 I.I.D. Radom Variables Estimatig the bias of a coi Questio: We wat to estimate the proportio p of Democrats i the US populatio, by takig
More informationBertrand s postulate Chapter 2
Bertrad s postulate Chapter We have see that the sequece of prie ubers, 3, 5, 7,... is ifiite. To see that the size of its gaps is ot bouded, let N := 3 5 p deote the product of all prie ubers that are
More information7.1 Convergence of sequences of random variables
Chapter 7 Limit theorems Throughout this sectio we will assume a probability space (Ω, F, P), i which is defied a ifiite sequece of radom variables (X ) ad a radom variable X. The fact that for every ifiite
More informationFall 2013 MTH431/531 Real analysis Section Notes
Fall 013 MTH431/531 Real aalysis Sectio 8.1-8. Notes Yi Su 013.11.1 1. Defiitio of uiform covergece. We look at a sequece of fuctios f (x) ad study the coverget property. Notice we have two parameters
More informationMASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.265/15.070J Fall 2013 Lecture 21 11/27/2013
MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.265/15.070J Fall 2013 Lecture 21 11/27/2013 Fuctioal Law of Large Numbers. Costructio of the Wieer Measure Cotet. 1. Additioal techical results o weak covergece
More information1 Review and Overview
DRAFT a fial versio will be posted shortly CS229T/STATS231: Statistical Learig Theory Lecturer: Tegyu Ma Lecture #3 Scribe: Migda Qiao October 1, 2013 1 Review ad Overview I the first half of this course,
More informationACO Comprehensive Exam 9 October 2007 Student code A. 1. Graph Theory
1. Graph Theory Prove that there exist o simple plaar triagulatio T ad two distict adjacet vertices x, y V (T ) such that x ad y are the oly vertices of T of odd degree. Do ot use the Four-Color Theorem.
More informationProblem Set 2 Solutions
CS271 Radomess & Computatio, Sprig 2018 Problem Set 2 Solutios Poit totals are i the margi; the maximum total umber of poits was 52. 1. Probabilistic method for domiatig sets 6pts Pick a radom subset S
More information1 Convergence in Probability and the Weak Law of Large Numbers
36-752 Advaced Probability Overview Sprig 2018 8. Covergece Cocepts: i Probability, i L p ad Almost Surely Istructor: Alessadro Rialdo Associated readig: Sec 2.4, 2.5, ad 4.11 of Ash ad Doléas-Dade; Sec
More informationA string of not-so-obvious statements about correlation in the data. (This refers to the mechanical calculation of correlation in the data.
STAT-UB.003 NOTES for Wedesday 0.MAY.0 We will use the file JulieApartet.tw. We ll give the regressio of Price o SqFt, show residual versus fitted plot, save residuals ad fitted. Give plot of (Resid, Price,
More information6.883: Online Methods in Machine Learning Alexander Rakhlin
6.883: Olie Methods i Machie Learig Alexader Rakhli LECTURE 23. SOME CONSEQUENCES OF ONLINE NO-REGRET METHODS I this lecture, we explore some cosequeces of the developed techiques.. Covex optimizatio Wheever
More informationMixture models (cont d)
6.867 Machie learig, lecture 5 (Jaakkola) Lecture topics: Differet types of ixture odels (cot d) Estiatig ixtures: the EM algorith Mixture odels (cot d) Basic ixture odel Mixture odels try to capture ad
More information6.3 Testing Series With Positive Terms
6.3. TESTING SERIES WITH POSITIVE TERMS 307 6.3 Testig Series With Positive Terms 6.3. Review of what is kow up to ow I theory, testig a series a i for covergece amouts to fidig the i= sequece of partial
More informationTHE GREATEST ORDER OF THE DIVISOR FUNCTION WITH INCREASING DIMENSION
MATHEMATICA MONTISNIGRI Vol XXVIII (013) 17-5 THE GREATEST ORDER OF THE DIVISOR FUNCTION WITH INCREASING DIMENSION GLEB V. FEDOROV * * Mechaics ad Matheatics Faculty Moscow State Uiversity Moscow, Russia
More informationLecture 3 The Lebesgue Integral
Lecture 3: The Lebesgue Itegral 1 of 14 Course: Theory of Probability I Term: Fall 2013 Istructor: Gorda Zitkovic Lecture 3 The Lebesgue Itegral The costructio of the itegral Uless expressly specified
More informationECE 901 Lecture 12: Complexity Regularization and the Squared Loss
ECE 90 Lecture : Complexity Regularizatio ad the Squared Loss R. Nowak 5/7/009 I the previous lectures we made use of the Cheroff/Hoeffdig bouds for our aalysis of classifier errors. Hoeffdig s iequality
More informationAVERAGE MARKS SCALING
TERTIARY INSTITUTIONS SERVICE CENTRE Level 1, 100 Royal Street East Perth, Wester Australia 6004 Telephoe (08) 9318 8000 Facsiile (08) 95 7050 http://wwwtisceduau/ 1 Itroductio AVERAGE MARKS SCALING I
More informationDiscrete Mathematics: Lectures 8 and 9 Principle of Inclusion and Exclusion Instructor: Arijit Bishnu Date: August 11 and 13, 2009
Discrete Matheatics: Lectures 8 ad 9 Priciple of Iclusio ad Exclusio Istructor: Arijit Bishu Date: August ad 3, 009 As you ca observe by ow, we ca cout i various ways. Oe such ethod is the age-old priciple
More informationMachine Learning Theory (CS 6783)
Machie Learig Theory (CS 6783) Lecture 3 : Olie Learig, miimax value, sequetial Rademacher complexity Recap: Miimax Theorem We shall use the celebrated miimax theorem as a key tool to boud the miimax rate
More informationThe Hypergeometric Coupon Collection Problem and its Dual
Joural of Idustrial ad Systes Egieerig Vol., o., pp -7 Sprig 7 The Hypergeoetric Coupo Collectio Proble ad its Dual Sheldo M. Ross Epstei Departet of Idustrial ad Systes Egieerig, Uiversity of Souther
More informationProbability 2 - Notes 10. Lemma. If X is a random variable and g(x) 0 for all x in the support of f X, then P(g(X) 1) E[g(X)].
Probability 2 - Notes 0 Some Useful Iequalities. Lemma. If X is a radom variable ad g(x 0 for all x i the support of f X, the P(g(X E[g(X]. Proof. (cotiuous case P(g(X Corollaries x:g(x f X (xdx x:g(x
More informationApplication to Random Graphs
A Applicatio to Radom Graphs Brachig processes have a umber of iterestig ad importat applicatios. We shall cosider oe of the most famous of them, the Erdős-Réyi radom graph theory. 1 Defiitio A.1. Let
More informationVECTOR SEMINORMS, SPACES WITH VECTOR NORM, AND REGULAR OPERATORS
Dedicated to Professor Philippe G. Ciarlet o his 70th birthday VECTOR SEMINORMS, SPACES WITH VECTOR NORM, AND REGULAR OPERATORS ROMULUS CRISTESCU The rst sectio of this paper deals with the properties
More information2 Markov Chain Monte Carlo Sampling
22 Part I. Markov Chais ad Stochastic Samplig Figure 10: Hard-core colourig of a lattice. 2 Markov Chai Mote Carlo Samplig We ow itroduce Markov chai Mote Carlo (MCMC) samplig, which is a extremely importat
More informationMASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.265/15.070J Fall 2013 Lecture 3 9/11/2013. Large deviations Theory. Cramér s Theorem
MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.265/5.070J Fall 203 Lecture 3 9//203 Large deviatios Theory. Cramér s Theorem Cotet.. Cramér s Theorem. 2. Rate fuctio ad properties. 3. Chage of measure techique.
More informationINFINITE SEQUENCES AND SERIES
11 INFINITE SEQUENCES AND SERIES INFINITE SEQUENCES AND SERIES 11.4 The Compariso Tests I this sectio, we will lear: How to fid the value of a series by comparig it with a kow series. COMPARISON TESTS
More informationInformation Theory and Statistics Lecture 4: Lempel-Ziv code
Iformatio Theory ad Statistics Lecture 4: Lempel-Ziv code Łukasz Dębowski ldebowsk@ipipa.waw.pl Ph. D. Programme 203/204 Etropy rate is the limitig compressio rate Theorem For a statioary process (X i)
More informationNon-asymptotic sequential confidence regions with fixed sizes for the multivariate nonlinear parameters of regression. Andrey V.
No-asyptotic sequetial cofidece regios with fixed sizes for the ultivariate oliear paraeters of regressio Adrey V Tiofeev Abstract I this paper we cosider a sequetial desig for estiatio of o-liear paraeters
More information1+x 1 + α+x. x = 2(α x2 ) 1+x
Math 2030 Homework 6 Solutios # [Problem 5] For coveiece we let α lim sup a ad β lim sup b. Without loss of geerality let us assume that α β. If α the by assumptio β < so i this case α + β. By Theorem
More informationf(1), and so, if f is continuous, f(x) = f(1)x.
2.2.35: Let f be a additive fuctio. i Clearly fx = fx ad therefore f x = fx for all Z+ ad x R. Hece, for ay, Z +, f = f, ad so, if f is cotiuous, fx = fx. ii Suppose that f is bouded o soe o-epty ope set.
More informationDefinition 4.2. (a) A sequence {x n } in a Banach space X is a basis for X if. unique scalars a n (x) such that x = n. a n (x) x n. (4.
4. BASES I BAACH SPACES 39 4. BASES I BAACH SPACES Sice a Baach space X is a vector space, it must possess a Hamel, or vector space, basis, i.e., a subset {x γ } γ Γ whose fiite liear spa is all of X ad
More informationLecture 2: April 3, 2013
TTIC/CMSC 350 Mathematical Toolkit Sprig 203 Madhur Tulsiai Lecture 2: April 3, 203 Scribe: Shubhedu Trivedi Coi tosses cotiued We retur to the coi tossig example from the last lecture agai: Example. Give,
More informationMath 210A Homework 1
Math 0A Homework Edward Burkard Exercise. a) State the defiitio of a aalytic fuctio. b) What are the relatioships betwee aalytic fuctios ad the Cauchy-Riema equatios? Solutio. a) A fuctio f : G C is called
More information6.4 Binomial Coefficients
64 Bioial Coefficiets Pascal s Forula Pascal s forula, aed after the seveteeth-cetury Frech atheaticia ad philosopher Blaise Pascal, is oe of the ost faous ad useful i cobiatorics (which is the foral ter
More informationpage Suppose that S 0, 1 1, 2.
page 10 1. Suppose that S 0, 1 1,. a. What is the set of iterior poits of S? The set of iterior poits of S is 0, 1 1,. b. Give that U is the set of iterior poits of S, evaluate U. 0, 1 1, 0, 1 1, S. The
More information62. Power series Definition 16. (Power series) Given a sequence {c n }, the series. c n x n = c 0 + c 1 x + c 2 x 2 + c 3 x 3 +
62. Power series Defiitio 16. (Power series) Give a sequece {c }, the series c x = c 0 + c 1 x + c 2 x 2 + c 3 x 3 + is called a power series i the variable x. The umbers c are called the coefficiets of
More informationMath Solutions to homework 6
Math 175 - Solutios to homework 6 Cédric De Groote November 16, 2017 Problem 1 (8.11 i the book): Let K be a compact Hermitia operator o a Hilbert space H ad let the kerel of K be {0}. Show that there
More informationSolutions to HW Assignment 1
Solutios to HW: 1 Course: Theory of Probability II Page: 1 of 6 Uiversity of Texas at Austi Solutios to HW Assigmet 1 Problem 1.1. Let Ω, F, {F } 0, P) be a filtered probability space ad T a stoppig time.
More informationECE534, Spring 2018: Final Exam
ECE534, Srig 2018: Fial Exam Problem 1 Let X N (0, 1) ad Y N (0, 1) be ideedet radom variables. variables V = X + Y ad W = X 2Y. Defie the radom (a) Are V, W joitly Gaussia? Justify your aswer. (b) Comute
More informationStanford Statistics 311/Electrical Engineering 377
I. Uiversal predictio ad codig a. Gae: sequecex ofdata, adwattopredict(orcode)aswellasifwekew distributio of data b. Two versios: probabilistic ad adversarial. I either case, let p ad q be desities or
More informationThe Poisson Process *
OpeStax-CNX module: m11255 1 The Poisso Process * Do Johso This work is produced by OpeStax-CNX ad licesed uder the Creative Commos Attributio Licese 1.0 Some sigals have o waveform. Cosider the measuremet
More informationEmpirical Process Theory and Oracle Inequalities
Stat 928: Statistical Learig Theory Lecture: 10 Empirical Process Theory ad Oracle Iequalities Istructor: Sham Kakade 1 Risk vs Risk See Lecture 0 for a discussio o termiology. 2 The Uio Boud / Boferoi
More informationCSCI-6971 Lecture Notes: Stochastic processes
CSCI-6971 Lecture Notes: Stochastic processes Kristopher R. Beevers Departet of Coputer Sciece Resselaer Polytechic Istitute beevek@cs.rpi.edu February 2, 2006 1 Overview Defiitio 1.1. A stochastic process
More informationMATH 312 Midterm I(Spring 2015)
MATH 3 Midterm I(Sprig 05) Istructor: Xiaowei Wag Feb 3rd, :30pm-3:50pm, 05 Problem (0 poits). Test for covergece:.. 3.. p, p 0. (coverges for p < ad diverges for p by ratio test.). ( coverges, sice (log
More informationGeneralized Semi- Markov Processes (GSMP)
Geeralized Semi- Markov Processes (GSMP) Summary Some Defiitios Markov ad Semi-Markov Processes The Poisso Process Properties of the Poisso Process Iterarrival times Memoryless property ad the residual
More informationJacobi symbols. p 1. Note: The Jacobi symbol does not necessarily distinguish between quadratic residues and nonresidues. That is, we could have ( a
Jacobi sybols efiitio Let be a odd positive iteger If 1, the Jacobi sybol : Z C is the costat fuctio 1 1 If > 1, it has a decopositio ( as ) a product of (ot ecessarily distict) pries p 1 p r The Jacobi
More informationSurveying the Variance Reduction Methods
Available olie at www.scizer.co Austria Joural of Matheatics ad Statistics, Vol 1, Issue 1, (2017): 10-15 ISSN 0000-0000 Surveyig the Variace Reductio Methods Arash Mirtorabi *1, Gholahossei Gholai 2 1.
More informationSeries III. Chapter Alternating Series
Chapter 9 Series III With the exceptio of the Null Sequece Test, all the tests for series covergece ad divergece that we have cosidered so far have dealt oly with series of oegative terms. Series with
More informationProbability and Random Processes
Probability ad Radom Processes Lecture 5 Probability ad radom variables The law of large umbers Mikael Skoglud, Probability ad radom processes 1/21 Why Measure Theoretic Probability? Stroger limit theorems
More information5.6 Absolute Convergence and The Ratio and Root Tests
5.6 Absolute Covergece ad The Ratio ad Root Tests Bria E. Veitch 5.6 Absolute Covergece ad The Ratio ad Root Tests Recall from our previous sectio that diverged but ( ) coverged. Both of these sequeces
More informationElement sampling: Part 2
Chapter 4 Elemet samplig: Part 2 4.1 Itroductio We ow cosider uequal probability samplig desigs which is very popular i practice. I the uequal probability samplig, we ca improve the efficiecy of the resultig
More informationREVIEW OF CALCULUS Herman J. Bierens Pennsylvania State University (January 28, 2004) x 2., or x 1. x j. ' ' n i'1 x i well.,y 2
REVIEW OF CALCULUS Hera J. Bieres Pesylvaia State Uiversity (Jauary 28, 2004) 1. Suatio Let x 1,x 2,...,x e a sequece of uers. The su of these uers is usually deoted y x 1 % x 2 %...% x ' j x j, or x 1
More informationBerry-Esseen bounds for self-normalized martingales
Berry-Essee bouds for self-ormalized martigales Xiequa Fa a, Qi-Ma Shao b a Ceter for Applied Mathematics, Tiaji Uiversity, Tiaji 30007, Chia b Departmet of Statistics, The Chiese Uiversity of Hog Kog,
More informationLecture Notes for Analysis Class
Lecture Notes for Aalysis Class Topological Spaces A topology for a set X is a collectio T of subsets of X such that: (a) X ad the empty set are i T (b) Uios of elemets of T are i T (c) Fiite itersectios
More informationMachine Learning Theory Tübingen University, WS 2016/2017 Lecture 3
Machie Learig Theory Tübige Uiversity, WS 06/07 Lecture 3 Tolstikhi Ilya Abstract I this lecture we will prove the VC-boud, which provides a high-probability excess risk boud for the ERM algorithm whe
More informationMachine Learning Brett Bernstein
Machie Learig Brett Berstei Week 2 Lecture: Cocept Check Exercises Starred problems are optioal. Excess Risk Decompositio 1. Let X = Y = {1, 2,..., 10}, A = {1,..., 10, 11} ad suppose the data distributio
More informationDouble Derangement Permutations
Ope Joural of iscrete Matheatics, 206, 6, 99-04 Published Olie April 206 i SciRes http://wwwscirporg/joural/ojd http://dxdoiorg/04236/ojd2066200 ouble erageet Perutatios Pooya aeshad, Kayar Mirzavaziri
More informationNotes 19 : Martingale CLT
Notes 9 : Martigale CLT Math 733-734: Theory of Probability Lecturer: Sebastie Roch Refereces: [Bil95, Chapter 35], [Roc, Chapter 3]. Sice we have ot ecoutered weak covergece i some time, we first recall
More informationAl Lehnen Madison Area Technical College 10/5/2014
The Correlatio of Two Rado Variables Page Preliiary: The Cauchy-Schwarz-Buyakovsky Iequality For ay two sequeces of real ubers { a } ad = { b } =, the followig iequality is always true. Furtherore, equality
More informationLecture 13: Maximum Likelihood Estimation
ECE90 Sprig 007 Statistical Learig Theory Istructor: R. Nowak Lecture 3: Maximum Likelihood Estimatio Summary of Lecture I the last lecture we derived a risk (MSE) boud for regressio problems; i.e., select
More information1.3 Convergence Theorems of Fourier Series. k k k k. N N k 1. With this in mind, we state (without proof) the convergence of Fourier series.
.3 Covergece Theorems of Fourier Series I this sectio, we preset the covergece of Fourier series. A ifiite sum is, by defiitio, a limit of partial sums, that is, a cos( kx) b si( kx) lim a cos( kx) b si(
More information2. F ; =(,1)F,1; +F,1;,1 is satised by thestirlig ubers of the rst kid ([1], p. 824). 3. F ; = F,1; + F,1;,1 is satised by the Stirlig ubers of the se
O First-Order Two-Diesioal Liear Hoogeeous Partial Dierece Equatios G. Neil Have y Ditri A. Gusev z Abstract Aalysis of algoriths occasioally requires solvig of rst-order two-diesioal liear hoogeeous partial
More information