Lecture 16: Achieving and Estimating the Fundamental Limit
|
|
- Aubrey Underwood
- 5 years ago
- Views:
Transcription
1 EE378A tatistical igal Processig Lecture 6-05/25/207 Lecture 6: Achievig ad Estimatig the Fudametal Limit Lecturer: Jiatao Jiao cribe: William Clary I this lecture, we formally defie the two distict problems of achievig ad estimatig the fudametal limit, ad show that uder the logarithmic loss, it is easier to estimate the fudametal limit tha to achieve it. The Bayes evelope The Bayes evelope itroduced i the previous lectures ca be viewed as the fudametal limit of predictio. Ideed, for a specified loss fuctio Λ(x, ˆx), the miimum average loss i predictig X P X is give by the Bayes evelope: U(P X ) mi ˆx E X PX [Λ(X, ˆx). () i.i.d. Throughout this lecture, we observe X, X 2,..., X P X, where X X = {, 2,..., }. I other words, the alphabet size of X is X =. We deote by M the space of probability measures o X. We take Λ(x, ˆx) to be the logarithmic loss i the sequel, i other words, we have Λ(x, ˆx) = Λ(x, ˆP ) = log ˆP (x), (2) for ay x X, ˆP M. For o-egative sequeces a γ, b γ, we use the otatio a γ b γ to deote that there exists a uiversal a costat C such that sup γ γ b γ C, ad a γ b γ is equivalet to b γ a γ. Notatio a γ b γ is equivalet to a a γ b γ ad b γ a γ. Notatio a γ b γ meas that lim if γ γ b γ =, ad a γ b γ is equivalet to b γ a γ. We write a b = mi{a, b} ad a b = max{a, b}. Moreover, poly K deotes the set of all polyomials of degree o more tha K. 2 Achievig the fudametal limit i.i.d. Give i.i.d. observatios X, X 2,..., X P X, we would like to costruct a predictor ˆP = ˆP (X, X 2,..., X ) to predict a fresh ew idepedet radom variable X P X, where X is idepedet of the traiig data {X i } i=. The average risk of predictig X usig the predictor ˆP uder the logarithmic loss is give by [log, (3) where the expectatio is over the radomess of (X, X 2,..., X, X) P (+) X.
2 2. The iappropriate questio of miimax risk ice the distributio P X is ukow, we may take the miimax approach i decisio theory ad aim at solvig the miimax risk. I other words, we aim at solvig [ if sup log. (4) ˆP P X M We ow show that this questio leads to a degeerate aswer that may ot be what we wat. Theorem. The miimax risk is give by if ˆP sup P X M [ log = log(), (5) ad the miimax risk achievig ˆP ca be take to be U = (,,..., ), where U is the uiform distributio o X. Proof We first show that the miimax risk is at least log(). Ideed, for ay predictor ˆP M, we have [log X, X 2,..., X = P X (x) log (6) ˆP (x) x X x X P X (x) log P X (x) (7) = H(P X ), (8) where we used the o-egativity of the KL divergece, ad H(P X ) is the hao etropy. Takig P X = U, we have H(P X ) = log. Takig expectatios o both sides with respect to X, X 2,..., X, we kow [log log (9) for ay predictor ˆP. O the other had, takig ˆP U, we have [log which proves that the miimax risk is at most log. = log, (0) Theorem shows that solvig the miimax risk i predictio may lead to iappropriate aswers. Ideed, the miimax optimal solutios turs out to be a degeerate aswer that igores all the traiig data. What we show ext is that focusig o the miimax regret solves this problem i a meaigful way. 2.2 Achievig the fudametal limit: miimax regret As we argued i the proof of Theorem, for ay predictor ˆP = ˆP (X, X 2,..., X ), we have [log H(P X ). () 2
3 It motivates us to defie the miimax regret as follows: [ if sup log H(P X ). (2) ˆP P X M We have the followig algebraic maipulatios for ay predictor ˆP : [ [ [log H(P X ) = P X (x) log ˆP (x) P X (x) log P X (x) X, X 2,..., X x X x X [ = P X (x) log P X(x) ˆP (x) x X (3) (4) = D(P X ˆP ), (5) where D(P Q) = P (x) x X P (x) log Q(x) is the KL divergece betwee P ad Q. I other words, solvig the miimax regret of predictig a fresh ew idepedet radom variable X based o i.i.d. traiig samples X, X 2,..., X is equivalet to solvig the problem of estimatig the discrete distributio P X uder the KL divergece loss. The miimax regret is characterized by the followig theorem. Theorem 2. 2 if ˆP sup P X M [ log { ( + o()) H(P X ) = 2 log(e) if ( + o()) log( ) if (6) Moreover, if lim sup c (0, ), the miimax regret is bouded away from zero. The predictor ˆP that achieves the performace above i the regime of is: ˆP (x) = (x) + β((x)) + j= β(( x j)) for ay x X, (7) where (x) = (X i = x), (8) i= ad (X, X 2,..., X ) is the traiig data. Here 2 if k = 0 β(k) = if k = 3 4 o.w. (9) The predictor ˆP that achieves the performace above i the regime of is: ˆP (x) = (x) + log + log for ay x X. (20) Paiski, Liam. Variatioal Miimax Estimatio of Discrete Distributios uder KL Loss. I NIP, pp Braess, Dietrich, ad Thomas auer. Berstei polyomials ad learig theory. Joural of Approximatio Theory 28, o. 2 (2004):
4 Vaishig miimax regret implies that there exists a predictor ˆP such that its average predictio error [log o the test set approaches the fudametal limit H(P X ). Theorem 2 shows that its takes at least samples to achieve vaishig miimax regret. It ca be uderstood ituitively that oe eeds at least to see all the symbols at least oce to be able to costruct a predictor whose performace is able to approach the fudametal limit. The miimax regret defiitio reflects the traditioal way of uderstadig of the difficulty of machie learig tasks. I machie learig practice, we iteratively improve our traiig algorithm, ad use its predictio accuracy o the test set to measure the performace of our predictio algorithm. The best performace achieved by existig schemes o the test set is usually uderstood as the limit of predictio for a specific dataset. I this cotext, Theorem 2 ca be iterpreted i the way that with samples, there does ot exist ay predictio algorithm based o traiig samples whose performace o the test set ca approach the Bayes evelope i the worst case. As we show i the ext sectio, there exist algorithms that ca estimate the fudametal limit with samples without explicitly costructig a predictio algorithm. 3 Estimatig the fudametal limit We defie the problem of estimatig the fudametal limit as solvig the followig miimax problem: if Ĥ sup Ĥ H(P X), (2) P X M where the ifimum is take over all possible estimators Ĥ = Ĥ(X, X 2,..., X ) that are fuctios of the empirical traiig data. The materials i this sectio are maily take from. Jiao, Jiatao, Kartik Vekat, Yaju Ha, ad Tsachy Weissma. Miimax estimatio of fuctioals of discrete distributios. IEEE Trasactios o Iformatio Theory 6, o. 5 (205): Jiao, Jiatao, Kartik Vekat, Yaju Ha, ad Tsachy Weissma. Maximum likelihood estimatio of fuctioals of discrete distributios. arxiv preprit arxiv: (204). 3. The miimax rates We have the followig theorem. Theorem uppose if Ĥ log. The, sup Ĥ H(P X) P X M log l + l. (22) Theorem 3 shows that it suffices to take samples to cosistetly estimate the fudametal limit H(P X ). It is very surprisig that the umber of samples required is i fact subliear i : oe ca estimate the hao etropy uiformly over all P X M eve if oe has ot see most of the symbols i the alphabet X i the empirical samples. 3 Valiat, Gregory, ad Paul Valiat. Estimatig the usee: a /log ()-sample estimator for etropy ad support size, show optimal via ew CLTs. I Proceedigs of the forty-third aual ACM symposium o Theory of computig, pp ACM, Valiat, Gregory, ad Paul Valiat. The power of liear estimators. I Foudatios of Computer ciece (FOC), 20 IEEE 52d Aual ymposium o, pp IEEE, Wu, Yihog, ad Pegku Yag. Miimax rates of etropy estimatio o large alphabets via best polyomial approximatio. IEEE Trasactios o Iformatio Theory 62, o. 6 (206): Jiao, Jiatao, Kartik Vekat, Yaju Ha, ad Tsachy Weissma. Miimax estimatio of fuctioals of discrete distributios. IEEE Trasactios o Iformatio Theory 6, o. 5 (205):
5 3.2 Natural cadidate: the empirical etropy Oe of the most atural estimators for the hao etropy H(P X ) give i.i.d. samples is the empirical etropy, which is defied as the followig. Deote the empirical distributio by ˆP = (ˆp, ˆp 2,..., ˆp ), where ˆp i = i= (X i = i) is the empirical frequecy of symbol i i the traiig set. The empirical etropy is defied as H( ˆP ), which plugs-i the empirical distributio ito the hao etropy fuctioal. Ituitively, sice the hao etropy is a cotiuous fuctioal for fiite alphabet distributios, ad ˆP coverges to the true distributio P X as, the plug-i estimate H( ˆP ) should be a decet estimator for H(P X ) if is fixed ad. It is ideed true: it is oly i the high dimesios that the empirical etropy starts to behave poorly as a estimate for the hao etropy. We have the followig theorem quatifyig the performace of the empirical etropy i estimatig H(P X ). Theorem uppose. The, sup H( ˆP ) H(P X ) P X M + l. (23) Comparig Theorem 4 ad 3, it seems that the mai differece is that oe has improved the term to l i the miimax rate-optimal etropy estimator, while keepig the secod term uchaged. We ow ivestigate where the two terms come from, ad how oe may costruct the miimax rate-optimal estimators based o the empirical etropy. 3.3 Aalysis of the empirical etropy For ay estimator Ĥ, its performace i estimatig H(P X) ca be characterized via its bias defied as Ĥ H(P X ), ad the cocetratio of Ĥ aroud its expectatio Ĥ. The cocetratio property may be partially characterized by the variace of the estimator Ĥ, amely Var(Ĥ) = (Ĥ Ĥ)2. We ow argue that i Theorem 4, the term l comes from the bias, ad the term comes from the variace. Itroduce the cocave fuctio f(x) = x l( x ) o [0,. It is clear that We have the followig claim. Claim 5. If p i, the H( ˆP ) = f(ˆp i ). (24) i= 0 f(p i ) Ef(ˆp i ). (25) Moreover, Var(H( ˆP )) (l())2 2(l() + 3)2. (26) The results i Claim 5 are ispirig. It shows that the variace of the empirical etropy ca be uiversally bouded regardless of the support size. Moreover, the bias cotributed by each symbol will be liearly added up together, cotributig the term. It is clear that i the regime of fixed ad, the variace domiates, but i the high dimesios the bias domiates. Hece, the key to improvig the empirical etropy would be to reduce the bias i high dimesios without icurrig too much additioal variace. 7 Jiao, Jiatao, Kartik Vekat, Yaju Ha, ad Tsachy Weissma. Maximum likelihood estimatio of fuctioals of discrete distributios. arxiv preprit arxiv: (204). 8 Wu, Yihog, ad Pegku Yag. Miimax rates of etropy estimatio o large alphabets via best polyomial approximatio. IEEE Trasactios o Iformatio Theory 62, o. 6 (206):
6 3.4 How ca we improve the empirical etropy? It has bee a log jourey to fid the miimax rate-optimal estimators. Harris i 975 proposed expadig E p H( ˆP ) usig a Taylor expasio ad obtaied H( ˆP ) = H(P X ) ( ) + o( ). (27) p i 3 The Taylor expasio result looks decet i the regime where p i s are ot too small. Ideed, for very small p i the remaider term i= p i may be much larger tha the true etropy H(P X ) itself. This ituitio turs out to be correct: it suffices to do a first-order bias correctio usig Taylor series i the regime of ot too small p i. I geeral, for ˆp B(, p), we may write [f(ˆp ) = f(p) + ( ) 2 f p( p) (p) + O P 2, which motivates the bias correctio: i= ˆf c = f(ˆp ) 2 f (ˆp ) ˆp ( ˆp ). I the etropy estimatio case, we follow the bias correctio above ad do the followig 9 Costructio 6. If the true p i l, we use f(ˆp i) + 2 istead of f(ˆp i) to estimate f(p i ). Now the focus is o the small p i regime. We eed to uderstad precisely which term cotributed the bias boud. Assume for ow that all p i l. We have the followig maipulatios: H( ˆP ) H(P X ) = = f(ˆp i ) f(p i ) (28) i= (f(ˆp i ) P K ( ˆp i )) i= (f(p i ) P K (p i )) + where P K ( ) is a arbitrary polyomial with order o more tha K. The followig two observatios are crucial for the improvemets of empirical etropy. Claim 7. If p i l, we have ˆp i l with probability at least 4. i= Claim 8. uppose K l. The for ay costat c > 0, if sup P K poly K x [0, c l f(x) P K (x) Utilizig those two claims, ad coditioig o the evet that all ˆp i c l obtai that (f(ˆp i ) P K ( ˆp i )) l i= (f(p i ) P K (p i )) i= (P K ( ˆp i ) P K (p i )), (29) i= c l. (30), p i c l, we immediately (3) l, (32) 9 Note that this bias correctio ituitio does ot easily geeralize to higher order correctios. For a systematic approach to do higher order bias correctio with Taylor series, we refer the readers to Yaju Ha, Jiatao Jiao, Tsachy Weissma, Miimax Rate-Optimal Estimatio of Divergeces betwee Discrete Distributios, arxiv preprit arxiv: (206). 6
7 which implies that [ E P (P K ( ˆp i ) P K (p i )) i= sice we already kow i Claim 5 that H( ˆP ) H(P X ). Thus, we have idetified the reaso of the poor bias of the empirical etropy: it is because the plug-i approach i estimatig the polyomial P K icurs too much bias. Realizig this turs out to be the crucial factor that leads to the miimax rate-optimal estimator: uder the multiomial model there exists ubiased estimators for ay polyomial P K whose order is o more tha. Ideed, whe X B(, p), for ay iteger r {,..., }: [ X(X )... (X r + ) E = p r. ( )... ( r + ) We complete the costructio of the miimax rate-optimal estimator by doig the followig: Costructio 9. If the true p i l, we use the ubiased estimator of polyomial P K(p i ) to estimate f(p i ). Here P K ( ) is the best approximatio polyomial of f(p i ) over the iterval [ 0, c l itroduced i Claim 8. As for the last step, we eed to use the Cheroff boud to show the followig results o cofidece itervals i the biomial model: Claim 0. There exist c, c 2, c 3, c 4 positive real umbers such that: log() if ˆp i [0, c the p log() i [0, c 2 with probability at least. 4 log() if ˆp i [c 3, the p log() i [c 4, with probability at least. 4 There are other details eeded to make the whole proof work: for example, oe eeds to argue that this approach does ot icrease the variace by too much, ad also show miimax lower bouds. I practice oe may also remove the costat term i P K ( ) to esure that oe assigs zero to symbols that have ever appeared i the traiig data. Thus, we have costructed a miimax rate-optimal estimator that does ot require the kowledge of the support size, but behaves early as well as the exact miimax estimator with the kowledge of the support size. (33) 7
Three Approaches towards Optimal Property Estimation and Testing
Three Approaches towards Optimal Property Estimatio ad Testig Jiatao Jiao (taford EE) Joit work with: Yaju Ha, Dmitri Pavlichi, Kartik Vekat, Tsachy Weissma Frotiers i Distributio Testig Workshop, FOC
More informationLecture 17: Minimax estimation of high-dimensional functionals. 1 Estimating the fundamental limit is easier than achieving it: other loss functions
EE378A tatistical igal Processig Lecture 3-05/29/207 Lecture 7: Miimax estimatio of high-dimesioal fuctioals Lecturer: Jiatao Jiao cribe: Joatha Lacotte Estimatig the fudametal limit is easier tha achievig
More informationIntro to Learning Theory
Lecture 1, October 18, 2016 Itro to Learig Theory Ruth Urer 1 Machie Learig ad Learig Theory Comig soo 2 Formal Framework 21 Basic otios I our formal model for machie learig, the istaces to be classified
More informationEstimation for Complete Data
Estimatio for Complete Data complete data: there is o loss of iformatio durig study. complete idividual complete data= grouped data A complete idividual data is the oe i which the complete iformatio of
More information6.3 Testing Series With Positive Terms
6.3. TESTING SERIES WITH POSITIVE TERMS 307 6.3 Testig Series With Positive Terms 6.3. Review of what is kow up to ow I theory, testig a series a i for covergece amouts to fidig the i= sequece of partial
More informationInformation-based Feature Selection
Iformatio-based Feature Selectio Farza Faria, Abbas Kazeroui, Afshi Babveyh Email: {faria,abbask,afshib}@staford.edu 1 Itroductio Feature selectio is a topic of great iterest i applicatios dealig with
More informationEmpirical Process Theory and Oracle Inequalities
Stat 928: Statistical Learig Theory Lecture: 10 Empirical Process Theory ad Oracle Iequalities Istructor: Sham Kakade 1 Risk vs Risk See Lecture 0 for a discussio o termiology. 2 The Uio Boud / Boferoi
More information5.1 A mutual information bound based on metric entropy
Chapter 5 Global Fao Method I this chapter, we exted the techiques of Chapter 2.4 o Fao s method the local Fao method) to a more global costructio. I particular, we show that, rather tha costructig a local
More informationECE 901 Lecture 14: Maximum Likelihood Estimation and Complexity Regularization
ECE 90 Lecture 4: Maximum Likelihood Estimatio ad Complexity Regularizatio R Nowak 5/7/009 Review : Maximum Likelihood Estimatio We have iid observatios draw from a ukow distributio Y i iid p θ, i,, where
More informationLecture 7: October 18, 2017
Iformatio ad Codig Theory Autum 207 Lecturer: Madhur Tulsiai Lecture 7: October 8, 207 Biary hypothesis testig I this lecture, we apply the tools developed i the past few lectures to uderstad the problem
More informationLecture 10 October Minimaxity and least favorable prior sequences
STATS 300A: Theory of Statistics Fall 205 Lecture 0 October 22 Lecturer: Lester Mackey Scribe: Brya He, Rahul Makhijai Warig: These otes may cotai factual ad/or typographic errors. 0. Miimaxity ad least
More informationEntropy Rates and Asymptotic Equipartition
Chapter 29 Etropy Rates ad Asymptotic Equipartitio Sectio 29. itroduces the etropy rate the asymptotic etropy per time-step of a stochastic process ad shows that it is well-defied; ad similarly for iformatio,
More information1 Review and Overview
DRAFT a fial versio will be posted shortly CS229T/STATS231: Statistical Learig Theory Lecturer: Tegyu Ma Lecture #3 Scribe: Migda Qiao October 1, 2013 1 Review ad Overview I the first half of this course,
More informationCS434a/541a: Pattern Recognition Prof. Olga Veksler. Lecture 5
CS434a/54a: Patter Recogitio Prof. Olga Veksler Lecture 5 Today Itroductio to parameter estimatio Two methods for parameter estimatio Maimum Likelihood Estimatio Bayesia Estimatio Itroducto Bayesia Decisio
More informationConvergence of random variables. (telegram style notes) P.J.C. Spreij
Covergece of radom variables (telegram style otes).j.c. Spreij this versio: September 6, 2005 Itroductio As we kow, radom variables are by defiitio measurable fuctios o some uderlyig measurable space
More informationResampling Methods. X (1/2), i.e., Pr (X i m) = 1/2. We order the data: X (1) X (2) X (n). Define the sample median: ( n.
Jauary 1, 2019 Resamplig Methods Motivatio We have so may estimators with the property θ θ d N 0, σ 2 We ca also write θ a N θ, σ 2 /, where a meas approximately distributed as Oce we have a cosistet estimator
More informationMaximum Likelihood Estimation and Complexity Regularization
ECE90 Sprig 004 Statistical Regularizatio ad Learig Theory Lecture: 4 Maximum Likelihood Estimatio ad Complexity Regularizatio Lecturer: Rob Nowak Scribe: Pam Limpiti Review : Maximum Likelihood Estimatio
More informationJournal of Multivariate Analysis. Superefficient estimation of the marginals by exploiting knowledge on the copula
Joural of Multivariate Aalysis 102 (2011) 1315 1319 Cotets lists available at ScieceDirect Joural of Multivariate Aalysis joural homepage: www.elsevier.com/locate/jmva Superefficiet estimatio of the margials
More informationAn Introduction to Randomized Algorithms
A Itroductio to Radomized Algorithms The focus of this lecture is to study a radomized algorithm for quick sort, aalyze it usig probabilistic recurrece relatios, ad also provide more geeral tools for aalysis
More informationFrequentist Inference
Frequetist Iferece The topics of the ext three sectios are useful applicatios of the Cetral Limit Theorem. Without kowig aythig about the uderlyig distributio of a sequece of radom variables {X i }, for
More informationInformation Theory and Statistics Lecture 4: Lempel-Ziv code
Iformatio Theory ad Statistics Lecture 4: Lempel-Ziv code Łukasz Dębowski ldebowsk@ipipa.waw.pl Ph. D. Programme 203/204 Etropy rate is the limitig compressio rate Theorem For a statioary process (X i)
More informationExpectation and Variance of a random variable
Chapter 11 Expectatio ad Variace of a radom variable The aim of this lecture is to defie ad itroduce mathematical Expectatio ad variace of a fuctio of discrete & cotiuous radom variables ad the distributio
More information1 Introduction to reducing variance in Monte Carlo simulations
Copyright c 010 by Karl Sigma 1 Itroductio to reducig variace i Mote Carlo simulatios 11 Review of cofidece itervals for estimatig a mea I statistics, we estimate a ukow mea µ = E(X) of a distributio by
More informationECE 8527: Introduction to Machine Learning and Pattern Recognition Midterm # 1. Vaishali Amin Fall, 2015
ECE 8527: Itroductio to Machie Learig ad Patter Recogitio Midterm # 1 Vaishali Ami Fall, 2015 tue39624@temple.edu Problem No. 1: Cosider a two-class discrete distributio problem: ω 1 :{[0,0], [2,0], [2,2],
More informationEECS564 Estimation, Filtering, and Detection Hwk 2 Solns. Winter p θ (z) = (2θz + 1 θ), 0 z 1
EECS564 Estimatio, Filterig, ad Detectio Hwk 2 Sols. Witer 25 4. Let Z be a sigle observatio havig desity fuctio where. p (z) = (2z + ), z (a) Assumig that is a oradom parameter, fid ad plot the maximum
More informationMAT1026 Calculus II Basic Convergence Tests for Series
MAT026 Calculus II Basic Covergece Tests for Series Egi MERMUT 202.03.08 Dokuz Eylül Uiversity Faculty of Sciece Departmet of Mathematics İzmir/TURKEY Cotets Mootoe Covergece Theorem 2 2 Series of Real
More informationA survey on penalized empirical risk minimization Sara A. van de Geer
A survey o pealized empirical risk miimizatio Sara A. va de Geer We address the questio how to choose the pealty i empirical risk miimizatio. Roughly speakig, this pealty should be a good boud for the
More informationIntroductory statistics
CM9S: Machie Learig for Bioiformatics Lecture - 03/3/06 Itroductory statistics Lecturer: Sriram Sakararama Scribe: Sriram Sakararama We will provide a overview of statistical iferece focussig o the key
More informationLecture 3: August 31
36-705: Itermediate Statistics Fall 018 Lecturer: Siva Balakrisha Lecture 3: August 31 This lecture will be mostly a summary of other useful expoetial tail bouds We will ot prove ay of these i lecture,
More information4. Partial Sums and the Central Limit Theorem
1 of 10 7/16/2009 6:05 AM Virtual Laboratories > 6. Radom Samples > 1 2 3 4 5 6 7 4. Partial Sums ad the Cetral Limit Theorem The cetral limit theorem ad the law of large umbers are the two fudametal theorems
More informationREGRESSION WITH QUADRATIC LOSS
REGRESSION WITH QUADRATIC LOSS MAXIM RAGINSKY Regressio with quadratic loss is aother basic problem studied i statistical learig theory. We have a radom couple Z = X, Y ), where, as before, X is a R d
More information7.1 Convergence of sequences of random variables
Chapter 7 Limit Theorems Throughout this sectio we will assume a probability space (, F, P), i which is defied a ifiite sequece of radom variables (X ) ad a radom variable X. The fact that for every ifiite
More informationRandom Variables, Sampling and Estimation
Chapter 1 Radom Variables, Samplig ad Estimatio 1.1 Itroductio This chapter will cover the most importat basic statistical theory you eed i order to uderstad the ecoometric material that will be comig
More informationDiscrete Mathematics for CS Spring 2008 David Wagner Note 22
CS 70 Discrete Mathematics for CS Sprig 2008 David Wager Note 22 I.I.D. Radom Variables Estimatig the bias of a coi Questio: We wat to estimate the proportio p of Democrats i the US populatio, by takig
More informationIt is always the case that unions, intersections, complements, and set differences are preserved by the inverse image of a function.
MATH 532 Measurable Fuctios Dr. Neal, WKU Throughout, let ( X, F, µ) be a measure space ad let (!, F, P ) deote the special case of a probability space. We shall ow begi to study real-valued fuctios defied
More informationMachine Learning Brett Bernstein
Machie Learig Brett Berstei Week 2 Lecture: Cocept Check Exercises Starred problems are optioal. Excess Risk Decompositio 1. Let X = Y = {1, 2,..., 10}, A = {1,..., 10, 11} ad suppose the data distributio
More information10-701/ Machine Learning Mid-term Exam Solution
0-70/5-78 Machie Learig Mid-term Exam Solutio Your Name: Your Adrew ID: True or False (Give oe setece explaatio) (20%). (F) For a cotiuous radom variable x ad its probability distributio fuctio p(x), it
More informationEXAMINATIONS OF THE ROYAL STATISTICAL SOCIETY
EXAMINATIONS OF THE ROYAL STATISTICAL SOCIETY GRADUATE DIPLOMA, 016 MODULE : Statistical Iferece Time allowed: Three hours Cadidates should aswer FIVE questios. All questios carry equal marks. The umber
More informationLecture 10: Universal coding and prediction
0-704: Iformatio Processig ad Learig Sprig 0 Lecture 0: Uiversal codig ad predictio Lecturer: Aarti Sigh Scribes: Georg M. Goerg Disclaimer: These otes have ot bee subjected to the usual scrutiy reserved
More informationProblem Set 4 Due Oct, 12
EE226: Radom Processes i Systems Lecturer: Jea C. Walrad Problem Set 4 Due Oct, 12 Fall 06 GSI: Assae Gueye This problem set essetially reviews detectio theory ad hypothesis testig ad some basic otios
More informationINFINITE SEQUENCES AND SERIES
11 INFINITE SEQUENCES AND SERIES INFINITE SEQUENCES AND SERIES 11.4 The Compariso Tests I this sectio, we will lear: How to fid the value of a series by comparig it with a kow series. COMPARISON TESTS
More informationMachine Learning Brett Bernstein
Machie Learig Brett Berstei Week Lecture: Cocept Check Exercises Starred problems are optioal. Statistical Learig Theory. Suppose A = Y = R ad X is some other set. Furthermore, assume P X Y is a discrete
More information1 Approximating Integrals using Taylor Polynomials
Seughee Ye Ma 8: Week 7 Nov Week 7 Summary This week, we will lear how we ca approximate itegrals usig Taylor series ad umerical methods. Topics Page Approximatig Itegrals usig Taylor Polyomials. Defiitios................................................
More informationMathematical Statistics - MS
Paper Specific Istructios. The examiatio is of hours duratio. There are a total of 60 questios carryig 00 marks. The etire paper is divided ito three sectios, A, B ad C. All sectios are compulsory. Questios
More informationMachine Learning Theory (CS 6783)
Machie Learig Theory (CS 6783) Lecture 2 : Learig Frameworks, Examples Settig up learig problems. X : istace space or iput space Examples: Computer Visio: Raw M N image vectorized X = 0, 255 M N, SIFT
More informationECE 901 Lecture 12: Complexity Regularization and the Squared Loss
ECE 90 Lecture : Complexity Regularizatio ad the Squared Loss R. Nowak 5/7/009 I the previous lectures we made use of the Cheroff/Hoeffdig bouds for our aalysis of classifier errors. Hoeffdig s iequality
More informationNotes for Lecture 11
U.C. Berkeley CS78: Computatioal Complexity Hadout N Professor Luca Trevisa 3/4/008 Notes for Lecture Eigevalues, Expasio, ad Radom Walks As usual by ow, let G = (V, E) be a udirected d-regular graph with
More informationFall 2013 MTH431/531 Real analysis Section Notes
Fall 013 MTH431/531 Real aalysis Sectio 8.1-8. Notes Yi Su 013.11.1 1. Defiitio of uiform covergece. We look at a sequece of fuctios f (x) ad study the coverget property. Notice we have two parameters
More informationHypothesis Testing. Evaluation of Performance of Learned h. Issues. Trade-off Between Bias and Variance
Hypothesis Testig Empirically evaluatig accuracy of hypotheses: importat activity i ML. Three questios: Give observed accuracy over a sample set, how well does this estimate apply over additioal samples?
More informationElement sampling: Part 2
Chapter 4 Elemet samplig: Part 2 4.1 Itroductio We ow cosider uequal probability samplig desigs which is very popular i practice. I the uequal probability samplig, we ca improve the efficiecy of the resultig
More informationMonte Carlo Integration
Mote Carlo Itegratio I these otes we first review basic umerical itegratio methods (usig Riema approximatio ad the trapezoidal rule) ad their limitatios for evaluatig multidimesioal itegrals. Next we itroduce
More informationSieve Estimators: Consistency and Rates of Convergence
EECS 598: Statistical Learig Theory, Witer 2014 Topic 6 Sieve Estimators: Cosistecy ad Rates of Covergece Lecturer: Clayto Scott Scribe: Julia Katz-Samuels, Brado Oselio, Pi-Yu Che Disclaimer: These otes
More informationEcon 325/327 Notes on Sample Mean, Sample Proportion, Central Limit Theorem, Chi-square Distribution, Student s t distribution 1.
Eco 325/327 Notes o Sample Mea, Sample Proportio, Cetral Limit Theorem, Chi-square Distributio, Studet s t distributio 1 Sample Mea By Hiro Kasahara We cosider a radom sample from a populatio. Defiitio
More informationLecture 6: Integration and the Mean Value Theorem. slope =
Math 8 Istructor: Padraic Bartlett Lecture 6: Itegratio ad the Mea Value Theorem Week 6 Caltech 202 The Mea Value Theorem The Mea Value Theorem abbreviated MVT is the followig result: Theorem. Suppose
More informationMinimax Estimation of Functionals of Discrete Distributions
Miimax Estimatio of Fuctioals of Discrete Distributios Jiatao Jiao, tudet Member, IEEE, Kartik Vekat, tudet Member, IEEE, Yaju Ha, tudet Member, IEEE, ad Tsachy Weissma, Fellow, IEEE arxiv:406.6956v5 [cs.it]
More informationChapter 6 Principles of Data Reduction
Chapter 6 for BST 695: Special Topics i Statistical Theory. Kui Zhag, 0 Chapter 6 Priciples of Data Reductio Sectio 6. Itroductio Goal: To summarize or reduce the data X, X,, X to get iformatio about a
More informationLecture 15: Strong, Conditional, & Joint Typicality
EE376A/STATS376A Iformatio Theory Lecture 15-02/27/2018 Lecture 15: Strog, Coditioal, & Joit Typicality Lecturer: Tsachy Weissma Scribe: Nimit Sohoi, William McCloskey, Halwest Mohammad I this lecture,
More informationStat410 Probability and Statistics II (F16)
Some Basic Cocepts of Statistical Iferece (Sec 5.) Suppose we have a rv X that has a pdf/pmf deoted by f(x; θ) or p(x; θ), where θ is called the parameter. I previous lectures, we focus o probability problems
More informationRegression with quadratic loss
Regressio with quadratic loss Maxim Ragisky October 13, 2015 Regressio with quadratic loss is aother basic problem studied i statistical learig theory. We have a radom couple Z = X,Y, where, as before,
More information6. Sufficient, Complete, and Ancillary Statistics
Sufficiet, Complete ad Acillary Statistics http://www.math.uah.edu/stat/poit/sufficiet.xhtml 1 of 7 7/16/2009 6:13 AM Virtual Laboratories > 7. Poit Estimatio > 1 2 3 4 5 6 6. Sufficiet, Complete, ad Acillary
More informationSimulation. Two Rule For Inverting A Distribution Function
Simulatio Two Rule For Ivertig A Distributio Fuctio Rule 1. If F(x) = u is costat o a iterval [x 1, x 2 ), the the uiform value u is mapped oto x 2 through the iversio process. Rule 2. If there is a jump
More information1 Inferential Methods for Correlation and Regression Analysis
1 Iferetial Methods for Correlatio ad Regressio Aalysis I the chapter o Correlatio ad Regressio Aalysis tools for describig bivariate cotiuous data were itroduced. The sample Pearso Correlatio Coefficiet
More information4.1 Data processing inequality
ECE598: Iformatio-theoretic methods i high-dimesioal statistics Sprig 206 Lecture 4: Total variatio/iequalities betwee f-divergeces Lecturer: Yihog Wu Scribe: Matthew Tsao, Feb 8, 206 [Ed. Mar 22] Recall
More informationMASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.436J/15.085J Fall 2008 Lecture 19 11/17/2008 LAWS OF LARGE NUMBERS II THE STRONG LAW OF LARGE NUMBERS
MASSACHUSTTS INSTITUT OF TCHNOLOGY 6.436J/5.085J Fall 2008 Lecture 9 /7/2008 LAWS OF LARG NUMBRS II Cotets. The strog law of large umbers 2. The Cheroff boud TH STRONG LAW OF LARG NUMBRS While the weak
More informationVariance of Discrete Random Variables Class 5, Jeremy Orloff and Jonathan Bloom
Variace of Discrete Radom Variables Class 5, 18.05 Jeremy Orloff ad Joatha Bloom 1 Learig Goals 1. Be able to compute the variace ad stadard deviatio of a radom variable.. Uderstad that stadard deviatio
More information7.1 Convergence of sequences of random variables
Chapter 7 Limit theorems Throughout this sectio we will assume a probability space (Ω, F, P), i which is defied a ifiite sequece of radom variables (X ) ad a radom variable X. The fact that for every ifiite
More informationLecture 12: September 27
36-705: Itermediate Statistics Fall 207 Lecturer: Siva Balakrisha Lecture 2: September 27 Today we will discuss sufficiecy i more detail ad the begi to discuss some geeral strategies for costructig estimators.
More informationExponential Families and Bayesian Inference
Computer Visio Expoetial Families ad Bayesia Iferece Lecture Expoetial Families A expoetial family of distributios is a d-parameter family f(x; havig the followig form: f(x; = h(xe g(t T (x B(, (. where
More informationLecture 11 October 27
STATS 300A: Theory of Statistics Fall 205 Lecture October 27 Lecturer: Lester Mackey Scribe: Viswajith Veugopal, Vivek Bagaria, Steve Yadlowsky Warig: These otes may cotai factual ad/or typographic errors..
More informationDirection: This test is worth 250 points. You are required to complete this test within 50 minutes.
Term Test October 3, 003 Name Math 56 Studet Number Directio: This test is worth 50 poits. You are required to complete this test withi 50 miutes. I order to receive full credit, aswer each problem completely
More informationOutput Analysis and Run-Length Control
IEOR E4703: Mote Carlo Simulatio Columbia Uiversity c 2017 by Marti Haugh Output Aalysis ad Ru-Legth Cotrol I these otes we describe how the Cetral Limit Theorem ca be used to costruct approximate (1 α%
More informationChapter 8: STATISTICAL INTERVALS FOR A SINGLE SAMPLE. Part 3: Summary of CI for µ Confidence Interval for a Population Proportion p
Chapter 8: STATISTICAL INTERVALS FOR A SINGLE SAMPLE Part 3: Summary of CI for µ Cofidece Iterval for a Populatio Proportio p Sectio 8-4 Summary for creatig a 100(1-α)% CI for µ: Whe σ 2 is kow ad paret
More informationLocal moment matching: A unified methodology for symmetric functional estimation and distribution estimation under Wasserstein distance
Proceedigs of Machie Learig Research vol 75:1 33, 2018 31st Aual Coferece o Learig Theory Local momet matchig: A uified methodology for symmetric fuctioal estimatio ad distributio estimatio uder Wasserstei
More informationIt is often useful to approximate complicated functions using simpler ones. We consider the task of approximating a function by a polynomial.
Taylor Polyomials ad Taylor Series It is ofte useful to approximate complicated fuctios usig simpler oes We cosider the task of approximatig a fuctio by a polyomial If f is at least -times differetiable
More informationMaximum Likelihood Estimation of Functionals of Discrete Distributions
Maximum Likelihood Estimatio of Fuctioals of Discrete Distributios Jiatao Jiao, Studet Member, IEEE, Kartik Vekat, Studet Member, IEEE, Yaju Ha, Studet Member, IEEE, ad Tsachy Weissma, Fellow, IEEE arxiv:406.6959v7
More informationMATH 320: Probability and Statistics 9. Estimation and Testing of Parameters. Readings: Pruim, Chapter 4
MATH 30: Probability ad Statistics 9. Estimatio ad Testig of Parameters Estimatio ad Testig of Parameters We have bee dealig situatios i which we have full kowledge of the distributio of a radom variable.
More informationEcon 325 Notes on Point Estimator and Confidence Interval 1 By Hiro Kasahara
Poit Estimator Eco 325 Notes o Poit Estimator ad Cofidece Iterval 1 By Hiro Kasahara Parameter, Estimator, ad Estimate The ormal probability desity fuctio is fully characterized by two costats: populatio
More informationDiscrete Mathematics for CS Spring 2007 Luca Trevisan Lecture 22
CS 70 Discrete Mathematics for CS Sprig 2007 Luca Trevisa Lecture 22 Aother Importat Distributio The Geometric Distributio Questio: A biased coi with Heads probability p is tossed repeatedly util the first
More informationLecture Chapter 6: Convergence of Random Sequences
ECE5: Aalysis of Radom Sigals Fall 6 Lecture Chapter 6: Covergece of Radom Sequeces Dr Salim El Rouayheb Scribe: Abhay Ashutosh Doel, Qibo Zhag, Peiwe Tia, Pegzhe Wag, Lu Liu Radom sequece Defiitio A ifiite
More informationLecture 19: Convergence
Lecture 19: Covergece Asymptotic approach I statistical aalysis or iferece, a key to the success of fidig a good procedure is beig able to fid some momets ad/or distributios of various statistics. I may
More informationLinear regression. Daniel Hsu (COMS 4771) (y i x T i β)2 2πσ. 2 2σ 2. 1 n. (x T i β y i ) 2. 1 ˆβ arg min. β R n d
Liear regressio Daiel Hsu (COMS 477) Maximum likelihood estimatio Oe of the simplest liear regressio models is the followig: (X, Y ),..., (X, Y ), (X, Y ) are iid radom pairs takig values i R d R, ad Y
More informationLecture 3. Properties of Summary Statistics: Sampling Distribution
Lecture 3 Properties of Summary Statistics: Samplig Distributio Mai Theme How ca we use math to justify that our umerical summaries from the sample are good summaries of the populatio? Lecture Summary
More informationStat 421-SP2012 Interval Estimation Section
Stat 41-SP01 Iterval Estimatio Sectio 11.1-11. We ow uderstad (Chapter 10) how to fid poit estimators of a ukow parameter. o However, a poit estimate does ot provide ay iformatio about the ucertaity (possible
More informationCS284A: Representations and Algorithms in Molecular Biology
CS284A: Represetatios ad Algorithms i Molecular Biology Scribe Notes o Lectures 3 & 4: Motif Discovery via Eumeratio & Motif Represetatio Usig Positio Weight Matrix Joshua Gervi Based o presetatios by
More informationMath 61CM - Solutions to homework 3
Math 6CM - Solutios to homework 3 Cédric De Groote October 2 th, 208 Problem : Let F be a field, m 0 a fixed oegative iteger ad let V = {a 0 + a x + + a m x m a 0,, a m F} be the vector space cosistig
More informationSolutions: Homework 3
Solutios: Homework 3 Suppose that the radom variables Y,...,Y satisfy Y i = x i + " i : i =,..., IID where x,...,x R are fixed values ad ",...," Normal(0, )with R + kow. Fid ˆ = MLE( ). IND Solutio: Observe
More informationMath 113 Exam 3 Practice
Math Exam Practice Exam 4 will cover.-., 0. ad 0.. Note that eve though. was tested i exam, questios from that sectios may also be o this exam. For practice problems o., refer to the last review. This
More informationMASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.265/15.070J Fall 2013 Lecture 2 9/9/2013. Large Deviations for i.i.d. Random Variables
MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.265/15.070J Fall 2013 Lecture 2 9/9/2013 Large Deviatios for i.i.d. Radom Variables Cotet. Cheroff boud usig expoetial momet geeratig fuctios. Properties of a momet
More informationMachine Learning Theory Tübingen University, WS 2016/2017 Lecture 11
Machie Learig Theory Tübige Uiversity, WS 06/07 Lecture Tolstikhi Ilya Abstract We will itroduce the otio of reproducig kerels ad associated Reproducig Kerel Hilbert Spaces (RKHS). We will cosider couple
More informationINFINITE SEQUENCES AND SERIES
INFINITE SEQUENCES AND SERIES INFINITE SEQUENCES AND SERIES I geeral, it is difficult to fid the exact sum of a series. We were able to accomplish this for geometric series ad the series /[(+)]. This is
More informationLecture 6 Chi Square Distribution (χ 2 ) and Least Squares Fitting
Lecture 6 Chi Square Distributio (χ ) ad Least Squares Fittig Chi Square Distributio (χ ) Suppose: We have a set of measuremets {x 1, x, x }. We kow the true value of each x i (x t1, x t, x t ). We would
More informationChapter 3. Strong convergence. 3.1 Definition of almost sure convergence
Chapter 3 Strog covergece As poited out i the Chapter 2, there are multiple ways to defie the otio of covergece of a sequece of radom variables. That chapter defied covergece i probability, covergece i
More informationSince X n /n P p, we know that X n (n. Xn (n X n ) Using the asymptotic result above to obtain an approximation for fixed n, we obtain
Assigmet 9 Exercise 5.5 Let X biomial, p, where p 0, 1 is ukow. Obtai cofidece itervals for p i two differet ways: a Sice X / p d N0, p1 p], the variace of the limitig distributio depeds oly o p. Use the
More informationDS 100: Principles and Techniques of Data Science Date: April 13, Discussion #10
DS 00: Priciples ad Techiques of Data Sciece Date: April 3, 208 Name: Hypothesis Testig Discussio #0. Defie these terms below as they relate to hypothesis testig. a) Data Geeratio Model: Solutio: A set
More informationMachine Learning Theory Tübingen University, WS 2016/2017 Lecture 12
Machie Learig Theory Tübige Uiversity, WS 06/07 Lecture Tolstikhi Ilya Abstract I this lecture we derive risk bouds for kerel methods. We will start by showig that Soft Margi kerel SVM correspods to miimizig
More information6.883: Online Methods in Machine Learning Alexander Rakhlin
6.883: Olie Methods i Machie Learig Alexader Rakhli LECTURES 5 AND 6. THE EXPERTS SETTING. EXPONENTIAL WEIGHTS All the algorithms preseted so far halluciate the future values as radom draws ad the perform
More information1 of 7 7/16/2009 6:06 AM Virtual Laboratories > 6. Radom Samples > 1 2 3 4 5 6 7 6. Order Statistics Defiitios Suppose agai that we have a basic radom experimet, ad that X is a real-valued radom variable
More informationLecture Notes 15 Hypothesis Testing (Chapter 10)
1 Itroductio Lecture Notes 15 Hypothesis Testig Chapter 10) Let X 1,..., X p θ x). Suppose we we wat to kow if θ = θ 0 or ot, where θ 0 is a specific value of θ. For example, if we are flippig a coi, we
More informationBasics of Probability Theory (for Theory of Computation courses)
Basics of Probability Theory (for Theory of Computatio courses) Oded Goldreich Departmet of Computer Sciece Weizma Istitute of Sciece Rehovot, Israel. oded.goldreich@weizma.ac.il November 24, 2008 Preface.
More information