Lecture 17: Minimax estimation of high-dimensional functionals. 1 Estimating the fundamental limit is easier than achieving it: other loss functions
|
|
- Ashlee Sanders
- 5 years ago
- Views:
Transcription
1 EE378A tatistical igal Processig Lecture 3-05/29/207 Lecture 7: Miimax estimatio of high-dimesioal fuctioals Lecturer: Jiatao Jiao cribe: Joatha Lacotte Estimatig the fudametal limit is easier tha achievig it: other loss fuctios We emphasize that it is a geeral pheomeo that estimatig the fudametal limit is easier tha achievig it. Recall the defiitio of the two problems of achievig ad estimatig the fudametal limit: Achievig the fudametal limit: Estimatig the fudametal limit: if ˆX(X,...,X ) P X P if Û(X,...,X ) P X P E P Λ(X, ˆX) U(P X )] E P Û U(P X) ] Here we observe i.i.d. samples X, X 2,..., X X with distributio P X P, where P is a collectio of probability distributio o X. The U(P X ) = arg miˆx E P Λ(X, ˆx)] is the Bayes evelope. I last lecture, we show that i the case of Λ(X, ˆx) = Λ(X, ˆP ) = log ˆP ad P = M (X), it takes samples to achieve the fudametal limit, ad l samples to estimate the fudametal limit. I this lecture we show that similar pheomeo happes for aother widely used loss fuctio: the Hammig loss. uppose X = (Y, Z) where Y Y, Y = ad Z {0, }. Oe may iterpret Y as the feature, ad Z as the label while castig it as a biary classificatio problem. Cosider the Hammig loss: Λ(x, t) = {t(y ) Z}. The Bayes evelope i this case is give by where η(y) = PZ = Y = y]. U(P X ) = Emi(η(Y ), η(y ))], Theorem. Cosider P = {P X P Z (0) = 2 }. The, it requires samples to achieve U(P X) ad log() samples to estimate U(P X). 2 Boostig the Chow-Liu algorithm with improved mutual iformatio estimates Graphical models provide us with efficiet computatioal tools to coduct iferece i high dimesioal data with potetial structure, cf. ] ad refereces therei. Learig the structure ad parameters of graphical models from empirical data is therefore the startig poit for all these applicatios. It has bee kow that exact learig of a geeral graphical model is NP-hard 2], ad there exist tractable sub-classes amog which tree graphical models are the most famous. The semial work of Chow ad Liu 3] cotributed a efficiet algorithm to compute the Maximum Likelihood Estimator (MLE) of tree structured graphical model based o empirical data, ad costitutes oe of the very few cases where the exact MLE ca be solved efficietly. There are various approaches towards learig more complex structures, for which we refer the reader to 4] for a review.
2 Cocretely, the Chow Liu algorithm (CL) addresses the followig questio. Give i.i.d. samples of a radom vector X = (X, X 2,..., X d ), where X i X, X <, we wat to estimate the joit distributio of X. Chow ad Liu 3] assumed that P X ca be factorized as: P X = d P Xmi X mj(i), 0 j(i) < i, () i= where (m, m 2,..., m d ) is a ukow permutatio of itegers, 2,..., d, ad P Xi X 0 is by defiitio equal to P Xi. The, CL outputs the distributio P X that maximizes the likelihood of the observed data. Iterestigly, this optimizatio problem ca be efficietly solved after beig trasformed ito a Maximum Weight paig Tree (MWT) problem, which ca be solved usig the Prim or Kruskal algorithm. I particular, they showed that the MLE of the tree structure boils dow to the followig expressio: E ML = arg max I( ˆP e ), (2) E Q :Q is a tree e E Q where I( ˆP e ) is the mutual iformatio associated with the empirical distributio of the two odes coected via edge e, ad E Q is the set of edges of a tree distributio Q (i.e., Q factors as a tree). I words, it suffices to first compute the empirical mutual iformatio betwee ay two odes (i total ( d 2) pairs), ad the maximum weight spaig tree is the tree structure that maximizes the likelihood. To obtai estimates of distributios o each edge, Chow ad Liu 3] simply assiged the empirical distributio while pickig a arbitrary ode as the tree root. We begi by askig the followig atural questio: Questio 2. Is the Chow Liu algorithm optimal for learig tree graphical models? ice the Chow Liu algorithm exactly solves the MLE, ad has bee widely used i may applicatios, its optimality seems to be tacitly assumed i much of the literature. However, a closer ispectio of the statistical theory 5, 6] reveals that it is oly kow that the Chow Liu algorithm performs essetially optimally whe the umber of samples grows to, while the umber of states of the tree has fixed size. Ideed, the moder theory of the maximum likelihood estimatio paradigm 7] oly justifies the asymptotic efficiecy of MLE, without geeral o-asymptotic guaratees whe we have fiitely may samples. I cotrast, various moder data-aalytic applicatios deal with datasets that do ot have the luxury of too may observatios compared to the alphabet size. To explai the isights uderlyig our improved algorithm, we revisit equatio (2) ad ote that if we were to replace the empirical mutual iformatio with the true mutual iformatio, the output of the MWT would be the true edges of the tree. I light of this, the CL algorithm ca be viewed as a plug-i estimator that replaces the true mutual iformatio with a estimate of it, amely the empirical mutual iformatio. Naturally the, it is to be expected that a better estimate of the mutual iformatio would lead to smaller probability of error i idetifyig the tree. However, how bad ca the empirical mutual iformatio be as a estimate for the true mutual iformatio? The followig theorem i 8] implies that it ca be highly sub-optimal i high dimesioal regimes. Theorem 3. uppose we have two radom variables X, X 2 X, X <. The miimax sample complexity i estimatig the mutual iformatio I(X ; X 2 ) uder mea squared error is Θ( X 2 / l X ), while the sample complexity required by the empirical mutual iformatio to be cosistet is Θ( X 2 ). I words, Theorem 3 implies that for the miimax rate-optimal estimator, it suffices to take X 2 / l X samples to cosistetly estimate the mutual iformatio I(X ; X 2 ) for ay uderlyig distributios. At the same time, uless X 2, there exist distributios for which the error of the empirical mutual iformatio would be bouded away from zero. Furthermore, whe X is large, the approach of estimatig the three etropy terms i I(X ; X 2 ) = H(X ) + H(X 2 ) H(X, X 2 ) (3) 2
3 separately usig the miimax rate-optimal etropy estimators is miimax rate-optimal for estimatig the mutual iformatio. Now we apply the improved mutual iformatio estimates to improve the Chow Liu algorithm. I the followig experimet, we fix d = 7, X = 300, costruct a star tree (i.e. all radom variables are coditioally idepedet give X ), ad geerate a radom joit distributio by assigig idepedet Beta(/2, /2)- distributed radom variables to each etry of the margial distributio P X ad the trasitio probabilities P Xk X, 2 k d (with ormalizatio). The, we icrease the sample size from 0 3 to , ad for each we coduct 20 Mote Carlo simulatios. Note that the true tree has d = 6 edges, ad ay estimated set of edges will have at least oe overlap with these 6 edges because the true tree is a star graph. We defie the wrog-edges-ratio i this case as the umber of edges differet from the true set of edges divided by d 2 = 5. Thus, if the wrog-edges-ratio equals oe, it meas that the estimated tree is maximally differet from the true tree ad, i the other extreme, a ratio of zero correspods to perfect recostructio. We compute the expected wrog-edges-ratio over 20 Mote Carlo simulatios for each, ad the results are exhibited i Figure Modified CL Origial CL Expected wrog edges ratio umber of samples x 0 4 Figure : The expected wrog-edges-ratio of our modifed algorithm ad the origial CL algorithm for sample sizes ragig from 0 3 to Figure reveals itriguig phase trasitios for both the modified ad the origial CL algorithm. Whe we have fewer tha samples, both algorithms yield a wrog-edges-ratio of, but soo after the sample size exceeds 6 0 3, the modified CL algorithm begis to recostruct the etwork perfectly, while the origial CL algorithm cotiues to fail maximally util the sample size exceeds , 8 times the sample size required by the ew algorithm, which we temporarily call Modified Chow Liu algorithm. Ope Questio 4. The exact sample complexity for learig tree graphical models is ope. Although it is 3
4 clear that usig improved mutual iformatio estimates improves the tree learig process, it is ot clear that it is the optimal way for tree learig. Oe may start from askig the followig easier questio: for ay joitly distributed radom variables (X, X 2, X 3 ), what is the sample complexity of testig I(X ; X 2 ) I(X ; X 3 ) ɛ 2 agaist I(X ; X 2 ) I(X ; X 3 ) ɛ? Here 0 < ɛ < ɛ 2 are fixed costats. 3 upport size estimatio Give a discrete probability distributio P, we wat to estimate its port size: (P ) = i {pi>0} uder the assumptio that mi i p i. Note that this assumptio immediately implies that the port size of P ca be at most. The questio is: how ca we desig the miimax rate-optimal estimator for estimatig the true port size? The materials i this sectio come from 9]. We demostrate i this sectio that usig the approximatio based methodology i last lecture, oe ca ituitively obtai the sample complexity i a systematic fashio. It was show i the last lecture that we ca distiguish the cases p i log ad p i log with overwhelmig probability. It motivates us to separate the problem ito two regimes:. The case of : i this case, the fuctioal we wat to estimate is a costat, ad it suffices to use the plug-i approach. The resultig bias for each p i is: log 2. The case of log E(ˆp i 0) pi>0] = ( p i ) e pi e. (4) : i this case, there might be some p i that falls i the iterval log others fall i the iterval ]., Those p i that fall ito ] suffices to look at the p i s that are i. Claim 5. uppose q (0, ). The, if 0, log P K poly K,P K (0)=0 x q,] Applyig Claim 5 to the scaled iterval ] for p i, log is P K (x) 0, log ], where q = 0, log ], while log, ] are hadled with the MLE, ad it ( ) q K + e K q. (5) q log, K log, we obtai that the bias e K q = e log log = e log, (6) which vaishes at log as l. Hece, we have (ituitively) show that the bias of the miimax optimal approach should be of order ( ) e Θ log. (7) 4
5 Refereces ] M. J. Waiwright ad M. I. Jorda, Graphical models, expoetial families, ad variatioal iferece, Foudatios ad Treds R i Machie Learig, vol., o. -2, pp. 305, ] D. Karger ad N. rebro, Learig Markov etworks: Maximum bouded tree-width graphs, i Proceedigs of the twelfth aual ACM-IAM symposium o Discrete algorithms. ociety for Idustrial ad Applied Mathematics, 200, pp ] C. Chow ad C. Liu, Approximatig discrete probability distributios with depedece trees, Iformatio Theory, IEEE Trasactios o, vol. 4, o. 3, pp , ] Y. Zhou, tructure learig of probabilistic graphical models: a comprehesive survey, arxiv preprit arxiv:.6925, 20. 5] C. Chow ad T. Wager, Cosistecy of a estimate of tree-depedet probability distributios (corresp.), Iformatio Theory, IEEE Trasactios o, vol. 9, o. 3, pp , ] V. Y. Ta, A. Aadkumar, L. Tog, ad A.. Willsky, A large-deviatio aalysis of the maximumlikelihood learig of Markov tree structures, Iformatio Theory, IEEE Trasactios o, vol. 57, o. 3, pp , 20. 7] E. L. Lehma ad G. Casella, Theory of poit estimatio. priger, 998, vol. 3. 8] J. Jiao, K. Vekat, Y. Ha, ad T. Weissma, Miimax estimatio of fuctioals of discrete distributios, IEEE Trasactios o Iformatio Theory, vol. 6, o. 5, pp , ] Y. Wu ad P. Yag, Chebyshev polyomials, momet matchig, ad optimal estimatio of the usee, arxiv preprit arxiv: ,
Lecture 16: Achieving and Estimating the Fundamental Limit
EE378A tatistical igal Processig Lecture 6-05/25/207 Lecture 6: Achievig ad Estimatig the Fudametal Limit Lecturer: Jiatao Jiao cribe: William Clary I this lecture, we formally defie the two distict problems
More informationInformation-based Feature Selection
Iformatio-based Feature Selectio Farza Faria, Abbas Kazeroui, Afshi Babveyh Email: {faria,abbask,afshib}@staford.edu 1 Itroductio Feature selectio is a topic of great iterest i applicatios dealig with
More informationThree Approaches towards Optimal Property Estimation and Testing
Three Approaches towards Optimal Property Estimatio ad Testig Jiatao Jiao (taford EE) Joit work with: Yaju Ha, Dmitri Pavlichi, Kartik Vekat, Tsachy Weissma Frotiers i Distributio Testig Workshop, FOC
More informationEECS564 Estimation, Filtering, and Detection Hwk 2 Solns. Winter p θ (z) = (2θz + 1 θ), 0 z 1
EECS564 Estimatio, Filterig, ad Detectio Hwk 2 Sols. Witer 25 4. Let Z be a sigle observatio havig desity fuctio where. p (z) = (2z + ), z (a) Assumig that is a oradom parameter, fid ad plot the maximum
More informationMonte Carlo Integration
Mote Carlo Itegratio I these otes we first review basic umerical itegratio methods (usig Riema approximatio ad the trapezoidal rule) ad their limitatios for evaluatig multidimesioal itegrals. Next we itroduce
More informationLet us give one more example of MLE. Example 3. The uniform distribution U[0, θ] on the interval [0, θ] has p.d.f.
Lecture 5 Let us give oe more example of MLE. Example 3. The uiform distributio U[0, ] o the iterval [0, ] has p.d.f. { 1 f(x =, 0 x, 0, otherwise The likelihood fuctio ϕ( = f(x i = 1 I(X 1,..., X [0,
More informationResampling Methods. X (1/2), i.e., Pr (X i m) = 1/2. We order the data: X (1) X (2) X (n). Define the sample median: ( n.
Jauary 1, 2019 Resamplig Methods Motivatio We have so may estimators with the property θ θ d N 0, σ 2 We ca also write θ a N θ, σ 2 /, where a meas approximately distributed as Oce we have a cosistet estimator
More informationLecture 11: Decision Trees
ECE9 Sprig 7 Statistical Learig Theory Istructor: R. Nowak Lecture : Decisio Trees Miimum Complexity Pealized Fuctio Recall the basic results of the last lectures: let X ad Y deote the iput ad output spaces
More informationMATH 320: Probability and Statistics 9. Estimation and Testing of Parameters. Readings: Pruim, Chapter 4
MATH 30: Probability ad Statistics 9. Estimatio ad Testig of Parameters Estimatio ad Testig of Parameters We have bee dealig situatios i which we have full kowledge of the distributio of a radom variable.
More informationA quick activity - Central Limit Theorem and Proportions. Lecture 21: Testing Proportions. Results from the GSS. Statistics and the General Population
A quick activity - Cetral Limit Theorem ad Proportios Lecture 21: Testig Proportios Statistics 10 Coli Rudel Flip a coi 30 times this is goig to get loud! Record the umber of heads you obtaied ad calculate
More informationLecture 33: Bootstrap
Lecture 33: ootstrap Motivatio To evaluate ad compare differet estimators, we eed cosistet estimators of variaces or asymptotic variaces of estimators. This is also importat for hypothesis testig ad cofidece
More informationLecture 14: Graph Entropy
15-859: Iformatio Theory ad Applicatios i TCS Sprig 2013 Lecture 14: Graph Etropy March 19, 2013 Lecturer: Mahdi Cheraghchi Scribe: Euiwoog Lee 1 Recap Bergma s boud o the permaet Shearer s Lemma Number
More informationFirst Year Quantitative Comp Exam Spring, Part I - 203A. f X (x) = 0 otherwise
First Year Quatitative Comp Exam Sprig, 2012 Istructio: There are three parts. Aswer every questio i every part. Questio I-1 Part I - 203A A radom variable X is distributed with the margial desity: >
More information10-701/ Machine Learning Mid-term Exam Solution
0-70/5-78 Machie Learig Mid-term Exam Solutio Your Name: Your Adrew ID: True or False (Give oe setece explaatio) (20%). (F) For a cotiuous radom variable x ad its probability distributio fuctio p(x), it
More informationExponential Families and Bayesian Inference
Computer Visio Expoetial Families ad Bayesia Iferece Lecture Expoetial Families A expoetial family of distributios is a d-parameter family f(x; havig the followig form: f(x; = h(xe g(t T (x B(, (. where
More informationMachine Learning Brett Bernstein
Machie Learig Brett Berstei Week Lecture: Cocept Check Exercises Starred problems are optioal. Statistical Learig Theory. Suppose A = Y = R ad X is some other set. Furthermore, assume P X Y is a discrete
More informationEcon 325 Notes on Point Estimator and Confidence Interval 1 By Hiro Kasahara
Poit Estimator Eco 325 Notes o Poit Estimator ad Cofidece Iterval 1 By Hiro Kasahara Parameter, Estimator, ad Estimate The ormal probability desity fuctio is fully characterized by two costats: populatio
More informationECE 8527: Introduction to Machine Learning and Pattern Recognition Midterm # 1. Vaishali Amin Fall, 2015
ECE 8527: Itroductio to Machie Learig ad Patter Recogitio Midterm # 1 Vaishali Ami Fall, 2015 tue39624@temple.edu Problem No. 1: Cosider a two-class discrete distributio problem: ω 1 :{[0,0], [2,0], [2,2],
More informationStatistical Inference Based on Extremum Estimators
T. Rotheberg Fall, 2007 Statistical Iferece Based o Extremum Estimators Itroductio Suppose 0, the true value of a p-dimesioal parameter, is kow to lie i some subset S R p : Ofte we choose to estimate 0
More informationLecture 11: Channel Coding Theorem: Converse Part
EE376A/STATS376A Iformatio Theory Lecture - 02/3/208 Lecture : Chael Codig Theorem: Coverse Part Lecturer: Tsachy Weissma Scribe: Erdem Bıyık I this lecture, we will cotiue our discussio o chael codig
More informationLecture 12: September 27
36-705: Itermediate Statistics Fall 207 Lecturer: Siva Balakrisha Lecture 2: September 27 Today we will discuss sufficiecy i more detail ad the begi to discuss some geeral strategies for costructig estimators.
More information5.1 A mutual information bound based on metric entropy
Chapter 5 Global Fao Method I this chapter, we exted the techiques of Chapter 2.4 o Fao s method the local Fao method) to a more global costructio. I particular, we show that, rather tha costructig a local
More informationExpectation-Maximization Algorithm.
Expectatio-Maximizatio Algorithm. Petr Pošík Czech Techical Uiversity i Prague Faculty of Electrical Egieerig Dept. of Cyberetics MLE 2 Likelihood.........................................................................................................
More informationRandom Variables, Sampling and Estimation
Chapter 1 Radom Variables, Samplig ad Estimatio 1.1 Itroductio This chapter will cover the most importat basic statistical theory you eed i order to uderstad the ecoometric material that will be comig
More informationLecture 19: Convergence
Lecture 19: Covergece Asymptotic approach I statistical aalysis or iferece, a key to the success of fidig a good procedure is beig able to fid some momets ad/or distributios of various statistics. I may
More informationApplication to Random Graphs
A Applicatio to Radom Graphs Brachig processes have a umber of iterestig ad importat applicatios. We shall cosider oe of the most famous of them, the Erdős-Réyi radom graph theory. 1 Defiitio A.1. Let
More informationSimulation. Two Rule For Inverting A Distribution Function
Simulatio Two Rule For Ivertig A Distributio Fuctio Rule 1. If F(x) = u is costat o a iterval [x 1, x 2 ), the the uiform value u is mapped oto x 2 through the iversio process. Rule 2. If there is a jump
More informationLecture 7: October 18, 2017
Iformatio ad Codig Theory Autum 207 Lecturer: Madhur Tulsiai Lecture 7: October 8, 207 Biary hypothesis testig I this lecture, we apply the tools developed i the past few lectures to uderstad the problem
More informationCS434a/541a: Pattern Recognition Prof. Olga Veksler. Lecture 5
CS434a/54a: Patter Recogitio Prof. Olga Veksler Lecture 5 Today Itroductio to parameter estimatio Two methods for parameter estimatio Maimum Likelihood Estimatio Bayesia Estimatio Itroducto Bayesia Decisio
More informationECE 901 Lecture 12: Complexity Regularization and the Squared Loss
ECE 90 Lecture : Complexity Regularizatio ad the Squared Loss R. Nowak 5/7/009 I the previous lectures we made use of the Cheroff/Hoeffdig bouds for our aalysis of classifier errors. Hoeffdig s iequality
More informationEXAMINATIONS OF THE ROYAL STATISTICAL SOCIETY
EXAMINATIONS OF THE ROYAL STATISTICAL SOCIETY GRADUATE DIPLOMA, 016 MODULE : Statistical Iferece Time allowed: Three hours Cadidates should aswer FIVE questios. All questios carry equal marks. The umber
More informationLecture 9: September 19
36-700: Probability ad Mathematical Statistics I Fall 206 Lecturer: Siva Balakrisha Lecture 9: September 9 9. Review ad Outlie Last class we discussed: Statistical estimatio broadly Pot estimatio Bias-Variace
More information4.1 Data processing inequality
ECE598: Iformatio-theoretic methods i high-dimesioal statistics Sprig 206 Lecture 4: Total variatio/iequalities betwee f-divergeces Lecturer: Yihog Wu Scribe: Matthew Tsao, Feb 8, 206 [Ed. Mar 22] Recall
More information1 of 7 7/16/2009 6:06 AM Virtual Laboratories > 6. Radom Samples > 1 2 3 4 5 6 7 6. Order Statistics Defiitios Suppose agai that we have a basic radom experimet, ad that X is a real-valued radom variable
More information32 estimating the cumulative distribution function
32 estimatig the cumulative distributio fuctio 4.6 types of cofidece itervals/bads Let F be a class of distributio fuctios F ad let θ be some quatity of iterest, such as the mea of F or the whole fuctio
More informationIntroductory statistics
CM9S: Machie Learig for Bioiformatics Lecture - 03/3/06 Itroductory statistics Lecturer: Sriram Sakararama Scribe: Sriram Sakararama We will provide a overview of statistical iferece focussig o the key
More informationBasics of Probability Theory (for Theory of Computation courses)
Basics of Probability Theory (for Theory of Computatio courses) Oded Goldreich Departmet of Computer Sciece Weizma Istitute of Sciece Rehovot, Israel. oded.goldreich@weizma.ac.il November 24, 2008 Preface.
More informationLecture 3: August 31
36-705: Itermediate Statistics Fall 018 Lecturer: Siva Balakrisha Lecture 3: August 31 This lecture will be mostly a summary of other useful expoetial tail bouds We will ot prove ay of these i lecture,
More informationOutput Analysis and Run-Length Control
IEOR E4703: Mote Carlo Simulatio Columbia Uiversity c 2017 by Marti Haugh Output Aalysis ad Ru-Legth Cotrol I these otes we describe how the Cetral Limit Theorem ca be used to costruct approximate (1 α%
More informationConfidence Interval for Standard Deviation of Normal Distribution with Known Coefficients of Variation
Cofidece Iterval for tadard Deviatio of Normal Distributio with Kow Coefficiets of Variatio uparat Niwitpog Departmet of Applied tatistics, Faculty of Applied ciece Kig Mogkut s Uiversity of Techology
More information1 Introduction to reducing variance in Monte Carlo simulations
Copyright c 010 by Karl Sigma 1 Itroductio to reducig variace i Mote Carlo simulatios 11 Review of cofidece itervals for estimatig a mea I statistics, we estimate a ukow mea µ = E(X) of a distributio by
More informationLecture 3. Properties of Summary Statistics: Sampling Distribution
Lecture 3 Properties of Summary Statistics: Samplig Distributio Mai Theme How ca we use math to justify that our umerical summaries from the sample are good summaries of the populatio? Lecture Summary
More informationLecture 2: April 3, 2013
TTIC/CMSC 350 Mathematical Toolkit Sprig 203 Madhur Tulsiai Lecture 2: April 3, 203 Scribe: Shubhedu Trivedi Coi tosses cotiued We retur to the coi tossig example from the last lecture agai: Example. Give,
More informationLecture 12: November 13, 2018
Mathematical Toolkit Autum 2018 Lecturer: Madhur Tulsiai Lecture 12: November 13, 2018 1 Radomized polyomial idetity testig We will use our kowledge of coditioal probability to prove the followig lemma,
More informationCEE 522 Autumn Uncertainty Concepts for Geotechnical Engineering
CEE 5 Autum 005 Ucertaity Cocepts for Geotechical Egieerig Basic Termiology Set A set is a collectio of (mutually exclusive) objects or evets. The sample space is the (collectively exhaustive) collectio
More informationAdvanced Stochastic Processes.
Advaced Stochastic Processes. David Gamarik LECTURE 2 Radom variables ad measurable fuctios. Strog Law of Large Numbers (SLLN). Scary stuff cotiued... Outlie of Lecture Radom variables ad measurable fuctios.
More informationCSE 527, Additional notes on MLE & EM
CSE 57 Lecture Notes: MLE & EM CSE 57, Additioal otes o MLE & EM Based o earlier otes by C. Grat & M. Narasimha Itroductio Last lecture we bega a examiatio of model based clusterig. This lecture will be
More informationPolynomial identity testing and global minimum cut
CHAPTER 6 Polyomial idetity testig ad global miimum cut I this lecture we will cosider two further problems that ca be solved usig probabilistic algorithms. I the first half, we will cosider the problem
More informationLECTURE 14 NOTES. A sequence of α-level tests {ϕ n (x)} is consistent if
LECTURE 14 NOTES 1. Asymptotic power of tests. Defiitio 1.1. A sequece of -level tests {ϕ x)} is cosistet if β θ) := E θ [ ϕ x) ] 1 as, for ay θ Θ 1. Just like cosistecy of a sequece of estimators, Defiitio
More information17. Joint distributions of extreme order statistics Lehmann 5.1; Ferguson 15
17. Joit distributios of extreme order statistics Lehma 5.1; Ferguso 15 I Example 10., we derived the asymptotic distributio of the maximum from a radom sample from a uiform distributio. We did this usig
More informationGoodness-of-Fit Tests and Categorical Data Analysis (Devore Chapter Fourteen)
Goodess-of-Fit Tests ad Categorical Data Aalysis (Devore Chapter Fourtee) MATH-252-01: Probability ad Statistics II Sprig 2019 Cotets 1 Chi-Squared Tests with Kow Probabilities 1 1.1 Chi-Squared Testig................
More informationLECTURE NOTES 9. 1 Point Estimation. 1.1 The Method of Moments
LECTURE NOTES 9 Poit Estimatio Uder the hypothesis that the sample was geerated from some parametric statistical model, a atural way to uderstad the uderlyig populatio is by estimatig the parameters of
More informationLecture Note 8 Point Estimators and Point Estimation Methods. MIT Spring 2006 Herman Bennett
Lecture Note 8 Poit Estimators ad Poit Estimatio Methods MIT 14.30 Sprig 2006 Herma Beett Give a parameter with ukow value, the goal of poit estimatio is to use a sample to compute a umber that represets
More informationEmpirical Process Theory and Oracle Inequalities
Stat 928: Statistical Learig Theory Lecture: 10 Empirical Process Theory ad Oracle Iequalities Istructor: Sham Kakade 1 Risk vs Risk See Lecture 0 for a discussio o termiology. 2 The Uio Boud / Boferoi
More informationEstimation of the essential supremum of a regression function
Estimatio of the essetial supremum of a regressio fuctio Michael ohler, Adam rzyżak 2, ad Harro Walk 3 Fachbereich Mathematik, Techische Uiversität Darmstadt, Schlossgartestr. 7, 64289 Darmstadt, Germay,
More informationDistribution of Random Samples & Limit theorems
STAT/MATH 395 A - PROBABILITY II UW Witer Quarter 2017 Néhémy Lim Distributio of Radom Samples & Limit theorems 1 Distributio of i.i.d. Samples Motivatig example. Assume that the goal of a study is to
More information6. Sufficient, Complete, and Ancillary Statistics
Sufficiet, Complete ad Acillary Statistics http://www.math.uah.edu/stat/poit/sufficiet.xhtml 1 of 7 7/16/2009 6:13 AM Virtual Laboratories > 7. Poit Estimatio > 1 2 3 4 5 6 6. Sufficiet, Complete, ad Acillary
More informationFrequentist Inference
Frequetist Iferece The topics of the ext three sectios are useful applicatios of the Cetral Limit Theorem. Without kowig aythig about the uderlyig distributio of a sequece of radom variables {X i }, for
More informationEstimation for Complete Data
Estimatio for Complete Data complete data: there is o loss of iformatio durig study. complete idividual complete data= grouped data A complete idividual data is the oe i which the complete iformatio of
More informationChapter 6 Principles of Data Reduction
Chapter 6 for BST 695: Special Topics i Statistical Theory. Kui Zhag, 0 Chapter 6 Priciples of Data Reductio Sectio 6. Itroductio Goal: To summarize or reduce the data X, X,, X to get iformatio about a
More informationEcon 325/327 Notes on Sample Mean, Sample Proportion, Central Limit Theorem, Chi-square Distribution, Student s t distribution 1.
Eco 325/327 Notes o Sample Mea, Sample Proportio, Cetral Limit Theorem, Chi-square Distributio, Studet s t distributio 1 Sample Mea By Hiro Kasahara We cosider a radom sample from a populatio. Defiitio
More informationECE 901 Lecture 14: Maximum Likelihood Estimation and Complexity Regularization
ECE 90 Lecture 4: Maximum Likelihood Estimatio ad Complexity Regularizatio R Nowak 5/7/009 Review : Maximum Likelihood Estimatio We have iid observatios draw from a ukow distributio Y i iid p θ, i,, where
More informationEfficient GMM LECTURE 12 GMM II
DECEMBER 1 010 LECTURE 1 II Efficiet The estimator depeds o the choice of the weight matrix A. The efficiet estimator is the oe that has the smallest asymptotic variace amog all estimators defied by differet
More informationProblem Set 4 Due Oct, 12
EE226: Radom Processes i Systems Lecturer: Jea C. Walrad Problem Set 4 Due Oct, 12 Fall 06 GSI: Assae Gueye This problem set essetially reviews detectio theory ad hypothesis testig ad some basic otios
More informationIntro to Learning Theory
Lecture 1, October 18, 2016 Itro to Learig Theory Ruth Urer 1 Machie Learig ad Learig Theory Comig soo 2 Formal Framework 21 Basic otios I our formal model for machie learig, the istaces to be classified
More informationNYU Center for Data Science: DS-GA 1003 Machine Learning and Computational Statistics (Spring 2018)
NYU Ceter for Data Sciece: DS-GA 003 Machie Learig ad Computatioal Statistics (Sprig 208) Brett Berstei, David Roseberg, Be Jakubowski Jauary 20, 208 Istructios: Followig most lab ad lecture sectios, we
More informationMASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.436J/15.085J Fall 2008 Lecture 19 11/17/2008 LAWS OF LARGE NUMBERS II THE STRONG LAW OF LARGE NUMBERS
MASSACHUSTTS INSTITUT OF TCHNOLOGY 6.436J/5.085J Fall 2008 Lecture 9 /7/2008 LAWS OF LARG NUMBRS II Cotets. The strog law of large umbers 2. The Cheroff boud TH STRONG LAW OF LARG NUMBRS While the weak
More informationSieve Estimators: Consistency and Rates of Convergence
EECS 598: Statistical Learig Theory, Witer 2014 Topic 6 Sieve Estimators: Cosistecy ad Rates of Covergece Lecturer: Clayto Scott Scribe: Julia Katz-Samuels, Brado Oselio, Pi-Yu Che Disclaimer: These otes
More informationFastest mixing Markov chain on a path
Fastest mixig Markov chai o a path Stephe Boyd Persi Diacois Ju Su Li Xiao Revised July 2004 Abstract We ider the problem of assigig trasitio probabilities to the edges of a path, so the resultig Markov
More informationDimension-free PAC-Bayesian bounds for the estimation of the mean of a random vector
Dimesio-free PAC-Bayesia bouds for the estimatio of the mea of a radom vector Olivier Catoi CREST CNRS UMR 9194 Uiversité Paris Saclay olivier.catoi@esae.fr Ilaria Giulii Laboratoire de Probabilités et
More informationDouble Stage Shrinkage Estimator of Two Parameters. Generalized Exponential Distribution
Iteratioal Mathematical Forum, Vol., 3, o. 3, 3-53 HIKARI Ltd, www.m-hikari.com http://dx.doi.org/.9/imf.3.335 Double Stage Shrikage Estimator of Two Parameters Geeralized Expoetial Distributio Alaa M.
More informationNotes for Lecture 11
U.C. Berkeley CS78: Computatioal Complexity Hadout N Professor Luca Trevisa 3/4/008 Notes for Lecture Eigevalues, Expasio, ad Radom Walks As usual by ow, let G = (V, E) be a udirected d-regular graph with
More information6 Integers Modulo n. integer k can be written as k = qn + r, with q,r, 0 r b. So any integer.
6 Itegers Modulo I Example 2.3(e), we have defied the cogruece of two itegers a,b with respect to a modulus. Let us recall that a b (mod ) meas a b. We have proved that cogruece is a equivalece relatio
More informationLogit regression Logit regression
Logit regressio Logit regressio models the probability of Y= as the cumulative stadard logistic distributio fuctio, evaluated at z = β 0 + β X: Pr(Y = X) = F(β 0 + β X) F is the cumulative logistic distributio
More informationStatistical Inference (Chapter 10) Statistical inference = learn about a population based on the information provided by a sample.
Statistical Iferece (Chapter 10) Statistical iferece = lear about a populatio based o the iformatio provided by a sample. Populatio: The set of all values of a radom variable X of iterest. Characterized
More informationLecture 7: Density Estimation: k-nearest Neighbor and Basis Approach
STAT 425: Itroductio to Noparametric Statistics Witer 28 Lecture 7: Desity Estimatio: k-nearest Neighbor ad Basis Approach Istructor: Ye-Chi Che Referece: Sectio 8.4 of All of Noparametric Statistics.
More informationSince X n /n P p, we know that X n (n. Xn (n X n ) Using the asymptotic result above to obtain an approximation for fixed n, we obtain
Assigmet 9 Exercise 5.5 Let X biomial, p, where p 0, 1 is ukow. Obtai cofidece itervals for p i two differet ways: a Sice X / p d N0, p1 p], the variace of the limitig distributio depeds oly o p. Use the
More information( θ. sup θ Θ f X (x θ) = L. sup Pr (Λ (X) < c) = α. x : Λ (x) = sup θ H 0. sup θ Θ f X (x θ) = ) < c. NH : θ 1 = θ 2 against AH : θ 1 θ 2
82 CHAPTER 4. MAXIMUM IKEIHOOD ESTIMATION Defiitio: et X be a radom sample with joit p.m/d.f. f X x θ. The geeralised likelihood ratio test g.l.r.t. of the NH : θ H 0 agaist the alterative AH : θ H 1,
More informationOverview of Estimation
Topic Iferece is the problem of turig data ito kowledge, where kowledge ofte is expressed i terms of etities that are ot preset i the data per se but are preset i models that oe uses to iterpret the data.
More informationLecture 11 and 12: Basic estimation theory
Lecture ad 2: Basic estimatio theory Sprig 202 - EE 94 Networked estimatio ad cotrol Prof. Kha March 2 202 I. MAXIMUM-LIKELIHOOD ESTIMATORS The maximum likelihood priciple is deceptively simple. Louis
More informationPosted-Price, Sealed-Bid Auctions
Posted-Price, Sealed-Bid Auctios Professors Greewald ad Oyakawa 207-02-08 We itroduce the posted-price, sealed-bid auctio. This auctio format itroduces the idea of approximatios. We describe how well this
More informationAda Boost, Risk Bounds, Concentration Inequalities. 1 AdaBoost and Estimates of Conditional Probabilities
CS8B/Stat4B Sprig 008) Statistical Learig Theory Lecture: Ada Boost, Risk Bouds, Cocetratio Iequalities Lecturer: Peter Bartlett Scribe: Subhrasu Maji AdaBoost ad Estimates of Coditioal Probabilities We
More informationDiscrete Mathematics for CS Spring 2008 David Wagner Note 22
CS 70 Discrete Mathematics for CS Sprig 2008 David Wager Note 22 I.I.D. Radom Variables Estimatig the bias of a coi Questio: We wat to estimate the proportio p of Democrats i the US populatio, by takig
More informationProblem Set 2 Solutions
CS271 Radomess & Computatio, Sprig 2018 Problem Set 2 Solutios Poit totals are i the margi; the maximum total umber of poits was 52. 1. Probabilistic method for domiatig sets 6pts Pick a radom subset S
More informationA survey on penalized empirical risk minimization Sara A. van de Geer
A survey o pealized empirical risk miimizatio Sara A. va de Geer We address the questio how to choose the pealty i empirical risk miimizatio. Roughly speakig, this pealty should be a good boud for the
More information1 Review and Overview
DRAFT a fial versio will be posted shortly CS229T/STATS231: Statistical Learig Theory Lecturer: Tegyu Ma Lecture #3 Scribe: Migda Qiao October 1, 2013 1 Review ad Overview I the first half of this course,
More informationStatistics 511 Additional Materials
Cofidece Itervals o mu Statistics 511 Additioal Materials This topic officially moves us from probability to statistics. We begi to discuss makig ifereces about the populatio. Oe way to differetiate probability
More informationLecture 2: Monte Carlo Simulation
STAT/Q SCI 43: Itroductio to Resamplig ethods Sprig 27 Istructor: Ye-Chi Che Lecture 2: ote Carlo Simulatio 2 ote Carlo Itegratio Assume we wat to evaluate the followig itegratio: e x3 dx What ca we do?
More informationGoodness-Of-Fit For The Generalized Exponential Distribution. Abstract
Goodess-Of-Fit For The Geeralized Expoetial Distributio By Amal S. Hassa stitute of Statistical Studies & Research Cairo Uiversity Abstract Recetly a ew distributio called geeralized expoetial or expoetiated
More informationLinear regression. Daniel Hsu (COMS 4771) (y i x T i β)2 2πσ. 2 2σ 2. 1 n. (x T i β y i ) 2. 1 ˆβ arg min. β R n d
Liear regressio Daiel Hsu (COMS 477) Maximum likelihood estimatio Oe of the simplest liear regressio models is the followig: (X, Y ),..., (X, Y ), (X, Y ) are iid radom pairs takig values i R d R, ad Y
More informationThis exam contains 19 pages (including this cover page) and 10 questions. A Formulae sheet is provided with the exam.
Probability ad Statistics FS 07 Secod Sessio Exam 09.0.08 Time Limit: 80 Miutes Name: Studet ID: This exam cotais 9 pages (icludig this cover page) ad 0 questios. A Formulae sheet is provided with the
More informationLecture 10: Universal coding and prediction
0-704: Iformatio Processig ad Learig Sprig 0 Lecture 0: Uiversal codig ad predictio Lecturer: Aarti Sigh Scribes: Georg M. Goerg Disclaimer: These otes have ot bee subjected to the usual scrutiy reserved
More informationAgnostic Learning and Concentration Inequalities
ECE901 Sprig 2004 Statistical Regularizatio ad Learig Theory Lecture: 7 Agostic Learig ad Cocetratio Iequalities Lecturer: Rob Nowak Scribe: Aravid Kailas 1 Itroductio 1.1 Motivatio I the last lecture
More informationLecture 6 Chi Square Distribution (χ 2 ) and Least Squares Fitting
Lecture 6 Chi Square Distributio (χ ) ad Least Squares Fittig Chi Square Distributio (χ ) Suppose: We have a set of measuremets {x 1, x, x }. We kow the true value of each x i (x t1, x t, x t ). We would
More informationMachine Learning Theory Tübingen University, WS 2016/2017 Lecture 12
Machie Learig Theory Tübige Uiversity, WS 06/07 Lecture Tolstikhi Ilya Abstract I this lecture we derive risk bouds for kerel methods. We will start by showig that Soft Margi kerel SVM correspods to miimizig
More informationStat410 Probability and Statistics II (F16)
Some Basic Cocepts of Statistical Iferece (Sec 5.) Suppose we have a rv X that has a pdf/pmf deoted by f(x; θ) or p(x; θ), where θ is called the parameter. I previous lectures, we focus o probability problems
More informationAAEC/ECON 5126 FINAL EXAM: SOLUTIONS
AAEC/ECON 5126 FINAL EXAM: SOLUTIONS SPRING 2015 / INSTRUCTOR: KLAUS MOELTNER This exam is ope-book, ope-otes, but please work strictly o your ow. Please make sure your ame is o every sheet you re hadig
More informationMachine Learning Theory (CS 6783)
Machie Learig Theory (CS 6783) Lecture 2 : Learig Frameworks, Examples Settig up learig problems. X : istace space or iput space Examples: Computer Visio: Raw M N image vectorized X = 0, 255 M N, SIFT
More informationLecture 9: Boosting. Akshay Krishnamurthy October 3, 2017
Lecture 9: Boostig Akshay Krishamurthy akshay@csumassedu October 3, 07 Recap Last week we discussed some algorithmic aspects of machie learig We saw oe very powerful family of learig algorithms, amely
More informationA statistical method to determine sample size to estimate characteristic value of soil parameters
A statistical method to determie sample size to estimate characteristic value of soil parameters Y. Hojo, B. Setiawa 2 ad M. Suzuki 3 Abstract Sample size is a importat factor to be cosidered i determiig
More information