Lecture 10: Universal coding and prediction

Size: px
Start display at page:

Download "Lecture 10: Universal coding and prediction"

Transcription

1 0-704: Iformatio Processig ad Learig Sprig 0 Lecture 0: Uiversal codig ad predictio Lecturer: Aarti Sigh Scribes: Georg M. Goerg Disclaimer: These otes have ot bee subjected to the usual scrutiy reserved for formal publicatios. They may be distributed outside this class oly with the permissio of the Istructor. 0. Uiversal codig ad predictio We wat a ecoder that works well for ay distributio p P. For example, to gzip a text file we would like a ecoder that works for several differet laguages, rather tha desigig a ew zip code for each sigle laguage. Shao/Huffma codes could achieve this, but their disadvatage is that geerally we have to wait for the whole sequece to arrive before begiig to ecode/decode. We would like to desig a code that ca be ecoded ad decoded immediately o a symbol per symbol basis. I this lecture we will focus o the duality betwee optimal codig ad optimal predictio. For example, it is atural to require a code C to have short legth L C (x) for the symbol x; aalogously i predictio we ofte wat to miimize a loss fuctio loss q (x) for a predictor q. Table 0. gives a geeral overview of several dual cocepts ad otios of a code C ad a predictor q that we will use i this (ad upcomig) lectures. 0.. Weak ad strog uiversality A code C is uiversal if R C = sup E p (R p,c ) 0. (0.) A predictor q is uiversally cosistet if R q = sup E p (R p,q ) 0. (0.) Just as stadard calculus has poitwise ad uiform covergece (of e.g. fuctios), we ca also differetiate betwee weak ad strog uiversality: weakly uiversal if the covergece rate depeds o p P, e.g. if R C = o( γp ) ad γ p chages for every p P. strog uiversal if the covergece rate is the same for all p P, e.g. R C = o( γ ) p P. By Kraft s iequality we kow that for prefix codes x X LC(x The rate eed ot ecessarily be polyomial; it just serves as a example. ), (0.3) 0-

2 0- Lecture 0: Uiversal codig ad predictio legth of symbol x usig code C code legth data x, x p P code C predictor q LC(x) lossq(x) = log q(x) Properties per symbol Average expected legth Ep ideal code legth LC(x ) - log q(x ) if x iid = i= log q(x i) ( ) LC(x) log p(x ) (Shao iformatio cotet) egative log-likelihood or self-iformatio loss empirical log-likelihood of the data E p log q(x ) E-loss = risk log p(x ) likelihood uder the true model p redudacy Rp,C = L C(x ) ( log p(x ) ) Rp,q = log q(x ) ( log p(x ) ) excess loss of predictor q wrt true p expected redudacy Ep (Rp,C) = Ep ( ) LC(x ) H(x ) 0 for uiquely decodable codes miimax expected redudacy a mic sup Ep (Rp,C) :=RC Ep (Rp,q) = E p log p(x ) q(x ) = D (p q) 0 excess risk = E excess loss miq sup D (p q) =:Rq miimax excess risk Table 0.: Uiversal codig ad predictio: per symbol properties of a code C ad a predictor q. a We wat that the code C works well for all p P.

3 Lecture 0: Uiversal codig ad predictio 0-3 ad we ca costruct the correspodig predictor by q(x L C(x ) ) = x = k X L C(x ) L C(x ) L c (x ) log q(x ). (0.4) Similarly, for ay distributio q(x ) we ca defie a correspodig prefix code that satisfies This must be true sice we kow that at least the Shao code For prefix codes (see (0.)) we have L c (x ) log q(x ) +. (0.5) R C mi sup q D (p q) arg mi q ( ) log q(x ) ca achieve this. =: R. (0.6) Let q be the predictor that achieves the above miimum. We ca the costruct a Shao code C from this predictor. By (0.5) we kow that R C will be withi / bit of R C (or withi bit for the etire sequece). 0.. Predictio problem We have data x from p P. How much loss do we suffer from usig q p istead of the true p? Here q ca be ay distributio; however, typically q is a estimate of p depedig o the data x. I geeral, we have to impose some restrictios o the class of distributios P to get uiversally cosistet codes C / predictors q, i.e. to achieve error rates 0. For example, we ofte assume i) iid, ii) Markov chais,... For a specific q cosider R q R. Typically we try to boud ad to get cotrol over the error rates. Note that q = arg mi q R q, where R q is the worst excess risk for a particular q. q is the model/estimate q that miimizes the expected worst case sceario. We will show later i the course that, i geeral, the optimal q is a mixture distributio over the class p P. I other words, for ay q, a mixture distributio p mix such that the excess risk of q is always greater or equal to the excess risk of p mix, i.e. D (p q) D (p p mix ). (0.7) 0..3 Maximum loss istead of expected loss Now istead of expected loss, cosider the maximum loss (maximum over all possible sequeces x ) R = mi q sup max x log p(x ) q(x ) (0.8)

4 0-4 Lecture 0: Uiversal codig ad predictio Let P ML (x ) = sup p(x ) (the MLE). Defie the ormalized ML as NML(x ) = P ML(x ) x P ML(x ) = q, (0.9) uder maximum-loss (istead of E-loss). The ormalized maximum likelihood distributio is the best uiversal predictor uder maximum loss. Theorem 0. For ay class P of processes with fiite alphabet q = NML(x ) ad R = log x P ML (x ). (0.0) For a proof, see pg 480 of Csiszar ad Shield s tutorial Problems of NML ad maximum loss for arithmetic codig For arithmetic codig we eed the coditioal distributio q(x x ). But the NML distributio is ot cosistet i the sese that q (x ) x + q (x + ) (0.) or equivaletly q (x,..., x ) x + q (x,..., x, x + ) (0.) Remark: The right had side i the above expressios yields a valid distributio, but it is ot the distributio of x,..., x uder q. This ca be see by recallig the defiitio of q. Thus it is ot possible to defie a correspodig arithmetic code (Shao ad Huffma codes are possible though). Thus we retur to cosider E-loss as i this case we kow that q is a mixture distributio - ad this is cosistet i the sese that q(x,..., x ) = x + q(x,..., x, x + ). (0.3) Examples of model classes ad their optimal codes/predictors. P is the class of iid processes with fiite alphabet X. It ca be show that the optimal predictor is give by q(x (x i x i ) + ) =, i= i + X (0.4) (x i x i ) = # of ocurreces of symbol x i i past x i. (0.5) The term (xi xi ) i is simply the frequecy of symbols observed before time t; the additioal smoothes out the ML estimate. Thus it avoids assigig 0 probability to symbols that have ot occurred yet (but may occur i the future). + + X

5 Lecture 0: Uiversal codig ad predictio 0-5 Let x be the umber of times the symbol x occurred i the etire legth sequece. The (0.4) ca be rewritte as (see ext lecture) q(x ) = ( x X ( x )( x 3 ) ) ( ) + X + X X π(p) p(x ), (0.6) where π(p) Dirichlet (,..., ). Oe ca show that (for proof see ext lecture) R q Rq X. P is the class of Markov processes of order. log best possible boud + costat. (0.7) Let i (k, j) be the cout of how may times the sequece (k, j) appeared i the first i symbols (x,..., x i ); also let i (k) = j i (k, j) be the total umber of times the symbol k occurred i the first i symbols. q(x ) = i= q(j x i ), q(j x i ) = i (k, j) +. (0.8) i (k) + X For a Markov process of order m = oe ca show (see ext lecture) R q Rq X ( X ) log best possible boud + costat. (0.9) 3. P is the class of Markov processes of order m, i.e. x i depeds o previous m steps. The q(j x i # times j occured preceeded by xi ) = # times x i X i m occured + i m +. (0.0) Here it holds R q Rq X m ( X ) log best possible boud + costat m. (0.) Agai, see the ext lecture for detailed derivatios.

Information Theory and Statistics Lecture 4: Lempel-Ziv code

Information Theory and Statistics Lecture 4: Lempel-Ziv code Iformatio Theory ad Statistics Lecture 4: Lempel-Ziv code Łukasz Dębowski ldebowsk@ipipa.waw.pl Ph. D. Programme 203/204 Etropy rate is the limitig compressio rate Theorem For a statioary process (X i)

More information

Lecture 2: Concentration Bounds

Lecture 2: Concentration Bounds CSE 52: Desig ad Aalysis of Algorithms I Sprig 206 Lecture 2: Cocetratio Bouds Lecturer: Shaya Oveis Ghara March 30th Scribe: Syuzaa Sargsya Disclaimer: These otes have ot bee subjected to the usual scrutiy

More information

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.436J/15.085J Fall 2008 Lecture 19 11/17/2008 LAWS OF LARGE NUMBERS II THE STRONG LAW OF LARGE NUMBERS

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.436J/15.085J Fall 2008 Lecture 19 11/17/2008 LAWS OF LARGE NUMBERS II THE STRONG LAW OF LARGE NUMBERS MASSACHUSTTS INSTITUT OF TCHNOLOGY 6.436J/5.085J Fall 2008 Lecture 9 /7/2008 LAWS OF LARG NUMBRS II Cotets. The strog law of large umbers 2. The Cheroff boud TH STRONG LAW OF LARG NUMBRS While the weak

More information

Lecture 12: September 27

Lecture 12: September 27 36-705: Itermediate Statistics Fall 207 Lecturer: Siva Balakrisha Lecture 2: September 27 Today we will discuss sufficiecy i more detail ad the begi to discuss some geeral strategies for costructig estimators.

More information

10-704: Information Processing and Learning Spring Lecture 10: Feb 12

10-704: Information Processing and Learning Spring Lecture 10: Feb 12 10-704: Iformatio Processig ad Learig Sprig 2015 Lecture 10: Feb 12 Lecturer: Akshay Krishamurthy Scribe: Dea Asta, Kirthevasa Kadasamy Disclaimer: These otes have ot bee subjected to the usual scrutiy

More information

Lecture 13: Maximum Likelihood Estimation

Lecture 13: Maximum Likelihood Estimation ECE90 Sprig 007 Statistical Learig Theory Istructor: R. Nowak Lecture 3: Maximum Likelihood Estimatio Summary of Lecture I the last lecture we derived a risk (MSE) boud for regressio problems; i.e., select

More information

1 Review and Overview

1 Review and Overview DRAFT a fial versio will be posted shortly CS229T/STATS231: Statistical Learig Theory Lecturer: Tegyu Ma Lecture #3 Scribe: Migda Qiao October 1, 2013 1 Review ad Overview I the first half of this course,

More information

ECE 901 Lecture 13: Maximum Likelihood Estimation

ECE 901 Lecture 13: Maximum Likelihood Estimation ECE 90 Lecture 3: Maximum Likelihood Estimatio R. Nowak 5/7/009 The focus of this lecture is to cosider aother approach to learig based o maximum likelihood estimatio. Ulike earlier approaches cosidered

More information

CS434a/541a: Pattern Recognition Prof. Olga Veksler. Lecture 5

CS434a/541a: Pattern Recognition Prof. Olga Veksler. Lecture 5 CS434a/54a: Patter Recogitio Prof. Olga Veksler Lecture 5 Today Itroductio to parameter estimatio Two methods for parameter estimatio Maimum Likelihood Estimatio Bayesia Estimatio Itroducto Bayesia Decisio

More information

Intro to Learning Theory

Intro to Learning Theory Lecture 1, October 18, 2016 Itro to Learig Theory Ruth Urer 1 Machie Learig ad Learig Theory Comig soo 2 Formal Framework 21 Basic otios I our formal model for machie learig, the istaces to be classified

More information

Convergence of random variables. (telegram style notes) P.J.C. Spreij

Convergence of random variables. (telegram style notes) P.J.C. Spreij Covergece of radom variables (telegram style otes).j.c. Spreij this versio: September 6, 2005 Itroductio As we kow, radom variables are by defiitio measurable fuctios o some uderlyig measurable space

More information

Introductory statistics

Introductory statistics CM9S: Machie Learig for Bioiformatics Lecture - 03/3/06 Itroductory statistics Lecturer: Sriram Sakararama Scribe: Sriram Sakararama We will provide a overview of statistical iferece focussig o the key

More information

Maximum Likelihood Estimation and Complexity Regularization

Maximum Likelihood Estimation and Complexity Regularization ECE90 Sprig 004 Statistical Regularizatio ad Learig Theory Lecture: 4 Maximum Likelihood Estimatio ad Complexity Regularizatio Lecturer: Rob Nowak Scribe: Pam Limpiti Review : Maximum Likelihood Estimatio

More information

1 Convergence in Probability and the Weak Law of Large Numbers

1 Convergence in Probability and the Weak Law of Large Numbers 36-752 Advaced Probability Overview Sprig 2018 8. Covergece Cocepts: i Probability, i L p ad Almost Surely Istructor: Alessadro Rialdo Associated readig: Sec 2.4, 2.5, ad 4.11 of Ash ad Doléas-Dade; Sec

More information

Rademacher Complexity

Rademacher Complexity EECS 598: Statistical Learig Theory, Witer 204 Topic 0 Rademacher Complexity Lecturer: Clayto Scott Scribe: Ya Deg, Kevi Moo Disclaimer: These otes have ot bee subjected to the usual scrutiy reserved for

More information

7.1 Convergence of sequences of random variables

7.1 Convergence of sequences of random variables Chapter 7 Limit Theorems Throughout this sectio we will assume a probability space (, F, P), i which is defied a ifiite sequece of radom variables (X ) ad a radom variable X. The fact that for every ifiite

More information

Lecture 19: Convergence

Lecture 19: Convergence Lecture 19: Covergece Asymptotic approach I statistical aalysis or iferece, a key to the success of fidig a good procedure is beig able to fid some momets ad/or distributios of various statistics. I may

More information

CS284A: Representations and Algorithms in Molecular Biology

CS284A: Representations and Algorithms in Molecular Biology CS284A: Represetatios ad Algorithms i Molecular Biology Scribe Notes o Lectures 3 & 4: Motif Discovery via Eumeratio & Motif Represetatio Usig Positio Weight Matrix Joshua Gervi Based o presetatios by

More information

Sieve Estimators: Consistency and Rates of Convergence

Sieve Estimators: Consistency and Rates of Convergence EECS 598: Statistical Learig Theory, Witer 2014 Topic 6 Sieve Estimators: Cosistecy ad Rates of Covergece Lecturer: Clayto Scott Scribe: Julia Katz-Samuels, Brado Oselio, Pi-Yu Che Disclaimer: These otes

More information

ECE 901 Lecture 14: Maximum Likelihood Estimation and Complexity Regularization

ECE 901 Lecture 14: Maximum Likelihood Estimation and Complexity Regularization ECE 90 Lecture 4: Maximum Likelihood Estimatio ad Complexity Regularizatio R Nowak 5/7/009 Review : Maximum Likelihood Estimatio We have iid observatios draw from a ukow distributio Y i iid p θ, i,, where

More information

Chapter 6 Infinite Series

Chapter 6 Infinite Series Chapter 6 Ifiite Series I the previous chapter we cosidered itegrals which were improper i the sese that the iterval of itegratio was ubouded. I this chapter we are goig to discuss a topic which is somewhat

More information

(A sequence also can be thought of as the list of function values attained for a function f :ℵ X, where f (n) = x n for n 1.) x 1 x N +k x N +4 x 3

(A sequence also can be thought of as the list of function values attained for a function f :ℵ X, where f (n) = x n for n 1.) x 1 x N +k x N +4 x 3 MATH 337 Sequeces Dr. Neal, WKU Let X be a metric space with distace fuctio d. We shall defie the geeral cocept of sequece ad limit i a metric space, the apply the results i particular to some special

More information

Lecture 6: Source coding, Typicality, and Noisy channels and capacity

Lecture 6: Source coding, Typicality, and Noisy channels and capacity 15-859: Iformatio Theory ad Applicatios i TCS CMU: Sprig 2013 Lecture 6: Source codig, Typicality, ad Noisy chaels ad capacity Jauary 31, 2013 Lecturer: Mahdi Cheraghchi Scribe: Togbo Huag 1 Recap Uiversal

More information

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.265/15.070J Fall 2013 Lecture 6 9/23/2013. Brownian motion. Introduction

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.265/15.070J Fall 2013 Lecture 6 9/23/2013. Brownian motion. Introduction MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.265/5.070J Fall 203 Lecture 6 9/23/203 Browia motio. Itroductio Cotet.. A heuristic costructio of a Browia motio from a radom walk. 2. Defiitio ad basic properties

More information

Empirical Process Theory and Oracle Inequalities

Empirical Process Theory and Oracle Inequalities Stat 928: Statistical Learig Theory Lecture: 10 Empirical Process Theory ad Oracle Iequalities Istructor: Sham Kakade 1 Risk vs Risk See Lecture 0 for a discussio o termiology. 2 The Uio Boud / Boferoi

More information

ECE 901 Lecture 12: Complexity Regularization and the Squared Loss

ECE 901 Lecture 12: Complexity Regularization and the Squared Loss ECE 90 Lecture : Complexity Regularizatio ad the Squared Loss R. Nowak 5/7/009 I the previous lectures we made use of the Cheroff/Hoeffdig bouds for our aalysis of classifier errors. Hoeffdig s iequality

More information

Fall 2013 MTH431/531 Real analysis Section Notes

Fall 2013 MTH431/531 Real analysis Section Notes Fall 013 MTH431/531 Real aalysis Sectio 8.1-8. Notes Yi Su 013.11.1 1. Defiitio of uiform covergece. We look at a sequece of fuctios f (x) ad study the coverget property. Notice we have two parameters

More information

Shannon s noiseless coding theorem

Shannon s noiseless coding theorem 18.310 lecture otes May 4, 2015 Shao s oiseless codig theorem Lecturer: Michel Goemas I these otes we discuss Shao s oiseless codig theorem, which is oe of the foudig results of the field of iformatio

More information

Entropy and Ergodic Theory Lecture 5: Joint typicality and conditional AEP

Entropy and Ergodic Theory Lecture 5: Joint typicality and conditional AEP Etropy ad Ergodic Theory Lecture 5: Joit typicality ad coditioal AEP 1 Notatio: from RVs back to distributios Let (Ω, F, P) be a probability space, ad let X ad Y be A- ad B-valued discrete RVs, respectively.

More information

Advanced Stochastic Processes.

Advanced Stochastic Processes. Advaced Stochastic Processes. David Gamarik LECTURE 2 Radom variables ad measurable fuctios. Strog Law of Large Numbers (SLLN). Scary stuff cotiued... Outlie of Lecture Radom variables ad measurable fuctios.

More information

Asymptotic Coupling and Its Applications in Information Theory

Asymptotic Coupling and Its Applications in Information Theory Asymptotic Couplig ad Its Applicatios i Iformatio Theory Vicet Y. F. Ta Joit Work with Lei Yu Departmet of Electrical ad Computer Egieerig, Departmet of Mathematics, Natioal Uiversity of Sigapore IMS-APRM

More information

7.1 Convergence of sequences of random variables

7.1 Convergence of sequences of random variables Chapter 7 Limit theorems Throughout this sectio we will assume a probability space (Ω, F, P), i which is defied a ifiite sequece of radom variables (X ) ad a radom variable X. The fact that for every ifiite

More information

10-701/ Machine Learning Mid-term Exam Solution

10-701/ Machine Learning Mid-term Exam Solution 0-70/5-78 Machie Learig Mid-term Exam Solutio Your Name: Your Adrew ID: True or False (Give oe setece explaatio) (20%). (F) For a cotiuous radom variable x ad its probability distributio fuctio p(x), it

More information

6. Sufficient, Complete, and Ancillary Statistics

6. Sufficient, Complete, and Ancillary Statistics Sufficiet, Complete ad Acillary Statistics http://www.math.uah.edu/stat/poit/sufficiet.xhtml 1 of 7 7/16/2009 6:13 AM Virtual Laboratories > 7. Poit Estimatio > 1 2 3 4 5 6 6. Sufficiet, Complete, ad Acillary

More information

Lecture 11: Pseudorandom functions

Lecture 11: Pseudorandom functions COM S 6830 Cryptography Oct 1, 2009 Istructor: Rafael Pass 1 Recap Lecture 11: Pseudoradom fuctios Scribe: Stefao Ermo Defiitio 1 (Ge, Ec, Dec) is a sigle message secure ecryptio scheme if for all uppt

More information

Machine Learning Theory (CS 6783)

Machine Learning Theory (CS 6783) Machie Learig Theory (CS 6783) Lecture 2 : Learig Frameworks, Examples Settig up learig problems. X : istace space or iput space Examples: Computer Visio: Raw M N image vectorized X = 0, 255 M N, SIFT

More information

Tests of Hypotheses Based on a Single Sample (Devore Chapter Eight)

Tests of Hypotheses Based on a Single Sample (Devore Chapter Eight) Tests of Hypotheses Based o a Sigle Sample Devore Chapter Eight MATH-252-01: Probability ad Statistics II Sprig 2018 Cotets 1 Hypothesis Tests illustrated with z-tests 1 1.1 Overview of Hypothesis Testig..........

More information

The Maximum-Likelihood Decoding Performance of Error-Correcting Codes

The Maximum-Likelihood Decoding Performance of Error-Correcting Codes The Maximum-Lielihood Decodig Performace of Error-Correctig Codes Hery D. Pfister ECE Departmet Texas A&M Uiversity August 27th, 2007 (rev. 0) November 2st, 203 (rev. ) Performace of Codes. Notatio X,

More information

Lecture 11: Channel Coding Theorem: Converse Part

Lecture 11: Channel Coding Theorem: Converse Part EE376A/STATS376A Iformatio Theory Lecture - 02/3/208 Lecture : Chael Codig Theorem: Coverse Part Lecturer: Tsachy Weissma Scribe: Erdem Bıyık I this lecture, we will cotiue our discussio o chael codig

More information

Lesson 10: Limits and Continuity

Lesson 10: Limits and Continuity www.scimsacademy.com Lesso 10: Limits ad Cotiuity SCIMS Academy 1 Limit of a fuctio The cocept of limit of a fuctio is cetral to all other cocepts i calculus (like cotiuity, derivative, defiite itegrals

More information

subcaptionfont+=small,labelformat=parens,labelsep=space,skip=6pt,list=0,hypcap=0 subcaption ALGEBRAIC COMBINATORICS LECTURE 8 TUESDAY, 2/16/2016

subcaptionfont+=small,labelformat=parens,labelsep=space,skip=6pt,list=0,hypcap=0 subcaption ALGEBRAIC COMBINATORICS LECTURE 8 TUESDAY, 2/16/2016 subcaptiofot+=small,labelformat=pares,labelsep=space,skip=6pt,list=0,hypcap=0 subcaptio ALGEBRAIC COMBINATORICS LECTURE 8 TUESDAY, /6/06. Self-cojugate Partitios Recall that, give a partitio λ, we may

More information

Summary. Recap ... Last Lecture. Summary. Theorem

Summary. Recap ... Last Lecture. Summary. Theorem Last Lecture Biostatistics 602 - Statistical Iferece Lecture 23 Hyu Mi Kag April 11th, 2013 What is p-value? What is the advatage of p-value compared to hypothesis testig procedure with size α? How ca

More information

1 Review and Overview

1 Review and Overview CS9T/STATS3: Statistical Learig Theory Lecturer: Tegyu Ma Lecture #6 Scribe: Jay Whag ad Patrick Cho October 0, 08 Review ad Overview Recall i the last lecture that for ay family of scalar fuctios F, we

More information

5.1 A mutual information bound based on metric entropy

5.1 A mutual information bound based on metric entropy Chapter 5 Global Fao Method I this chapter, we exted the techiques of Chapter 2.4 o Fao s method the local Fao method) to a more global costructio. I particular, we show that, rather tha costructig a local

More information

Ma 530 Introduction to Power Series

Ma 530 Introduction to Power Series Ma 530 Itroductio to Power Series Please ote that there is material o power series at Visual Calculus. Some of this material was used as part of the presetatio of the topics that follow. What is a Power

More information

Ada Boost, Risk Bounds, Concentration Inequalities. 1 AdaBoost and Estimates of Conditional Probabilities

Ada Boost, Risk Bounds, Concentration Inequalities. 1 AdaBoost and Estimates of Conditional Probabilities CS8B/Stat4B Sprig 008) Statistical Learig Theory Lecture: Ada Boost, Risk Bouds, Cocetratio Iequalities Lecturer: Peter Bartlett Scribe: Subhrasu Maji AdaBoost ad Estimates of Coditioal Probabilities We

More information

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.265/15.070J Fall 2013 Lecture 2 9/9/2013. Large Deviations for i.i.d. Random Variables

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.265/15.070J Fall 2013 Lecture 2 9/9/2013. Large Deviations for i.i.d. Random Variables MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.265/15.070J Fall 2013 Lecture 2 9/9/2013 Large Deviatios for i.i.d. Radom Variables Cotet. Cheroff boud usig expoetial momet geeratig fuctios. Properties of a momet

More information

6.3 Testing Series With Positive Terms

6.3 Testing Series With Positive Terms 6.3. TESTING SERIES WITH POSITIVE TERMS 307 6.3 Testig Series With Positive Terms 6.3. Review of what is kow up to ow I theory, testig a series a i for covergece amouts to fidig the i= sequece of partial

More information

It is always the case that unions, intersections, complements, and set differences are preserved by the inverse image of a function.

It is always the case that unions, intersections, complements, and set differences are preserved by the inverse image of a function. MATH 532 Measurable Fuctios Dr. Neal, WKU Throughout, let ( X, F, µ) be a measure space ad let (!, F, P ) deote the special case of a probability space. We shall ow begi to study real-valued fuctios defied

More information

The method of types. PhD short course Information Theory and Statistics Siena, September, Mauro Barni University of Siena

The method of types. PhD short course Information Theory and Statistics Siena, September, Mauro Barni University of Siena PhD short course Iformatio Theory ad Statistics Siea, 15-19 September, 2014 The method of types Mauro Bari Uiversity of Siea Outlie of the course Part 1: Iformatio theory i a utshell Part 2: The method

More information

On Modeling On Minimum Description Length Modeling. M-closed

On Modeling On Minimum Description Length Modeling. M-closed O Modelig O Miiu Descriptio Legth Modelig M M-closed M-ope Do you believe that the data geeratig echais really is i your odel class M? 7 73 Miiu Descriptio Legth Priciple o-m-closed predictive iferece

More information

Lecture 14: Graph Entropy

Lecture 14: Graph Entropy 15-859: Iformatio Theory ad Applicatios i TCS Sprig 2013 Lecture 14: Graph Etropy March 19, 2013 Lecturer: Mahdi Cheraghchi Scribe: Euiwoog Lee 1 Recap Bergma s boud o the permaet Shearer s Lemma Number

More information

Entropy Rates and Asymptotic Equipartition

Entropy Rates and Asymptotic Equipartition Chapter 29 Etropy Rates ad Asymptotic Equipartitio Sectio 29. itroduces the etropy rate the asymptotic etropy per time-step of a stochastic process ad shows that it is well-defied; ad similarly for iformatio,

More information

EECS564 Estimation, Filtering, and Detection Hwk 2 Solns. Winter p θ (z) = (2θz + 1 θ), 0 z 1

EECS564 Estimation, Filtering, and Detection Hwk 2 Solns. Winter p θ (z) = (2θz + 1 θ), 0 z 1 EECS564 Estimatio, Filterig, ad Detectio Hwk 2 Sols. Witer 25 4. Let Z be a sigle observatio havig desity fuctio where. p (z) = (2z + ), z (a) Assumig that is a oradom parameter, fid ad plot the maximum

More information

Lecture 10 October Minimaxity and least favorable prior sequences

Lecture 10 October Minimaxity and least favorable prior sequences STATS 300A: Theory of Statistics Fall 205 Lecture 0 October 22 Lecturer: Lester Mackey Scribe: Brya He, Rahul Makhijai Warig: These otes may cotai factual ad/or typographic errors. 0. Miimaxity ad least

More information

Information Theory Tutorial Communication over Channels with memory. Chi Zhang Department of Electrical Engineering University of Notre Dame

Information Theory Tutorial Communication over Channels with memory. Chi Zhang Department of Electrical Engineering University of Notre Dame Iformatio Theory Tutorial Commuicatio over Chaels with memory Chi Zhag Departmet of Electrical Egieerig Uiversity of Notre Dame Abstract A geeral capacity formula C = sup I(; Y ), which is correct for

More information

Lecture 5. Power properties of EL and EL for vectors

Lecture 5. Power properties of EL and EL for vectors Stats 34 Empirical Likelihood Oct.8 Lecture 5. Power properties of EL ad EL for vectors Istructor: Art B. Owe, Staford Uiversity. Scribe: Jigshu Wag Power properties of empirical likelihood Power of the

More information

Lecture 7: October 18, 2017

Lecture 7: October 18, 2017 Iformatio ad Codig Theory Autum 207 Lecturer: Madhur Tulsiai Lecture 7: October 8, 207 Biary hypothesis testig I this lecture, we apply the tools developed i the past few lectures to uderstad the problem

More information

Chapter 3. Strong convergence. 3.1 Definition of almost sure convergence

Chapter 3. Strong convergence. 3.1 Definition of almost sure convergence Chapter 3 Strog covergece As poited out i the Chapter 2, there are multiple ways to defie the otio of covergece of a sequece of radom variables. That chapter defied covergece i probability, covergece i

More information

Let us give one more example of MLE. Example 3. The uniform distribution U[0, θ] on the interval [0, θ] has p.d.f.

Let us give one more example of MLE. Example 3. The uniform distribution U[0, θ] on the interval [0, θ] has p.d.f. Lecture 5 Let us give oe more example of MLE. Example 3. The uiform distributio U[0, ] o the iterval [0, ] has p.d.f. { 1 f(x =, 0 x, 0, otherwise The likelihood fuctio ϕ( = f(x i = 1 I(X 1,..., X [0,

More information

Recurrence Relations

Recurrence Relations Recurrece Relatios Aalysis of recursive algorithms, such as: it factorial (it ) { if (==0) retur ; else retur ( * factorial(-)); } Let t be the umber of multiplicatios eeded to calculate factorial(). The

More information

Sequences A sequence of numbers is a function whose domain is the positive integers. We can see that the sequence

Sequences A sequence of numbers is a function whose domain is the positive integers. We can see that the sequence Sequeces A sequece of umbers is a fuctio whose domai is the positive itegers. We ca see that the sequece 1, 1, 2, 2, 3, 3,... is a fuctio from the positive itegers whe we write the first sequece elemet

More information

Lecture 4. We also define the set of possible values for the random walk as the set of all x R d such that P(S n = x) > 0 for some n.

Lecture 4. We also define the set of possible values for the random walk as the set of all x R d such that P(S n = x) > 0 for some n. Radom Walks ad Browia Motio Tel Aviv Uiversity Sprig 20 Lecture date: Mar 2, 20 Lecture 4 Istructor: Ro Peled Scribe: Lira Rotem This lecture deals primarily with recurrece for geeral radom walks. We preset

More information

Lecture 2: Monte Carlo Simulation

Lecture 2: Monte Carlo Simulation STAT/Q SCI 43: Itroductio to Resamplig ethods Sprig 27 Istructor: Ye-Chi Che Lecture 2: ote Carlo Simulatio 2 ote Carlo Itegratio Assume we wat to evaluate the followig itegratio: e x3 dx What ca we do?

More information

Lecture 2. The Lovász Local Lemma

Lecture 2. The Lovász Local Lemma Staford Uiversity Sprig 208 Math 233A: No-costructive methods i combiatorics Istructor: Ja Vodrák Lecture date: Jauary 0, 208 Origial scribe: Apoorva Khare Lecture 2. The Lovász Local Lemma 2. Itroductio

More information

Lecture Chapter 6: Convergence of Random Sequences

Lecture Chapter 6: Convergence of Random Sequences ECE5: Aalysis of Radom Sigals Fall 6 Lecture Chapter 6: Covergece of Radom Sequeces Dr Salim El Rouayheb Scribe: Abhay Ashutosh Doel, Qibo Zhag, Peiwe Tia, Pegzhe Wag, Lu Liu Radom sequece Defiitio A ifiite

More information

18.657: Mathematics of Machine Learning

18.657: Mathematics of Machine Learning 8.657: Mathematics of Machie Learig Lecturer: Philippe Rigollet Lecture 4 Scribe: Cheg Mao Sep., 05 I this lecture, we cotiue to discuss the effect of oise o the rate of the excess risk E(h) = R(h) R(h

More information

A sequence of numbers is a function whose domain is the positive integers. We can see that the sequence

A sequence of numbers is a function whose domain is the positive integers. We can see that the sequence Sequeces A sequece of umbers is a fuctio whose domai is the positive itegers. We ca see that the sequece,, 2, 2, 3, 3,... is a fuctio from the positive itegers whe we write the first sequece elemet as

More information

Glivenko-Cantelli Classes

Glivenko-Cantelli Classes CS28B/Stat24B (Sprig 2008 Statistical Learig Theory Lecture: 4 Gliveko-Catelli Classes Lecturer: Peter Bartlett Scribe: Michelle Besi Itroductio This lecture will cover Gliveko-Catelli (GC classes ad itroduce

More information

Lecture 11: Decision Trees

Lecture 11: Decision Trees ECE9 Sprig 7 Statistical Learig Theory Istructor: R. Nowak Lecture : Decisio Trees Miimum Complexity Pealized Fuctio Recall the basic results of the last lectures: let X ad Y deote the iput ad output spaces

More information

Generalized Semi- Markov Processes (GSMP)

Generalized Semi- Markov Processes (GSMP) Geeralized Semi- Markov Processes (GSMP) Summary Some Defiitios Markov ad Semi-Markov Processes The Poisso Process Properties of the Poisso Process Iterarrival times Memoryless property ad the residual

More information

Entropies & Information Theory

Entropies & Information Theory Etropies & Iformatio Theory LECTURE I Nilajaa Datta Uiversity of Cambridge,U.K. For more details: see lecture otes (Lecture 1- Lecture 5) o http://www.qi.damtp.cam.ac.uk/ode/223 Quatum Iformatio Theory

More information

Lecture 3 The Lebesgue Integral

Lecture 3 The Lebesgue Integral Lecture 3: The Lebesgue Itegral 1 of 14 Course: Theory of Probability I Term: Fall 2013 Istructor: Gorda Zitkovic Lecture 3 The Lebesgue Itegral The costructio of the itegral Uless expressly specified

More information

Lecture 15: Learning Theory: Concentration Inequalities

Lecture 15: Learning Theory: Concentration Inequalities STAT 425: Itroductio to Noparametric Statistics Witer 208 Lecture 5: Learig Theory: Cocetratio Iequalities Istructor: Ye-Chi Che 5. Itroductio Recall that i the lecture o classificatio, we have see that

More information

Distribution of Random Samples & Limit theorems

Distribution of Random Samples & Limit theorems STAT/MATH 395 A - PROBABILITY II UW Witer Quarter 2017 Néhémy Lim Distributio of Radom Samples & Limit theorems 1 Distributio of i.i.d. Samples Motivatig example. Assume that the goal of a study is to

More information

NYU Center for Data Science: DS-GA 1003 Machine Learning and Computational Statistics (Spring 2018)

NYU Center for Data Science: DS-GA 1003 Machine Learning and Computational Statistics (Spring 2018) NYU Ceter for Data Sciece: DS-GA 003 Machie Learig ad Computatioal Statistics (Sprig 208) Brett Berstei, David Roseberg, Be Jakubowski Jauary 20, 208 Istructios: Followig most lab ad lecture sectios, we

More information

Lecture 15: Strong, Conditional, & Joint Typicality

Lecture 15: Strong, Conditional, & Joint Typicality EE376A/STATS376A Iformatio Theory Lecture 15-02/27/2018 Lecture 15: Strog, Coditioal, & Joit Typicality Lecturer: Tsachy Weissma Scribe: Nimit Sohoi, William McCloskey, Halwest Mohammad I this lecture,

More information

REAL ANALYSIS II: PROBLEM SET 1 - SOLUTIONS

REAL ANALYSIS II: PROBLEM SET 1 - SOLUTIONS REAL ANALYSIS II: PROBLEM SET 1 - SOLUTIONS 18th Feb, 016 Defiitio (Lipschitz fuctio). A fuctio f : R R is said to be Lipschitz if there exists a positive real umber c such that for ay x, y i the domai

More information

1 of 7 7/16/2009 6:06 AM Virtual Laboratories > 6. Radom Samples > 1 2 3 4 5 6 7 6. Order Statistics Defiitios Suppose agai that we have a basic radom experimet, ad that X is a real-valued radom variable

More information

Efficient GMM LECTURE 12 GMM II

Efficient GMM LECTURE 12 GMM II DECEMBER 1 010 LECTURE 1 II Efficiet The estimator depeds o the choice of the weight matrix A. The efficiet estimator is the oe that has the smallest asymptotic variace amog all estimators defied by differet

More information

Basics of Probability Theory (for Theory of Computation courses)

Basics of Probability Theory (for Theory of Computation courses) Basics of Probability Theory (for Theory of Computatio courses) Oded Goldreich Departmet of Computer Sciece Weizma Istitute of Sciece Rehovot, Israel. oded.goldreich@weizma.ac.il November 24, 2008 Preface.

More information

Chapter 6 Principles of Data Reduction

Chapter 6 Principles of Data Reduction Chapter 6 for BST 695: Special Topics i Statistical Theory. Kui Zhag, 0 Chapter 6 Priciples of Data Reductio Sectio 6. Itroductio Goal: To summarize or reduce the data X, X,, X to get iformatio about a

More information

Machine Learning Brett Bernstein

Machine Learning Brett Bernstein Machie Learig Brett Berstei Week Lecture: Cocept Check Exercises Starred problems are optioal. Statistical Learig Theory. Suppose A = Y = R ad X is some other set. Furthermore, assume P X Y is a discrete

More information

Notes 19 : Martingale CLT

Notes 19 : Martingale CLT Notes 9 : Martigale CLT Math 733-734: Theory of Probability Lecturer: Sebastie Roch Refereces: [Bil95, Chapter 35], [Roc, Chapter 3]. Sice we have ot ecoutered weak covergece i some time, we first recall

More information

Algorithms for Clustering

Algorithms for Clustering CR2: Statistical Learig & Applicatios Algorithms for Clusterig Lecturer: J. Salmo Scribe: A. Alcolei Settig: give a data set X R p where is the umber of observatio ad p is the umber of features, we wat

More information

Lectures 12 and 13 - Complexity Penalized Maximum Likelihood Estimation

Lectures 12 and 13 - Complexity Penalized Maximum Likelihood Estimation Lectures ad 3 - Complexity Pealized Maximum Likelihood Estimatio Rui Castro May 5, 03 Itroductio As you leared i previous courses, if we have a statistical model we ca ofte estimate ukow parameters by

More information

Lecture 1: Basic problems of coding theory

Lecture 1: Basic problems of coding theory Lecture 1: Basic problems of codig theory Error-Correctig Codes (Sprig 016) Rutgers Uiversity Swastik Kopparty Scribes: Abhishek Bhrushudi & Aditya Potukuchi Admiistrivia was discussed at the begiig of

More information

Infinite Sequences and Series

Infinite Sequences and Series Chapter 6 Ifiite Sequeces ad Series 6.1 Ifiite Sequeces 6.1.1 Elemetary Cocepts Simply speakig, a sequece is a ordered list of umbers writte: {a 1, a 2, a 3,...a, a +1,...} where the elemets a i represet

More information

Output Analysis and Run-Length Control

Output Analysis and Run-Length Control IEOR E4703: Mote Carlo Simulatio Columbia Uiversity c 2017 by Marti Haugh Output Aalysis ad Ru-Legth Cotrol I these otes we describe how the Cetral Limit Theorem ca be used to costruct approximate (1 α%

More information

We are mainly going to be concerned with power series in x, such as. (x)} converges - that is, lims N n

We are mainly going to be concerned with power series in x, such as. (x)} converges - that is, lims N n Review of Power Series, Power Series Solutios A power series i x - a is a ifiite series of the form c (x a) =c +c (x a)+(x a) +... We also call this a power series cetered at a. Ex. (x+) is cetered at

More information

Feedback in Iterative Algorithms

Feedback in Iterative Algorithms Feedback i Iterative Algorithms Charles Byre (Charles Byre@uml.edu), Departmet of Mathematical Scieces, Uiversity of Massachusetts Lowell, Lowell, MA 01854 October 17, 2005 Abstract Whe the oegative system

More information

An Introduction to Randomized Algorithms

An Introduction to Randomized Algorithms A Itroductio to Radomized Algorithms The focus of this lecture is to study a radomized algorithm for quick sort, aalyze it usig probabilistic recurrece relatios, ad also provide more geeral tools for aalysis

More information

Lecture 8: Convergence of transformations and law of large numbers

Lecture 8: Convergence of transformations and law of large numbers Lecture 8: Covergece of trasformatios ad law of large umbers Trasformatio ad covergece Trasformatio is a importat tool i statistics. If X coverges to X i some sese, we ofte eed to check whether g(x ) coverges

More information

UC Berkeley CS 170: Efficient Algorithms and Intractable Problems Handout 17 Lecturer: David Wagner April 3, Notes 17 for CS 170

UC Berkeley CS 170: Efficient Algorithms and Intractable Problems Handout 17 Lecturer: David Wagner April 3, Notes 17 for CS 170 UC Berkeley CS 170: Efficiet Algorithms ad Itractable Problems Hadout 17 Lecturer: David Wager April 3, 2003 Notes 17 for CS 170 1 The Lempel-Ziv algorithm There is a sese i which the Huffma codig was

More information

b i u x i U a i j u x i u x j

b i u x i U a i j u x i u x j M ath 5 2 7 Fall 2 0 0 9 L ecture 1 9 N ov. 1 6, 2 0 0 9 ) S ecod- Order Elliptic Equatios: Weak S olutios 1. Defiitios. I this ad the followig two lectures we will study the boudary value problem Here

More information

1 Approximating Integrals using Taylor Polynomials

1 Approximating Integrals using Taylor Polynomials Seughee Ye Ma 8: Week 7 Nov Week 7 Summary This week, we will lear how we ca approximate itegrals usig Taylor series ad umerical methods. Topics Page Approximatig Itegrals usig Taylor Polyomials. Defiitios................................................

More information

Sequences and Series of Functions

Sequences and Series of Functions Chapter 6 Sequeces ad Series of Fuctios 6.1. Covergece of a Sequece of Fuctios Poitwise Covergece. Defiitio 6.1. Let, for each N, fuctio f : A R be defied. If, for each x A, the sequece (f (x)) coverges

More information

Approximations and more PMFs and PDFs

Approximations and more PMFs and PDFs Approximatios ad more PMFs ad PDFs Saad Meimeh 1 Approximatio of biomial with Poisso Cosider the biomial distributio ( b(k,,p = p k (1 p k, k λ: k Assume that is large, ad p is small, but p λ at the limit.

More information

Point Estimation: properties of estimators 1 FINITE-SAMPLE PROPERTIES. finite-sample properties (CB 7.3) large-sample properties (CB 10.

Point Estimation: properties of estimators 1 FINITE-SAMPLE PROPERTIES. finite-sample properties (CB 7.3) large-sample properties (CB 10. Poit Estimatio: properties of estimators fiite-sample properties CB 7.3) large-sample properties CB 10.1) 1 FINITE-SAMPLE PROPERTIES How a estimator performs for fiite umber of observatios. Estimator:

More information