10-704: Information Processing and Learning Spring Lecture 10: Feb 12
|
|
- Piers Morris
- 5 years ago
- Views:
Transcription
1 10-704: Iformatio Processig ad Learig Sprig 2015 Lecture 10: Feb 12 Lecturer: Akshay Krishamurthy Scribe: Dea Asta, Kirthevasa Kadasamy Disclaimer: These otes have ot bee subjected to the usual scrutiy reserved for formal publicatios. They may be distributed outside this class oly with the permissio of the Istructor Codes Codes are fuctios that covert strigs over some alphabet ito (typically shorter) strigs over aother alphabet. Recall, the codig problem X Ecoder C() Σ Here Σ is the dictioary (e.g. for biary codes Σ = {0, 1}). Our goal is to have low Epected code legth with respect to the distributio p of, l(c) = E p l() where l() is the legth of C() Taoomy of codes Let X be a radom variable takig values i a set X. Let deote Kleee closure of the dictioary ad C be the code i.e C : X Σ. The etesio of the code C is a code of the form C : X Σ defied by C( ) = C( 1 )C( 2 )... C( ), = 0, 1,..., 1, 2,..., X. Listed below are they types of codes we saw i class. Codes Descriptio of C Nosigular C ijective i.e., X, = C() = C( ) Uiquely Decodable The etesio the code is osigular. Prefi/Istataeous/Self-puctuatig No code word prefies aother: for all distict, X, C( ) does ot start with C( ) We illustrate this below with the followig eample for the symbols {a, b, c, d} take from Chapter 5 i Cover ad Thomas. X Sigular Nosigular Uiquely Decodable Istataeous a b c d We begi with the followig importat results. 10-1
2 10-2 Lecture 10: Feb 12 Theorem 10.1 (Kraft-McMilla Iequality) For ay uiquely decodable code C : X Σ where D = Σ, D l() 1. (10.1) Coversely, for all sets {l()} X of umbers satisfyig (12.1), there eists a prefi code C : X {1, 2,..., D} such that l() is the legth of C() for each. Theorem H(X) l(c) for all uiquely decodable codes. 2. For ay ɛ > 0, there eists large eough ad a code C : X Σ such that l(c ) H(X) + ɛ. ɛ > 0. Proof: The proof for the first statemet follows by solvig the cove program mi X p()l() subject to the costraits D l() 1. p() = 1. For the secod, we ca use the Shao code i blocks. That is, for a -legth sequece 1 = use the code leghts l ( 1 ) = log D p( 1 ) so that H(X) = H(X 1 ) El (X 1 ) H(X 1 ) + 1 = H(X) + 1 Propositio 10.3 The ideal codelegths for a prefi code with smallest epected codelegth are l () = log D 1 p() (Shao iformatio cotet) Proof: I last class, we showed that for all legth fuctios l of prefi codes, E[l ()] = H p (X) E[l()]. While Shao etropies are ot iteger-valued ad hece caot be the legths of code words, the itegers { log D 1 p() } X satisfy the Kraft-McMilla Iequality ad hece there eists some uiquely decodable code C for which H p () E[l()] < H p () + 1, X (10.2) by Theorem Such a code is called Shao code. Moreover, the legths of code words for such a code C achieve the etropy for X asymptotically, i.e. if Shao codes are costructed for strigs of symbols where, istead of idividual symbols. Assumig X 1, X 2,... form a iid process, for all = 0, 1,... H(X) = H(X 1, X 2,..., X ) E[l( 1,..., )] < H(X 1, X 2,..., X ) + 1 = H(X) + 1 by (12.2), ad hece E[ l(1,...,) ] H(X). If X 1, X 2,... form a startioary process, the a similar argumet shows that E[ l(1,...,) ] H(X ), where H(X ) is the etropy rate of the process.
3 Lecture 10: Feb Theorem 10.4 (Shao Source Codig Theorem) A collectio of iid radom variables, each with etropy H(X), ca be compressed ito H(X) bits o average with egligible loss as. Coversely, o uiquely decodable code ca compress them to less tha H(X) bits without loss of iformatio No-sigular vs. Uiquely decodable codes Ca we gai aythig by givig up uique decodability ad oly requirig the code to be o-sigular? First, the questio is ot really fair because we caot decode sequece of symbols each ecoded with a o-sigular code easily. Secod, (as we argue below) o-sigular codes oly provide a small improvemet i epected codelegth over etropy. Theorem 10.5 The legth of a o-sigular code satisifes D l() l ma ad for ay probability distributio p o X, the code has epected legth E[l(X)] = p()l() H D (X) log D l ma. Proof: Let a l deote the umber of uique codewords of legth l. The a l D l sice o codeword ca be repeated due to o-sigularity. Usig this l ma l ma D l() = a l D l D l D l = l ma. The epected codelegth ca be obtaied by solvig the followig optimizatio problem: mi p()l() subject to D l l ma, l=1 the cove o-sigularity code costrait. Differetiatig the Lagragia p l +λ D l with respect to l ad otig that at the global miimum (λ, l ) it must be zero, we get : which implies that D l = p λ l D. l=1 p λ D l l D = 0 Usig complemetary slackess, otig that λ > 0 for the above coditio to make sese, we have : D l = p λ l D = l ma which implies λ = 1/(l ma l D) ad hece D l = p l ma, or the optimum legth l = log D (p l ma ). This gives the epected miimum codelegth for osigular codes as p l = p log D (p l ma ) = H D (X) log D l ma. I last lecture, we saw a eample of a o-sigular code for a process which has epected legth below etropy. However, this is oly true whe ecodig idividual symbols. As a direct corollary of the above result, if symbol strigs of legth are ecoded usig a o-sigular code, the E[l(X )] H(X ) log D (l ma )
4 10-4 Lecture 10: Feb 12 Thus, the epected legth per symbol ca t be much smaller tha the etropy (for iid processes) or etropy rate (for statioary processes) asymptotically eve for o-sigular codes, sice the secod term divided by is egligible. Thus, o-sigular codes do t offer much improvemet over uiquely decodable ad prefi codes. I fact, the followig result shows that ay o-sigular code ca be coverted ito a prefi code while oly icreasig the codelegth per symbol by a amout that is egligible asymptotically Huffma Codig Is there a prefi code with epected legth shorter tha Shao code? The aswer is yes. The optimal (shortest epected legth) prefi code for a give distributio ca be costructed by a simple algorithm due to Huffma. We itroduce a optimal symbol code, called a Huffma code, that admits a simple algorithm for its implemetatio. We fi Σ = {0, 1} ad hece cosider biary codes, although the procedure described here readily adapts for more geeral Σ. Simply, we defie the Huffma code C : X {0, 1} as the codig scheme that builds a biary tree from leaves up - takes the two symbols havig the least probabilities, assigs them equal legths, merges them, ad the reiterates the etire process. Formally, we describe the code as follows. Let X = { 1,..., N }, p 1 = p( 1 ), p 2 = p 2 ( 2 ),... p N = p( N ). The procedure Huff is defied as follows: Huff (p 1,..., p N ): if N > 2 the C(1) 0, C(2) 1 else sort p 1 p 2... p N C Huff(p 1, p 2,..., p N 2, p N 1 + p N ) for each i if i N 2 the C(i) C (i) else if i = N 1 the C(i) C (N 1) 0 else C(i) C (N 1) 1 retur C For eample, cosider the followig probability distributio: symbol a b c d e f g p i Huffma code The Huffma tree is build usig the procedure described above. The two least probable symbols at the first iteratio are a ad f, so they are merged ito oe ew symbol af with probability = At the secod iteratio, the two least probable symbols are af ad g which are the combied ad so o. The resultig Huffma tree is show below.
5 Lecture 10: Feb a f g c The Huffma code for a symbol i the alphabet {a, b, c, d, e, f, g} ca ow be read startig from the root of the tree ad traversig dow the tree util is reached; each leftwards movemet suffies a 0 bit ad each rightwards movemet adds a trailig 1, resultig i the code show above i the table. Remark 1: If more tha two symbols have the same probability at ay iteratio, the the Huffma codig may ot be uique (depedig o the order i which they are merged). However, all Huffma codigs o that alphabet are optimal i the sese they will yield the same epected codelegth. Remark 2: Oe might thik of aother alterate procedure to assig small codelegths by buildig a tree top-dow istead, e.g. divide the symbols ito two sets with almost equal probabilities ad repeatig. While ituitively appealig, this procedure is suboptimal ad leads to a larger epected codelegth tha the Huffma ecodig. You should try this o the symbol distributio described above. Remark 3: For a D-ary ecodig, the procedure is similar ecept D least probable symbols are merged at each step. Sice the total umber of symbols may ot be eough to allow D variables to be merged at each step, we might eed to add some dummy symbols with 0 probability before costructig the Huffma tree. How may dummy symbols eed to be added? Sice the first iteratio merges D symbols ad the each iteratio combies D-1 symbols with a merged symbols, if the procedure is to last for k (some iteger umber of) iteratios, the the total umber of source symbols eeded is 1 + k(d 1). So before begiig the Huffma procedure, we add eough dummy symbols so that the total umber of symbols look like 1+k(D 1) for the smallest possible value of k. Now we will show that the Huffma procedure is ideed optimal, i.e. it yields the smallest epected codelegth for ay prefi code. Sice there ca be may optimal codes (e.g. flippig bits i a code still leads to a code with same codelegth, also echagig source symbols with same codelegth still yields a optimal code) ad Huffma codig oly fids oe of them, lets first characterize some properties of optimal codes. Assume the source symbols 1,..., N X are ordered so that p 1 p 2 p N. For brevity, we write l i for l( i ) for each i = 1,..., N. We first observe some properties of geeral optimal prefi codes. d b e Lemma 10.6 For ay distributio, a optimal prefi code eists that satisfies: 1. if p j > p k, the l j l k. 2. The two logest codewords have the same legth ad correspod to the two least likely symbols. 3. The two logest codewords oly differ i the last two bits. Proof: The collectio of prefi codes is well-ordered uder epected legths of code words. Hece there eists a (ot ecessarily uique) optimal prefi code. To see (1), suppose C is a optimal prefi code. Let C be the code iterchagig C( j ) ad C( k ) for some j < k (so that p j p k ). The 0 L(C ) L(C) = i p i l i i p i l i = p j l k + p k l j p j l j p k l k = (p j p k )(l k l j )
6 10-6 Lecture 10: Feb 12 ad hece l k l j 0, or equivaletly, l j l k. To see (2), ote that if the two logest codewords had differig legths, a bit ca be removed from the ed of the logest codeword while remaiig a prefi code ad hece have strictly lower epected legth. A applicatio of (1) yields (2) sice it tells us that the logest codewords correspod to the least likely symbols. We claim that Huffma codes are optimal, at least amog all prefi codes. Because our proof ivolves multiple codes, we avoid ambiguity by writig L(C) for the epected legth of a code word coded by C, for each C. Propositio 10.7 Huffma codes are optimal prefi codes. Proof: Defie a sequece {A N } N=2,..., X of sets of source symbols, ad associated probabilities P N = {p 1, p 2,..., p N 1, p N + p N p X }. Let C N deote a huffma ecodig o the set of source symbols A N with probabilities P N. We iduct o the size of the alphabets N. 1. For the base case N = 2, the Huffma code maps 1 ad 2 to oe bit each ad is hece optimal. 2. Iductively assume that the Huffma code C N 1 is a optimal prefi code. 3. We will show that the Huffma code C N is also a optimal prefi code. Notice that the code C N 1 is formed by takig the commo prefi of the two logest codewords (leastlikely symbols) i { 1,..., N } ad allottig it to a symbol with epected legth p N 1 + p N. I other words, the Huffma tree for the merged alphabet is the merge of the Huffma tree for the origial alphabet. This is true simply by the defiitio of the Huffma procedure. Let l i deote the legth of the codeword for symbol i i C N ad let l i deote the legth of symbol i i C N 1. The L(C N ) = = N 2 i=1 N 2 i=1 p i l i + p N 1 l N 1 + p N l N p i l i + (p N 1 + p N )l N 1 +(p N 1 + p N ) } {{ } L(C N 1 ) the last lie followig from the Huffma costructio. Suppose, to the cotrary, that C N were ot optimal. Let C N be optimal (eistece is guarateed by previous Lemma). We ca take C N 1 to be obtaied by mergig the two least likely symbols which have same legth by Lemma But the L( C N ) = L( C N 1 ) + (p N 1 + p N ) L(C N 1 ) + (p N 1 + p N ) = L(C N ) where the iequality holds sice C N 1 is optimal. Hece, C N had to be optimal. Remarks 10.8 The umbers p 1, p 2,..., p N eed ot be probabilities - just weights {w i } takig arbitrary o-egative values. Huffma ecodig i this case results i a code with miimum i p iw i. Remarks 10.9 Sice Huffma codes are optimal prefi codes, they satisfy H(X) E[l(X)] < H(X) + 1, same as Shao code. However, epected legth of Huffma codes is ever loger tha that of a Shao code, eve though for ay give idividual symbol either Shao or Huffma code may assig a shorter codelegth.
7 Lecture 10: Feb Remarks Huffma codes are ofte udesirable i practice because they caot easily accomodate chagig source distributios. We ofte desire codes that ca icorporate refied iformatio o the probability distributios of strigs, ot just the distributios of idividual source symbols (e.g. Eglish laguage.) The huffma codig tree eeds to be recomputed for differet source distributios (e.g. Eglish vs Frech) Coectios to Machie Learig Cosider the classical statistical learig set up. We have data Z from some distributio p. We wat to have good predictio o some learig task which is ofte characterised via, f = argmi E Z p [L(Z, f(z))] f F where l is some loss fuctio ad R(f) = E Z p [L(Z, f(z))] is the risk. For eample i liear regressio Z = (X, Y ) ad L(Z, f(z)) = (Y f(x)) 2 where f(x) = β X the the objective becomes, β = argmi β E X,Y p [ (Y β X) 2]. However, the challege is that we do t see the true distributio p. Istead we have samples Z 1 = {Z 1,..., Z } where Z i p. A atural idea is to miimize the emprical risk, 1 ˆf = argmi E Z ˆp [L(Z, f(z))] = argmi f F f F L(Z i, f(z i )) Deote the empirical risk by R (f) = E Z ˆp [L(Z, f(z))] This procedure is called Empirical Risk Miimizatio (ERM). But does ERM work? We will begi this discussio by cosiderig bouded loss fuctios l [0, 1]. We shall eted this i the et lecture. For this, we wish to boud the ecess risk, R( ˆf) R(f ). Notig that R ( ˆf) R (f ), we see that R( ˆf) R(f ) R( ˆf) R ( ˆf) + R(f ) R (f ). Therefore it is sufficiet to have a uiform deviatio boud of the form f F, R (f) R(f) ɛ. Usig Hoeffdig iquality we see that for ay fied f F, with probability > 1 δ log(2/δ) R (f) R(f) 2 Therefore if F is fiite we see that with probability > 1 δ log(2 F /δ) f F R (f) R(f) 2 which gives us the desired uiform deviatio boud. i=1 (10.3) Note that we got the uiform deviatio boud by assigig failure probability δ/ F to each fuctio ad the usig the uio boud. But to apply the uio boud it is sufficiet if F is coutable. Suppose we have a distributio ζ over F. The we ca assig a failure probability of δ(f) = δζ(f) for each f F ad apply the uio boud. I particular, if we ca assig a prefi code to the class the we ca use the distributio ζ(f) = 2 l(f). f ζ(f) 1 by the Kraft iquality. This gives a deviatio boud of the form, R (f) R(f) (1 + l(f)) log(2/δ) 2 = ɛ(f)
Lecture 10: Universal coding and prediction
0-704: Iformatio Processig ad Learig Sprig 0 Lecture 0: Uiversal codig ad predictio Lecturer: Aarti Sigh Scribes: Georg M. Goerg Disclaimer: These otes have ot bee subjected to the usual scrutiy reserved
More informationInformation Theory and Statistics Lecture 4: Lempel-Ziv code
Iformatio Theory ad Statistics Lecture 4: Lempel-Ziv code Łukasz Dębowski ldebowsk@ipipa.waw.pl Ph. D. Programme 203/204 Etropy rate is the limitig compressio rate Theorem For a statioary process (X i)
More informationLecture 14: Graph Entropy
15-859: Iformatio Theory ad Applicatios i TCS Sprig 2013 Lecture 14: Graph Etropy March 19, 2013 Lecturer: Mahdi Cheraghchi Scribe: Euiwoog Lee 1 Recap Bergma s boud o the permaet Shearer s Lemma Number
More informationMachine Learning Brett Bernstein
Machie Learig Brett Berstei Week 2 Lecture: Cocept Check Exercises Starred problems are optioal. Excess Risk Decompositio 1. Let X = Y = {1, 2,..., 10}, A = {1,..., 10, 11} ad suppose the data distributio
More informationIntro to Learning Theory
Lecture 1, October 18, 2016 Itro to Learig Theory Ruth Urer 1 Machie Learig ad Learig Theory Comig soo 2 Formal Framework 21 Basic otios I our formal model for machie learig, the istaces to be classified
More informationProduct measures, Tonelli s and Fubini s theorems For use in MAT3400/4400, autumn 2014 Nadia S. Larsen. Version of 13 October 2014.
Product measures, Toelli s ad Fubii s theorems For use i MAT3400/4400, autum 2014 Nadia S. Larse Versio of 13 October 2014. 1. Costructio of the product measure The purpose of these otes is to preset the
More informationECE 901 Lecture 12: Complexity Regularization and the Squared Loss
ECE 90 Lecture : Complexity Regularizatio ad the Squared Loss R. Nowak 5/7/009 I the previous lectures we made use of the Cheroff/Hoeffdig bouds for our aalysis of classifier errors. Hoeffdig s iequality
More information4.3 Growth Rates of Solutions to Recurrences
4.3. GROWTH RATES OF SOLUTIONS TO RECURRENCES 81 4.3 Growth Rates of Solutios to Recurreces 4.3.1 Divide ad Coquer Algorithms Oe of the most basic ad powerful algorithmic techiques is divide ad coquer.
More informationShannon s noiseless coding theorem
18.310 lecture otes May 4, 2015 Shao s oiseless codig theorem Lecturer: Michel Goemas I these otes we discuss Shao s oiseless codig theorem, which is oe of the foudig results of the field of iformatio
More informationLecture 11: Pseudorandom functions
COM S 6830 Cryptography Oct 1, 2009 Istructor: Rafael Pass 1 Recap Lecture 11: Pseudoradom fuctios Scribe: Stefao Ermo Defiitio 1 (Ge, Ec, Dec) is a sigle message secure ecryptio scheme if for all uppt
More informationRademacher Complexity
EECS 598: Statistical Learig Theory, Witer 204 Topic 0 Rademacher Complexity Lecturer: Clayto Scott Scribe: Ya Deg, Kevi Moo Disclaimer: These otes have ot bee subjected to the usual scrutiy reserved for
More information6.3 Testing Series With Positive Terms
6.3. TESTING SERIES WITH POSITIVE TERMS 307 6.3 Testig Series With Positive Terms 6.3. Review of what is kow up to ow I theory, testig a series a i for covergece amouts to fidig the i= sequece of partial
More informationEmpirical Process Theory and Oracle Inequalities
Stat 928: Statistical Learig Theory Lecture: 10 Empirical Process Theory ad Oracle Iequalities Istructor: Sham Kakade 1 Risk vs Risk See Lecture 0 for a discussio o termiology. 2 The Uio Boud / Boferoi
More informationUC Berkeley CS 170: Efficient Algorithms and Intractable Problems Handout 17 Lecturer: David Wagner April 3, Notes 17 for CS 170
UC Berkeley CS 170: Efficiet Algorithms ad Itractable Problems Hadout 17 Lecturer: David Wager April 3, 2003 Notes 17 for CS 170 1 The Lempel-Ziv algorithm There is a sese i which the Huffma codig was
More informationAn Introduction to Randomized Algorithms
A Itroductio to Radomized Algorithms The focus of this lecture is to study a radomized algorithm for quick sort, aalyze it usig probabilistic recurrece relatios, ad also provide more geeral tools for aalysis
More informationMath 155 (Lecture 3)
Math 55 (Lecture 3) September 8, I this lecture, we ll cosider the aswer to oe of the most basic coutig problems i combiatorics Questio How may ways are there to choose a -elemet subset of the set {,,,
More information1 Convergence in Probability and the Weak Law of Large Numbers
36-752 Advaced Probability Overview Sprig 2018 8. Covergece Cocepts: i Probability, i L p ad Almost Surely Istructor: Alessadro Rialdo Associated readig: Sec 2.4, 2.5, ad 4.11 of Ash ad Doléas-Dade; Sec
More informationInfinite Sequences and Series
Chapter 6 Ifiite Sequeces ad Series 6.1 Ifiite Sequeces 6.1.1 Elemetary Cocepts Simply speakig, a sequece is a ordered list of umbers writte: {a 1, a 2, a 3,...a, a +1,...} where the elemets a i represet
More information(A sequence also can be thought of as the list of function values attained for a function f :ℵ X, where f (n) = x n for n 1.) x 1 x N +k x N +4 x 3
MATH 337 Sequeces Dr. Neal, WKU Let X be a metric space with distace fuctio d. We shall defie the geeral cocept of sequece ad limit i a metric space, the apply the results i particular to some special
More informationÉCOLE POLYTECHNIQUE FÉDÉRALE DE LAUSANNE
ÉCOLE POLYTECHNIQUE FÉDÉRALE DE LAUSANNE School of Computer ad Commuicatio Scieces Hadout Iformatio Theory ad Sigal Processig Compressio ad Quatizatio November 0, 207 Data compressio Notatio Give a set
More informationLecture 2. The Lovász Local Lemma
Staford Uiversity Sprig 208 Math 233A: No-costructive methods i combiatorics Istructor: Ja Vodrák Lecture date: Jauary 0, 208 Origial scribe: Apoorva Khare Lecture 2. The Lovász Local Lemma 2. Itroductio
More informationProblem Set 2 Solutions
CS271 Radomess & Computatio, Sprig 2018 Problem Set 2 Solutios Poit totals are i the margi; the maximum total umber of poits was 52. 1. Probabilistic method for domiatig sets 6pts Pick a radom subset S
More informationConvergence of random variables. (telegram style notes) P.J.C. Spreij
Covergece of radom variables (telegram style otes).j.c. Spreij this versio: September 6, 2005 Itroductio As we kow, radom variables are by defiitio measurable fuctios o some uderlyig measurable space
More informationLecture 6: Source coding, Typicality, and Noisy channels and capacity
15-859: Iformatio Theory ad Applicatios i TCS CMU: Sprig 2013 Lecture 6: Source codig, Typicality, ad Noisy chaels ad capacity Jauary 31, 2013 Lecturer: Mahdi Cheraghchi Scribe: Togbo Huag 1 Recap Uiversal
More informationEstimation for Complete Data
Estimatio for Complete Data complete data: there is o loss of iformatio durig study. complete idividual complete data= grouped data A complete idividual data is the oe i which the complete iformatio of
More informationMath F215: Induction April 7, 2013
Math F25: Iductio April 7, 203 Iductio is used to prove that a collectio of statemets P(k) depedig o k N are all true. A statemet is simply a mathematical phrase that must be either true or false. Here
More informationInduction: Solutions
Writig Proofs Misha Lavrov Iductio: Solutios Wester PA ARML Practice March 6, 206. Prove that a 2 2 chessboard with ay oe square removed ca always be covered by shaped tiles. Solutio : We iduct o. For
More informationProperties of Regular Languages. Reading: Chapter 4
Properties of Regular Laguages Readig: Chapter 4 Topics ) How to prove whether a give laguage is regular or ot? 2) Closure properties of regular laguages 3) Miimizatio of DFAs 2 Some laguages are ot regular
More information6 Integers Modulo n. integer k can be written as k = qn + r, with q,r, 0 r b. So any integer.
6 Itegers Modulo I Example 2.3(e), we have defied the cogruece of two itegers a,b with respect to a modulus. Let us recall that a b (mod ) meas a b. We have proved that cogruece is a equivalece relatio
More information1 Introduction to reducing variance in Monte Carlo simulations
Copyright c 010 by Karl Sigma 1 Itroductio to reducig variace i Mote Carlo simulatios 11 Review of cofidece itervals for estimatig a mea I statistics, we estimate a ukow mea µ = E(X) of a distributio by
More information1 Duality revisited. AM 221: Advanced Optimization Spring 2016
AM 22: Advaced Optimizatio Sprig 206 Prof. Yaro Siger Sectio 7 Wedesday, Mar. 9th Duality revisited I this sectio, we will give a slightly differet perspective o duality. optimizatio program: f(x) x R
More informationChapter 3. Strong convergence. 3.1 Definition of almost sure convergence
Chapter 3 Strog covergece As poited out i the Chapter 2, there are multiple ways to defie the otio of covergece of a sequece of radom variables. That chapter defied covergece i probability, covergece i
More informationInjections, Surjections, and the Pigeonhole Principle
Ijectios, Surjectios, ad the Pigeohole Priciple 1 (10 poits Here we will come up with a sloppy boud o the umber of parethesisestigs (a (5 poits Describe a ijectio from the set of possible ways to est pairs
More informationFrequentist Inference
Frequetist Iferece The topics of the ext three sectios are useful applicatios of the Cetral Limit Theorem. Without kowig aythig about the uderlyig distributio of a sequece of radom variables {X i }, for
More informationSequences. Notation. Convergence of a Sequence
Sequeces A sequece is essetially just a list. Defiitio (Sequece of Real Numbers). A sequece of real umbers is a fuctio Z (, ) R for some real umber. Do t let the descriptio of the domai cofuse you; it
More informationMA131 - Analysis 1. Workbook 3 Sequences II
MA3 - Aalysis Workbook 3 Sequeces II Autum 2004 Cotets 2.8 Coverget Sequeces........................ 2.9 Algebra of Limits......................... 2 2.0 Further Useful Results........................
More informationChapter 4. Fourier Series
Chapter 4. Fourier Series At this poit we are ready to ow cosider the caoical equatios. Cosider, for eample the heat equatio u t = u, < (4.) subject to u(, ) = si, u(, t) = u(, t) =. (4.) Here,
More informationLecture 15: Learning Theory: Concentration Inequalities
STAT 425: Itroductio to Noparametric Statistics Witer 208 Lecture 5: Learig Theory: Cocetratio Iequalities Istructor: Ye-Chi Che 5. Itroductio Recall that i the lecture o classificatio, we have see that
More informationIP Reference guide for integer programming formulations.
IP Referece guide for iteger programmig formulatios. by James B. Orli for 15.053 ad 15.058 This documet is iteded as a compact (or relatively compact) guide to the formulatio of iteger programs. For more
More informationMachine Learning Theory (CS 6783)
Machie Learig Theory (CS 6783) Lecture 2 : Learig Frameworks, Examples Settig up learig problems. X : istace space or iput space Examples: Computer Visio: Raw M N image vectorized X = 0, 255 M N, SIFT
More information1 Review and Overview
DRAFT a fial versio will be posted shortly CS229T/STATS231: Statistical Learig Theory Lecturer: Tegyu Ma Lecture #3 Scribe: Migda Qiao October 1, 2013 1 Review ad Overview I the first half of this course,
More information10-704: Information Processing and Learning Fall Lecture 10: Oct 3
0-704: Information Processing and Learning Fall 206 Lecturer: Aarti Singh Lecture 0: Oct 3 Note: These notes are based on scribed notes from Spring5 offering of this course. LaTeX template courtesy of
More informationIf a subset E of R contains no open interval, is it of zero measure? For instance, is the set of irrationals in [0, 1] is of measure zero?
2 Lebesgue Measure I Chapter 1 we defied the cocept of a set of measure zero, ad we have observed that every coutable set is of measure zero. Here are some atural questios: If a subset E of R cotais a
More informationFall 2013 MTH431/531 Real analysis Section Notes
Fall 013 MTH431/531 Real aalysis Sectio 8.1-8. Notes Yi Su 013.11.1 1. Defiitio of uiform covergece. We look at a sequece of fuctios f (x) ad study the coverget property. Notice we have two parameters
More informationOPTIMAL ALGORITHMS -- SUPPLEMENTAL NOTES
OPTIMAL ALGORITHMS -- SUPPLEMENTAL NOTES Peter M. Maurer Why Hashig is θ(). As i biary search, hashig assumes that keys are stored i a array which is idexed by a iteger. However, hashig attempts to bypass
More informationProblem Set 4 Due Oct, 12
EE226: Radom Processes i Systems Lecturer: Jea C. Walrad Problem Set 4 Due Oct, 12 Fall 06 GSI: Assae Gueye This problem set essetially reviews detectio theory ad hypothesis testig ad some basic otios
More informationThe multiplicative structure of finite field and a construction of LRC
IERG6120 Codig for Distributed Storage Systems Lecture 8-06/10/2016 The multiplicative structure of fiite field ad a costructio of LRC Lecturer: Keeth Shum Scribe: Zhouyi Hu Notatios: We use the otatio
More informationMachine Learning Theory Tübingen University, WS 2016/2017 Lecture 12
Machie Learig Theory Tübige Uiversity, WS 06/07 Lecture Tolstikhi Ilya Abstract I this lecture we derive risk bouds for kerel methods. We will start by showig that Soft Margi kerel SVM correspods to miimizig
More informationSeunghee Ye Ma 8: Week 5 Oct 28
Week 5 Summary I Sectio, we go over the Mea Value Theorem ad its applicatios. I Sectio 2, we will recap what we have covered so far this term. Topics Page Mea Value Theorem. Applicatios of the Mea Value
More information10-701/ Machine Learning Mid-term Exam Solution
0-70/5-78 Machie Learig Mid-term Exam Solutio Your Name: Your Adrew ID: True or False (Give oe setece explaatio) (20%). (F) For a cotiuous radom variable x ad its probability distributio fuctio p(x), it
More informationEntropies & Information Theory
Etropies & Iformatio Theory LECTURE I Nilajaa Datta Uiversity of Cambridge,U.K. For more details: see lecture otes (Lecture 1- Lecture 5) o http://www.qi.damtp.cam.ac.uk/ode/223 Quatum Iformatio Theory
More informationEntropy and Ergodic Theory Lecture 5: Joint typicality and conditional AEP
Etropy ad Ergodic Theory Lecture 5: Joit typicality ad coditioal AEP 1 Notatio: from RVs back to distributios Let (Ω, F, P) be a probability space, ad let X ad Y be A- ad B-valued discrete RVs, respectively.
More informationLecture 9: Expanders Part 2, Extractors
Lecture 9: Expaders Part, Extractors Topics i Complexity Theory ad Pseudoradomess Sprig 013 Rutgers Uiversity Swastik Kopparty Scribes: Jaso Perry, Joh Kim I this lecture, we will discuss further the pseudoradomess
More informationLecture 1: Basic problems of coding theory
Lecture 1: Basic problems of codig theory Error-Correctig Codes (Sprig 016) Rutgers Uiversity Swastik Kopparty Scribes: Abhishek Bhrushudi & Aditya Potukuchi Admiistrivia was discussed at the begiig of
More informationLecture 7: October 18, 2017
Iformatio ad Codig Theory Autum 207 Lecturer: Madhur Tulsiai Lecture 7: October 8, 207 Biary hypothesis testig I this lecture, we apply the tools developed i the past few lectures to uderstad the problem
More informationw (1) ˆx w (1) x (1) /ρ and w (2) ˆx w (2) x (2) /ρ.
2 5. Weighted umber of late jobs 5.1. Release dates ad due dates: maximimizig the weight of o-time jobs Oce we add release dates, miimizig the umber of late jobs becomes a sigificatly harder problem. For
More informationIt is always the case that unions, intersections, complements, and set differences are preserved by the inverse image of a function.
MATH 532 Measurable Fuctios Dr. Neal, WKU Throughout, let ( X, F, µ) be a measure space ad let (!, F, P ) deote the special case of a probability space. We shall ow begi to study real-valued fuctios defied
More informationlim za n n = z lim a n n.
Lecture 6 Sequeces ad Series Defiitio 1 By a sequece i a set A, we mea a mappig f : N A. It is customary to deote a sequece f by {s } where, s := f(). A sequece {z } of (complex) umbers is said to be coverget
More informationOptimally Sparse SVMs
A. Proof of Lemma 3. We here prove a lower boud o the umber of support vectors to achieve geeralizatio bouds of the form which we cosider. Importatly, this result holds ot oly for liear classifiers, but
More informationRegression with quadratic loss
Regressio with quadratic loss Maxim Ragisky October 13, 2015 Regressio with quadratic loss is aother basic problem studied i statistical learig theory. We have a radom couple Z = X,Y, where, as before,
More informationAda Boost, Risk Bounds, Concentration Inequalities. 1 AdaBoost and Estimates of Conditional Probabilities
CS8B/Stat4B Sprig 008) Statistical Learig Theory Lecture: Ada Boost, Risk Bouds, Cocetratio Iequalities Lecturer: Peter Bartlett Scribe: Subhrasu Maji AdaBoost ad Estimates of Coditioal Probabilities We
More informationLecture 27. Capacity of additive Gaussian noise channel and the sphere packing bound
Lecture 7 Ageda for the lecture Gaussia chael with average power costraits Capacity of additive Gaussia oise chael ad the sphere packig boud 7. Additive Gaussia oise chael Up to this poit, we have bee
More information1 Review and Overview
CS9T/STATS3: Statistical Learig Theory Lecturer: Tegyu Ma Lecture #6 Scribe: Jay Whag ad Patrick Cho October 0, 08 Review ad Overview Recall i the last lecture that for ay family of scalar fuctios F, we
More informationREGRESSION WITH QUADRATIC LOSS
REGRESSION WITH QUADRATIC LOSS MAXIM RAGINSKY Regressio with quadratic loss is aother basic problem studied i statistical learig theory. We have a radom couple Z = X, Y ), where, as before, X is a R d
More informationSequences A sequence of numbers is a function whose domain is the positive integers. We can see that the sequence
Sequeces A sequece of umbers is a fuctio whose domai is the positive itegers. We ca see that the sequece 1, 1, 2, 2, 3, 3,... is a fuctio from the positive itegers whe we write the first sequece elemet
More informationLecture 2: April 3, 2013
TTIC/CMSC 350 Mathematical Toolkit Sprig 203 Madhur Tulsiai Lecture 2: April 3, 203 Scribe: Shubhedu Trivedi Coi tosses cotiued We retur to the coi tossig example from the last lecture agai: Example. Give,
More informationREAL ANALYSIS II: PROBLEM SET 1 - SOLUTIONS
REAL ANALYSIS II: PROBLEM SET 1 - SOLUTIONS 18th Feb, 016 Defiitio (Lipschitz fuctio). A fuctio f : R R is said to be Lipschitz if there exists a positive real umber c such that for ay x, y i the domai
More informationPROBLEM SET 5 SOLUTIONS 126 = , 37 = , 15 = , 7 = 7 1.
Math 7 Sprig 06 PROBLEM SET 5 SOLUTIONS Notatios. Give a real umber x, we will defie sequeces (a k ), (x k ), (p k ), (q k ) as i lecture.. (a) (5 pts) Fid the simple cotiued fractio represetatios of 6
More informationThe Boolean Ring of Intervals
MATH 532 Lebesgue Measure Dr. Neal, WKU We ow shall apply the results obtaied about outer measure to the legth measure o the real lie. Throughout, our space X will be the set of real umbers R. Whe ecessary,
More informationLecture 11: Channel Coding Theorem: Converse Part
EE376A/STATS376A Iformatio Theory Lecture - 02/3/208 Lecture : Chael Codig Theorem: Coverse Part Lecturer: Tsachy Weissma Scribe: Erdem Bıyık I this lecture, we will cotiue our discussio o chael codig
More informationMachine Learning Theory Tübingen University, WS 2016/2017 Lecture 11
Machie Learig Theory Tübige Uiversity, WS 06/07 Lecture Tolstikhi Ilya Abstract We will itroduce the otio of reproducig kerels ad associated Reproducig Kerel Hilbert Spaces (RKHS). We will cosider couple
More informationAdvanced Stochastic Processes.
Advaced Stochastic Processes. David Gamarik LECTURE 2 Radom variables ad measurable fuctios. Strog Law of Large Numbers (SLLN). Scary stuff cotiued... Outlie of Lecture Radom variables ad measurable fuctios.
More informationLecture Notes for Analysis Class
Lecture Notes for Aalysis Class Topological Spaces A topology for a set X is a collectio T of subsets of X such that: (a) X ad the empty set are i T (b) Uios of elemets of T are i T (c) Fiite itersectios
More information6.895 Essential Coding Theory October 20, Lecture 11. This lecture is focused in comparisons of the following properties/parameters of a code:
6.895 Essetial Codig Theory October 0, 004 Lecture 11 Lecturer: Madhu Suda Scribe: Aastasios Sidiropoulos 1 Overview This lecture is focused i comparisos of the followig properties/parameters of a code:
More informationSieve Estimators: Consistency and Rates of Convergence
EECS 598: Statistical Learig Theory, Witer 2014 Topic 6 Sieve Estimators: Cosistecy ad Rates of Covergece Lecturer: Clayto Scott Scribe: Julia Katz-Samuels, Brado Oselio, Pi-Yu Che Disclaimer: These otes
More informationMASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.436J/15.085J Fall 2008 Lecture 19 11/17/2008 LAWS OF LARGE NUMBERS II THE STRONG LAW OF LARGE NUMBERS
MASSACHUSTTS INSTITUT OF TCHNOLOGY 6.436J/5.085J Fall 2008 Lecture 9 /7/2008 LAWS OF LARG NUMBRS II Cotets. The strog law of large umbers 2. The Cheroff boud TH STRONG LAW OF LARG NUMBRS While the weak
More information# fixed points of g. Tree to string. Repeatedly select the leaf with the smallest label, write down the label of its neighbour and remove the leaf.
Combiatorics Graph Theory Coutig labelled ad ulabelled graphs There are 2 ( 2) labelled graphs of order. The ulabelled graphs of order correspod to orbits of the actio of S o the set of labelled graphs.
More informationDiscrete Mathematics for CS Spring 2007 Luca Trevisan Lecture 22
CS 70 Discrete Mathematics for CS Sprig 2007 Luca Trevisa Lecture 22 Aother Importat Distributio The Geometric Distributio Questio: A biased coi with Heads probability p is tossed repeatedly util the first
More informationHomework 9. (n + 1)! = 1 1
. Chapter : Questio 8 If N, the Homewor 9 Proof. We will prove this by usig iductio o. 2! + 2 3! + 3 4! + + +! +!. Base step: Whe the left had side is. Whe the right had side is 2! 2 +! 2 which proves
More informationOutput Analysis and Run-Length Control
IEOR E4703: Mote Carlo Simulatio Columbia Uiversity c 2017 by Marti Haugh Output Aalysis ad Ru-Legth Cotrol I these otes we describe how the Cetral Limit Theorem ca be used to costruct approximate (1 α%
More informationChapter 6 Infinite Series
Chapter 6 Ifiite Series I the previous chapter we cosidered itegrals which were improper i the sese that the iterval of itegratio was ubouded. I this chapter we are goig to discuss a topic which is somewhat
More informationMachine Learning Theory Tübingen University, WS 2016/2017 Lecture 3
Machie Learig Theory Tübige Uiversity, WS 06/07 Lecture 3 Tolstikhi Ilya Abstract I this lecture we will prove the VC-boud, which provides a high-probability excess risk boud for the ERM algorithm whe
More informationMachine Learning for Data Science (CS 4786)
Machie Learig for Data Sciece CS 4786) Lecture & 3: Pricipal Compoet Aalysis The text i black outlies high level ideas. The text i blue provides simple mathematical details to derive or get to the algorithm
More informationLecture 12: February 28
10-716: Advaced Machie Learig Sprig 2019 Lecture 12: February 28 Lecturer: Pradeep Ravikumar Scribes: Jacob Tyo, Rishub Jai, Ojash Neopae Note: LaTeX template courtesy of UC Berkeley EECS dept. Disclaimer:
More informationSection 5.1 The Basics of Counting
1 Sectio 5.1 The Basics of Coutig Combiatorics, the study of arragemets of objects, is a importat part of discrete mathematics. I this chapter, we will lear basic techiques of coutig which has a lot of
More informationMath 299 Supplement: Real Analysis Nov 2013
Math 299 Supplemet: Real Aalysis Nov 203 Algebra Axioms. I Real Aalysis, we work withi the axiomatic system of real umbers: the set R alog with the additio ad multiplicatio operatios +,, ad the iequality
More informationLecture 3: August 31
36-705: Itermediate Statistics Fall 018 Lecturer: Siva Balakrisha Lecture 3: August 31 This lecture will be mostly a summary of other useful expoetial tail bouds We will ot prove ay of these i lecture,
More informationComputability and computational complexity
Computability ad computatioal complexity Lecture 4: Uiversal Turig machies. Udecidability Io Petre Computer Sciece, Åbo Akademi Uiversity Fall 2015 http://users.abo.fi/ipetre/computability/ 21. toukokuu
More informationLecture 11: Decision Trees
ECE9 Sprig 7 Statistical Learig Theory Istructor: R. Nowak Lecture : Decisio Trees Miimum Complexity Pealized Fuctio Recall the basic results of the last lectures: let X ad Y deote the iput ad output spaces
More informationLecture 2: Concentration Bounds
CSE 52: Desig ad Aalysis of Algorithms I Sprig 206 Lecture 2: Cocetratio Bouds Lecturer: Shaya Oveis Ghara March 30th Scribe: Syuzaa Sargsya Disclaimer: These otes have ot bee subjected to the usual scrutiy
More informationLecture 3 The Lebesgue Integral
Lecture 3: The Lebesgue Itegral 1 of 14 Course: Theory of Probability I Term: Fall 2013 Istructor: Gorda Zitkovic Lecture 3 The Lebesgue Itegral The costructio of the itegral Uless expressly specified
More informationMATH1035: Workbook Four M. Daws, 2009
MATH1035: Workbook Four M. Daws, 2009 Roots of uity A importat result which ca be proved by iductio is: De Moivre s theorem atural umber case: Let θ R ad N. The cosθ + i siθ = cosθ + i siθ. Proof: The
More informationIntegrable Functions. { f n } is called a determining sequence for f. If f is integrable with respect to, then f d does exist as a finite real number
MATH 532 Itegrable Fuctios Dr. Neal, WKU We ow shall defie what it meas for a measurable fuctio to be itegrable, show that all itegral properties of simple fuctios still hold, ad the give some coditios
More informationInformation Theory Tutorial Communication over Channels with memory. Chi Zhang Department of Electrical Engineering University of Notre Dame
Iformatio Theory Tutorial Commuicatio over Chaels with memory Chi Zhag Departmet of Electrical Egieerig Uiversity of Notre Dame Abstract A geeral capacity formula C = sup I(; Y ), which is correct for
More informationNotes for Lecture 11
U.C. Berkeley CS78: Computatioal Complexity Hadout N Professor Luca Trevisa 3/4/008 Notes for Lecture Eigevalues, Expasio, ad Radom Walks As usual by ow, let G = (V, E) be a udirected d-regular graph with
More informationEntropy Rates and Asymptotic Equipartition
Chapter 29 Etropy Rates ad Asymptotic Equipartitio Sectio 29. itroduces the etropy rate the asymptotic etropy per time-step of a stochastic process ad shows that it is well-defied; ad similarly for iformatio,
More information1 Approximating Integrals using Taylor Polynomials
Seughee Ye Ma 8: Week 7 Nov Week 7 Summary This week, we will lear how we ca approximate itegrals usig Taylor series ad umerical methods. Topics Page Approximatig Itegrals usig Taylor Polyomials. Defiitios................................................
More informationn outcome is (+1,+1, 1,..., 1). Let the r.v. X denote our position (relative to our starting point 0) after n moves. Thus X = X 1 + X 2 + +X n,
CS 70 Discrete Mathematics for CS Sprig 2008 David Wager Note 9 Variace Questio: At each time step, I flip a fair coi. If it comes up Heads, I walk oe step to the right; if it comes up Tails, I walk oe
More informationA sequence of numbers is a function whose domain is the positive integers. We can see that the sequence
Sequeces A sequece of umbers is a fuctio whose domai is the positive itegers. We ca see that the sequece,, 2, 2, 3, 3,... is a fuctio from the positive itegers whe we write the first sequece elemet as
More information