Minimax Redundancy for Large Alphabets by Analytic Methods
|
|
- Maud Townsend
- 5 years ago
- Views:
Transcription
1 Minimax Redundancy for Large Alphabets by Analytic Methods Wojciech Szpankowski Purdue University W. Lafayette, IN March 15, 2012 NSF STC Center for Science of Information CISS, Princeton 2012 Joint work with M. Weinberger
2 Outline 1. Source Coding: The Redundancy Rate Problem 2. Universal Memoryless Sources (a) Finite Alphabet (b) Unbounded Alphabet 3. Universal Renewal Sources
3 Source Coding and Redundancy Source coding aims at finding codesc : A {0,1} of the shortest length L(C,x), either on average or for individual sequences. Known Source P : The pointwise and maximal redundancy are: R n (C n,p;x n 1 ) = L(C n,x n 1 ) + logp(xn 1 ) R (C n,p) = max[l(c x n n,x n 1 ) + logp(xn 1 )] 1 where P(x n 1 ) is the probability of xn 1 = x 1 x n. Unknown Source P : Following Davisson, the maximal minimax redundancy R n (S) for a family of sources S is: R n (S) = minsup Cn P S max[l(c x n n,x n 1 ) + logp(xn 1 )]. 1
4 Source Coding and Redundancy Source coding aims at finding codesc : A {0,1} of the shortest length L(C,x), either on average or for individual sequences. Known Source P : The pointwise and maximal redundancy are: R n (C n,p;x n 1 ) = L(C n,x n 1 ) + logp(xn 1 ) R (C n,p) = max[l(c x n n,x n 1 ) + logp(xn 1 )] 1 where P(x n 1 ) is the probability of xn 1 = x 1 x n. Unknown Source P : Following Davisson, the maximal minimax redundancy R n (S) for a family of sources S is: R n (S) = minsup Cn P S max[l(c x n n,x n 1 ) + logp(xn 1 )]. 1 Shtarkov s Bound: d n (S) := log x n 1 An sup P S P(x n 1 ) R n (S) log x n 1 An sup P S P(x n 1 ) +1 } {{ } Dn(S)
5 Maximal Minimax Redundancy R n For the maximal minimax redundancy define Q (x n 1 ) := sup P S P(x n 1 ) y n1 An sup P S P(y n 1 ). the maximum likelihood distribution. Observe that (Shtarkov, 1976): R n (S) = min sup Cn C P S = min Cn C max x n 1 = min Cn C max x n 1 max(l(c n,x n 1 ) + logp(xn 1 )) x n 1 = R GS n (Q ) + log ( ) L(C n,x n 1 ) + sup logp(x n 1 ) P S ( L(Cn,x n 1 ) + logq (x n 1 )) + log y n 1 An sup P S P(y n 1 ) = log y n 1 An sup P S y n 1 An sup P S P(y n 1 ) P(y n 1 ) + O(1). where R GS n (Q ) is the redundancy of the optimal generalized Shannon code. Also, d n (S) = log sup x n 1 An P S P(x n 1 ) := logd n (S).
6 Learnable Information and Redundancy 1. S := M k = {P θ : θ Θ} set of k-dimensional parameterized distributions. Let ˆθ(x n ) = argmax θ Θ log1/p θ (x n ) be the ML estimator.
7 Learnable Information and Redundancy 1. S := M k = {P θ : θ Θ} set of k-dimensional parameterized distributions. Let ˆθ(x n ) = argmax θ Θ log1/p θ (x n ) be the ML estimator. C n ( Θ) balls ; log C n ( θ) useful bits θ θ'... 1 n θˆ indistinquishable models 2. Two models, say P θ (x n ) and P θ (x n ) are indistinguishable if the ML estimator ˆθ with high probability declares both models are the same. 3. The number of distinguishable distributions (i.e, ˆθ), C n (Θ), summarizes then learnable information, I(Θ) = log 2 C n (Θ).
8 Learnable Information and Redundancy 1. S := M k = {P θ : θ Θ} set of k-dimensional parameterized distributions. Let ˆθ(x n ) = argmax θ Θ log1/p θ (x n ) be the ML estimator. C n ( Θ) balls ; log C n ( θ) useful bits θ θ'... 1 n θˆ indistinquishable models 2. Two models, say P θ (x n ) and P θ (x n ) are indistinguishable if the ML estimator ˆθ with high probability declares both models are the same. 3. The number of distinguishable distributions (i.e, ˆθ), C n (Θ), summarizes then learnable information, I(Θ) = log 2 C n (Θ). 4. Consider the following expansion of the Kullback-Leibler (KL) divergence D(Pˆθ P θ ) := E[logPˆθ(X n )] E[logP θ (X n )] 1 2 (θ ˆθ) T I(ˆθ)(θ ˆθ) d 2 I (θ, ˆθ) where I(θ) = {I ij (θ)} ij is the Fisher information matrix and d I (θ, ˆθ) is a rescaled Euclidean distance known as Mahalanobis distance. 5. Balasubramanian proved the number of distinguishable balls C n (Θ) of radius O(1/ n) is asymptotically equal to the minimax redundancy: Learnable Information = logc n (Θ) = inf θ Θ max x n log Pˆθ P θ = R n (Mk )
9 Outline Update 1. Source Coding: The Redundancy Rate Problem 2. Universal Memoryless Sources (a) Finite Alphabet (b) Unbounded Alphabet 3. Universal Renewal Sources
10 Maximal Minimax for Memoryless Sources For a memoryless source over the alphabet A = {1, 2,..., m} we have Then P(x n 1 ) = p 1 k1 p m km, k k m = n. D n (M 0 ) := x n 1 sup P(x n 1 ) P(x n 1 ) = x n 1 = = sup p k 1 1 pk m m p 1,...,pm k 1 + +km=n k 1 + +km=n ( n k 1,...,k m ) sup p 1,...,pm ( n )( k1 k 1,...,k m n since the (unnormalized) likelihood distribution is sup P(x n P(x n 1 ) 1 ) = sup p k 1 1 pk m m = p 1,...,pm ( k1 n p k 1 1 pk m m ) k1 ( km ) k1 ( km n n ) k m ) k m.
11 Generating Function for D n (M 0 ) We write D n (M 0 ) = k 1 + +km=n ( n )( ) k1 k1 ( km k 1,...,k m n n ) k m = n! n n k 1 + +km=n k k 1 1 k 1! kkm m k m! Let us introduce a tree-generating function B(z) = k=0 k k k! zk = 1 1 T(z), T(z) = k=1 k k 1 z k k! where T(z) = ze T(z) (= W( z), Lambert s W -function) that enumerates all rooted labeled trees. Let now D m (z) = n=0 z n n n n! D n(m 0 ). Then by the convolution formula D m (z) = [B(z)] m 1.
12 Asymptotics for FINITE m The function B(z) has an algebraic singularity at z = e 1, and β(z) := B(z/e) = 1 2(1 z) O( (1 z). By Cauchy s coefficient formula D n (M 0 ) = n! n n[zn ][B(z)] m = 2πn(1 + O(1/n)) 1 2πi β(z) m dz. zn+1
13 Asymptotics for FINITE m The function B(z) has an algebraic singularity at z = e 1, and β(z) := B(z/e) = 1 2(1 z) O( (1 z). By Cauchy s coefficient formula D n (M 0 ) = n! n n[zn ][B(z)] m = 2πn(1 + O(1/n)) 1 2πi β(z) m dz. zn+1 For finite m, the singularity analysis of Flajolet and Odlyzko implies [z n ](1 z) α nα 1 Γ(α), α / {0, 1, 2,...} that finally yields (cf. Clarke & Barron, 1990, W.S., 1998) R n (M 0) = m 1 ( ) ( ) n π log + log 2 2 Γ( m 2 ) ( 3 + m(m 2)(2m + 1) Γ(m 2 )m 3Γ( m ) Γ2 ( m 2 )m2 9Γ 2 ( m ) ) 2 n 1 n +
14 Outline Update 1. Source Coding: The Redundancy Rate Problem 2. Universal Memoryless Sources (a) Finite Alphabet (b) Unbounded Alphabet 3. Universal Renewal Sources
15 Redundancy for LARGE m Now assume that m is unbounded and may vary with n. Then D n,m (M 0 ) = 2πn 1 2πi β(z) m where g(z) = mlnβ(z) (n + 1)lnz. z n+1 dz = 2πn 1 2πi e g(z) dz
16 Redundancy for LARGE m Now assume that m is unbounded and may vary with n. Then D n,m (M 0 ) = 2πn 1 2πi β(z) m where g(z) = mlnβ(z) (n + 1)lnz. z n+1 dz = 2πn 1 2πi The saddle point z 0 is a solution of g (z 0 ) = 0, that is, e g(z) dz g(z) = g(z 0 ) (z z 0) 2 g (z 0 ) + O(g (z 0 )(z z 0 ) 3 ). Under mild conditions satisfied by our g(z) (e.g., z 0 is real and unique), the saddle point method leads to: D n,m (M 0 ) = for some ρ < 3/2. e g(z 0 ) ( g (1 )) 2π g (z 0 ) (z 0 ) + O, (g (z 0 )) ρ
17 Saddle Points m = o(n) m = n n = o(m)
18 Main Results Large Alphabet Theorem 1 (Orlitsky and Santhanam, 2004, and W.S. and Weinberger, 2010). For memoryless sources M 0 over an m-ary alphabet, m as n grows, we have: (i) For m = o(n) R n,m (M 0) = m 1 2 log n m + m 2 loge + mloge 3 m n 1 2 O ( ) m n (ii) For m = αn + l(n), where α is a positive constant and l(n) = o(n), R n,m (M 0) = nlogb α + l(n)logc α log A α + O(l(n) 2 /n) where C α := /α, A α := C α + 2/α, B α = αc α+2 α e 1 Cα. (iii) For n = o(m) R n,m (M 0) = nlog m n + 3 n 2 2m loge 3 ( ) n 1 2m loge + O n + n3. m 2
19 Constrained Memoryless Sources Consider the following generalized source M 0 : alphabet is A B, where A = m and B = M; probabilities p 1,...,p m, of A are unknown, while the probabilities q 1,...,q M of B are fixed; define q = q q M and p = 1 q. Our goal is to compute the minimax redundancy R n,m,m ( M 0 ). Recall Lemma 1. R n,m,m ( M 0 ) = log(d n,m,m ) + O(1), D n,m,m = 2 d n,m,m. D n,m,m = n k=0 ( n k) p k (1 p) n k D k,mk. where D n,m = R n,m (M 0 ) + O(1) is the minimax redundancy over A as presented in Theorem 1. Proof. It basically follows from D n,m,m = sup x (A B) n P P(x) = sup x (A B) n P P(x).
20 Binomial Sums Consider the following binomial sum S f (n) = n k=0 ( n k) p k (1 p) n k f(k) In our case, f(k) = D n,m,m = 2 d n,m,m. Case: f(k) = O(n a log b n) = o(e n ): S f (n) = f(np)(1 + O(1/n)) (cf. Jacquet & W.S. (1999), and Flajolet (1999)). Case: f(k) = (α k ) S f (n) (pα + 1 p) n. Case: f(k) = O(k kβ ) S f (n) p n f(n).
21 Main Results for the Constrained Model M 0 Theorem 2 (W.S and Weinberger, 2010). Write m n for m depending on n. (i) m n = o(n). Assume: (a) m(x) := m x and its derivatives are continuous functions. (b) n := m n+1 m n = O(m (n)), m (n) = O(m/n), m (n) = O(m/n 2 ). If m n = o( n/logn), then R n,m,m = m np 1 2 ( np log m np ) + m np 2 loge O ( ) 1 + m2 n log2 n. mn n Otherwise, R n,m,m = m ( ) np np 2 log + m ( ) np m np 2 loge + O logn + m2 n n n log2. m n (ii) Let m n = αn + l(n), where α is a positive constant and l(n) = o(n). R n,m,m = nlog(b αp + 1 p) log ( A α + O l(n) + 1 ), n where A α and B α are defined in Theorem 1. (iii) Let n=o(m n ) and assume m k /k is a nondecreasing sequence. Then, ( ) ( R n,m,m = nlog pmn n 2 + O + 1 ). n m n n
22 Outline Update 1. Source Coding: The Redundancy Rate Problem 2. Universal Memoryless Sources 3. Universal Renewal Sources
23 Renewal Sources (Virtual Large Alphabet) The renewal process R 0 (introduced in 1996 by Csiszár and Shields) defined as follows: Let T 1,T 2... be a sequence of i.i.d. positive-valued random variables with distribution Q(j) = Pr{T i = j}. In a binary renewal sequence the positions of the 1 s are at the renewal epochs T 0,T 0 + T 1,... with runs of zeros of lengths T 1 1, T 2 1,....
24 Renewal Sources (Virtual Large Alphabet) The renewal process R 0 (introduced in 1996 by Csiszár and Shields) defined as follows: Let T 1,T 2... be a sequence of i.i.d. positive-valued random variables with distribution Q(j) = Pr{T i = j}. In a binary renewal sequence the positions of the 1 s are at the renewal epochs T 0,T 0 + T 1,... with runs of zeros of lengths T 1 1, T 2 1,.... For a sequence x n 0 = 10α 1 10 α αn 10 0 }{{} k define k m as the number of i such that α i = m. Then P(x n 1 ) = [Q(0)]k 0 [Q(1)] k1 [Q(n 1)] k n 1 Pr{T 1 > k }.
25 Renewal Sources (Virtual Large Alphabet) The renewal process R 0 (introduced in 1996 by Csiszár and Shields) defined as follows: Let T 1,T 2... be a sequence of i.i.d. positive-valued random variables with distribution Q(j) = Pr{T i = j}. In a binary renewal sequence the positions of the 1 s are at the renewal epochs T 0,T 0 + T 1,... with runs of zeros of lengths T 1 1, T 2 1,.... For a sequence x n 0 = 10α 1 10 α αn 10 0 }{{} k define k m as the number of i such that α i = m. Then P(x n 1 ) = [Q(0)]k 0 [Q(1)] k1 [Q(n 1)] k n 1 Pr{T 1 > k }. Theorem 3 (Flajolet and W.S., 1998). Consider the class of renewal processes. Then R n (R 0) = 2 cn + O(logn). log2 where c = π
26 Maximal Minimax Redundancy It can be proved that r n+1 1 D n (R 0 ) n m=0 r m r n = n r n,k, r n,k = k=0 I(n,k) ( k )( ) k0 ( ) k0 k1 k1 ( kn 1 k 0 k n 1 k k k ) kn 1 where I(n,k) is is the integer partition of n into k terms, i.e., n = k 0 + 2k nk n 1, k = k k n 1. But we shall study s n = n k=0 s n,k where s n,k = e k I(n,k) k k 0 k 0! kkn 1 k n 1! since S(z,u) = k,n s n,ku k z n = i=1 β(zi u).
27 Maximal Minimax Redundancy It can be proved that r n+1 1 D n (R 0 ) n m=0 r m r n = n r n,k, r n,k = k=0 I(n,k) ( k )( ) k0 ( ) k0 k1 k1 ( kn 1 k 0 k n 1 k k k ) kn 1 where I(n,k) is is the integer partition of n into k terms, i.e., n = k 0 + 2k nk n 1, k = k k n 1. But we shall study s n = n k=0 s n,k where s n,k = e k I(n,k) k k 0 k 0! kkn 1 k n 1! since S(z,u) = k,n s n,ku k z n = i=1 β(zi u). ( ) s n = [z n ]S(z,1) = [z n c ]exp 1 z + alog 1 1 z Theorem 4 (Flajolet and W.S., 1998). We have the following asymptotics s n exp (2 cn 78 ) logn + O(1), logr n = 2 5 cn log2 8 logn+1 2 loglogn+o(1).
28 That s IT
Analytic Information Theory: From Shannon to Knuth and Back. Knuth80: Piteaa, Sweden, 2018 Dedicated to Don E. Knuth
Analytic Information Theory: From Shannon to Knuth and Back Wojciech Szpankowski Center for Science of Information Purdue University January 7, 2018 Knuth80: Piteaa, Sweden, 2018 Dedicated to Don E. Knuth
More informationAlgorithms, Combinatorics, Information, and Beyond. Dedicated to PHILIPPE FLAJOLET
Algorithms, Combinatorics, Information, and Beyond Wojciech Szpankowski Purdue University October 3, 2011 Concordia University, Montreal, 2011 Dedicated to PHILIPPE FLAJOLET Research supported by NSF Science
More informationAnalytic Information Theory and Beyond
Analytic Information Theory and Beyond W. Szpankowski Department of Computer Science Purdue University W. Lafayette, IN 47907 May 12, 2010 AofA and IT logos Frankfurt 2010 Research supported by NSF Science
More informationAnalytic Algorithmics, Combinatorics, and Information Theory
Analytic Algorithmics, Combinatorics, and Information Theory W. Szpankowski Department of Computer Science Purdue University W. Lafayette, IN 47907 September 11, 2006 AofA and IT logos Research supported
More informationFacets of Information
Facets of Information W. Szpankowski Department of Computer Science Purdue University W. Lafayette, IN 47907 January 30, 2009 AofA and IT logos QUALCOMM, San Diego, 2009 Thanks to Y. Choi, Purdue, P. Jacquet,
More informationA One-to-One Code and Its Anti-Redundancy
A One-to-One Code and Its Anti-Redundancy W. Szpankowski Department of Computer Science, Purdue University July 4, 2005 This research is supported by NSF, NSA and NIH. Outline of the Talk. Prefix Codes
More informationCounting Markov Types
Counting Markov Types Philippe Jacquet, Charles Knessl, Wojciech Szpankowski To cite this version: Philippe Jacquet, Charles Knessl, Wojciech Szpankowski. Counting Markov Types. Drmota, Michael and Gittenberger,
More informationFacets of Information
Facets of Information Wojciech Szpankowski Department of Computer Science Purdue University W. Lafayette, IN 47907 May 31, 2010 AofA and IT logos Ecole Polytechnique, Palaiseau, 2010 Research supported
More informationContext tree models for source coding
Context tree models for source coding Toward Non-parametric Information Theory Licence de droits d usage Outline Lossless Source Coding = density estimation with log-loss Source Coding and Universal Coding
More informationA Master Theorem for Discrete Divide and Conquer Recurrences
A Master Theorem for Discrete Divide and Conquer Recurrences Wojciech Szpankowski Department of Computer Science Purdue University W. Lafayette, IN 47907 January 20, 2011 NSF CSoI SODA, 2011 Research supported
More informationA Master Theorem for Discrete Divide and Conquer Recurrences. Dedicated to PHILIPPE FLAJOLET
A Master Theorem for Discrete Divide and Conquer Recurrences Wojciech Szpankowski Department of Computer Science Purdue University U.S.A. September 6, 2012 Uniwersytet Jagieloński, Kraków, 2012 Dedicated
More informationScience of Information
Science of Information Wojciech Szpankowski Department of Computer Science Purdue University W. Lafayette, IN 47907 July 9, 2010 AofA and IT logos ORACLE, California 2010 Research supported by NSF Science
More informationCoding on Countably Infinite Alphabets
Coding on Countably Infinite Alphabets Non-parametric Information Theory Licence de droits d usage Outline Lossless Coding on infinite alphabets Source Coding Universal Coding Infinite Alphabets Enveloppe
More informationVariable-to-Variable Codes with Small Redundancy Rates
Variable-to-Variable Codes with Small Redundancy Rates M. Drmota W. Szpankowski September 25, 2004 This research is supported by NSF, NSA and NIH. Institut f. Diskrete Mathematik und Geometrie, TU Wien,
More informationAnalytic Information Theory: Redundancy Rate Problems
Analytic Information Theory: Redundancy Rate Problems W. Szpankowski Department of Computer Science Purdue University W. Lafayette, IN 47907 November 3, 2015 Research supported by NSF Science & Technology
More informationA lower bound on compression of unknown alphabets
Theoretical Computer Science 33 005 93 3 wwwelseviercom/locate/tcs A lower bound on compression of unknown alphabets Nikola Jevtić a, Alon Orlitsky a,b,, Narayana P Santhanam a a ECE Department, University
More informationSource Coding. Master Universitario en Ingeniería de Telecomunicación. I. Santamaría Universidad de Cantabria
Source Coding Master Universitario en Ingeniería de Telecomunicación I. Santamaría Universidad de Cantabria Contents Introduction Asymptotic Equipartition Property Optimal Codes (Huffman Coding) Universal
More informationECE598: Information-theoretic methods in high-dimensional statistics Spring 2016
ECE598: Information-theoretic methods in high-dimensional statistics Spring 06 Lecture : Mutual Information Method Lecturer: Yihong Wu Scribe: Jaeho Lee, Mar, 06 Ed. Mar 9 Quick review: Assouad s lemma
More informationString Complexity. Dedicated to Svante Janson for his 60 Birthday
String Complexity Wojciech Szpankowski Purdue University W. Lafayette, IN 47907 June 1, 2015 Dedicated to Svante Janson for his 60 Birthday Outline 1. Working with Svante 2. String Complexity 3. Joint
More informationConsidering our result for the sum and product of analytic functions, this means that for (a 0, a 1,..., a N ) C N+1, the polynomial.
Lecture 3 Usual complex functions MATH-GA 245.00 Complex Variables Polynomials. Construction f : z z is analytic on all of C since its real and imaginary parts satisfy the Cauchy-Riemann relations and
More information7 Asymptotics for Meromorphic Functions
Lecture G jacques@ucsd.edu 7 Asymptotics for Meromorphic Functions Hadamard s Theorem gives a broad description of the exponential growth of coefficients in power series, but the notion of exponential
More informationRobustness and duality of maximum entropy and exponential family distributions
Chapter 7 Robustness and duality of maximum entropy and exponential family distributions In this lecture, we continue our study of exponential families, but now we investigate their properties in somewhat
More informationMachine learning - HT Maximum Likelihood
Machine learning - HT 2016 3. Maximum Likelihood Varun Kanade University of Oxford January 27, 2016 Outline Probabilistic Framework Formulate linear regression in the language of probability Introduce
More informationComplex Analysis Slide 9: Power Series
Complex Analysis Slide 9: Power Series MA201 Mathematics III Department of Mathematics IIT Guwahati August 2015 Complex Analysis Slide 9: Power Series 1 / 37 Learning Outcome of this Lecture We learn Sequence
More informationPattern Matching in Constrained Sequences
Pattern Matching in Constrained Sequences Yongwook Choi and Wojciech Szpankowski Department of Computer Science Purdue University W. Lafayette, IN 47907 U.S.A. Email: ywchoi@purdue.edu, spa@cs.purdue.edu
More informationWE start with a general discussion. Suppose we have
646 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 43, NO. 2, MARCH 1997 Minimax Redundancy for the Class of Memoryless Sources Qun Xie and Andrew R. Barron, Member, IEEE Abstract Let X n = (X 1 ; 111;Xn)be
More informationAn O(N) Semi-Predictive Universal Encoder via the BWT
An O(N) Semi-Predictive Universal Encoder via the BWT Dror Baron and Yoram Bresler Abstract We provide an O(N) algorithm for a non-sequential semi-predictive encoder whose pointwise redundancy with respect
More informationNotes by Zvi Rosen. Thanks to Alyssa Palfreyman for supplements.
Lecture: Hélène Barcelo Analytic Combinatorics ECCO 202, Bogotá Notes by Zvi Rosen. Thanks to Alyssa Palfreyman for supplements.. Tuesday, June 2, 202 Combinatorics is the study of finite structures that
More informationPointwise Redundancy in Lossy Data Compression and Universal Lossy Data Compression
Pointwise Redundancy in Lossy Data Compression and Universal Lossy Data Compression I. Kontoyiannis To appear, IEEE Transactions on Information Theory, Jan. 2000 Last revision, November 21, 1999 Abstract
More informationLecture 2: August 31
0-704: Information Processing and Learning Fall 206 Lecturer: Aarti Singh Lecture 2: August 3 Note: These notes are based on scribed notes from Spring5 offering of this course. LaTeX template courtesy
More informationQuiz 2 Date: Monday, November 21, 2016
10-704 Information Processing and Learning Fall 2016 Quiz 2 Date: Monday, November 21, 2016 Name: Andrew ID: Department: Guidelines: 1. PLEASE DO NOT TURN THIS PAGE UNTIL INSTRUCTED. 2. Write your name,
More informationLecture 8: Information Theory and Statistics
Lecture 8: Information Theory and Statistics Part II: Hypothesis Testing and I-Hsiang Wang Department of Electrical Engineering National Taiwan University ihwang@ntu.edu.tw December 23, 2015 1 / 50 I-Hsiang
More informationAsymptotic Rational Approximation To Pi: Solution of an Unsolved Problem Posed By Herbert Wilf
AofA 0 DMTCS proc AM, 00, 59 60 Asymptotic Rational Approximation To Pi: Solution of an Unsolved Problem Posed By Herbert Wilf Mark Daniel Ward Department of Statistics, Purdue University, 50 North University
More informationAnalytic Pattern Matching: From DNA to Twitter. AxA Workshop, Venice, 2016 Dedicated to Alberto Apostolico
Analytic Pattern Matching: From DNA to Twitter Wojciech Szpankowski Purdue University W. Lafayette, IN 47907 June 19, 2016 AxA Workshop, Venice, 2016 Dedicated to Alberto Apostolico Joint work with Philippe
More informationInformation Theory and Hypothesis Testing
Summer School on Game Theory and Telecommunications Campione, 7-12 September, 2014 Information Theory and Hypothesis Testing Mauro Barni University of Siena September 8 Review of some basic results linking
More informationCovariance function estimation in Gaussian process regression
Covariance function estimation in Gaussian process regression François Bachoc Department of Statistics and Operations Research, University of Vienna WU Research Seminar - May 2015 François Bachoc Gaussian
More informationarxiv: v4 [cs.it] 17 Oct 2015
Upper Bounds on the Relative Entropy and Rényi Divergence as a Function of Total Variation Distance for Finite Alphabets Igal Sason Department of Electrical Engineering Technion Israel Institute of Technology
More informationLecture 8: Information Theory and Statistics
Lecture 8: Information Theory and Statistics Part II: Hypothesis Testing and Estimation I-Hsiang Wang Department of Electrical Engineering National Taiwan University ihwang@ntu.edu.tw December 22, 2015
More information10-704: Information Processing and Learning Fall Lecture 24: Dec 7
0-704: Information Processing and Learning Fall 206 Lecturer: Aarti Singh Lecture 24: Dec 7 Note: These notes are based on scribed notes from Spring5 offering of this course. LaTeX template courtesy of
More informationMinimax Optimal Bayes Mixtures for Memoryless Sources over Large Alphabets
Proceedings of Machine Learning Research 8: 8, 208 Algorithmic Learning Theory 208 Minimax Optimal Bayes Mixtures for Memoryless Sources over Large Alphabets Elias Jääsaari Helsinki Institute for Information
More informationA proof of a partition conjecture of Bateman and Erdős
proof of a partition conjecture of Bateman and Erdős Jason P. Bell Department of Mathematics University of California, San Diego La Jolla C, 92093-0112. US jbell@math.ucsd.edu 1 Proposed Running Head:
More informationAnalysis Qualifying Exam
Analysis Qualifying Exam Spring 2017 Problem 1: Let f be differentiable on R. Suppose that there exists M > 0 such that f(k) M for each integer k, and f (x) M for all x R. Show that f is bounded, i.e.,
More informationApplications of Information Geometry to Hypothesis Testing and Signal Detection
CMCAA 2016 Applications of Information Geometry to Hypothesis Testing and Signal Detection Yongqiang Cheng National University of Defense Technology July 2016 Outline 1. Principles of Information Geometry
More informationNaïve Bayes classification
Naïve Bayes classification 1 Probability theory Random variable: a variable whose possible values are numerical outcomes of a random phenomenon. Examples: A person s height, the outcome of a coin toss
More informationOptimal Distributed Detection Strategies for Wireless Sensor Networks
Optimal Distributed Detection Strategies for Wireless Sensor Networks Ke Liu and Akbar M. Sayeed University of Wisconsin-Madison kliu@cae.wisc.edu, akbar@engr.wisc.edu Abstract We study optimal distributed
More informationMultiple Choice Tries and Distributed Hash Tables
Multiple Choice Tries and Distributed Hash Tables Luc Devroye and Gabor Lugosi and Gahyun Park and W. Szpankowski January 3, 2007 McGill University, Montreal, Canada U. Pompeu Fabra, Barcelona, Spain U.
More informationPhenomena in high dimensions in geometric analysis, random matrices, and computational geometry Roscoff, France, June 25-29, 2012
Phenomena in high dimensions in geometric analysis, random matrices, and computational geometry Roscoff, France, June 25-29, 202 BOUNDS AND ASYMPTOTICS FOR FISHER INFORMATION IN THE CENTRAL LIMIT THEOREM
More informationModel selection in penalized Gaussian graphical models
University of Groningen e.c.wit@rug.nl http://www.math.rug.nl/ ernst January 2014 Penalized likelihood generates a PATH of solutions Consider an experiment: Γ genes measured across T time points. Assume
More informationChapter 3 Source Coding. 3.1 An Introduction to Source Coding 3.2 Optimal Source Codes 3.3 Shannon-Fano Code 3.4 Huffman Code
Chapter 3 Source Coding 3. An Introduction to Source Coding 3.2 Optimal Source Codes 3.3 Shannon-Fano Code 3.4 Huffman Code 3. An Introduction to Source Coding Entropy (in bits per symbol) implies in average
More information4 Uniform convergence
4 Uniform convergence In the last few sections we have seen several functions which have been defined via series or integrals. We now want to develop tools that will allow us to show that these functions
More informationLOWELL. MICHIGAN, THURSDAY, AUGUST 16, Specialty Co. Sells to Hudson Mfg. Company. Ada Farmer Dies When Boat Tips
K N» N - K V XXXV - N 22 N 5 V N N q N 0 " " - x Q- & V N -" Q" 24-?? 2530 35-6 9025 - - K ; ) ) ( _) ) N K : ; N - K K ) q ( K N ( x- 89830 z 7 K $2000 ( - N " " K : K 89335 30 4 V N
More informationGraduate Econometrics I: Maximum Likelihood II
Graduate Econometrics I: Maximum Likelihood II Yves Dominicy Université libre de Bruxelles Solvay Brussels School of Economics and Management ECARES Yves Dominicy Graduate Econometrics I: Maximum Likelihood
More informationA Note on a Problem Posed by D. E. Knuth on a Satisfiability Recurrence
A Note on a Problem Posed by D. E. Knuth on a Satisfiability Recurrence Dedicated to Philippe Flajolet 1948-2011 July 4, 2013 Philippe Jacquet Charles Knessl 1 Wojciech Szpankowski 2 Bell Labs Dept. Math.
More informationPriors for the frequentist, consistency beyond Schwartz
Victoria University, Wellington, New Zealand, 11 January 2016 Priors for the frequentist, consistency beyond Schwartz Bas Kleijn, KdV Institute for Mathematics Part I Introduction Bayesian and Frequentist
More informationMinimax Rate of Convergence for an Estimator of the Functional Component in a Semiparametric Multivariate Partially Linear Model.
Minimax Rate of Convergence for an Estimator of the Functional Component in a Semiparametric Multivariate Partially Linear Model By Michael Levine Purdue University Technical Report #14-03 Department of
More informationCombinatorial/probabilistic analysis of a class of search-tree functionals
Combinatorial/probabilistic analysis of a class of search-tree functionals Jim Fill jimfill@jhu.edu http://www.mts.jhu.edu/ fill/ Mathematical Sciences The Johns Hopkins University Joint Statistical Meetings;
More information6.1 Variational representation of f-divergences
ECE598: Information-theoretic methods in high-dimensional statistics Spring 2016 Lecture 6: Variational representation, HCR and CR lower bounds Lecturer: Yihong Wu Scribe: Georgios Rovatsos, Feb 11, 2016
More informationNATIONAL UNIVERSITY OF SINGAPORE Department of Mathematics MA4247 Complex Analysis II Lecture Notes Part II
NATIONAL UNIVERSITY OF SINGAPORE Department of Mathematics MA4247 Complex Analysis II Lecture Notes Part II Chapter 2 Further properties of analytic functions 21 Local/Global behavior of analytic functions;
More informationAsymptotically Minimax Regret for Models with Hidden Variables
Asymptotically Minimax Regret for Models with Hidden Variables Jun ichi Takeuchi Department of Informatics, Kyushu University, Fukuoka, Fukuoka, Japan Andrew R. Barron Department of Statistics, Yale University,
More informationCOLLECTED WORKS OF PHILIPPE FLAJOLET
COLLECTED WORKS OF PHILIPPE FLAJOLET Editorial Committee: HSIEN-KUEI HWANG Institute of Statistical Science Academia Sinica Taipei 115 Taiwan ROBERT SEDGEWICK Department of Computer Science Princeton University
More informationNaïve Bayes classification. p ij 11/15/16. Probability theory. Probability theory. Probability theory. X P (X = x i )=1 i. Marginal Probability
Probability theory Naïve Bayes classification Random variable: a variable whose possible values are numerical outcomes of a random phenomenon. s: A person s height, the outcome of a coin toss Distinguish
More informationTwo-Part Codes with Low Worst-Case Redundancies for Distributed Compression of Bernoulli Sequences
2003 Conference on Information Sciences and Systems, The Johns Hopkins University, March 2 4, 2003 Two-Part Codes with Low Worst-Case Redundancies for Distributed Compression of Bernoulli Sequences Dror
More informationMarkov Chains and Hidden Markov Models
Chapter 1 Markov Chains and Hidden Markov Models In this chapter, we will introduce the concept of Markov chains, and show how Markov chains can be used to model signals using structures such as hidden
More informationCourse 214 Section 2: Infinite Series Second Semester 2008
Course 214 Section 2: Infinite Series Second Semester 2008 David R. Wilkins Copyright c David R. Wilkins 1989 2008 Contents 2 Infinite Series 25 2.1 The Comparison Test and Ratio Test.............. 26
More informationGraduate Econometrics I: Maximum Likelihood I
Graduate Econometrics I: Maximum Likelihood I Yves Dominicy Université libre de Bruxelles Solvay Brussels School of Economics and Management ECARES Yves Dominicy Graduate Econometrics I: Maximum Likelihood
More informationSTATISTICAL CURVATURE AND STOCHASTIC COMPLEXITY
2nd International Symposium on Information Geometry and its Applications December 2-6, 2005, Tokyo Pages 000 000 STATISTICAL CURVATURE AND STOCHASTIC COMPLEXITY JUN-ICHI TAKEUCHI, ANDREW R. BARRON, AND
More informationInformation Theory, Statistics, and Decision Trees
Information Theory, Statistics, and Decision Trees Léon Bottou COS 424 4/6/2010 Summary 1. Basic information theory. 2. Decision trees. 3. Information theory and statistics. Léon Bottou 2/31 COS 424 4/6/2010
More informationFourth Week: Lectures 10-12
Fourth Week: Lectures 10-12 Lecture 10 The fact that a power series p of positive radius of convergence defines a function inside its disc of convergence via substitution is something that we cannot ignore
More informationLecture 35: December The fundamental statistical distances
36-705: Intermediate Statistics Fall 207 Lecturer: Siva Balakrishnan Lecture 35: December 4 Today we will discuss distances and metrics between distributions that are useful in statistics. I will be lose
More informationSection 8: Asymptotic Properties of the MLE
2 Section 8: Asymptotic Properties of the MLE In this part of the course, we will consider the asymptotic properties of the maximum likelihood estimator. In particular, we will study issues of consistency,
More informationAsymptotic and Exact Poissonized Variance in the Analysis of Random Digital Trees (joint with Hsien-Kuei Hwang and Vytas Zacharovas)
Asymptotic and Exact Poissonized Variance in the Analysis of Random Digital Trees (joint with Hsien-Kuei Hwang and Vytas Zacharovas) Michael Fuchs Institute of Applied Mathematics National Chiao Tung University
More informationBernoulli Polynomials
Chapter 4 Bernoulli Polynomials 4. Bernoulli Numbers The generating function for the Bernoulli numbers is x e x = n= B n n! xn. (4.) That is, we are to expand the left-hand side of this equation in powers
More informationChapter 2. Binary and M-ary Hypothesis Testing 2.1 Introduction (Levy 2.1)
Chapter 2. Binary and M-ary Hypothesis Testing 2.1 Introduction (Levy 2.1) Detection problems can usually be casted as binary or M-ary hypothesis testing problems. Applications: This chapter: Simple hypothesis
More informationUniversal Loseless Compression: Context Tree Weighting(CTW)
Universal Loseless Compression: Context Tree Weighting(CTW) Dept. Electrical Engineering, Stanford University Dec 9, 2014 Universal Coding with Model Classes Traditional Shannon theory assume a (probabilistic)
More informationLarge Deviations Performance of Knuth-Yao algorithm for Random Number Generation
Large Deviations Performance of Knuth-Yao algorithm for Random Number Generation Akisato KIMURA akisato@ss.titech.ac.jp Tomohiko UYEMATSU uematsu@ss.titech.ac.jp April 2, 999 No. AK-TR-999-02 Abstract
More informationBayesian Adaptation. Aad van der Vaart. Vrije Universiteit Amsterdam. aad. Bayesian Adaptation p. 1/4
Bayesian Adaptation Aad van der Vaart http://www.math.vu.nl/ aad Vrije Universiteit Amsterdam Bayesian Adaptation p. 1/4 Joint work with Jyri Lember Bayesian Adaptation p. 2/4 Adaptation Given a collection
More informationarxiv: v1 [math.co] 12 Sep 2018
ON THE NUMBER OF INCREASING TREES WITH LABEL REPETITIONS arxiv:809.0434v [math.co] Sep 08 OLIVIER BODINI, ANTOINE GENITRINI, AND BERNHARD GITTENBERGER Abstract. In this paper we study a special subclass
More informationTheory of Maximum Likelihood Estimation. Konstantin Kashin
Gov 2001 Section 5: Theory of Maximum Likelihood Estimation Konstantin Kashin February 28, 2013 Outline Introduction Likelihood Examples of MLE Variance of MLE Asymptotic Properties What is Statistical
More informationMATH 722, COMPLEX ANALYSIS, SPRING 2009 PART 5
MATH 722, COMPLEX ANALYSIS, SPRING 2009 PART 5.. The Arzela-Ascoli Theorem.. The Riemann mapping theorem Let X be a metric space, and let F be a family of continuous complex-valued functions on X. We have
More information1 Assignment 1: Nonlinear dynamics (due September
Assignment : Nonlinear dynamics (due September 4, 28). Consider the ordinary differential equation du/dt = cos(u). Sketch the equilibria and indicate by arrows the increase or decrease of the solutions.
More informationRecall that in order to prove Theorem 8.8, we argued that under certain regularity conditions, the following facts are true under H 0 : 1 n
Chapter 9 Hypothesis Testing 9.1 Wald, Rao, and Likelihood Ratio Tests Suppose we wish to test H 0 : θ = θ 0 against H 1 : θ θ 0. The likelihood-based results of Chapter 8 give rise to several possible
More informationLARGE DEVIATIONS OF TYPICAL LINEAR FUNCTIONALS ON A CONVEX BODY WITH UNCONDITIONAL BASIS. S. G. Bobkov and F. L. Nazarov. September 25, 2011
LARGE DEVIATIONS OF TYPICAL LINEAR FUNCTIONALS ON A CONVEX BODY WITH UNCONDITIONAL BASIS S. G. Bobkov and F. L. Nazarov September 25, 20 Abstract We study large deviations of linear functionals on an isotropic
More informationThe number of inversions in permutations: A saddle point approach
The number of inversions in permutations: A saddle point approach Guy Louchard and Helmut Prodinger November 4, 22 Abstract Using the saddle point method, we obtain from the generating function of the
More informationStatistical Methods for Handling Incomplete Data Chapter 2: Likelihood-based approach
Statistical Methods for Handling Incomplete Data Chapter 2: Likelihood-based approach Jae-Kwang Kim Department of Statistics, Iowa State University Outline 1 Introduction 2 Observed likelihood 3 Mean Score
More informationMinimax lower bounds I
Minimax lower bounds I Kyoung Hee Kim Sungshin University 1 Preliminaries 2 General strategy 3 Le Cam, 1973 4 Assouad, 1983 5 Appendix Setting Family of probability measures {P θ : θ Θ} on a sigma field
More informationMAT 570 REAL ANALYSIS LECTURE NOTES. Contents. 1. Sets Functions Countability Axiom of choice Equivalence relations 9
MAT 570 REAL ANALYSIS LECTURE NOTES PROFESSOR: JOHN QUIGG SEMESTER: FALL 204 Contents. Sets 2 2. Functions 5 3. Countability 7 4. Axiom of choice 8 5. Equivalence relations 9 6. Real numbers 9 7. Extended
More informationMathematical Institute, University of Utrecht. The problem of estimating the mean of an observed Gaussian innite-dimensional vector
On Minimax Filtering over Ellipsoids Eduard N. Belitser and Boris Y. Levit Mathematical Institute, University of Utrecht Budapestlaan 6, 3584 CD Utrecht, The Netherlands The problem of estimating the mean
More informationEntropy power inequality for a family of discrete random variables
20 IEEE International Symposium on Information Theory Proceedings Entropy power inequality for a family of discrete random variables Naresh Sharma, Smarajit Das and Siddharth Muthurishnan School of Technology
More informationTopic 12 Overview of Estimation
Topic 12 Overview of Estimation Classical Statistics 1 / 9 Outline Introduction Parameter Estimation Classical Statistics Densities and Likelihoods 2 / 9 Introduction In the simplest possible terms, the
More informationf(x θ)dx with respect to θ. Assuming certain smoothness conditions concern differentiating under the integral the integral sign, we first obtain
0.1. INTRODUCTION 1 0.1 Introduction R. A. Fisher, a pioneer in the development of mathematical statistics, introduced a measure of the amount of information contained in an observaton from f(x θ). Fisher
More informationAsymptotic Statistics-VI. Changliang Zou
Asymptotic Statistics-VI Changliang Zou Kolmogorov-Smirnov distance Example (Kolmogorov-Smirnov confidence intervals) We know given α (0, 1), there is a well-defined d = d α,n such that, for any continuous
More informationMethods of Estimation
Methods of Estimation MIT 18.655 Dr. Kempthorne Spring 2016 1 Outline Methods of Estimation I 1 Methods of Estimation I 2 X X, X P P = {P θ, θ Θ}. Problem: Finding a function θˆ(x ) which is close to θ.
More information16.1 Bounding Capacity with Covering Number
ECE598: Information-theoretic methods in high-dimensional statistics Spring 206 Lecture 6: Upper Bounds for Density Estimation Lecturer: Yihong Wu Scribe: Yang Zhang, Apr, 206 So far we have been mostly
More informationA Representation of Approximate Self- Overlapping Word and Its Application
Purdue University Purdue e-pubs Computer Science Technical Reports Department of Computer Science 1994 A Representation of Approximate Self- Overlapping Word and Its Application Wojciech Szpankowski Purdue
More informationReal Analysis Problems
Real Analysis Problems Cristian E. Gutiérrez September 14, 29 1 1 CONTINUITY 1 Continuity Problem 1.1 Let r n be the sequence of rational numbers and Prove that f(x) = 1. f is continuous on the irrationals.
More information21.1 Lower bounds on minimax risk for functional estimation
ECE598: Information-theoretic methods in high-dimensional statistics Spring 016 Lecture 1: Functional estimation & testing Lecturer: Yihong Wu Scribe: Ashok Vardhan, Apr 14, 016 In this chapter, we will
More informationA new converse in rate-distortion theory
A new converse in rate-distortion theory Victoria Kostina, Sergio Verdú Dept. of Electrical Engineering, Princeton University, NJ, 08544, USA Abstract This paper shows new finite-blocklength converse bounds
More informationPaul-Eugène Parent. March 12th, Department of Mathematics and Statistics University of Ottawa. MAT 3121: Complex Analysis I
Paul-Eugène Parent Department of Mathematics and Statistics University of Ottawa March 12th, 2014 Outline 1 Holomorphic power Series Proposition Let f (z) = a n (z z o ) n be the holomorphic function defined
More informationPROBABILITY AND INFORMATION THEORY. Dr. Gjergji Kasneci Introduction to Information Retrieval WS
PROBABILITY AND INFORMATION THEORY Dr. Gjergji Kasneci Introduction to Information Retrieval WS 2012-13 1 Outline Intro Basics of probability and information theory Probability space Rules of probability
More informationVariational Inference (11/04/13)
STA561: Probabilistic machine learning Variational Inference (11/04/13) Lecturer: Barbara Engelhardt Scribes: Matt Dickenson, Alireza Samany, Tracy Schifeling 1 Introduction In this lecture we will further
More information