Necessary and Sufficient Conditions for High-Dimensional Salient Feature Subset Recovery
|
|
- Agatha Hampton
- 5 years ago
- Views:
Transcription
1 Necessary and Sufficient Conditions for High-Dimensional Salient Feature Subset Recovery Vincent Tan, Matt Johnson, Alan S. Willsky Stochastic Systems Group, Laboratory for Information and Decision Systems, Massachusetts Institute of Technology ISIT (Jun 14, 2010) 1/17 Vincent Tan (SSG, LIDS, MIT) Salient Feature Subset Recovery ISIT / 17
2 Motivation: A Real-Life Example The Manchester Asthma and Allergy Study (MAAS) is a birth-cohort study of more than n 1000 children. 2/17 Vincent Tan (SSG, LIDS, MIT) Salient Feature Subset Recovery ISIT / 17
3 Motivation: A Real-Life Example The Manchester Asthma and Allergy Study (MAAS) is a birth-cohort study of more than n 1000 children. 2/17 Vincent Tan (SSG, LIDS, MIT) Salient Feature Subset Recovery ISIT / 17
4 Motivation: A Real-Life Example The Manchester Asthma and Allergy Study (MAAS) is a birth-cohort study of more than n 1000 children. Number of variables d /17 Vincent Tan (SSG, LIDS, MIT) Salient Feature Subset Recovery ISIT / 17
5 Motivation: A Real-Life Example The Manchester Asthma and Allergy Study (MAAS) is a birth-cohort study of more than n 1000 children. Number of variables d But only k 30 are salient for assessing susceptibility to asthma. 2/17 Vincent Tan (SSG, LIDS, MIT) Salient Feature Subset Recovery ISIT / 17
6 Motivation: A Real-Life Example The Manchester Asthma and Allergy Study (MAAS) is a birth-cohort study of more than n 1000 children. Number of variables d But only k 30 are salient for assessing susceptibility to asthma. Identification of these salient features is important. 2/17 Vincent Tan (SSG, LIDS, MIT) Salient Feature Subset Recovery ISIT / 17
7 Some Natural Questions Natural questions: 3/17 Vincent Tan (SSG, LIDS, MIT) Salient Feature Subset Recovery ISIT / 17
8 Some Natural Questions Natural questions: What is the meaning of saliency? How to define it information-theoretically? 3/17 Vincent Tan (SSG, LIDS, MIT) Salient Feature Subset Recovery ISIT / 17
9 Some Natural Questions Natural questions: What is the meaning of saliency? How to define it information-theoretically? How many samples n are required to identify the k salient features out of the d features? Num features d Num salient features k What are the fundamental limits for recovery of the salient set? 3/17 Vincent Tan (SSG, LIDS, MIT) Salient Feature Subset Recovery ISIT / 17
10 Some Natural Questions Natural questions: What is the meaning of saliency? How to define it information-theoretically? How many samples n are required to identify the k salient features out of the d features? Num features d Num salient features k What are the fundamental limits for recovery of the salient set? Are there any efficient algorithms for special classes of features? 3/17 Vincent Tan (SSG, LIDS, MIT) Salient Feature Subset Recovery ISIT / 17
11 Main Contributions We define the salient set by appealing to the error exponents in hypothesis testing. 4/17 Vincent Tan (SSG, LIDS, MIT) Salient Feature Subset Recovery ISIT / 17
12 Main Contributions We define the salient set by appealing to the error exponents in hypothesis testing. Sufficiency: We show that if for all n sufficiently large, { ( ) } d k n > max C 1 k log, exp(c 2 k), k then the error probability is arbitrarily small. 4/17 Vincent Tan (SSG, LIDS, MIT) Salient Feature Subset Recovery ISIT / 17
13 Main Contributions We define the salient set by appealing to the error exponents in hypothesis testing. Sufficiency: We show that if for all n sufficiently large, { ( ) } d k n > max C 1 k log, exp(c 2 k), k then the error probability is arbitrarily small. Necessity: Under certain conditions, for λ (0, 1), if ( ) d n < λ C 3 log k then the error probability 1 λ. 4/17 Vincent Tan (SSG, LIDS, MIT) Salient Feature Subset Recovery ISIT / 17
14 System Model Let the alphabet for each variable be X, a finite set. High-dimensional setting where d and k grow with n. Two sequences of unknown d-dimensional distributions P (d), Q (d) P(X d ), d N. 5/17 Vincent Tan (SSG, LIDS, MIT) Salient Feature Subset Recovery ISIT / 17
15 System Model Let the alphabet for each variable be X, a finite set. High-dimensional setting where d and k grow with n. Two sequences of unknown d-dimensional distributions P (d), Q (d) P(X d ), d N. IID samples (x n, y n ) := ({x (l) } n l=1, {y(l) } n l=1 ) drawn from P(d) Q (d). Each pair of samples x (l), y (l) X d. 5/17 Vincent Tan (SSG, LIDS, MIT) Salient Feature Subset Recovery ISIT / 17
16 Definition of Saliency Motivated by asymptotics of binary hypothesis testing H 0 : z n P (d), H 1 : z n Q (d) 6/17 Vincent Tan (SSG, LIDS, MIT) Salient Feature Subset Recovery ISIT / 17
17 Definition of Saliency Motivated by asymptotics of binary hypothesis testing H 0 : z n P (d), H 1 : z n Q (d) Chernoff-Stein Lemma: If Pr(Ĥ 1 H 0 ) < α, then Chernoff Information: Pr(Ĥ 0 H 1 ). = exp( nd(p (d) Q (d) )) Pr(err) = π 0 Pr(Ĥ 1 H 0 ) + π 1 Pr(Ĥ 0 H 1 ). = exp( nc(p (d), Q (d) )) where C(P, Q) := min t [0,1] log z P(z) t Q(z) 1 t 6/17 Vincent Tan (SSG, LIDS, MIT) Salient Feature Subset Recovery ISIT / 17
18 Definition of Saliency P (d) A is the marginal of P(d) on the subset A {1,..., d}. 7/17 Vincent Tan (SSG, LIDS, MIT) Salient Feature Subset Recovery ISIT / 17
19 Definition of Saliency P (d) A is the marginal of P(d) on the subset A {1,..., d}. Definition: A set S d {1,..., d} is KL-divergence-salient if D(P (d) Q (d) ) = D(P (d) S d Q (d) S d ). 7/17 Vincent Tan (SSG, LIDS, MIT) Salient Feature Subset Recovery ISIT / 17
20 Definition of Saliency P (d) A is the marginal of P(d) on the subset A {1,..., d}. Definition: A set S d {1,..., d} is KL-divergence-salient if D(P (d) Q (d) ) = D(P (d) S d Q (d) S d ). Definition: A set S d {1,..., d} is Chernoff Information-salient if C(P (d), Q (d) ) = C(P (d) S d, Q (d) S d ). 7/17 Vincent Tan (SSG, LIDS, MIT) Salient Feature Subset Recovery ISIT / 17
21 Characterization of Saliency Lemma The following are equivalent: S d is KL-divergence-salient. S d is Chernoff Information-salient. 8/17 Vincent Tan (SSG, LIDS, MIT) Salient Feature Subset Recovery ISIT / 17
22 Characterization of Saliency Lemma The following are equivalent: S d is KL-divergence-salient. S d is Chernoff Information-salient. Conditionals are identical P (d) = P (d) S d W S c d S d Q (d) = Q (d) S d W S c d S d. 8/17 Vincent Tan (SSG, LIDS, MIT) Salient Feature Subset Recovery ISIT / 17
23 Characterization of Saliency Lemma The following are equivalent: S d is KL-divergence-salient. S d is Chernoff Information-salient. Conditionals are identical P (d) = P (d) S d W S c d S d Q (d) = Q (d) S d W S c d S d. What are the scaling laws on (n, d, k) so that the error probability can be made arbitrarily small? 8/17 Vincent Tan (SSG, LIDS, MIT) Salient Feature Subset Recovery ISIT / 17
24 Definition of Achievability A decoder is a set-valued function that maps samples to subsets of size k, i.e., ( ) {1,..., d} ψ n : (X d ) n (X d ) n. k The decoder is given the true value of k, the number of salient features. We are working on relaxing this assumption. 9/17 Vincent Tan (SSG, LIDS, MIT) Salient Feature Subset Recovery ISIT / 17
25 Definition of Achievability A decoder is a set-valued function that maps samples to subsets of size k, i.e., ( ) {1,..., d} ψ n : (X d ) n (X d ) n. k The decoder is given the true value of k, the number of salient features. We are working on relaxing this assumption. Definition: The tuple of model parameters (n, d, k) is achievable for {P (d), Q (d) } d N if there exists a sequence of decoders {ψ n } such that q n (ψ n ) := Pr(ψ n (x n, y n ) S d ) < ɛ, n > N ɛ. 9/17 Vincent Tan (SSG, LIDS, MIT) Salient Feature Subset Recovery ISIT / 17
26 Three Assumptions on Distributions P (d), Q (d) Saliency: For every P (d), Q (d) there exists a salient set S d of known size k, i.e., D(P (d) Q (d) ) = D(P (d) S d Q (d) S d ). 10/17 Vincent Tan (SSG, LIDS, MIT) Salient Feature Subset Recovery ISIT / 17
27 Three Assumptions on Distributions P (d), Q (d) Saliency: For every P (d), Q (d) there exists a salient set S d of known size k, i.e., D(P (d) Q (d) ) = D(P (d) S d Q (d) S d ). η-distinguishability: There exists a constant η > 0 such that D(P (d) S d Q (d) for all T d S d such that T d = k. T d S d ) D(P (d) Q (d) T d ) η 10/17 Vincent Tan (SSG, LIDS, MIT) Salient Feature Subset Recovery ISIT / 17
28 Three Assumptions on Distributions P (d), Q (d) Saliency: For every P (d), Q (d) there exists a salient set S d of known size k, i.e., D(P (d) Q (d) ) = D(P (d) S d Q (d) S d ). η-distinguishability: There exists a constant η > 0 such that D(P (d) S d Q (d) for all T d S d such that T d = k. T d S d ) D(P (d) Q (d) T d ) η L-Boundedness: There exists a constant L (0, ) such that [ (d) ] P S log d (z Sd ) Q (d) [ L, L] S d (z Sd ) for all states z Sd X k. 10/17 Vincent Tan (SSG, LIDS, MIT) Salient Feature Subset Recovery ISIT / 17
29 Achievability Result Theorem If there exists an δ > 0 such that for some B > 0 { ( ) ( )} k d k 2k log X n > max B log, exp, k 1 δ then there exists a sequence of decoders ψ n that satisfies for some exponent E > 0. q n (ψ n) = O(exp( ne)), 11/17 Vincent Tan (SSG, LIDS, MIT) Salient Feature Subset Recovery ISIT / 17
30 Proof Idea and a Corollary Use the exhaustive search decoder. Search for the size-k set with the largest empirical KL-divergence. Large deviation bounds, e.g., Sanov s theorem. 12/17 Vincent Tan (SSG, LIDS, MIT) Salient Feature Subset Recovery ISIT / 17
31 Proof Idea and a Corollary Use the exhaustive search decoder. Search for the size-k set with the largest empirical KL-divergence. Large deviation bounds, e.g., Sanov s theorem. Positivity of error exponents under the specified conditions. Similar to main result in [Ng, UAI 1998] but we focus on exact subset recovery and not generalization error. 12/17 Vincent Tan (SSG, LIDS, MIT) Salient Feature Subset Recovery ISIT / 17
32 Proof Idea and a Corollary Use the exhaustive search decoder. Search for the size-k set with the largest empirical KL-divergence. Large deviation bounds, e.g., Sanov s theorem. Positivity of error exponents under the specified conditions. Similar to main result in [Ng, UAI 1998] but we focus on exact subset recovery and not generalization error. Corollary Let k = k 0 be a constant and R (0, B/k 0 ). Then if n > log d R 12/17 Vincent Tan (SSG, LIDS, MIT) Salient Feature Subset Recovery ISIT / 17
33 Proof Idea and a Corollary Use the exhaustive search decoder. Search for the size-k set with the largest empirical KL-divergence. Large deviation bounds, e.g., Sanov s theorem. Positivity of error exponents under the specified conditions. Similar to main result in [Ng, UAI 1998] but we focus on exact subset recovery and not generalization error. Corollary Let k = k 0 be a constant and R (0, B/k 0 ). Then if n > log d R q n (ψ n) = O (exp( ne)). 12/17 Vincent Tan (SSG, LIDS, MIT) Salient Feature Subset Recovery ISIT / 17
34 Converse Result We assume that the salient set S d is chosen uniformly at random over all subsets of size k. 13/17 Vincent Tan (SSG, LIDS, MIT) Salient Feature Subset Recovery ISIT / 17
35 Converse Result We assume that the salient set S d is chosen uniformly at random over all subsets of size k. Theorem If for some λ (0, 1), then n < λ k log ( ) d k H(P (d) ) + H(Q (d) ) q n (ψ n ) 1 λ for all decoders ψ n. 13/17 Vincent Tan (SSG, LIDS, MIT) Salient Feature Subset Recovery ISIT / 17
36 Remarks Proof is a consequence of Fano s inequality. 14/17 Vincent Tan (SSG, LIDS, MIT) Salient Feature Subset Recovery ISIT / 17
37 Remarks Proof is a consequence of Fano s inequality. Bound never satisfied if variables in Sd c are uniform and independent of S d. ( ) λ d n < H(P (d) ) + H(Q (d) k log ) k }{{} O(d) However, converse is interesting if distributions have additional structure on their entropies. 14/17 Vincent Tan (SSG, LIDS, MIT) Salient Feature Subset Recovery ISIT / 17
38 Remarks Proof is a consequence of Fano s inequality. Bound never satisfied if variables in Sd c are uniform and independent of S d. ( ) λ d n < H(P (d) ) + H(Q (d) k log ) k }{{} O(d) However, converse is interesting if distributions have additional structure on their entropies. Assume most of the features are processed or redundant. Example: there could be two features, BMI and isobese. One is a processed version of the other. 14/17 Vincent Tan (SSG, LIDS, MIT) Salient Feature Subset Recovery ISIT / 17
39 Converse Result Corollary Assume that there there exists a M < such that the conditional entropies satisfy { max H ( P (d) ) ( (d) Sd c S d, H Q Sd d) } c S M k. 15/17 Vincent Tan (SSG, LIDS, MIT) Salient Feature Subset Recovery ISIT / 17
40 Converse Result Corollary Assume that there there exists a M < such that the conditional entropies satisfy { max H ( P (d) ) ( (d) Sd c S d, H Q Sd d) } c S M k. If the number of samples satisfies ( ) λ d n < 2(M + log X ) log k then q n (ψ n ) 1 λ. 15/17 Vincent Tan (SSG, LIDS, MIT) Salient Feature Subset Recovery ISIT / 17
41 Comparison to Achievability Result Assume d = exp(nr) and k is constant. 16/17 Vincent Tan (SSG, LIDS, MIT) Salient Feature Subset Recovery ISIT / 17
42 Comparison to Achievability Result Assume d = exp(nr) and k is constant. There is a rate R ach so that if R < R ach, then (n, d, k) is achievable. Conversely, there is another rate R conv so that if R > R conv, then recovery is not possible. 0 R 16/17 Vincent Tan (SSG, LIDS, MIT) Salient Feature Subset Recovery ISIT / 17
43 Comparison to Achievability Result Assume d = exp(nr) and k is constant. There is a rate R ach so that if R < R ach, then (n, d, k) is achievable. Conversely, there is another rate R conv so that if R > R conv, then recovery is not possible. 0 R R ach 16/17 Vincent Tan (SSG, LIDS, MIT) Salient Feature Subset Recovery ISIT / 17
44 Comparison to Achievability Result Assume d = exp(nr) and k is constant. There is a rate R ach so that if R < R ach, then (n, d, k) is achievable. Conversely, there is another rate R conv so that if R > R conv, then recovery is not possible. 0 R R ach R conv 16/17 Vincent Tan (SSG, LIDS, MIT) Salient Feature Subset Recovery ISIT / 17
45 Conclusions Provided an information-theoretic definition of saliency motivated by error exponents in hypothesis testing. 17/17 Vincent Tan (SSG, LIDS, MIT) Salient Feature Subset Recovery ISIT / 17
46 Conclusions Provided an information-theoretic definition of saliency motivated by error exponents in hypothesis testing. Provided necessary and sufficient conditions for salient set recovery. Number of samples n can be much smaller than d, total number of variables. 17/17 Vincent Tan (SSG, LIDS, MIT) Salient Feature Subset Recovery ISIT / 17
47 Conclusions Provided an information-theoretic definition of saliency motivated by error exponents in hypothesis testing. Provided necessary and sufficient conditions for salient set recovery. Number of samples n can be much smaller than d, total number of variables. In the paper, we provide a computationally efficient and consistent algorithm to search for S d when P (d) and Q (d) are Markov on trees. 17/17 Vincent Tan (SSG, LIDS, MIT) Salient Feature Subset Recovery ISIT / 17
Large-Deviations and Applications for Learning Tree-Structured Graphical Models
Large-Deviations and Applications for Learning Tree-Structured Graphical Models Vincent Tan Stochastic Systems Group, Lab of Information and Decision Systems, Massachusetts Institute of Technology Thesis
More informationSecond-Order Asymptotics in Information Theory
Second-Order Asymptotics in Information Theory Vincent Y. F. Tan (vtan@nus.edu.sg) Dept. of ECE and Dept. of Mathematics National University of Singapore (NUS) National Taiwan University November 2015
More informationInformation Theory and Statistics, Part I
Information Theory and Statistics, Part I Information Theory 2013 Lecture 6 George Mathai May 16, 2013 Outline This lecture will cover Method of Types. Law of Large Numbers. Universal Source Coding. Large
More informationLECTURE 13. Last time: Lecture outline
LECTURE 13 Last time: Strong coding theorem Revisiting channel and codes Bound on probability of error Error exponent Lecture outline Fano s Lemma revisited Fano s inequality for codewords Converse to
More informationLecture 22: Error exponents in hypothesis testing, GLRT
10-704: Information Processing and Learning Spring 2012 Lecture 22: Error exponents in hypothesis testing, GLRT Lecturer: Aarti Singh Scribe: Aarti Singh Disclaimer: These notes have not been subjected
More informationInformation Theory and Hypothesis Testing
Summer School on Game Theory and Telecommunications Campione, 7-12 September, 2014 Information Theory and Hypothesis Testing Mauro Barni University of Siena September 8 Review of some basic results linking
More information10-704: Information Processing and Learning Fall Lecture 24: Dec 7
0-704: Information Processing and Learning Fall 206 Lecturer: Aarti Singh Lecture 24: Dec 7 Note: These notes are based on scribed notes from Spring5 offering of this course. LaTeX template courtesy of
More informationNational University of Singapore Department of Electrical & Computer Engineering. Examination for
National University of Singapore Department of Electrical & Computer Engineering Examination for EE5139R Information Theory for Communication Systems (Semester I, 2014/15) November/December 2014 Time Allowed:
More informationHomework Set #2 Data Compression, Huffman code and AEP
Homework Set #2 Data Compression, Huffman code and AEP 1. Huffman coding. Consider the random variable ( x1 x X = 2 x 3 x 4 x 5 x 6 x 7 0.50 0.26 0.11 0.04 0.04 0.03 0.02 (a Find a binary Huffman code
More informationNetwork coding for multicast relation to compression and generalization of Slepian-Wolf
Network coding for multicast relation to compression and generalization of Slepian-Wolf 1 Overview Review of Slepian-Wolf Distributed network compression Error exponents Source-channel separation issues
More informationAsymptotic redundancy and prolixity
Asymptotic redundancy and prolixity Yuval Dagan, Yuval Filmus, and Shay Moran April 6, 2017 Abstract Gallager (1978) considered the worst-case redundancy of Huffman codes as the maximum probability tends
More informationLECTURE 10. Last time: Lecture outline
LECTURE 10 Joint AEP Coding Theorem Last time: Error Exponents Lecture outline Strong Coding Theorem Reading: Gallager, Chapter 5. Review Joint AEP A ( ɛ n) (X) A ( ɛ n) (Y ) vs. A ( ɛ n) (X, Y ) 2 nh(x)
More information4F5: Advanced Communications and Coding Handout 2: The Typical Set, Compression, Mutual Information
4F5: Advanced Communications and Coding Handout 2: The Typical Set, Compression, Mutual Information Ramji Venkataramanan Signal Processing and Communications Lab Department of Engineering ramji.v@eng.cam.ac.uk
More informationGuesswork Subject to a Total Entropy Budget
Guesswork Subject to a Total Entropy Budget Arman Rezaee, Ahmad Beirami, Ali Makhdoumi, Muriel Médard, and Ken Duffy arxiv:72.09082v [cs.it] 25 Dec 207 Abstract We consider an abstraction of computational
More informationInformation Theory in Intelligent Decision Making
Information Theory in Intelligent Decision Making Adaptive Systems and Algorithms Research Groups School of Computer Science University of Hertfordshire, United Kingdom June 7, 2015 Information Theory
More informationINFORMATION THEORY AND STATISTICS
CHAPTER INFORMATION THEORY AND STATISTICS We now explore the relationship between information theory and statistics. We begin by describing the method of types, which is a powerful technique in large deviation
More informationarxiv: v4 [cs.it] 17 Oct 2015
Upper Bounds on the Relative Entropy and Rényi Divergence as a Function of Total Variation Distance for Finite Alphabets Igal Sason Department of Electrical Engineering Technion Israel Institute of Technology
More informationTwo Applications of the Gaussian Poincaré Inequality in the Shannon Theory
Two Applications of the Gaussian Poincaré Inequality in the Shannon Theory Vincent Y. F. Tan (Joint work with Silas L. Fong) National University of Singapore (NUS) 2016 International Zurich Seminar on
More informationMGMT 69000: Topics in High-dimensional Data Analysis Falll 2016
MGMT 69000: Topics in High-dimensional Data Analysis Falll 2016 Lecture 14: Information Theoretic Methods Lecturer: Jiaming Xu Scribe: Hilda Ibriga, Adarsh Barik, December 02, 2016 Outline f-divergence
More informationEE5319R: Problem Set 3 Assigned: 24/08/16, Due: 31/08/16
EE539R: Problem Set 3 Assigned: 24/08/6, Due: 3/08/6. Cover and Thomas: Problem 2.30 (Maimum Entropy): Solution: We are required to maimize H(P X ) over all distributions P X on the non-negative integers
More information10-704: Information Processing and Learning Fall Lecture 9: Sept 28
10-704: Information Processing and Learning Fall 2016 Lecturer: Siheng Chen Lecture 9: Sept 28 Note: These notes are based on scribed notes from Spring15 offering of this course. LaTeX template courtesy
More informationx log x, which is strictly convex, and use Jensen s Inequality:
2. Information measures: mutual information 2.1 Divergence: main inequality Theorem 2.1 (Information Inequality). D(P Q) 0 ; D(P Q) = 0 iff P = Q Proof. Let ϕ(x) x log x, which is strictly convex, and
More informationMASSACHUSETTS INSTITUTE OF TECHNOLOGY Department of Electrical Engineering and Computer Science Transmission of Information Spring 2006
MASSACHUSETTS INSTITUTE OF TECHNOLOGY Department of Electrical Engineering and Computer Science 6.44 Transmission of Information Spring 2006 Homework 2 Solution name username April 4, 2006 Reading: Chapter
More informationInformation measures in simple coding problems
Part I Information measures in simple coding problems in this web service in this web service Source coding and hypothesis testing; information measures A(discrete)source is a sequence {X i } i= of random
More informationData Compression. Limit of Information Compression. October, Examples of codes 1
Data Compression Limit of Information Compression Radu Trîmbiţaş October, 202 Outline Contents Eamples of codes 2 Kraft Inequality 4 2. Kraft Inequality............................ 4 2.2 Kraft inequality
More informationHomework 1 Due: Thursday 2/5/2015. Instructions: Turn in your homework in class on Thursday 2/5/2015
10-704 Homework 1 Due: Thursday 2/5/2015 Instructions: Turn in your homework in class on Thursday 2/5/2015 1. Information Theory Basics and Inequalities C&T 2.47, 2.29 (a) A deck of n cards in order 1,
More informationEE5139R: Problem Set 4 Assigned: 31/08/16, Due: 07/09/16
EE539R: Problem Set 4 Assigned: 3/08/6, Due: 07/09/6. Cover and Thomas: Problem 3.5 Sets defined by probabilities: Define the set C n (t = {x n : P X n(x n 2 nt } (a We have = P X n(x n P X n(x n 2 nt
More informationArimoto Channel Coding Converse and Rényi Divergence
Arimoto Channel Coding Converse and Rényi Divergence Yury Polyanskiy and Sergio Verdú Abstract Arimoto proved a non-asymptotic upper bound on the probability of successful decoding achievable by any code
More informationFrans M.J. Willems. Authentication Based on Secret-Key Generation. Frans M.J. Willems. (joint work w. Tanya Ignatenko)
Eindhoven University of Technology IEEE EURASIP Spain Seminar on Signal Processing, Communication and Information Theory, Universidad Carlos III de Madrid, December 11, 2014 : Secret-Based Authentication
More informationFeedback Capacity of a Class of Symmetric Finite-State Markov Channels
Feedback Capacity of a Class of Symmetric Finite-State Markov Channels Nevroz Şen, Fady Alajaji and Serdar Yüksel Department of Mathematics and Statistics Queen s University Kingston, ON K7L 3N6, Canada
More informationTight Upper Bounds on the Redundancy of Optimal Binary AIFV Codes
Tight Upper Bounds on the Redundancy of Optimal Binary AIFV Codes Weihua Hu Dept. of Mathematical Eng. Email: weihua96@gmail.com Hirosuke Yamamoto Dept. of Complexity Sci. and Eng. Email: Hirosuke@ieee.org
More informationEntropy as a measure of surprise
Entropy as a measure of surprise Lecture 5: Sam Roweis September 26, 25 What does information do? It removes uncertainty. Information Conveyed = Uncertainty Removed = Surprise Yielded. How should we quantify
More informationLECTURE 15. Last time: Feedback channel: setting up the problem. Lecture outline. Joint source and channel coding theorem
LECTURE 15 Last time: Feedback channel: setting up the problem Perfect feedback Feedback capacity Data compression Lecture outline Joint source and channel coding theorem Converse Robustness Brain teaser
More informationLecture 8: Information Theory and Statistics
Lecture 8: Information Theory and Statistics Part II: Hypothesis Testing and Estimation I-Hsiang Wang Department of Electrical Engineering National Taiwan University ihwang@ntu.edu.tw December 22, 2015
More informationSource Coding. Master Universitario en Ingeniería de Telecomunicación. I. Santamaría Universidad de Cantabria
Source Coding Master Universitario en Ingeniería de Telecomunicación I. Santamaría Universidad de Cantabria Contents Introduction Asymptotic Equipartition Property Optimal Codes (Huffman Coding) Universal
More informationfor some error exponent E( R) as a function R,
. Capacity-achieving codes via Forney concatenation Shannon s Noisy Channel Theorem assures us the existence of capacity-achieving codes. However, exhaustive search for the code has double-exponential
More informationLecture 8: Information Theory and Statistics
Lecture 8: Information Theory and Statistics Part II: Hypothesis Testing and I-Hsiang Wang Department of Electrical Engineering National Taiwan University ihwang@ntu.edu.tw December 23, 2015 1 / 50 I-Hsiang
More informationEECS 750. Hypothesis Testing with Communication Constraints
EECS 750 Hypothesis Testing with Communication Constraints Name: Dinesh Krithivasan Abstract In this report, we study a modification of the classical statistical problem of bivariate hypothesis testing.
More informationDiscrete Markov Random Fields the Inference story. Pradeep Ravikumar
Discrete Markov Random Fields the Inference story Pradeep Ravikumar Graphical Models, The History How to model stochastic processes of the world? I want to model the world, and I like graphs... 2 Mid to
More informationHypothesis Testing with Communication Constraints
Hypothesis Testing with Communication Constraints Dinesh Krithivasan EECS 750 April 17, 2006 Dinesh Krithivasan (EECS 750) Hyp. testing with comm. constraints April 17, 2006 1 / 21 Presentation Outline
More informationECE Advanced Communication Theory, Spring 2009 Homework #1 (INCOMPLETE)
ECE 74 - Advanced Communication Theory, Spring 2009 Homework #1 (INCOMPLETE) 1. A Huffman code finds the optimal codeword to assign to a given block of source symbols. (a) Show that cannot be a Huffman
More informationINFORMATION PROCESSING ABILITY OF BINARY DETECTORS AND BLOCK DECODERS. Michael A. Lexa and Don H. Johnson
INFORMATION PROCESSING ABILITY OF BINARY DETECTORS AND BLOCK DECODERS Michael A. Lexa and Don H. Johnson Rice University Department of Electrical and Computer Engineering Houston, TX 775-892 amlexa@rice.edu,
More informationEntropy, Inference, and Channel Coding
Entropy, Inference, and Channel Coding Sean Meyn Department of Electrical and Computer Engineering University of Illinois and the Coordinated Science Laboratory NSF support: ECS 02-17836, ITR 00-85929
More informationSolutions to Set #2 Data Compression, Huffman code and AEP
Solutions to Set #2 Data Compression, Huffman code and AEP. Huffman coding. Consider the random variable ( ) x x X = 2 x 3 x 4 x 5 x 6 x 7 0.50 0.26 0. 0.04 0.04 0.03 0.02 (a) Find a binary Huffman code
More informationlossless, optimal compressor
6. Variable-length Lossless Compression The principal engineering goal of compression is to represent a given sequence a, a 2,..., a n produced by a source as a sequence of bits of minimal possible length.
More informationChapter 5: Data Compression
Chapter 5: Data Compression Definition. A source code C for a random variable X is a mapping from the range of X to the set of finite length strings of symbols from a D-ary alphabet. ˆX: source alphabet,
More informationTight Bounds for Symmetric Divergence Measures and a Refined Bound for Lossless Source Coding
APPEARS IN THE IEEE TRANSACTIONS ON INFORMATION THEORY, FEBRUARY 015 1 Tight Bounds for Symmetric Divergence Measures and a Refined Bound for Lossless Source Coding Igal Sason Abstract Tight bounds for
More informationIntroduction to Information Theory
Introduction to Information Theory Impressive slide presentations Radu Trîmbiţaş UBB October 2012 Radu Trîmbiţaş (UBB) Introduction to Information Theory October 2012 1 / 19 Transmission of information
More informationELEMENTS O F INFORMATION THEORY
ELEMENTS O F INFORMATION THEORY THOMAS M. COVER JOY A. THOMAS Preface to the Second Edition Preface to the First Edition Acknowledgments for the Second Edition Acknowledgments for the First Edition x
More informationSolutions to Homework Set #1 Sanov s Theorem, Rate distortion
st Semester 00/ Solutions to Homework Set # Sanov s Theorem, Rate distortion. Sanov s theorem: Prove the simple version of Sanov s theorem for the binary random variables, i.e., let X,X,...,X n be a sequence
More informationIT and large deviation theory
PhD short course Information Theory and Statistics Siena, 15-19 September, 2014 IT and large deviation theory Mauro Barni University of Siena Outline of the short course Part 1: Information theory in a
More informationRefined Bounds on the Empirical Distribution of Good Channel Codes via Concentration Inequalities
Refined Bounds on the Empirical Distribution of Good Channel Codes via Concentration Inequalities Maxim Raginsky and Igal Sason ISIT 2013, Istanbul, Turkey Capacity-Achieving Channel Codes The set-up DMC
More informationCoding on Countably Infinite Alphabets
Coding on Countably Infinite Alphabets Non-parametric Information Theory Licence de droits d usage Outline Lossless Coding on infinite alphabets Source Coding Universal Coding Infinite Alphabets Enveloppe
More informationInformation Theory and Statistics Lecture 2: Source coding
Information Theory and Statistics Lecture 2: Source coding Łukasz Dębowski ldebowsk@ipipan.waw.pl Ph. D. Programme 2013/2014 Injections and codes Definition (injection) Function f is called an injection
More informationUncertainty. Jayakrishnan Unnikrishnan. CSL June PhD Defense ECE Department
Decision-Making under Statistical Uncertainty Jayakrishnan Unnikrishnan PhD Defense ECE Department University of Illinois at Urbana-Champaign CSL 141 12 June 2010 Statistical Decision-Making Relevant in
More informationLecture 16. Error-free variable length schemes (contd.): Shannon-Fano-Elias code, Huffman code
Lecture 16 Agenda for the lecture Error-free variable length schemes (contd.): Shannon-Fano-Elias code, Huffman code Variable-length source codes with error 16.1 Error-free coding schemes 16.1.1 The Shannon-Fano-Elias
More informationTwo-Part Codes with Low Worst-Case Redundancies for Distributed Compression of Bernoulli Sequences
2003 Conference on Information Sciences and Systems, The Johns Hopkins University, March 2 4, 2003 Two-Part Codes with Low Worst-Case Redundancies for Distributed Compression of Bernoulli Sequences Dror
More informationMaximization of the information divergence from the multinomial distributions 1
aximization of the information divergence from the multinomial distributions Jozef Juríček Charles University in Prague Faculty of athematics and Physics Department of Probability and athematical Statistics
More informationOn Improved Bounds for Probability Metrics and f- Divergences
IRWIN AND JOAN JACOBS CENTER FOR COMMUNICATION AND INFORMATION TECHNOLOGIES On Improved Bounds for Probability Metrics and f- Divergences Igal Sason CCIT Report #855 March 014 Electronics Computers Communications
More informationOn the Complexity of Best Arm Identification with Fixed Confidence
On the Complexity of Best Arm Identification with Fixed Confidence Discrete Optimization with Noise Aurélien Garivier, Emilie Kaufmann COLT, June 23 th 2016, New York Institut de Mathématiques de Toulouse
More informationECE 587 / STA 563: Lecture 5 Lossless Compression
ECE 587 / STA 563: Lecture 5 Lossless Compression Information Theory Duke University, Fall 2017 Author: Galen Reeves Last Modified: October 18, 2017 Outline of lecture: 5.1 Introduction to Lossless Source
More informationOn the Entropy of Sums of Bernoulli Random Variables via the Chen-Stein Method
On the Entropy of Sums of Bernoulli Random Variables via the Chen-Stein Method Igal Sason Department of Electrical Engineering Technion - Israel Institute of Technology Haifa 32000, Israel ETH, Zurich,
More informationECE 587 / STA 563: Lecture 5 Lossless Compression
ECE 587 / STA 563: Lecture 5 Lossless Compression Information Theory Duke University, Fall 28 Author: Galen Reeves Last Modified: September 27, 28 Outline of lecture: 5. Introduction to Lossless Source
More informationIEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 57, NO. 3, MARCH
IEEE TRANSACTIONS ON INFORMATION THEORY, VOL 57, NO 3, MARCH 2011 1587 Universal and Composite Hypothesis Testing via Mismatched Divergence Jayakrishnan Unnikrishnan, Member, IEEE, Dayu Huang, Student
More informationCoding of memoryless sources 1/35
Coding of memoryless sources 1/35 Outline 1. Morse coding ; 2. Definitions : encoding, encoding efficiency ; 3. fixed length codes, encoding integers ; 4. prefix condition ; 5. Kraft and Mac Millan theorems
More informationMachine Learning Basics Lecture 7: Multiclass Classification. Princeton University COS 495 Instructor: Yingyu Liang
Machine Learning Basics Lecture 7: Multiclass Classification Princeton University COS 495 Instructor: Yingyu Liang Example: image classification indoor Indoor outdoor Example: image classification (multiclass)
More informationOn the Necessity of Binning for the Distributed Hypothesis Testing Problem
On the Necessity of Binning for the Distributed Hypothesis Testing Problem Gil Katz, Pablo Piantanida, Romain Couillet, Merouane Debbah To cite this version: Gil Katz, Pablo Piantanida, Romain Couillet,
More informationQuantitative Biology II Lecture 4: Variational Methods
10 th March 2015 Quantitative Biology II Lecture 4: Variational Methods Gurinder Singh Mickey Atwal Center for Quantitative Biology Cold Spring Harbor Laboratory Image credit: Mike West Summary Approximate
More informationIntroduction to Machine Learning Lecture 14. Mehryar Mohri Courant Institute and Google Research
Introduction to Machine Learning Lecture 14 Mehryar Mohri Courant Institute and Google Research mohri@cims.nyu.edu Density Estimation Maxent Models 2 Entropy Definition: the entropy of a random variable
More information. Then V l on K l, and so. e e 1.
Sanov s Theorem Let E be a Polish space, and define L n : E n M E to be the empirical measure given by L n x = n n m= δ x m for x = x,..., x n E n. Given a µ M E, denote by µ n the distribution of L n
More informationInformation Recovery from Pairwise Measurements: A Shannon-Theoretic Approach
Information Recovery from Pairwise Measurements: A Shannon-Theoretic Approach Yuxin Chen Statistics Stanford University Email: yxchen@stanfordedu Changho Suh EE KAIST Email: chsuh@kaistackr Andrea J Goldsmith
More informationRemote Source Coding with Two-Sided Information
Remote Source Coding with Two-Sided Information Basak Guler Ebrahim MolavianJazi Aylin Yener Wireless Communications and Networking Laboratory Department of Electrical Engineering The Pennsylvania State
More informationChapter 11. Information Theory and Statistics
Chapter 11 Information Theory and Statistics Peng-Hua Wang Graduate Inst. of Comm. Engineering National Taipei University Chapter Outline Chap. 11 Information Theory and Statistics 11.1 Method of Types
More informationThe Method of Types and Its Application to Information Hiding
The Method of Types and Its Application to Information Hiding Pierre Moulin University of Illinois at Urbana-Champaign www.ifp.uiuc.edu/ moulin/talks/eusipco05-slides.pdf EUSIPCO Antalya, September 7,
More informationInformation Theory and Feature Selection (Joint Informativeness and Tractability)
Information Theory and Feature Selection (Joint Informativeness and Tractability) Leonidas Lefakis Zalando Research Labs 1 / 66 Dimensionality Reduction Feature Construction Construction X 1,..., X D f
More informationInformation Theory with Applications, Math6397 Lecture Notes from September 30, 2014 taken by Ilknur Telkes
Information Theory with Applications, Math6397 Lecture Notes from September 3, 24 taken by Ilknur Telkes Last Time Kraft inequality (sep.or) prefix code Shannon Fano code Bound for average code-word length
More informationSHARED INFORMATION. Prakash Narayan with. Imre Csiszár, Sirin Nitinawarat, Himanshu Tyagi, Shun Watanabe
SHARED INFORMATION Prakash Narayan with Imre Csiszár, Sirin Nitinawarat, Himanshu Tyagi, Shun Watanabe 2/40 Acknowledgement Praneeth Boda Himanshu Tyagi Shun Watanabe 3/40 Outline Two-terminal model: Mutual
More information10-704: Information Processing and Learning Fall Lecture 10: Oct 3
0-704: Information Processing and Learning Fall 206 Lecturer: Aarti Singh Lecture 0: Oct 3 Note: These notes are based on scribed notes from Spring5 offering of this course. LaTeX template courtesy of
More informationLecture 10: Broadcast Channel and Superposition Coding
Lecture 10: Broadcast Channel and Superposition Coding Scribed by: Zhe Yao 1 Broadcast channel M 0M 1M P{y 1 y x} M M 01 1 M M 0 The capacity of the broadcast channel depends only on the marginal conditional
More information13 : Variational Inference: Loopy Belief Propagation and Mean Field
10-708: Probabilistic Graphical Models 10-708, Spring 2012 13 : Variational Inference: Loopy Belief Propagation and Mean Field Lecturer: Eric P. Xing Scribes: Peter Schulam and William Wang 1 Introduction
More informationStrong Converse and Stein s Lemma in the Quantum Hypothesis Testing
Strong Converse and Stein s Lemma in the Quantum Hypothesis Testing arxiv:uant-ph/9906090 v 24 Jun 999 Tomohiro Ogawa and Hiroshi Nagaoka Abstract The hypothesis testing problem of two uantum states is
More informationLecture 21: Minimax Theory
Lecture : Minimax Theory Akshay Krishnamurthy akshay@cs.umass.edu November 8, 07 Recap In the first part of the course, we spent the majority of our time studying risk minimization. We found many ways
More informationIntermittent Communication
Intermittent Communication Mostafa Khoshnevisan, Student Member, IEEE, and J. Nicholas Laneman, Senior Member, IEEE arxiv:32.42v2 [cs.it] 7 Mar 207 Abstract We formulate a model for intermittent communication
More informationA Formula for the Capacity of the General Gel fand-pinsker Channel
A Formula for the Capacity of the General Gel fand-pinsker Channel Vincent Y. F. Tan Institute for Infocomm Research (I 2 R, A*STAR, Email: tanyfv@i2r.a-star.edu.sg ECE Dept., National University of Singapore
More information10-704: Information Processing and Learning Fall Lecture 21: Nov 14. sup
0-704: Information Processing and Learning Fall 206 Lecturer: Aarti Singh Lecture 2: Nov 4 Note: hese notes are based on scribed notes from Spring5 offering of this course LaeX template courtesy of UC
More informationQuiz 2 Date: Monday, November 21, 2016
10-704 Information Processing and Learning Fall 2016 Quiz 2 Date: Monday, November 21, 2016 Name: Andrew ID: Department: Guidelines: 1. PLEASE DO NOT TURN THIS PAGE UNTIL INSTRUCTED. 2. Write your name,
More informationUniversal Estimation of Divergence for Continuous Distributions via Data-Dependent Partitions
Universal Estimation of for Continuous Distributions via Data-Dependent Partitions Qing Wang, Sanjeev R. Kulkarni, Sergio Verdú Department of Electrical Engineering Princeton University Princeton, NJ 8544
More informationAnonymous Heterogeneous Distributed Detection: Optimal Decision Rules, Error Exponents, and the Price of Anonymity
Anonymous Heterogeneous Distributed Detection: Optimal Decision Rules, Error Exponents, and the Price of Anonymity Wei-Ning Chen and I-Hsiang Wang arxiv:805.03554v2 [cs.it] 29 Jul 208 Abstract We explore
More informationCapacity of the Discrete Memoryless Energy Harvesting Channel with Side Information
204 IEEE International Symposium on Information Theory Capacity of the Discrete Memoryless Energy Harvesting Channel with Side Information Omur Ozel, Kaya Tutuncuoglu 2, Sennur Ulukus, and Aylin Yener
More informationGraph Coloring and Conditional Graph Entropy
Graph Coloring and Conditional Graph Entropy Vishal Doshi, Devavrat Shah, Muriel Médard, Sidharth Jaggi Laboratory for Information and Decision Systems Massachusetts Institute of Technology Cambridge,
More informationONE of Shannon s key discoveries was that for quite
IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 44, NO. 6, OCTOBER 1998 2505 The Method of Types Imre Csiszár, Fellow, IEEE (Invited Paper) Abstract The method of types is one of the key technical tools
More informationAn Achievable Error Exponent for the Mismatched Multiple-Access Channel
An Achievable Error Exponent for the Mismatched Multiple-Access Channel Jonathan Scarlett University of Cambridge jms265@camacuk Albert Guillén i Fàbregas ICREA & Universitat Pompeu Fabra University of
More information(each row defines a probability distribution). Given n-strings x X n, y Y n we can use the absence of memory in the channel to compute
ENEE 739C: Advanced Topics in Signal Processing: Coding Theory Instructor: Alexander Barg Lecture 6 (draft; 9/6/03. Error exponents for Discrete Memoryless Channels http://www.enee.umd.edu/ abarg/enee739c/course.html
More informationPassword Cracking: The Effect of Bias on the Average Guesswork of Hash Functions
Password Cracking: The Effect of Bias on the Average Guesswork of Hash Functions Yair Yona, and Suhas Diggavi, Fellow, IEEE Abstract arxiv:608.0232v4 [cs.cr] Jan 207 In this work we analyze the average
More informationFinding the best mismatched detector for channel coding and hypothesis testing
Finding the best mismatched detector for channel coding and hypothesis testing Sean Meyn Department of Electrical and Computer Engineering University of Illinois and the Coordinated Science Laboratory
More informationEstimation of signal information content for classification
Estimation of signal information content for classification The MIT Faculty has made this article openly available. Please share how this access benefits you. Your story matters. Citation As Published
More informationLecture 3 : Algorithms for source coding. September 30, 2016
Lecture 3 : Algorithms for source coding September 30, 2016 Outline 1. Huffman code ; proof of optimality ; 2. Coding with intervals : Shannon-Fano-Elias code and Shannon code ; 3. Arithmetic coding. 1/39
More informationLecture 22: Final Review
Lecture 22: Final Review Nuts and bolts Fundamental questions and limits Tools Practical algorithms Future topics Dr Yao Xie, ECE587, Information Theory, Duke University Basics Dr Yao Xie, ECE587, Information
More informationInformation-theoretic Secrecy A Cryptographic Perspective
Information-theoretic Secrecy A Cryptographic Perspective Stefano Tessaro UC Santa Barbara WCS 2017 April 30, 2017 based on joint works with M. Bellare and A. Vardy Cryptography Computational assumptions
More information