Information Theory and Hypothesis Testing

Size: px
Start display at page:

Download "Information Theory and Hypothesis Testing"

Transcription

1 Summer School on Game Theory and Telecommunications Campione, 7-12 September, 2014 Information Theory and Hypothesis Testing Mauro Barni University of Siena

2 September 8 Review of some basic results linking Information Theory and Statistics (Hypothesis Testing) September 9 Application of game theory and IT to adversarial signal processing: Hypothesis Testing game

3 Outline Review of basic concepts Hypotesis testing Information Theory The method of types in Information Theory Some classical applications Law of large numbers Sanov s theorem and large deviation theory Hypothesis testing

4 Hypothesis testing

5 Hypothesis testing: problem definition H 0 ~ P 0 x n H 1 ~ P 1 H 0 or H 1? 0/1 Goal Decide in favor of H 0 or H1 according to a predefined optimality criterion Decision rule: φ(x n ) = 0 /1 Acceptance region: Λ 0 = {x n :φ(x n ) = 0}

6 Hypothesis testing: problem definition I type error probability (false alarm, false positive): P 1/0 = P f = Pr{φ(X n ) =1/ H 0 } = P 0 (X n Λ 1 ) II type error probability (missed detection, false negative): P 0/1 = P m = Pr{φ(X n ) = 0 / H 1 } = P 1 (X n Λ 0 ) A priori probabilities: P H0, P H1 Costs (losses, risk) : L 1/0, L 0/1

7 Bayes criterion Minimize Bayes risk φ(x n ) : L =P H0 P 1/0 L 1/0 + P H1 P 0/1 L 0/1 is minimum Optimum acceptance region #% Λ 0 = x n : P 0 (xn ) P 1 (x n ) P H 1 L 0/1 $ &% P H0 L 1/0 '% ( )% Likelihood ratio

8 Neyman - Pearson criterion Problems with Bayes approach A priori probabilities needed Difficulty of defining costs especially for rare events Neyman Pearson approach Minimize P m subject to a constraint on P f # Λ 0 = x n : P 0 (xn ) $ % P 1 (x n ) T & ' ( Determined by letting P f = λ Likelyhood ratio

9 An example: Gaussian observables (1/2) P 0 = N(µ 0,σ 2 ) = L(x n ) = e e (x i µ 0 ) 2 i 2σ 2 i (x i µ 1 ) 2 2σ 2 1 2πσ 2 e log (x µ 0 ) 2 2σ 2 P 1 = N(µ 1,σ 2 ) µ 1 > µ 0 (x i µ 1 ) 2 (x i i µ 0 ) 2 i somealgebra 1 X = n i x i? τ! P(X H 0 ) = N µ 0, σ 2 # " n $! &, P(X H 1 ) = N µ 1, σ 2 # % " n $ & %! Determine τ so that N µ 0, σ 2 # " n τ $ & = λ %

10 An example: Gaussian observables (2/2) P f P m! N µ 0, σ 2 # " n $ & % τ ρ! N µ 1, σ 2 $ # & " n %

11 ROC curve 1,0E+00 1,0E-01 1,0E-02 1,0E-03 1,0E-04 1,0E-05 Pm 1,0E-06 1,0E-07 1,0E-08 1,0E-09 1,0E-10 1,0E-11 1,0E-12 1,0E-13 1,E-13 1,E-12 1,E-11 1,E-10 1,E-09 1,E-08 1,E-07 1,E-06 1,E-05 1,E-04 1,E-03 1,E-02 1,E-01 1,E+00 P f

12 Information Theory

13 Measuring information: Shannon s approach Model a source of information as a random variable Information related to ignorance and impredectability Information is related only to the probability of the events not to their values No attempt to model the importance of information in a given context No link with perceived level of information Other important aspects retained

14 Axiomatic definition of Entropy Given a discrete source (random variable) X, with alphabet X look for a measure of information with the following properties H 2 (p,1 p) is continuous in p " 1 H 2 2, 1 % $ ' = 1 (normalization, bit) # 2 & " p H m (p 1, p 2... p m ) = H m 1 (p 1 + p 2, p 3... p m )+ (p 1 + p 2 )H 1 p 2, 2 % $ ' Grouping property # p 1 + p 2 p 1 + p 2 & H m (p 1, p 2... p m ) = H m (σ (p 1, p 2... p m )) independence from permutation With these axioms we necessarily have X Source Entropy = H(X) = p i log 2 p i = p(x)log 2 p(x) i=1 x X

15 Source coding The source coding theorem clarifies the meaning of the Entropy Source Coding Theorem (Shannon 1948): Given a discrete memoryless source X, a lossless coding scheme exists if and only if the rate of the code is lower than the entropy R < H. The entropy catches the essence of information in the sense that it gives the minimum number bits necessary to describe the output of a source

16 Other quantities We can define several other quantities catching different aspects of the information measure when two or more sources are involved H(X,Y ) = p(x, y)log p(x, y) x y H(X Y ) = p(x, y)log p(x y) x I(X;Y ) = p(x, y)log x y y p(x, y) p(x)p(y)

17 Relative entropy In the rest of the talk we will make an extensive use of the following quantity (also called Divergence, Kullback-Leibler distance): D(P Q) = x X P(x)log P(x) Q(x) Divergence can be interpreted as a kind of distance between pmf s D(P Q) > 0 D(P Q) = 0 P Q iff P = Q

18 The method of types

19 Type or empirical probability Type, or empirical probability, of a sequence P x n (a) = N(a xn ) n a X Set with all the types with denominator n P n = all types with denominator n '! 1 if X = {0,1} P 5 = ( 0,1), 5, 4 $! 2 # &, " 5% 5, 3 $! 3 # &, " 5% 5, 2 $! 4 # &, " 5% 5, 1 $ ( # &, 1, 0 ) " 5% ( ) * +,

20 Type class Type class: all the sequences having the same type T(P) = { x n X n : P x n = P} Example: x 5 = 01100! P x 5 = 3 5, 2 $ # & " 5% T ( P ) x 5 = ') ( *) 11000,10100,10010,10001, , 01001, 00110, 00101, ), -)

21 Number of types The number of types grows polynomially with n Theorem The number of types with denominator n is upper bounded by: P n (n +1) X Proof. Obvious.

22 Probability of a sequence Theorem The probability that a sequence x = x n is emitted by a DMS source with pmf Q is Q(x) = 2 n ( H (P x ) +D(P x Q) ) if P x = Q Q(x) = 2 nh (P x ) nh (Q) = 2 Remember The larger the KL distance from the type of x and Q the lower the probability.

23 Probability of a sequence Proof. i Q(x) = Q(x i ) = a X Q(a) N (a x) = Q(a) np x (a) = 2 np x (a)logq(a) a X a X a X = 2 n[p x (a)logq(a) P x (a)log P x (a)+p x (a)log P x (a)] = 2 n a " P x (a)log P x (a) Q(a) +P % $ x (a)log P x (a)' # & = 2 n [ H (P x )+D(P x Q) ]

24 Size of a type class Theorem The size of a type class T(P) can be bounded as follows: 1 (n +1) X 2 nh (P) T(P) 2 nh (P) Remember The size of a type class grows exponentially with growing rate equal to the entropy of the type.

25 Size of a type class Proof. (upper bound) Given P P n consider the probability that a source with pmf P emits a sequence in T(P). We have 1 P(x) = 2 x T (P) x T (P) nh (P) nh (P) = T(P) 2 nh (P) T(P) 2

26 Size of a type class Proof. (lower bound)! T(P) = # " n np(a 1 )... np(a X ) $ & = n! % n 1!n 2! n X!! # " n e $ & % n T(P)! n! n n $ # & " e % " n n 1 1 $ # e n " $ # n e n % ' & % 1 'n n & X " $ # Stirling approximation after some algebra n X e n % X ' & T(P) 1 (n +1) X 2 nh (P)

27 Probability of a type class Theorem The probability that a DMS with pmf Q emits a sequence belonging to T(P) can be bounded as follows: 1 (n +1) X 2 nd(p Q) Q(T(P)) 2 nd(p Q) Remember The larger the KL distance between P and Q the smaller the probability. If P=Q, the probability tends to 1 exponentially fast

28 Probability of a type class Proof. Q(T(P)) = Q(x) = 2 x T (P) x T (P) n(h (P)+D(P Q)) n(h (P)+D(P Q)) = T(P) 2 By remembering the bounds on the size of T(P): 1 (n +1) X 2 nd(p Q) Q(T(P)) 2 nd(p Q)

29 In summary P n (n +1) X Q(x) = 2 n[d(p x Q)+H (P x )] nh (P) T(P) 2 Q(T(P)) 2 nd(p Q)

30 Information Theory and Statistics

31 Law of large numbers Q(T(P)) 2 nd(p Q) When n grows the only type class with a non-negligible probability is Q Theorem (law of large numbers) T ε Q = { x n : D(P x n Q) ε} P(x n T Q ε ) = Q(T(P)) 2 nd(p Q) 2 nε P:D(P Q)>ε P:D(P Q)>ε P:D(P Q)>ε (n +1) X 2 nε = 2 # n ε X $ % log(n+1) n & ' ( That tends to 0 when n tends to infinity

32 Large deviation theory LDT studies the probability of rare events, i.e. events not covered by the law of large numbers Examples What is the probability that in 1000 fair coin tosses head appears 800 times? Compute the probability that the mean value of a sequence (emitted by a DMS X) is larger than T, with T much larger than E[X]. Rare events in statistical physics or economics

33 Large deviation theory More formally: let S be a subset of pmf s and let Q be a source. We want to compute the probability that Q emits a sequence whose type belongs to S Q(S) = x:p x S Q(x) Example: What is the probability that the average value of a sequence drawn from Q is larger than 4? Above problem with S = pmf s such that E[S] > 4.

34 Large deviation theory If S contains a KL neighborhood of Q, then Q(S) -> 1 If S does not contain Q or a KL neighborhood of Q, then Q(S) -> 0. The question is: how fast?. Q S S. Q

35 Sanov s theorem Theorem (Sanov) Let S be a regular set of pmf s (cl(int(s) = S), then Q(S) 2 nd(p* Q) P * = argmin P S D(P Q) S. P *. Q

36 Sanov s theorem Proof. (upper bound) Q(S) = Q(T(P)) 2 nd(p Q) 2 nmin P S P n D(P Q) P S P n P S P n P S P n 2 nmin P S D(P Q) = 2 nd(p* Q) P S P n P S P n (n +1) X 2 nd(p* Q)

37 Sanov s theorem Proof. (lower bound) Due to the regularity of S and the density of P n n in the set of all pmf's we can find a sequence P n P n such that P n P * and hence D(P n Q) D(P * Q). Then for large n we can write: Q(S) = Q(T(P)) Q(T(P n )) P S P n 1 (n +1) X 2 nd(p n Q) 1 2 nd(p * Q) (n +1) X

38 Example Compute the probability that in 1000 coin tosses, head shows more than 800 times. S = B(p,1 p) with p 0.8 Q = B(0.5, 0.5) P * = B(0.8, 0.2) D(P * Q) =1 H(P * ) =1 h(0.8) = 0.3 P(S) 2 nd(p* Q) = 2 300!!!!

39 Hypothesis testing Consider two hypoteses H 0 and H 1 and a sequence of observations x n H 0 X ~ P 0 H 1 X ~ P 1 Neyman-Pearson criterion: minimize P 0 1 for a fixed P 1 0 Decide for H 0 if P 0 (x n ) P 1 (x n ) T Likelihood ratio, T depends on P 1 0

40 Hypothesis testing Let us pass to the log-likelihood ratio and assume a DMS: log P 0(x n ) P 1 (x n ) = log P 0 (x i ) = log P 0(x i ) i P 1 (x i ) P 1 (x i ) = a X a X np x (a)log np x (a)log P 0 (a) P 1 (a) = P x (a) P 1 (a) a X a X The N-P criterion boils down to i np x (a)log np x(a)log P 0 (a) P 1 (a) P x (a) P x (a) = P x (a) P 0 (a)? logt D(P x P 1 ) D(P x P 0 )? logt n = τ

41 Error exponents in HT and Sanov s theorem 1 λ = lim n n log P 1 0 ε = lim n 1 n log P 0 1 Λ 0 = {P : D(P P 0 ) D(P P 1 ) < τ} λ = D(P 0 * P 0 ) ε = D(P 1 * P 1 ) P 0 * = argmin P Λ 0 D(P P 0 ) P 1 * = argmin P Λ 0 D(P P 1 ) It can be proven that P 0 * = P 1 * = P *.. λ = D(P * P 0 ) P 1 ε = D(P * P 1 ) P *. P0 Λ 0

42 Best achievable error exp: Stein s lemma If it is enough that P 1 0 tends to zero exponentially, regardless of the error exponent, we can fix the threshold so that λ is arbitrarily small (yet positive). ε. λ P 1. P* P 0 Λ 0 λ 0 P * P 0 ε D(P 0 P 1 ) which is the best achievable error exp. (for P 0 1 ) for the test

43 Other links between IT and Statistics Chernoff bound Estimation Theory: Cramer-Rao bound...

44 Further readings 1. T. M. Cover and J. A. Thomas, Elements of Information Theory, Wiley 2. I. Csiszar, The method of types, IEEE Trans. Inf. Theory, vol.44, no.6, pp , Oct I. Csiszar and P. C Shields, Information Theory and Statistics; a Tutorial, Foundations and Trends in Commun. and Inf. Theory, 2004, NOW Pubisher Inc.

IT and large deviation theory

IT and large deviation theory PhD short course Information Theory and Statistics Siena, 15-19 September, 2014 IT and large deviation theory Mauro Barni University of Siena Outline of the short course Part 1: Information theory in a

More information

The method of types. PhD short course Information Theory and Statistics Siena, September, Mauro Barni University of Siena

The method of types. PhD short course Information Theory and Statistics Siena, September, Mauro Barni University of Siena PhD short course Iformatio Theory ad Statistics Siea, 15-19 September, 2014 The method of types Mauro Bari Uiversity of Siea Outlie of the course Part 1: Iformatio theory i a utshell Part 2: The method

More information

INFORMATION THEORY AND STATISTICS

INFORMATION THEORY AND STATISTICS CHAPTER INFORMATION THEORY AND STATISTICS We now explore the relationship between information theory and statistics. We begin by describing the method of types, which is a powerful technique in large deviation

More information

Lecture 8: Information Theory and Statistics

Lecture 8: Information Theory and Statistics Lecture 8: Information Theory and Statistics Part II: Hypothesis Testing and Estimation I-Hsiang Wang Department of Electrical Engineering National Taiwan University ihwang@ntu.edu.tw December 22, 2015

More information

Lecture 8: Information Theory and Statistics

Lecture 8: Information Theory and Statistics Lecture 8: Information Theory and Statistics Part II: Hypothesis Testing and I-Hsiang Wang Department of Electrical Engineering National Taiwan University ihwang@ntu.edu.tw December 23, 2015 1 / 50 I-Hsiang

More information

Lecture 22: Error exponents in hypothesis testing, GLRT

Lecture 22: Error exponents in hypothesis testing, GLRT 10-704: Information Processing and Learning Spring 2012 Lecture 22: Error exponents in hypothesis testing, GLRT Lecturer: Aarti Singh Scribe: Aarti Singh Disclaimer: These notes have not been subjected

More information

The Method of Types and Its Application to Information Hiding

The Method of Types and Its Application to Information Hiding The Method of Types and Its Application to Information Hiding Pierre Moulin University of Illinois at Urbana-Champaign www.ifp.uiuc.edu/ moulin/talks/eusipco05-slides.pdf EUSIPCO Antalya, September 7,

More information

10-704: Information Processing and Learning Fall Lecture 24: Dec 7

10-704: Information Processing and Learning Fall Lecture 24: Dec 7 0-704: Information Processing and Learning Fall 206 Lecturer: Aarti Singh Lecture 24: Dec 7 Note: These notes are based on scribed notes from Spring5 offering of this course. LaTeX template courtesy of

More information

Chapter 2. Binary and M-ary Hypothesis Testing 2.1 Introduction (Levy 2.1)

Chapter 2. Binary and M-ary Hypothesis Testing 2.1 Introduction (Levy 2.1) Chapter 2. Binary and M-ary Hypothesis Testing 2.1 Introduction (Levy 2.1) Detection problems can usually be casted as binary or M-ary hypothesis testing problems. Applications: This chapter: Simple hypothesis

More information

Information Theory in Intelligent Decision Making

Information Theory in Intelligent Decision Making Information Theory in Intelligent Decision Making Adaptive Systems and Algorithms Research Groups School of Computer Science University of Hertfordshire, United Kingdom June 7, 2015 Information Theory

More information

Capacity of AWGN channels

Capacity of AWGN channels Chapter 3 Capacity of AWGN channels In this chapter we prove that the capacity of an AWGN channel with bandwidth W and signal-tonoise ratio SNR is W log 2 (1+SNR) bits per second (b/s). The proof that

More information

Introduction to Bayesian Statistics

Introduction to Bayesian Statistics Bayesian Parameter Estimation Introduction to Bayesian Statistics Harvey Thornburg Center for Computer Research in Music and Acoustics (CCRMA) Department of Music, Stanford University Stanford, California

More information

Chapter 11. Information Theory and Statistics

Chapter 11. Information Theory and Statistics Chapter 11 Information Theory and Statistics Peng-Hua Wang Graduate Inst. of Comm. Engineering National Taipei University Chapter Outline Chap. 11 Information Theory and Statistics 11.1 Method of Types

More information

Information measures in simple coding problems

Information measures in simple coding problems Part I Information measures in simple coding problems in this web service in this web service Source coding and hypothesis testing; information measures A(discrete)source is a sequence {X i } i= of random

More information

COMPSCI 650 Applied Information Theory Jan 21, Lecture 2

COMPSCI 650 Applied Information Theory Jan 21, Lecture 2 COMPSCI 650 Applied Information Theory Jan 21, 2016 Lecture 2 Instructor: Arya Mazumdar Scribe: Gayane Vardoyan, Jong-Chyi Su 1 Entropy Definition: Entropy is a measure of uncertainty of a random variable.

More information

Information. = more information was provided by the outcome in #2

Information. = more information was provided by the outcome in #2 Outline First part based very loosely on [Abramson 63]. Information theory usually formulated in terms of information channels and coding will not discuss those here.. Information 2. Entropy 3. Mutual

More information

A Gentle Tutorial on Information Theory and Learning. Roni Rosenfeld. Carnegie Mellon University

A Gentle Tutorial on Information Theory and Learning. Roni Rosenfeld. Carnegie Mellon University A Gentle Tutorial on Information Theory and Learning Roni Rosenfeld Mellon University Mellon Outline First part based very loosely on [Abramson 63]. Information theory usually formulated in terms of information

More information

CS 630 Basic Probability and Information Theory. Tim Campbell

CS 630 Basic Probability and Information Theory. Tim Campbell CS 630 Basic Probability and Information Theory Tim Campbell 21 January 2003 Probability Theory Probability Theory is the study of how best to predict outcomes of events. An experiment (or trial or event)

More information

Information Theory and Statistics, Part I

Information Theory and Statistics, Part I Information Theory and Statistics, Part I Information Theory 2013 Lecture 6 George Mathai May 16, 2013 Outline This lecture will cover Method of Types. Law of Large Numbers. Universal Source Coding. Large

More information

Detection theory. H 0 : x[n] = w[n]

Detection theory. H 0 : x[n] = w[n] Detection Theory Detection theory A the last topic of the course, we will briefly consider detection theory. The methods are based on estimation theory and attempt to answer questions such as Is a signal

More information

Information Theory. David Rosenberg. June 15, New York University. David Rosenberg (New York University) DS-GA 1003 June 15, / 18

Information Theory. David Rosenberg. June 15, New York University. David Rosenberg (New York University) DS-GA 1003 June 15, / 18 Information Theory David Rosenberg New York University June 15, 2015 David Rosenberg (New York University) DS-GA 1003 June 15, 2015 1 / 18 A Measure of Information? Consider a discrete random variable

More information

Adversarial Source Identification Game with Corrupted Training

Adversarial Source Identification Game with Corrupted Training Adversarial Source Identification Game with Corrupted Training Mauro Barni, Fellow, IEEE, Benedetta Tondi, Student Member, IEEE Abstract arxiv:1703.09244v1 [cs.cr] 27 Mar 2017 We study a variant of the

More information

Source Coding. Master Universitario en Ingeniería de Telecomunicación. I. Santamaría Universidad de Cantabria

Source Coding. Master Universitario en Ingeniería de Telecomunicación. I. Santamaría Universidad de Cantabria Source Coding Master Universitario en Ingeniería de Telecomunicación I. Santamaría Universidad de Cantabria Contents Introduction Asymptotic Equipartition Property Optimal Codes (Huffman Coding) Universal

More information

Lecture 5 - Information theory

Lecture 5 - Information theory Lecture 5 - Information theory Jan Bouda FI MU May 18, 2012 Jan Bouda (FI MU) Lecture 5 - Information theory May 18, 2012 1 / 42 Part I Uncertainty and entropy Jan Bouda (FI MU) Lecture 5 - Information

More information

Introduction to Machine Learning

Introduction to Machine Learning What does this mean? Outline Contents Introduction to Machine Learning Introduction to Probabilistic Methods Varun Chandola December 26, 2017 1 Introduction to Probability 1 2 Random Variables 3 3 Bayes

More information

4F5: Advanced Communications and Coding Handout 2: The Typical Set, Compression, Mutual Information

4F5: Advanced Communications and Coding Handout 2: The Typical Set, Compression, Mutual Information 4F5: Advanced Communications and Coding Handout 2: The Typical Set, Compression, Mutual Information Ramji Venkataramanan Signal Processing and Communications Lab Department of Engineering ramji.v@eng.cam.ac.uk

More information

Introduction to Machine Learning

Introduction to Machine Learning Introduction to Machine Learning Introduction to Probabilistic Methods Varun Chandola Computer Science & Engineering State University of New York at Buffalo Buffalo, NY, USA chandola@buffalo.edu Chandola@UB

More information

Lecture Testing Hypotheses: The Neyman-Pearson Paradigm

Lecture Testing Hypotheses: The Neyman-Pearson Paradigm Math 408 - Mathematical Statistics Lecture 29-30. Testing Hypotheses: The Neyman-Pearson Paradigm April 12-15, 2013 Konstantin Zuev (USC) Math 408, Lecture 29-30 April 12-15, 2013 1 / 12 Agenda Example:

More information

arxiv: v4 [cs.it] 17 Oct 2015

arxiv: v4 [cs.it] 17 Oct 2015 Upper Bounds on the Relative Entropy and Rényi Divergence as a Function of Total Variation Distance for Finite Alphabets Igal Sason Department of Electrical Engineering Technion Israel Institute of Technology

More information

Lecture 7 Introduction to Statistical Decision Theory

Lecture 7 Introduction to Statistical Decision Theory Lecture 7 Introduction to Statistical Decision Theory I-Hsiang Wang Department of Electrical Engineering National Taiwan University ihwang@ntu.edu.tw December 20, 2016 1 / 55 I-Hsiang Wang IT Lecture 7

More information

If there exists a threshold k 0 such that. then we can take k = k 0 γ =0 and achieve a test of size α. c 2004 by Mark R. Bell,

If there exists a threshold k 0 such that. then we can take k = k 0 γ =0 and achieve a test of size α. c 2004 by Mark R. Bell, Recall The Neyman-Pearson Lemma Neyman-Pearson Lemma: Let Θ = {θ 0, θ }, and let F θ0 (x) be the cdf of the random vector X under hypothesis and F θ (x) be its cdf under hypothesis. Assume that the cdfs

More information

Uncertainty. Jayakrishnan Unnikrishnan. CSL June PhD Defense ECE Department

Uncertainty. Jayakrishnan Unnikrishnan. CSL June PhD Defense ECE Department Decision-Making under Statistical Uncertainty Jayakrishnan Unnikrishnan PhD Defense ECE Department University of Illinois at Urbana-Champaign CSL 141 12 June 2010 Statistical Decision-Making Relevant in

More information

On the Entropy of Sums of Bernoulli Random Variables via the Chen-Stein Method

On the Entropy of Sums of Bernoulli Random Variables via the Chen-Stein Method On the Entropy of Sums of Bernoulli Random Variables via the Chen-Stein Method Igal Sason Department of Electrical Engineering Technion - Israel Institute of Technology Haifa 32000, Israel ETH, Zurich,

More information

Robustness and duality of maximum entropy and exponential family distributions

Robustness and duality of maximum entropy and exponential family distributions Chapter 7 Robustness and duality of maximum entropy and exponential family distributions In this lecture, we continue our study of exponential families, but now we investigate their properties in somewhat

More information

EECS 750. Hypothesis Testing with Communication Constraints

EECS 750. Hypothesis Testing with Communication Constraints EECS 750 Hypothesis Testing with Communication Constraints Name: Dinesh Krithivasan Abstract In this report, we study a modification of the classical statistical problem of bivariate hypothesis testing.

More information

2. What are the tradeoffs among different measures of error (e.g. probability of false alarm, probability of miss, etc.)?

2. What are the tradeoffs among different measures of error (e.g. probability of false alarm, probability of miss, etc.)? ECE 830 / CS 76 Spring 06 Instructors: R. Willett & R. Nowak Lecture 3: Likelihood ratio tests, Neyman-Pearson detectors, ROC curves, and sufficient statistics Executive summary In the last lecture we

More information

Tight Bounds for Symmetric Divergence Measures and a Refined Bound for Lossless Source Coding

Tight Bounds for Symmetric Divergence Measures and a Refined Bound for Lossless Source Coding APPEARS IN THE IEEE TRANSACTIONS ON INFORMATION THEORY, FEBRUARY 015 1 Tight Bounds for Symmetric Divergence Measures and a Refined Bound for Lossless Source Coding Igal Sason Abstract Tight bounds for

More information

Expectation Maximization

Expectation Maximization Expectation Maximization Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr 1 /

More information

Quantitative Biology II Lecture 4: Variational Methods

Quantitative Biology II Lecture 4: Variational Methods 10 th March 2015 Quantitative Biology II Lecture 4: Variational Methods Gurinder Singh Mickey Atwal Center for Quantitative Biology Cold Spring Harbor Laboratory Image credit: Mike West Summary Approximate

More information

Speech Recognition Lecture 7: Maximum Entropy Models. Mehryar Mohri Courant Institute and Google Research

Speech Recognition Lecture 7: Maximum Entropy Models. Mehryar Mohri Courant Institute and Google Research Speech Recognition Lecture 7: Maximum Entropy Models Mehryar Mohri Courant Institute and Google Research mohri@cims.nyu.com This Lecture Information theory basics Maximum entropy models Duality theorem

More information

On the Complexity of Best Arm Identification with Fixed Confidence

On the Complexity of Best Arm Identification with Fixed Confidence On the Complexity of Best Arm Identification with Fixed Confidence Discrete Optimization with Noise Aurélien Garivier, Emilie Kaufmann COLT, June 23 th 2016, New York Institut de Mathématiques de Toulouse

More information

Context tree models for source coding

Context tree models for source coding Context tree models for source coding Toward Non-parametric Information Theory Licence de droits d usage Outline Lossless Source Coding = density estimation with log-loss Source Coding and Universal Coding

More information

University of Siena. Multimedia Security. Watermark extraction. Mauro Barni University of Siena. M. Barni, University of Siena

University of Siena. Multimedia Security. Watermark extraction. Mauro Barni University of Siena. M. Barni, University of Siena Multimedia Security Mauro Barni University of Siena : summary Optimum decoding/detection Additive SS watermarks Decoding/detection of QIM watermarks The dilemma of de-synchronization attacks Geometric

More information

Correlation Detection and an Operational Interpretation of the Rényi Mutual Information

Correlation Detection and an Operational Interpretation of the Rényi Mutual Information Correlation Detection and an Operational Interpretation of the Rényi Mutual Information Masahito Hayashi 1, Marco Tomamichel 2 1 Graduate School of Mathematics, Nagoya University, and Centre for Quantum

More information

Exercises with solutions (Set D)

Exercises with solutions (Set D) Exercises with solutions Set D. A fair die is rolled at the same time as a fair coin is tossed. Let A be the number on the upper surface of the die and let B describe the outcome of the coin toss, where

More information

Lecture 5: Likelihood ratio tests, Neyman-Pearson detectors, ROC curves, and sufficient statistics. 1 Executive summary

Lecture 5: Likelihood ratio tests, Neyman-Pearson detectors, ROC curves, and sufficient statistics. 1 Executive summary ECE 830 Spring 207 Instructor: R. Willett Lecture 5: Likelihood ratio tests, Neyman-Pearson detectors, ROC curves, and sufficient statistics Executive summary In the last lecture we saw that the likelihood

More information

exp{ (x i) 2 i=1 n i=1 (x i a) 2 (x i ) 2 = exp{ i=1 n i=1 n 2ax i a 2 i=1

exp{ (x i) 2 i=1 n i=1 (x i a) 2 (x i ) 2 = exp{ i=1 n i=1 n 2ax i a 2 i=1 4 Hypothesis testing 4. Simple hypotheses A computer tries to distinguish between two sources of signals. Both sources emit independent signals with normally distributed intensity, the signals of the first

More information

Necessary and Sufficient Conditions for High-Dimensional Salient Feature Subset Recovery

Necessary and Sufficient Conditions for High-Dimensional Salient Feature Subset Recovery Necessary and Sufficient Conditions for High-Dimensional Salient Feature Subset Recovery Vincent Tan, Matt Johnson, Alan S. Willsky Stochastic Systems Group, Laboratory for Information and Decision Systems,

More information

DETECTION theory deals primarily with techniques for

DETECTION theory deals primarily with techniques for ADVANCED SIGNAL PROCESSING SE Optimum Detection of Deterministic and Random Signals Stefan Tertinek Graz University of Technology turtle@sbox.tugraz.at Abstract This paper introduces various methods for

More information

Brief Review on Estimation Theory

Brief Review on Estimation Theory Brief Review on Estimation Theory K. Abed-Meraim ENST PARIS, Signal and Image Processing Dept. abed@tsi.enst.fr This presentation is essentially based on the course BASTA by E. Moulines Brief review on

More information

Chapter 2 Signal Processing at Receivers: Detection Theory

Chapter 2 Signal Processing at Receivers: Detection Theory Chapter Signal Processing at Receivers: Detection Theory As an application of the statistical hypothesis testing, signal detection plays a key role in signal processing at receivers of wireless communication

More information

Hypothesis Testing with Communication Constraints

Hypothesis Testing with Communication Constraints Hypothesis Testing with Communication Constraints Dinesh Krithivasan EECS 750 April 17, 2006 Dinesh Krithivasan (EECS 750) Hyp. testing with comm. constraints April 17, 2006 1 / 21 Presentation Outline

More information

Bayesian Learning. Bayesian Learning Criteria

Bayesian Learning. Bayesian Learning Criteria Bayesian Learning In Bayesian learning, we are interested in the probability of a hypothesis h given the dataset D. By Bayes theorem: P (h D) = P (D h)p (h) P (D) Other useful formulas to remember are:

More information

EE5319R: Problem Set 3 Assigned: 24/08/16, Due: 31/08/16

EE5319R: Problem Set 3 Assigned: 24/08/16, Due: 31/08/16 EE539R: Problem Set 3 Assigned: 24/08/6, Due: 3/08/6. Cover and Thomas: Problem 2.30 (Maimum Entropy): Solution: We are required to maimize H(P X ) over all distributions P X on the non-negative integers

More information

Chapter 8. Hypothesis Testing. Po-Ning Chen. Department of Communications Engineering. National Chiao-Tung University. Hsin Chu, Taiwan 30050

Chapter 8. Hypothesis Testing. Po-Ning Chen. Department of Communications Engineering. National Chiao-Tung University. Hsin Chu, Taiwan 30050 Chapter 8 Hypothesis Testing Po-Ning Chen Department of Communications Engineering National Chiao-Tung University Hsin Chu, Taiwan 30050 Error exponent and divergence II:8-1 Definition 8.1 (exponent) Arealnumbera

More information

Strong Converse and Stein s Lemma in the Quantum Hypothesis Testing

Strong Converse and Stein s Lemma in the Quantum Hypothesis Testing Strong Converse and Stein s Lemma in the Quantum Hypothesis Testing arxiv:uant-ph/9906090 v 24 Jun 999 Tomohiro Ogawa and Hiroshi Nagaoka Abstract The hypothesis testing problem of two uantum states is

More information

Detection Games Under Fully Active Adversaries

Detection Games Under Fully Active Adversaries IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. X, NO. X, XXXXXXX XXXX 1 Detection Games Under Fully Active Adversaries Benedetta Tondi, Member, IEEE, Neri Merhav, Fellow, IEEE, Mauro Barni Fellow, IEEE

More information

Math Bootcamp 2012 Miscellaneous

Math Bootcamp 2012 Miscellaneous Math Bootcamp 202 Miscellaneous Factorial, combination and permutation The factorial of a positive integer n denoted by n!, is the product of all positive integers less than or equal to n. Define 0! =.

More information

Detection Theory. Chapter 3. Statistical Decision Theory I. Isael Diaz Oct 26th 2010

Detection Theory. Chapter 3. Statistical Decision Theory I. Isael Diaz Oct 26th 2010 Detection Theory Chapter 3. Statistical Decision Theory I. Isael Diaz Oct 26th 2010 Outline Neyman-Pearson Theorem Detector Performance Irrelevant Data Minimum Probability of Error Bayes Risk Multiple

More information

ECE 4400:693 - Information Theory

ECE 4400:693 - Information Theory ECE 4400:693 - Information Theory Dr. Nghi Tran Lecture 8: Differential Entropy Dr. Nghi Tran (ECE-University of Akron) ECE 4400:693 Lecture 1 / 43 Outline 1 Review: Entropy of discrete RVs 2 Differential

More information

Homework Set #2 Data Compression, Huffman code and AEP

Homework Set #2 Data Compression, Huffman code and AEP Homework Set #2 Data Compression, Huffman code and AEP 1. Huffman coding. Consider the random variable ( x1 x X = 2 x 3 x 4 x 5 x 6 x 7 0.50 0.26 0.11 0.04 0.04 0.03 0.02 (a Find a binary Huffman code

More information

Solutions to Homework Set #1 Sanov s Theorem, Rate distortion

Solutions to Homework Set #1 Sanov s Theorem, Rate distortion st Semester 00/ Solutions to Homework Set # Sanov s Theorem, Rate distortion. Sanov s theorem: Prove the simple version of Sanov s theorem for the binary random variables, i.e., let X,X,...,X n be a sequence

More information

Applications of Information Geometry to Hypothesis Testing and Signal Detection

Applications of Information Geometry to Hypothesis Testing and Signal Detection CMCAA 2016 Applications of Information Geometry to Hypothesis Testing and Signal Detection Yongqiang Cheng National University of Defense Technology July 2016 Outline 1. Principles of Information Geometry

More information

1 Review of The Learning Setting

1 Review of The Learning Setting COS 5: Theoretical Machine Learning Lecturer: Rob Schapire Lecture #8 Scribe: Changyan Wang February 28, 208 Review of The Learning Setting Last class, we moved beyond the PAC model: in the PAC model we

More information

Detection Games under Fully Active Adversaries. Received: 23 October 2018; Accepted: 25 December 2018; Published: 29 December 2018

Detection Games under Fully Active Adversaries. Received: 23 October 2018; Accepted: 25 December 2018; Published: 29 December 2018 entropy Article Detection Games under Fully Active Adversaries Benedetta Tondi 1,, Neri Merhav 2 and Mauro Barni 1 1 Department of Information Engineering and Mathematical Sciences, University of Siena,

More information

Detection theory 101 ELEC-E5410 Signal Processing for Communications

Detection theory 101 ELEC-E5410 Signal Processing for Communications Detection theory 101 ELEC-E5410 Signal Processing for Communications Binary hypothesis testing Null hypothesis H 0 : e.g. noise only Alternative hypothesis H 1 : signal + noise p(x;h 0 ) γ p(x;h 1 ) Trade-off

More information

Chapter 2: Entropy and Mutual Information. University of Illinois at Chicago ECE 534, Natasha Devroye

Chapter 2: Entropy and Mutual Information. University of Illinois at Chicago ECE 534, Natasha Devroye Chapter 2: Entropy and Mutual Information Chapter 2 outline Definitions Entropy Joint entropy, conditional entropy Relative entropy, mutual information Chain rules Jensen s inequality Log-sum inequality

More information

Midterm Exam 1 Solution

Midterm Exam 1 Solution EECS 126 Probability and Random Processes University of California, Berkeley: Fall 2015 Kannan Ramchandran September 22, 2015 Midterm Exam 1 Solution Last name First name SID Name of student on your left:

More information

INFORMATION PROCESSING ABILITY OF BINARY DETECTORS AND BLOCK DECODERS. Michael A. Lexa and Don H. Johnson

INFORMATION PROCESSING ABILITY OF BINARY DETECTORS AND BLOCK DECODERS. Michael A. Lexa and Don H. Johnson INFORMATION PROCESSING ABILITY OF BINARY DETECTORS AND BLOCK DECODERS Michael A. Lexa and Don H. Johnson Rice University Department of Electrical and Computer Engineering Houston, TX 775-892 amlexa@rice.edu,

More information

16.1 Bounding Capacity with Covering Number

16.1 Bounding Capacity with Covering Number ECE598: Information-theoretic methods in high-dimensional statistics Spring 206 Lecture 6: Upper Bounds for Density Estimation Lecturer: Yihong Wu Scribe: Yang Zhang, Apr, 206 So far we have been mostly

More information

(each row defines a probability distribution). Given n-strings x X n, y Y n we can use the absence of memory in the channel to compute

(each row defines a probability distribution). Given n-strings x X n, y Y n we can use the absence of memory in the channel to compute ENEE 739C: Advanced Topics in Signal Processing: Coding Theory Instructor: Alexander Barg Lecture 6 (draft; 9/6/03. Error exponents for Discrete Memoryless Channels http://www.enee.umd.edu/ abarg/enee739c/course.html

More information

THE potential for large-scale sensor networks is attracting

THE potential for large-scale sensor networks is attracting IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 55, NO. 1, JANUARY 2007 327 Detection in Sensor Networks: The Saddlepoint Approximation Saeed A. Aldosari, Member, IEEE, and José M. F. Moura, Fellow, IEEE

More information

Introduction to Machine Learning Lecture 14. Mehryar Mohri Courant Institute and Google Research

Introduction to Machine Learning Lecture 14. Mehryar Mohri Courant Institute and Google Research Introduction to Machine Learning Lecture 14 Mehryar Mohri Courant Institute and Google Research mohri@cims.nyu.edu Density Estimation Maxent Models 2 Entropy Definition: the entropy of a random variable

More information

Convexity/Concavity of Renyi Entropy and α-mutual Information

Convexity/Concavity of Renyi Entropy and α-mutual Information Convexity/Concavity of Renyi Entropy and -Mutual Information Siu-Wai Ho Institute for Telecommunications Research University of South Australia Adelaide, SA 5095, Australia Email: siuwai.ho@unisa.edu.au

More information

Optimal Distributed Detection Strategies for Wireless Sensor Networks

Optimal Distributed Detection Strategies for Wireless Sensor Networks Optimal Distributed Detection Strategies for Wireless Sensor Networks Ke Liu and Akbar M. Sayeed University of Wisconsin-Madison kliu@cae.wisc.edu, akbar@engr.wisc.edu Abstract We study optimal distributed

More information

National University of Singapore Department of Electrical & Computer Engineering. Examination for

National University of Singapore Department of Electrical & Computer Engineering. Examination for National University of Singapore Department of Electrical & Computer Engineering Examination for EE5139R Information Theory for Communication Systems (Semester I, 2014/15) November/December 2014 Time Allowed:

More information

3. If a choice is broken down into two successive choices, the original H should be the weighted sum of the individual values of H.

3. If a choice is broken down into two successive choices, the original H should be the weighted sum of the individual values of H. Appendix A Information Theory A.1 Entropy Shannon (Shanon, 1948) developed the concept of entropy to measure the uncertainty of a discrete random variable. Suppose X is a discrete random variable that

More information

Lecture 22: Final Review

Lecture 22: Final Review Lecture 22: Final Review Nuts and bolts Fundamental questions and limits Tools Practical algorithms Future topics Dr Yao Xie, ECE587, Information Theory, Duke University Basics Dr Yao Xie, ECE587, Information

More information

QB LECTURE #4: Motif Finding

QB LECTURE #4: Motif Finding QB LECTURE #4: Motif Finding Adam Siepel Nov. 20, 2015 2 Plan for Today Probability models for binding sites Scoring and detecting binding sites De novo motif finding 3 Transcription Initiation Chromatin

More information

Introduction to Information Theory. Uncertainty. Entropy. Surprisal. Joint entropy. Conditional entropy. Mutual information.

Introduction to Information Theory. Uncertainty. Entropy. Surprisal. Joint entropy. Conditional entropy. Mutual information. L65 Dept. of Linguistics, Indiana University Fall 205 Information theory answers two fundamental questions in communication theory: What is the ultimate data compression? What is the transmission rate

More information

Dept. of Linguistics, Indiana University Fall 2015

Dept. of Linguistics, Indiana University Fall 2015 L645 Dept. of Linguistics, Indiana University Fall 2015 1 / 28 Information theory answers two fundamental questions in communication theory: What is the ultimate data compression? What is the transmission

More information

Tight Bounds for Symmetric Divergence Measures and a New Inequality Relating f-divergences

Tight Bounds for Symmetric Divergence Measures and a New Inequality Relating f-divergences Tight Bounds for Symmetric Divergence Measures and a New Inequality Relating f-divergences Igal Sason Department of Electrical Engineering Technion, Haifa 3000, Israel E-mail: sason@ee.technion.ac.il Abstract

More information

Preliminary Statistics Lecture 2: Probability Theory (Outline) prelimsoas.webs.com

Preliminary Statistics Lecture 2: Probability Theory (Outline) prelimsoas.webs.com 1 School of Oriental and African Studies September 2015 Department of Economics Preliminary Statistics Lecture 2: Probability Theory (Outline) prelimsoas.webs.com Gujarati D. Basic Econometrics, Appendix

More information

Lecture 2: August 31

Lecture 2: August 31 0-704: Information Processing and Learning Fall 206 Lecturer: Aarti Singh Lecture 2: August 3 Note: These notes are based on scribed notes from Spring5 offering of this course. LaTeX template courtesy

More information

Coding of memoryless sources 1/35

Coding of memoryless sources 1/35 Coding of memoryless sources 1/35 Outline 1. Morse coding ; 2. Definitions : encoding, encoding efficiency ; 3. fixed length codes, encoding integers ; 4. prefix condition ; 5. Kraft and Mac Millan theorems

More information

Classification & Information Theory Lecture #8

Classification & Information Theory Lecture #8 Classification & Information Theory Lecture #8 Introduction to Natural Language Processing CMPSCI 585, Fall 2007 University of Massachusetts Amherst Andrew McCallum Today s Main Points Automatically categorizing

More information

Lecture 3: More on regularization. Bayesian vs maximum likelihood learning

Lecture 3: More on regularization. Bayesian vs maximum likelihood learning Lecture 3: More on regularization. Bayesian vs maximum likelihood learning L2 and L1 regularization for linear estimators A Bayesian interpretation of regularization Bayesian vs maximum likelihood fitting

More information

ECE531 Lecture 6: Detection of Discrete-Time Signals with Random Parameters

ECE531 Lecture 6: Detection of Discrete-Time Signals with Random Parameters ECE531 Lecture 6: Detection of Discrete-Time Signals with Random Parameters D. Richard Brown III Worcester Polytechnic Institute 26-February-2009 Worcester Polytechnic Institute D. Richard Brown III 26-February-2009

More information

SHARED INFORMATION. Prakash Narayan with. Imre Csiszár, Sirin Nitinawarat, Himanshu Tyagi, Shun Watanabe

SHARED INFORMATION. Prakash Narayan with. Imre Csiszár, Sirin Nitinawarat, Himanshu Tyagi, Shun Watanabe SHARED INFORMATION Prakash Narayan with Imre Csiszár, Sirin Nitinawarat, Himanshu Tyagi, Shun Watanabe 2/40 Acknowledgement Praneeth Boda Himanshu Tyagi Shun Watanabe 3/40 Outline Two-terminal model: Mutual

More information

Communications Theory and Engineering

Communications Theory and Engineering Communications Theory and Engineering Master's Degree in Electronic Engineering Sapienza University of Rome A.A. 2018-2019 AEP Asymptotic Equipartition Property AEP In information theory, the analog of

More information

Computing and Communications 2. Information Theory -Entropy

Computing and Communications 2. Information Theory -Entropy 1896 1920 1987 2006 Computing and Communications 2. Information Theory -Entropy Ying Cui Department of Electronic Engineering Shanghai Jiao Tong University, China 2017, Autumn 1 Outline Entropy Joint entropy

More information

Large Deviations Performance of Knuth-Yao algorithm for Random Number Generation

Large Deviations Performance of Knuth-Yao algorithm for Random Number Generation Large Deviations Performance of Knuth-Yao algorithm for Random Number Generation Akisato KIMURA akisato@ss.titech.ac.jp Tomohiko UYEMATSU uematsu@ss.titech.ac.jp April 2, 999 No. AK-TR-999-02 Abstract

More information

The Information Bottleneck Revisited or How to Choose a Good Distortion Measure

The Information Bottleneck Revisited or How to Choose a Good Distortion Measure The Information Bottleneck Revisited or How to Choose a Good Distortion Measure Peter Harremoës Centrum voor Wiskunde en Informatica PO 94079, 1090 GB Amsterdam The Nederlands PHarremoes@cwinl Naftali

More information

A Novel Asynchronous Communication Paradigm: Detection, Isolation, and Coding

A Novel Asynchronous Communication Paradigm: Detection, Isolation, and Coding A Novel Asynchronous Communication Paradigm: Detection, Isolation, and Coding The MIT Faculty has made this article openly available. Please share how this access benefits you. Your story matters. Citation

More information

Machine Learning. Lecture 02.2: Basics of Information Theory. Nevin L. Zhang

Machine Learning. Lecture 02.2: Basics of Information Theory. Nevin L. Zhang Machine Learning Lecture 02.2: Basics of Information Theory Nevin L. Zhang lzhang@cse.ust.hk Department of Computer Science and Engineering The Hong Kong University of Science and Technology Nevin L. Zhang

More information

Machine learning - HT Maximum Likelihood

Machine learning - HT Maximum Likelihood Machine learning - HT 2016 3. Maximum Likelihood Varun Kanade University of Oxford January 27, 2016 Outline Probabilistic Framework Formulate linear regression in the language of probability Introduce

More information

Neural coding Ecological approach to sensory coding: efficient adaptation to the natural environment

Neural coding Ecological approach to sensory coding: efficient adaptation to the natural environment Neural coding Ecological approach to sensory coding: efficient adaptation to the natural environment Jean-Pierre Nadal CNRS & EHESS Laboratoire de Physique Statistique (LPS, UMR 8550 CNRS - ENS UPMC Univ.

More information

Solutions to Set #2 Data Compression, Huffman code and AEP

Solutions to Set #2 Data Compression, Huffman code and AEP Solutions to Set #2 Data Compression, Huffman code and AEP. Huffman coding. Consider the random variable ( ) x x X = 2 x 3 x 4 x 5 x 6 x 7 0.50 0.26 0. 0.04 0.04 0.03 0.02 (a) Find a binary Huffman code

More information

Lecture 35: December The fundamental statistical distances

Lecture 35: December The fundamental statistical distances 36-705: Intermediate Statistics Fall 207 Lecturer: Siva Balakrishnan Lecture 35: December 4 Today we will discuss distances and metrics between distributions that are useful in statistics. I will be lose

More information

Composite Hypotheses and Generalized Likelihood Ratio Tests

Composite Hypotheses and Generalized Likelihood Ratio Tests Composite Hypotheses and Generalized Likelihood Ratio Tests Rebecca Willett, 06 In many real world problems, it is difficult to precisely specify probability distributions. Our models for data may involve

More information