IT and large deviation theory

Size: px
Start display at page:

Download "IT and large deviation theory"

Transcription

1 PhD short course Information Theory and Statistics Siena, September, 2014 IT and large deviation theory Mauro Barni University of Siena

2 Outline of the short course Part 1: Information theory in a nutshell Part 2: The method of types and its relationship with statistics Part 3: Information theory and large deviation theory Part 4: Information theory and hypothesis testing Part 5: Application to adversarial signal processing

3 Outline of Part 3 Large Deviation Theory Sanov Theorem Conditional limit theorem Examples

4 Large deviation theory LDT studies the probability of rare events, i.e. events not covered by the law of large numbers Examples What is the probability that in 1000 fair coin tosses head appears 800 times? Compute the probability that the mean value of a sequence (emitted by a DMS X) is larger than T, with T much larger than E[X]. Rare events in statistical physics or economics

5 Large deviation theory More formally: let S be a subset of pmf s and let Q be a source. We want to compute the probability that Q emits a sequence whose type belongs to S Q(S) = x:p x S Q(x) Example: What is the probability that the average value of a sequence drawn from Q is larger than 4? Above problem with S = pmf s such that E[S] > 4.

6 Large deviation theory If S contains a KL neighborhood of Q, then Q(S) -> 1 If S does not contain Q or a KL neighborhood of Q, then Q(S) -> 0. The question is: how fast?. Q S S. Q

7 More formally: Sanov s theorem Theorem (Sanov) Let S be a regular set of pmf s (cl(int(s) = S), then Q(S) 2 nd(p* Q) P * = argmin P S D(P Q) S. P *. Q

8 Sanov s theorem Proof. (upper bound) Q(S) = Q(T(P)) 2 nd(p Q) 2 nmin P S P n D(P Q) P S P n P S P n P S P n 2 nmin P S D(P Q) = 2 nd(p* Q) P S P n P S P n (n +1) X 2 nd(p* Q)

9 Sanov s theorem Proof. (lower bound) Due to the regularity of S and the density of P n n in the set of all pmf's we can find a sequence P n P n such that P n P * and hence D(P n Q) D(P * Q). Then for large n we can write: Q(S) = Q(T(P)) Q(T(P n )) P S P n 1 (n +1) X 2 nd(p n Q) 1 2 nd(p * Q) (n +1) X

10 Example Compute the probability that in 1000 coin tosses, head shows more than 800 times. S = B(p,1 p) with p 0.8 Q = B(0.5, 0.5) P * = B(0.8, 0.2) D(P * Q) =1 H(P * ) =1 h(0.8) = 0.3 P(S) 2 nd(p* Q) = 2 300!!!!

11 A more general example We may want to compute # Pr$ % 1 n n & g j (X i ) α j j =1 k ' ( i=1 Sanov theorem with $ S = % P : & x X P(x)g j (x) α j ' j =1 k ( ) We can use Lagrange multipliers to minimize D(P Q) subject to Q in S

12 A more general example Unconstrained minimization of L(P) = P(x)log P(x) k Q(x) + λ # P(x)g (x) α & # & % ( + β % P(x) 1( j $ j j ' $ ' x j=1 Yielding (after some algebra): x x P * (x) = 1 K Q(x)e j λ j g j (x) with K = x X j Q(x)e λ j g j (x)

13 A numerical example Compute the probability that the average of n tosses of a fair die is larger than 4 (instead than 3.5) # 6 & S = $ P : xp(x) 4' % ( x=1 From the previous result we have P * (x) = 2λx 6 2 λi i=1 with λ chosen in such a way that Which can be solved numerically (Matlab) 6 xp * (x) = 4 x=1

14 Homework: how lucky do you need to be? Is it better to bet that head will show up in 3/5 of the tosses of a fair coin or that face 6 will show in 5/18 of the tosses of a fair die?

15 Conditional limit theorem Not only is the probability of S determined by P *, but P * determines the probabilities of the elements of x n subject to S Theorem Let E be a closed convex set S. Let X i be a sequence of iid RV generated by Q. Let P * be defined as in Sanov theorem. Then Pr { Q X 1 = a P x n S} P * (a) a X

16 Conditional limit theorem (extension) Theorem Let E be a closed convex set S. Let X i be a sequence of iid RV generated by Q. Let P * be defined as in Sanov theorem. Let m be fixed. Then Pr Q X 1 = a 1, X 2 = a 2 X m = a m P x n S m i=1 { } P * (a 1 ) Remark The theorem holds for any fixed m but not for m = n P * (a)

17 Homework: a lucky friend Your are told that your friend was so lucky that in a whole night spent at tossing dies face 6 showed up ¼ of the times. Estimate the probability that face 1 never showed in the first 10 tosses Do the same for the first 100 tosses (assuming that in the whole night your friend tossed the coin much more than 100 times).

18 References 1. T. M. Cover and J. A. Thomas, Elements of Information Theory, Wiley 2. I. Csiszar, The method of types, IEEE Trans. Inf. Theory, vol.44, no.6, pp , Oct I. Csiszar and P. C Shields, Information Theory and Statistics; a Tutorial, Foundations and Trends in Commun. and Inf. Theory, 2004, NOW Pubisher Inc.

Information Theory and Hypothesis Testing

Information Theory and Hypothesis Testing Summer School on Game Theory and Telecommunications Campione, 7-12 September, 2014 Information Theory and Hypothesis Testing Mauro Barni University of Siena September 8 Review of some basic results linking

More information

The method of types. PhD short course Information Theory and Statistics Siena, September, Mauro Barni University of Siena

The method of types. PhD short course Information Theory and Statistics Siena, September, Mauro Barni University of Siena PhD short course Iformatio Theory ad Statistics Siea, 15-19 September, 2014 The method of types Mauro Bari Uiversity of Siea Outlie of the course Part 1: Iformatio theory i a utshell Part 2: The method

More information

INFORMATION THEORY AND STATISTICS

INFORMATION THEORY AND STATISTICS CHAPTER INFORMATION THEORY AND STATISTICS We now explore the relationship between information theory and statistics. We begin by describing the method of types, which is a powerful technique in large deviation

More information

Information Theory and Statistics, Part I

Information Theory and Statistics, Part I Information Theory and Statistics, Part I Information Theory 2013 Lecture 6 George Mathai May 16, 2013 Outline This lecture will cover Method of Types. Law of Large Numbers. Universal Source Coding. Large

More information

Computing and Communications 2. Information Theory -Entropy

Computing and Communications 2. Information Theory -Entropy 1896 1920 1987 2006 Computing and Communications 2. Information Theory -Entropy Ying Cui Department of Electronic Engineering Shanghai Jiao Tong University, China 2017, Autumn 1 Outline Entropy Joint entropy

More information

What is a random variable

What is a random variable OKAN UNIVERSITY FACULTY OF ENGINEERING AND ARCHITECTURE MATH 256 Probability and Random Processes 04 Random Variables Fall 20 Yrd. Doç. Dr. Didem Kivanc Tureli didemk@ieee.org didem.kivanc@okan.edu.tr

More information

STAT 430/510 Probability Lecture 7: Random Variable and Expectation

STAT 430/510 Probability Lecture 7: Random Variable and Expectation STAT 430/510 Probability Lecture 7: Random Variable and Expectation Pengyuan (Penelope) Wang June 2, 2011 Review Properties of Probability Conditional Probability The Law of Total Probability Bayes Formula

More information

Fundamentals. CS 281A: Statistical Learning Theory. Yangqing Jia. August, Based on tutorial slides by Lester Mackey and Ariel Kleiner

Fundamentals. CS 281A: Statistical Learning Theory. Yangqing Jia. August, Based on tutorial slides by Lester Mackey and Ariel Kleiner Fundamentals CS 281A: Statistical Learning Theory Yangqing Jia Based on tutorial slides by Lester Mackey and Ariel Kleiner August, 2011 Outline 1 Probability 2 Statistics 3 Linear Algebra 4 Optimization

More information

M378K In-Class Assignment #1

M378K In-Class Assignment #1 The following problems are a review of M6K. M7K In-Class Assignment # Problem.. Complete the definition of mutual exclusivity of events below: Events A, B Ω are said to be mutually exclusive if A B =.

More information

Probability and Statisitcs

Probability and Statisitcs Probability and Statistics Random Variables De La Salle University Francis Joseph Campena, Ph.D. January 25, 2017 Francis Joseph Campena, Ph.D. () Probability and Statisitcs January 25, 2017 1 / 17 Outline

More information

On the Entropy of Sums of Bernoulli Random Variables via the Chen-Stein Method

On the Entropy of Sums of Bernoulli Random Variables via the Chen-Stein Method On the Entropy of Sums of Bernoulli Random Variables via the Chen-Stein Method Igal Sason Department of Electrical Engineering Technion - Israel Institute of Technology Haifa 32000, Israel ETH, Zurich,

More information

Math 151. Rumbos Fall Solutions to Review Problems for Exam 2. Pr(X = 1) = ) = Pr(X = 2) = Pr(X = 3) = p X. (k) =

Math 151. Rumbos Fall Solutions to Review Problems for Exam 2. Pr(X = 1) = ) = Pr(X = 2) = Pr(X = 3) = p X. (k) = Math 5. Rumbos Fall 07 Solutions to Review Problems for Exam. A bowl contains 5 chips of the same size and shape. Two chips are red and the other three are blue. Draw three chips from the bowl at random,

More information

Mathematical Foundations of Computer Science Lecture Outline October 18, 2018

Mathematical Foundations of Computer Science Lecture Outline October 18, 2018 Mathematical Foundations of Computer Science Lecture Outline October 18, 2018 The Total Probability Theorem. Consider events E and F. Consider a sample point ω E. Observe that ω belongs to either F or

More information

Plan for today. ! Part 1: (Hidden) Markov models. ! Part 2: String matching and read mapping

Plan for today. ! Part 1: (Hidden) Markov models. ! Part 2: String matching and read mapping Plan for today! Part 1: (Hidden) Markov models! Part 2: String matching and read mapping! 2.1 Exact algorithms! 2.2 Heuristic methods for approximate search (Hidden) Markov models Why consider probabilistics

More information

Random Variables. Statistics 110. Summer Copyright c 2006 by Mark E. Irwin

Random Variables. Statistics 110. Summer Copyright c 2006 by Mark E. Irwin Random Variables Statistics 110 Summer 2006 Copyright c 2006 by Mark E. Irwin Random Variables A Random Variable (RV) is a response of a random phenomenon which is numeric. Examples: 1. Roll a die twice

More information

Chapter 11. Information Theory and Statistics

Chapter 11. Information Theory and Statistics Chapter 11 Information Theory and Statistics Peng-Hua Wang Graduate Inst. of Comm. Engineering National Taipei University Chapter Outline Chap. 11 Information Theory and Statistics 11.1 Method of Types

More information

EE514A Information Theory I Fall 2013

EE514A Information Theory I Fall 2013 EE514A Information Theory I Fall 2013 K. Mohan, Prof. J. Bilmes University of Washington, Seattle Department of Electrical Engineering Fall Quarter, 2013 http://j.ee.washington.edu/~bilmes/classes/ee514a_fall_2013/

More information

MATH 3670 First Midterm February 17, No books or notes. No cellphone or wireless devices. Write clearly and show your work for every answer.

MATH 3670 First Midterm February 17, No books or notes. No cellphone or wireless devices. Write clearly and show your work for every answer. No books or notes. No cellphone or wireless devices. Write clearly and show your work for every answer. Name: Question: 1 2 3 4 Total Points: 30 20 20 40 110 Score: 1. The following numbers x i, i = 1,...,

More information

Fundamental Tools - Probability Theory II

Fundamental Tools - Probability Theory II Fundamental Tools - Probability Theory II MSc Financial Mathematics The University of Warwick September 29, 2015 MSc Financial Mathematics Fundamental Tools - Probability Theory II 1 / 22 Measurable random

More information

Lecture 22: Error exponents in hypothesis testing, GLRT

Lecture 22: Error exponents in hypothesis testing, GLRT 10-704: Information Processing and Learning Spring 2012 Lecture 22: Error exponents in hypothesis testing, GLRT Lecturer: Aarti Singh Scribe: Aarti Singh Disclaimer: These notes have not been subjected

More information

EE5319R: Problem Set 3 Assigned: 24/08/16, Due: 31/08/16

EE5319R: Problem Set 3 Assigned: 24/08/16, Due: 31/08/16 EE539R: Problem Set 3 Assigned: 24/08/6, Due: 3/08/6. Cover and Thomas: Problem 2.30 (Maimum Entropy): Solution: We are required to maimize H(P X ) over all distributions P X on the non-negative integers

More information

Random variables (discrete)

Random variables (discrete) Random variables (discrete) Saad Mneimneh 1 Introducing random variables A random variable is a mapping from the sample space to the real line. We usually denote the random variable by X, and a value that

More information

Discrete Probability Refresher

Discrete Probability Refresher ECE 1502 Information Theory Discrete Probability Refresher F. R. Kschischang Dept. of Electrical and Computer Engineering University of Toronto January 13, 1999 revised January 11, 2006 Probability theory

More information

Lecture 18: Optimization Programming

Lecture 18: Optimization Programming Fall, 2016 Outline Unconstrained Optimization 1 Unconstrained Optimization 2 Equality-constrained Optimization Inequality-constrained Optimization Mixture-constrained Optimization 3 Quadratic Programming

More information

Lecture 23. Random walks

Lecture 23. Random walks 18.175: Lecture 23 Random walks Scott Sheffield MIT 1 Outline Random walks Stopping times Arcsin law, other SRW stories 2 Outline Random walks Stopping times Arcsin law, other SRW stories 3 Exchangeable

More information

COMPSCI 650 Applied Information Theory Jan 21, Lecture 2

COMPSCI 650 Applied Information Theory Jan 21, Lecture 2 COMPSCI 650 Applied Information Theory Jan 21, 2016 Lecture 2 Instructor: Arya Mazumdar Scribe: Gayane Vardoyan, Jong-Chyi Su 1 Entropy Definition: Entropy is a measure of uncertainty of a random variable.

More information

Solutionbank S1 Edexcel AS and A Level Modular Mathematics

Solutionbank S1 Edexcel AS and A Level Modular Mathematics Heinemann Solutionbank: Statistics S Page of Solutionbank S Exercise A, Question Write down whether or not each of the following is a discrete random variable. Give a reason for your answer. a The average

More information

The Method of Types and Its Application to Information Hiding

The Method of Types and Its Application to Information Hiding The Method of Types and Its Application to Information Hiding Pierre Moulin University of Illinois at Urbana-Champaign www.ifp.uiuc.edu/ moulin/talks/eusipco05-slides.pdf EUSIPCO Antalya, September 7,

More information

If the objects are replaced there are n choices each time yielding n r ways. n C r and in the textbook by g(n, r).

If the objects are replaced there are n choices each time yielding n r ways. n C r and in the textbook by g(n, r). Caveat: Not proof read. Corrections appreciated. Combinatorics In the following, n, n 1, r, etc. will denote non-negative integers. Rule 1 The number of ways of ordering n distinguishable objects (also

More information

Source Coding. Master Universitario en Ingeniería de Telecomunicación. I. Santamaría Universidad de Cantabria

Source Coding. Master Universitario en Ingeniería de Telecomunicación. I. Santamaría Universidad de Cantabria Source Coding Master Universitario en Ingeniería de Telecomunicación I. Santamaría Universidad de Cantabria Contents Introduction Asymptotic Equipartition Property Optimal Codes (Huffman Coding) Universal

More information

6.041/6.431 Spring 2009 Quiz 1 Wednesday, March 11, 7:30-9:30 PM. SOLUTIONS

6.041/6.431 Spring 2009 Quiz 1 Wednesday, March 11, 7:30-9:30 PM. SOLUTIONS 6.0/6.3 Spring 009 Quiz Wednesday, March, 7:30-9:30 PM. SOLUTIONS Name: Recitation Instructor: Question Part Score Out of 0 all 0 a 5 b c 5 d 5 e 5 f 5 3 a b c d 5 e 5 f 5 g 5 h 5 Total 00 Write your solutions

More information

n(1 p i ) n 1 p i = 1 3 i=1 E(X i p = p i )P(p = p i ) = 1 3 p i = n 3 (p 1 + p 2 + p 3 ). p i i=1 P(X i = 1 p = p i )P(p = p i ) = p1+p2+p3

n(1 p i ) n 1 p i = 1 3 i=1 E(X i p = p i )P(p = p i ) = 1 3 p i = n 3 (p 1 + p 2 + p 3 ). p i i=1 P(X i = 1 p = p i )P(p = p i ) = p1+p2+p3 Introduction to Probability Due:August 8th, 211 Solutions of Final Exam Solve all the problems 1. (15 points) You have three coins, showing Head with probabilities p 1, p 2 and p 3. You perform two different

More information

Lecture 10. Variance and standard deviation

Lecture 10. Variance and standard deviation 18.440: Lecture 10 Variance and standard deviation Scott Sheffield MIT 1 Outline Defining variance Examples Properties Decomposition trick 2 Outline Defining variance Examples Properties Decomposition

More information

1 Bernoulli Distribution: Single Coin Flip

1 Bernoulli Distribution: Single Coin Flip STAT 350 - An Introduction to Statistics Named Discrete Distributions Jeremy Troisi Bernoulli Distribution: Single Coin Flip trial of an experiment that yields either a success or failure. X Bern(p),X

More information

Lecture 3. Discrete Random Variables

Lecture 3. Discrete Random Variables Math 408 - Mathematical Statistics Lecture 3. Discrete Random Variables January 23, 2013 Konstantin Zuev (USC) Math 408, Lecture 3 January 23, 2013 1 / 14 Agenda Random Variable: Motivation and Definition

More information

Problem Sheet 1. You may assume that both F and F are σ-fields. (a) Show that F F is not a σ-field. (b) Let X : Ω R be defined by 1 if n = 1

Problem Sheet 1. You may assume that both F and F are σ-fields. (a) Show that F F is not a σ-field. (b) Let X : Ω R be defined by 1 if n = 1 Problem Sheet 1 1. Let Ω = {1, 2, 3}. Let F = {, {1}, {2, 3}, {1, 2, 3}}, F = {, {2}, {1, 3}, {1, 2, 3}}. You may assume that both F and F are σ-fields. (a) Show that F F is not a σ-field. (b) Let X :

More information

IEOR 3106: Introduction to Operations Research: Stochastic Models. Professor Whitt. SOLUTIONS to Homework Assignment 2

IEOR 3106: Introduction to Operations Research: Stochastic Models. Professor Whitt. SOLUTIONS to Homework Assignment 2 IEOR 316: Introduction to Operations Research: Stochastic Models Professor Whitt SOLUTIONS to Homework Assignment 2 More Probability Review: In the Ross textbook, Introduction to Probability Models, read

More information

MA : Introductory Probability

MA : Introductory Probability MA 320-001: Introductory Probability David Murrugarra Department of Mathematics, University of Kentucky http://www.math.uky.edu/~dmu228/ma320/ Spring 2017 David Murrugarra (University of Kentucky) MA 320:

More information

Probability Theory Review

Probability Theory Review Cogsci 118A: Natural Computation I Lecture 2 (01/07/10) Lecturer: Angela Yu Probability Theory Review Scribe: Joseph Schilz Lecture Summary 1. Set theory: terms and operators In this section, we provide

More information

Econ 325: Introduction to Empirical Economics

Econ 325: Introduction to Empirical Economics Econ 325: Introduction to Empirical Economics Lecture 2 Probability Copyright 2010 Pearson Education, Inc. Publishing as Prentice Hall Ch. 3-1 3.1 Definition Random Experiment a process leading to an uncertain

More information

Probability theory for Networks (Part 1) CS 249B: Science of Networks Week 02: Monday, 02/04/08 Daniel Bilar Wellesley College Spring 2008

Probability theory for Networks (Part 1) CS 249B: Science of Networks Week 02: Monday, 02/04/08 Daniel Bilar Wellesley College Spring 2008 Probability theory for Networks (Part 1) CS 249B: Science of Networks Week 02: Monday, 02/04/08 Daniel Bilar Wellesley College Spring 2008 1 Review We saw some basic metrics that helped us characterize

More information

MAT 135B Midterm 1 Solutions

MAT 135B Midterm 1 Solutions MAT 35B Midterm Solutions Last Name (PRINT): First Name (PRINT): Student ID #: Section: Instructions:. Do not open your test until you are told to begin. 2. Use a pen to print your name in the spaces above.

More information

Dynamic Programming Lecture #4

Dynamic Programming Lecture #4 Dynamic Programming Lecture #4 Outline: Probability Review Probability space Conditional probability Total probability Bayes rule Independent events Conditional independence Mutual independence Probability

More information

On the Chi square and higher-order Chi distances for approximating f-divergences

On the Chi square and higher-order Chi distances for approximating f-divergences c 2013 Frank Nielsen, Sony Computer Science Laboratories, Inc. 1/17 On the Chi square and higher-order Chi distances for approximating f-divergences Frank Nielsen 1 Richard Nock 2 www.informationgeometry.org

More information

CS 630 Basic Probability and Information Theory. Tim Campbell

CS 630 Basic Probability and Information Theory. Tim Campbell CS 630 Basic Probability and Information Theory Tim Campbell 21 January 2003 Probability Theory Probability Theory is the study of how best to predict outcomes of events. An experiment (or trial or event)

More information

STAT 345 Spring 2018 Homework 4 - Discrete Probability Distributions

STAT 345 Spring 2018 Homework 4 - Discrete Probability Distributions STAT 345 Spring 2018 Homework 4 - Discrete Probability Distributions Name: Please adhere to the homework rules as given in the Syllabus. 1. Coin Flipping. Timothy and Jimothy are playing a betting game.

More information

Evaluating Classifiers. Lecture 2 Instructor: Max Welling

Evaluating Classifiers. Lecture 2 Instructor: Max Welling Evaluating Classifiers Lecture 2 Instructor: Max Welling Evaluation of Results How do you report classification error? How certain are you about the error you claim? How do you compare two algorithms?

More information

Chapter 2. Probability

Chapter 2. Probability 2-1 Chapter 2 Probability 2-2 Section 2.1: Basic Ideas Definition: An experiment is a process that results in an outcome that cannot be predicted in advance with certainty. Examples: rolling a die tossing

More information

Notes 6 Autumn Example (One die: part 1) One fair six-sided die is thrown. X is the number showing.

Notes 6 Autumn Example (One die: part 1) One fair six-sided die is thrown. X is the number showing. MAS 08 Probability I Notes Autumn 005 Random variables A probability space is a sample space S together with a probability function P which satisfies Kolmogorov s aioms. The Holy Roman Empire was, in the

More information

1 Variance of a Random Variable

1 Variance of a Random Variable Indian Institute of Technology Bombay Department of Electrical Engineering Handout 14 EE 325 Probability and Random Processes Lecture Notes 9 August 28, 2014 1 Variance of a Random Variable The expectation

More information

Lecture 9. Expectations of discrete random variables

Lecture 9. Expectations of discrete random variables 18.440: Lecture 9 Expectations of discrete random variables Scott Sheffield MIT 1 Outline Defining expectation Functions of random variables Motivation 2 Outline Defining expectation Functions of random

More information

Chap 4 Probability p227 The probability of any outcome in a random phenomenon is the proportion of times the outcome would occur in a long series of

Chap 4 Probability p227 The probability of any outcome in a random phenomenon is the proportion of times the outcome would occur in a long series of Chap 4 Probability p227 The probability of any outcome in a random phenomenon is the proportion of times the outcome would occur in a long series of repetitions. (p229) That is, probability is a long-term

More information

Probabilistic Systems Analysis Spring 2018 Lecture 6. Random Variables: Probability Mass Function and Expectation

Probabilistic Systems Analysis Spring 2018 Lecture 6. Random Variables: Probability Mass Function and Expectation EE 178 Probabilistic Systems Analysis Spring 2018 Lecture 6 Random Variables: Probability Mass Function and Expectation Probability Mass Function When we introduce the basic probability model in Note 1,

More information

1 Probability and Random Variables

1 Probability and Random Variables 1 Probability and Random Variables The models that you have seen thus far are deterministic models. For any time t, there is a unique solution X(t). On the other hand, stochastic models will result in

More information

Preliminary Statistics Lecture 2: Probability Theory (Outline) prelimsoas.webs.com

Preliminary Statistics Lecture 2: Probability Theory (Outline) prelimsoas.webs.com 1 School of Oriental and African Studies September 2015 Department of Economics Preliminary Statistics Lecture 2: Probability Theory (Outline) prelimsoas.webs.com Gujarati D. Basic Econometrics, Appendix

More information

Introduction to Machine Learning Lecture 14. Mehryar Mohri Courant Institute and Google Research

Introduction to Machine Learning Lecture 14. Mehryar Mohri Courant Institute and Google Research Introduction to Machine Learning Lecture 14 Mehryar Mohri Courant Institute and Google Research mohri@cims.nyu.edu Density Estimation Maxent Models 2 Entropy Definition: the entropy of a random variable

More information

Speech Recognition Lecture 7: Maximum Entropy Models. Mehryar Mohri Courant Institute and Google Research

Speech Recognition Lecture 7: Maximum Entropy Models. Mehryar Mohri Courant Institute and Google Research Speech Recognition Lecture 7: Maximum Entropy Models Mehryar Mohri Courant Institute and Google Research mohri@cims.nyu.com This Lecture Information theory basics Maximum entropy models Duality theorem

More information

Hypothesis Testing. Testing Hypotheses MIT Dr. Kempthorne. Spring MIT Testing Hypotheses

Hypothesis Testing. Testing Hypotheses MIT Dr. Kempthorne. Spring MIT Testing Hypotheses Testing Hypotheses MIT 18.443 Dr. Kempthorne Spring 2015 1 Outline Hypothesis Testing 1 Hypothesis Testing 2 Hypothesis Testing: Statistical Decision Problem Two coins: Coin 0 and Coin 1 P(Head Coin 0)

More information

Necessary and Sufficient Conditions for High-Dimensional Salient Feature Subset Recovery

Necessary and Sufficient Conditions for High-Dimensional Salient Feature Subset Recovery Necessary and Sufficient Conditions for High-Dimensional Salient Feature Subset Recovery Vincent Tan, Matt Johnson, Alan S. Willsky Stochastic Systems Group, Laboratory for Information and Decision Systems,

More information

Lecture 5 - Information theory

Lecture 5 - Information theory Lecture 5 - Information theory Jan Bouda FI MU May 18, 2012 Jan Bouda (FI MU) Lecture 5 - Information theory May 18, 2012 1 / 42 Part I Uncertainty and entropy Jan Bouda (FI MU) Lecture 5 - Information

More information

Lecture 1. ABC of Probability

Lecture 1. ABC of Probability Math 408 - Mathematical Statistics Lecture 1. ABC of Probability January 16, 2013 Konstantin Zuev (USC) Math 408, Lecture 1 January 16, 2013 1 / 9 Agenda Sample Spaces Realizations, Events Axioms of Probability

More information

Information Theory. David Rosenberg. June 15, New York University. David Rosenberg (New York University) DS-GA 1003 June 15, / 18

Information Theory. David Rosenberg. June 15, New York University. David Rosenberg (New York University) DS-GA 1003 June 15, / 18 Information Theory David Rosenberg New York University June 15, 2015 David Rosenberg (New York University) DS-GA 1003 June 15, 2015 1 / 18 A Measure of Information? Consider a discrete random variable

More information

4 Branching Processes

4 Branching Processes 4 Branching Processes Organise by generations: Discrete time. If P(no offspring) 0 there is a probability that the process will die out. Let X = number of offspring of an individual p(x) = P(X = x) = offspring

More information

4F5: Advanced Communications and Coding Handout 2: The Typical Set, Compression, Mutual Information

4F5: Advanced Communications and Coding Handout 2: The Typical Set, Compression, Mutual Information 4F5: Advanced Communications and Coding Handout 2: The Typical Set, Compression, Mutual Information Ramji Venkataramanan Signal Processing and Communications Lab Department of Engineering ramji.v@eng.cam.ac.uk

More information

2. Variance and Covariance: We will now derive some classic properties of variance and covariance. Assume real-valued random variables X and Y.

2. Variance and Covariance: We will now derive some classic properties of variance and covariance. Assume real-valued random variables X and Y. CS450 Final Review Problems Fall 08 Solutions or worked answers provided Problems -6 are based on the midterm review Identical problems are marked recap] Please consult previous recitations and textbook

More information

Hidden Markov Models. Hosein Mohimani GHC7717

Hidden Markov Models. Hosein Mohimani GHC7717 Hidden Markov Models Hosein Mohimani GHC7717 hoseinm@andrew.cmu.edu Fair et Casino Problem Dealer flips a coin and player bets on outcome Dealer use either a fair coin (head and tail equally likely) or

More information

Exercises with solutions (Set D)

Exercises with solutions (Set D) Exercises with solutions Set D. A fair die is rolled at the same time as a fair coin is tossed. Let A be the number on the upper surface of the die and let B describe the outcome of the coin toss, where

More information

MAT 271E Probability and Statistics

MAT 271E Probability and Statistics MAT 71E Probability and Statistics Spring 013 Instructor : Class Meets : Office Hours : Textbook : Supp. Text : İlker Bayram EEB 1103 ibayram@itu.edu.tr 13.30 1.30, Wednesday EEB 5303 10.00 1.00, Wednesday

More information

18.175: Lecture 14 Infinite divisibility and so forth

18.175: Lecture 14 Infinite divisibility and so forth 18.175 Lecture 14 18.175: Lecture 14 Infinite divisibility and so forth Scott Sheffield MIT 18.175 Lecture 14 Outline Infinite divisibility Higher dimensional CFs and CLTs Random walks Stopping times Arcsin

More information

18.440: Lecture 26 Conditional expectation

18.440: Lecture 26 Conditional expectation 18.440: Lecture 26 Conditional expectation Scott Sheffield MIT 1 Outline Conditional probability distributions Conditional expectation Interpretation and examples 2 Outline Conditional probability distributions

More information

Information. = more information was provided by the outcome in #2

Information. = more information was provided by the outcome in #2 Outline First part based very loosely on [Abramson 63]. Information theory usually formulated in terms of information channels and coding will not discuss those here.. Information 2. Entropy 3. Mutual

More information

A Gentle Tutorial on Information Theory and Learning. Roni Rosenfeld. Carnegie Mellon University

A Gentle Tutorial on Information Theory and Learning. Roni Rosenfeld. Carnegie Mellon University A Gentle Tutorial on Information Theory and Learning Roni Rosenfeld Mellon University Mellon Outline First part based very loosely on [Abramson 63]. Information theory usually formulated in terms of information

More information

Probability Theory for Machine Learning. Chris Cremer September 2015

Probability Theory for Machine Learning. Chris Cremer September 2015 Probability Theory for Machine Learning Chris Cremer September 2015 Outline Motivation Probability Definitions and Rules Probability Distributions MLE for Gaussian Parameter Estimation MLE and Least Squares

More information

Lecture 6: Entropy Rate

Lecture 6: Entropy Rate Lecture 6: Entropy Rate Entropy rate H(X) Random walk on graph Dr. Yao Xie, ECE587, Information Theory, Duke University Coin tossing versus poker Toss a fair coin and see and sequence Head, Tail, Tail,

More information

HIDDEN MARKOV MODELS

HIDDEN MARKOV MODELS HIDDEN MARKOV MODELS Outline CG-islands The Fair Bet Casino Hidden Markov Model Decoding Algorithm Forward-Backward Algorithm Profile HMMs HMM Parameter Estimation Viterbi training Baum-Welch algorithm

More information

An Introduction to Bioinformatics Algorithms Hidden Markov Models

An Introduction to Bioinformatics Algorithms  Hidden Markov Models Hidden Markov Models Hidden Markov Models Outline CG-islands The Fair Bet Casino Hidden Markov Model Decoding Algorithm Forward-Backward Algorithm Profile HMMs HMM Parameter Estimation Viterbi training

More information

ECE 450 Homework #3. 1. Given the joint density function f XY (x,y) = 0.5 1<x<2, 2<y< <x<4, 2<y<3 0 else

ECE 450 Homework #3. 1. Given the joint density function f XY (x,y) = 0.5 1<x<2, 2<y< <x<4, 2<y<3 0 else ECE 450 Homework #3 0. Consider the random variables X and Y, whose values are a function of the number showing when a single die is tossed, as show below: Exp. Outcome 1 3 4 5 6 X 3 3 4 4 Y 0 1 3 4 5

More information

Recitation 2: Probability

Recitation 2: Probability Recitation 2: Probability Colin White, Kenny Marino January 23, 2018 Outline Facts about sets Definitions and facts about probability Random Variables and Joint Distributions Characteristics of distributions

More information

Discrete Random Variables

Discrete Random Variables CPSC 53 Systems Modeling and Simulation Discrete Random Variables Dr. Anirban Mahanti Department of Computer Science University of Calgary mahanti@cpsc.ucalgary.ca Random Variables A random variable is

More information

CSE 312: Foundations of Computing II Random Variables, Linearity of Expectation 4 Solutions

CSE 312: Foundations of Computing II Random Variables, Linearity of Expectation 4 Solutions CSE 31: Foundations of Computing II Random Variables, Linearity of Expectation Solutions Review of Main Concepts (a Random Variable (rv: A numeric function X : Ω R of the outcome. (b Range/Support: The

More information

Mathematical Preliminaries

Mathematical Preliminaries Mathematical Preliminaries Economics 3307 - Intermediate Macroeconomics Aaron Hedlund Baylor University Fall 2013 Econ 3307 (Baylor University) Mathematical Preliminaries Fall 2013 1 / 25 Outline I: Sequences

More information

Design of Engineering Experiments

Design of Engineering Experiments Design of Engineering Experiments Hussam Alshraideh Chapter 2: Some Basic Statistical Concepts October 4, 2015 Hussam Alshraideh (JUST) Basic Stats October 4, 2015 1 / 29 Overview 1 Introduction Basic

More information

1 INFO Sep 05

1 INFO Sep 05 Events A 1,...A n are said to be mutually independent if for all subsets S {1,..., n}, p( i S A i ) = p(a i ). (For example, flip a coin N times, then the events {A i = i th flip is heads} are mutually

More information

6.4 Type I and Type II Errors

6.4 Type I and Type II Errors 6.4 Type I and Type II Errors Ulrich Hoensch Friday, March 22, 2013 Null and Alternative Hypothesis Neyman-Pearson Approach to Statistical Inference: A statistical test (also known as a hypothesis test)

More information

DISCRETE RANDOM VARIABLES: PMF s & CDF s [DEVORE 3.2]

DISCRETE RANDOM VARIABLES: PMF s & CDF s [DEVORE 3.2] DISCRETE RANDOM VARIABLES: PMF s & CDF s [DEVORE 3.2] PROBABILITY MASS FUNCTION (PMF) DEFINITION): Let X be a discrete random variable. Then, its pmf, denoted as p X(k), is defined as follows: p X(k) :=

More information

1. Let A be a 2 2 nonzero real matrix. Which of the following is true?

1. Let A be a 2 2 nonzero real matrix. Which of the following is true? 1. Let A be a 2 2 nonzero real matrix. Which of the following is true? (A) A has a nonzero eigenvalue. (B) A 2 has at least one positive entry. (C) trace (A 2 ) is positive. (D) All entries of A 2 cannot

More information

The Information Bottleneck Revisited or How to Choose a Good Distortion Measure

The Information Bottleneck Revisited or How to Choose a Good Distortion Measure The Information Bottleneck Revisited or How to Choose a Good Distortion Measure Peter Harremoës Centrum voor Wiskunde en Informatica PO 94079, 1090 GB Amsterdam The Nederlands PHarremoes@cwinl Naftali

More information

SDS 321: Introduction to Probability and Statistics

SDS 321: Introduction to Probability and Statistics SDS 321: Introduction to Probability and Statistics Lecture 10: Expectation and Variance Purnamrita Sarkar Department of Statistics and Data Science The University of Texas at Austin www.cs.cmu.edu/ psarkar/teaching

More information

MATH Notebook 5 Fall 2018/2019

MATH Notebook 5 Fall 2018/2019 MATH442601 2 Notebook 5 Fall 2018/2019 prepared by Professor Jenny Baglivo c Copyright 2004-2019 by Jenny A. Baglivo. All Rights Reserved. 5 MATH442601 2 Notebook 5 3 5.1 Sequences of IID Random Variables.............................

More information

Chapter 8: Differential entropy. University of Illinois at Chicago ECE 534, Natasha Devroye

Chapter 8: Differential entropy. University of Illinois at Chicago ECE 534, Natasha Devroye Chapter 8: Differential entropy Chapter 8 outline Motivation Definitions Relation to discrete entropy Joint and conditional differential entropy Relative entropy and mutual information Properties AEP for

More information

Appendix A : Introduction to Probability and stochastic processes

Appendix A : Introduction to Probability and stochastic processes A-1 Mathematical methods in communication July 5th, 2009 Appendix A : Introduction to Probability and stochastic processes Lecturer: Haim Permuter Scribe: Shai Shapira and Uri Livnat The probability of

More information

Information Theory in Intelligent Decision Making

Information Theory in Intelligent Decision Making Information Theory in Intelligent Decision Making Adaptive Systems and Algorithms Research Groups School of Computer Science University of Hertfordshire, United Kingdom June 7, 2015 Information Theory

More information

Probability Review. Yutian Li. January 18, Stanford University. Yutian Li (Stanford University) Probability Review January 18, / 27

Probability Review. Yutian Li. January 18, Stanford University. Yutian Li (Stanford University) Probability Review January 18, / 27 Probability Review Yutian Li Stanford University January 18, 2018 Yutian Li (Stanford University) Probability Review January 18, 2018 1 / 27 Outline 1 Elements of probability 2 Random variables 3 Multiple

More information

Math Bootcamp 2012 Miscellaneous

Math Bootcamp 2012 Miscellaneous Math Bootcamp 202 Miscellaneous Factorial, combination and permutation The factorial of a positive integer n denoted by n!, is the product of all positive integers less than or equal to n. Define 0! =.

More information

X = X X n, + X 2

X = X X n, + X 2 CS 70 Discrete Mathematics for CS Fall 2003 Wagner Lecture 22 Variance Question: At each time step, I flip a fair coin. If it comes up Heads, I walk one step to the right; if it comes up Tails, I walk

More information

Problem Points S C O R E Total: 120

Problem Points S C O R E Total: 120 PSTAT 160 A Final Exam December 10, 2015 Name Student ID # Problem Points S C O R E 1 10 2 10 3 10 4 10 5 10 6 10 7 10 8 10 9 10 10 10 11 10 12 10 Total: 120 1. (10 points) Take a Markov chain with the

More information

Introduction to probability theory

Introduction to probability theory Introduction to probability theory Fátima Sánchez Cabo Institute for Genomics and Bioinformatics, TUGraz f.sanchezcabo@tugraz.at 07/03/2007 - p. 1/35 Outline Random and conditional probability (7 March)

More information

Dept. of Linguistics, Indiana University Fall 2015

Dept. of Linguistics, Indiana University Fall 2015 L645 Dept. of Linguistics, Indiana University Fall 2015 1 / 34 To start out the course, we need to know something about statistics and This is only an introduction; for a fuller understanding, you would

More information

The Central Limit Theorem

The Central Limit Theorem The Central Limit Theorem Suppose n tickets are drawn at random with replacement from a box of numbered tickets. The central limit theorem says that when the probability histogram for the sum of the draws

More information

Probability Theory and Simulation Methods

Probability Theory and Simulation Methods Feb 28th, 2018 Lecture 10: Random variables Countdown to midterm (March 21st): 28 days Week 1 Chapter 1: Axioms of probability Week 2 Chapter 3: Conditional probability and independence Week 4 Chapters

More information