Asymptotic Equipartition Property - Seminar 3, part 1

Similar documents
Math 10B: Mock Mid II. April 13, 2016

Chapter 3 Common Families of Distributions

CS Homework Week 2 ( 2.25, 3.22, 4.9)

Chapter 4. Truncation Errors

Innova Junior College H2 Mathematics JC2 Preliminary Examinations Paper 2 Solutions 0 (*)

Vehicle Arrival Models : Headway

Math 315: Linear Algebra Solutions to Assignment 6

1 Review of Zero-Sum Games

Lecture 2 April 04, 2018

Homework 4 (Stats 620, Winter 2017) Due Tuesday Feb 14, in class Questions are derived from problems in Stochastic Processes by S. Ross.

Introduction to Probability and Statistics Slides 4 Chapter 4

Notes for Lecture 17-18

Time series Decomposition method

Final Spring 2007

Ensamble methods: Boosting

0.1 MAXIMUM LIKELIHOOD ESTIMATION EXPLAINED

Exam 3 Review (Sections Covered: , )

Spring Ammar Abu-Hudrouss Islamic University Gaza

Chapter 2. Models, Censoring, and Likelihood for Failure-Time Data

Lecture Notes 2. The Hilbert Space Approach to Time Series

Random Walk with Anti-Correlated Steps

T L. t=1. Proof of Lemma 1. Using the marginal cost accounting in Equation(4) and standard arguments. t )+Π RB. t )+K 1(Q RB

The Arcsine Distribution

Diebold, Chapter 7. Francis X. Diebold, Elements of Forecasting, 4th Edition (Mason, Ohio: Cengage Learning, 2006). Chapter 7. Characterizing Cycles

Stochastic models and their distributions

Transform Techniques. Moment Generating Function

Chapter Floating Point Representation

Ensamble methods: Bagging and Boosting

Discrete Markov Processes. 1. Introduction

Some Ramsey results for the n-cube

INTRODUCTION TO MACHINE LEARNING 3RD EDITION

SMT 2014 Calculus Test Solutions February 15, 2014 = 3 5 = 15.

10. State Space Methods

Solutions to the Exam Digital Communications I given on the 11th of June = 111 and g 2. c 2

Maintenance Models. Prof. Robert C. Leachman IEOR 130, Methods of Manufacturing Improvement Spring, 2011

Chapter 7: Solving Trig Equations

Retrieval Models. Boolean and Vector Space Retrieval Models. Common Preprocessing Steps. Boolean Model. Boolean Retrieval Model

Analyze patterns and relationships. 3. Generate two numerical patterns using AC

3.6 Derivatives as Rates of Change

Two Coupled Oscillators / Normal Modes

Second Law. first draft 9/23/04, second Sept Oct 2005 minor changes 2006, used spell check, expanded example

Longest Common Prefixes

INDEPENDENT SETS IN GRAPHS WITH GIVEN MINIMUM DEGREE

Linear Cryptanalysis

20. Applications of the Genetic-Drift Model

Zürich. ETH Master Course: L Autonomous Mobile Robots Localization II

Stationary Distribution. Design and Analysis of Algorithms Andrei Bulatov

More Digital Logic. t p output. Low-to-high and high-to-low transitions could have different t p. V in (t)

Matrix Versions of Some Refinements of the Arithmetic-Geometric Mean Inequality

CHAPTER 12 DIRECT CURRENT CIRCUITS

State-Space Models. Initialization, Estimation and Smoothing of the Kalman Filter

Lecture 4 Notes (Little s Theorem)

Math 2142 Exam 1 Review Problems. x 2 + f (0) 3! for the 3rd Taylor polynomial at x = 0. To calculate the various quantities:

Solutions from Chapter 9.1 and 9.2

Seminar 4: Hotelling 2

Problem set 2 for the course on. Markov chains and mixing times

Double system parts optimization: static and dynamic model

Radical Expressions. Terminology: A radical will have the following; a radical sign, a radicand, and an index.

Homework sheet Exercises done during the lecture of March 12, 2014

(a) Set up the least squares estimation procedure for this problem, which will consist in minimizing the sum of squared residuals. 2 t.

23.2. Representing Periodic Functions by Fourier Series. Introduction. Prerequisites. Learning Outcomes

Hamilton- J acobi Equation: Explicit Formulas In this lecture we try to apply the method of characteristics to the Hamilton-Jacobi equation: u t

Designing Information Devices and Systems I Spring 2019 Lecture Notes Note 17

Online Convex Optimization Example And Follow-The-Leader

EXERCISES FOR SECTION 1.5

Linear Response Theory: The connection between QFT and experiments

ON THE NUMBER OF FAMILIES OF BRANCHING PROCESSES WITH IMMIGRATION WITH FAMILY SIZES WITHIN RANDOM INTERVAL

2.7. Some common engineering functions. Introduction. Prerequisites. Learning Outcomes

The Strong Law of Large Numbers

Challenge Problems. DIS 203 and 210. March 6, (e 2) k. k(k + 2). k=1. f(x) = k(k + 2) = 1 x k

5. Stochastic processes (1)

Instructor: Barry McQuarrie Page 1 of 5

ACE 562 Fall Lecture 5: The Simple Linear Regression Model: Sampling Properties of the Least Squares Estimators. by Professor Scott H.

SOLUTIONS TO ECE 3084

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation

Problem Set 9 Due December, 7

Outline. lse-logo. Outline. Outline. 1 Wald Test. 2 The Likelihood Ratio Test. 3 Lagrange Multiplier Tests

Written Exercise Sheet 5

Application of a Stochastic-Fuzzy Approach to Modeling Optimal Discrete Time Dynamical Systems by Using Large Scale Data Processing

R t. C t P t. + u t. C t = αp t + βr t + v t. + β + w t

Comparing Means: t-tests for One Sample & Two Related Samples

Christos Papadimitriou & Luca Trevisan November 22, 2016

Lecture 2-1 Kinematics in One Dimension Displacement, Velocity and Acceleration Everything in the world is moving. Nothing stays still.

Approximation Algorithms for Unique Games via Orthogonal Separators

Reading from Young & Freedman: For this topic, read sections 25.4 & 25.5, the introduction to chapter 26 and sections 26.1 to 26.2 & 26.4.

GMM - Generalized Method of Moments

Block Diagram of a DCS in 411

An random variable is a quantity that assumes different values with certain probabilities.

Dynamic Econometric Models: Y t = + 0 X t + 1 X t X t k X t-k + e t. A. Autoregressive Model:

Lecture 33: November 29

Ordinary dierential equations

On Measuring Pro-Poor Growth. 1. On Various Ways of Measuring Pro-Poor Growth: A Short Review of the Literature

PENALIZED LEAST SQUARES AND PENALIZED LIKELIHOOD

Continuous Time Markov Chain (Markov Process)

MODULE 3 FUNCTION OF A RANDOM VARIABLE AND ITS DISTRIBUTION LECTURES PROBABILITY DISTRIBUTION OF A FUNCTION OF A RANDOM VARIABLE

Distribution of Estimates

Math 116 Second Midterm March 21, 2016

STRUCTURAL CHANGE IN TIME SERIES OF THE EXCHANGE RATES BETWEEN YEN-DOLLAR AND YEN-EURO IN

( ) ( ) if t = t. It must satisfy the identity. So, bulkiness of the unit impulse (hyper)function is equal to 1. The defining characteristic is

Math 527 Lecture 6: Hamilton-Jacobi Equation: Explicit Formulas

Transcription:

Asympoic Equipariion Propery - Seminar 3, par 1 Ocober 22, 2013 Problem 1 (Calculaion of ypical se) To clarify he noion of a ypical se A (n) ε and he smalles se of high probabiliy B (n), we will calculae he se for a simple example. Consider a sequence of i.i.d. binary random variables, X 1, X 2,..., X n, where he probabiliy ha X i = 1 is 0.6 (and herefore he probabiliy ha X i = 0 is 0.4). (a) Calculae H(X). (b) Wih n = 25 and ε = 0.1, which sequences fall in he ypical se A(n)? Wha is he probabiliy of he ypical se? How many elemens are here in he ypical se? (This involves compuaion of a able of probabiliies for sequences wih k 1 s, 0 k 25, and finding hose sequences ha are in he ypical se.) (c) How many elemens are here in he smalles se ha has probabiliy 0.9? (d) How many elemens are here in he inersecion of he ses in pars (b) and (c)? Wha is he probabiliy of his inersecion? Soluion. X : ( 0 1 0.4 0.6 ) (a) H(x) = 0.4 log 2 0.4 0.6 log 2 0.6 = 0.970 95 bis. (b) ε = 0.1, [H(X) ε, H(X) + ε] = [0.870 95, 1.07095]. A (n) ε = {x {0, 1} n : H(X) ε 1 n log p(xn ) H(X) + ε}, We generae a able of probabiliies using he following MATLAB code %Bernoulli AEP p=0.6; epsilon=0.1; 1

q=1-p; n=25; k=(0:n) ; Hx=enropy([p,q]); li=hx-epsilon; ls=hx+epsilon; for i=0:n combi(i+1)=nchoosek(n,i); end M=[k,combi,binocdf(k,n,p),-1/n*log2(p.^k.*q.^(n-k))]; The oupu able is 0 1 1.1259e-010 1.3219 1 25 4.3347e-009 1.2985 2 300 8.0333e-008 1.2751 3 2300 9.5431e-007 1.2517 4 12650 8.1646e-006 1.2283 5 53130 5.359e-005 1.2049 6 1.771e+005 0.00028072 1.1815 7 4.807e+005 0.0012054 1.1581 8 1.0816e+006 0.0043264 1.1347 9 2.043e+006 0.013169 1.1113 10 3.2688e+006 0.034392 1.0879 11 4.4574e+006 0.077801 1.0645 12 5.2003e+006 0.15377 1.0411 13 5.2003e+006 0.26772 1.0177 14 4.4574e+006 0.41423 0.99435 15 3.2688e+006 0.57538 0.97095 16 2.043e+006 0.72647 0.94755 17 1.0816e+006 0.84645 0.92415 18 4.807e+005 0.92643 0.90076 19 1.771e+005 0.97064 0.87736 20 53130 0.99053 0.85396 21 12650 0.99763 0.83056 22 2300 0.99957 0.80716 23 300 0.99995 0.78376 24 25 1 0.76036 25 1 1 0.73697 The forh column conains - 1 n log p(xn ). The values wihin he range [ 0.870 95, 1.07095] are for 11 k 19 i=find(m(:,4)>=li & M(:,4)<=ls); M M(i,:) wih oupu 2

11 4.4574e+006 0.077801 1.0645 12 5.2003e+006 0.15377 1.0411 13 5.2003e+006 0.26772 1.0177 14 4.4574e+006 0.41423 0.99435 15 3.2688e+006 0.57538 0.97095 16 2.043e+006 0.72647 0.94755 17 1.0816e+006 0.84645 0.92415 18 4.807e+005 0.92643 0.90076 19 1.771e+005 0.97064 0.87736 The probabiliy ha he number of 1 s lies beween 11 and 19 is equal o F (19) F (10) = 0.970638 0.034392 = 0.936246. Noe ha his is greaer han 1 ε, i.e., he n is large enough for he probabiliy of he ypical se o be greaer han 1 ε. The number of elemens in he ypical se can be found using he hird column. A ε (n) 19 ( ) 25 = = 26 366 510. k k=11 (c) To find he smalles se B (n) of probabiliy 0.9, we can imagine ha we are filling a bag wih pieces such ha we wan o reach a cerain weigh wih he minimum number of pieces. To minimize he number of pieces ha we use, we should use he larges possible pieces. In his case, i corresponds o using he sequences wih he highes probabiliy. Thus we keep puing he high probabiliy sequences ino his se unil we reach a oal probabiliy of 0.9. Looking a he fourh column of he able, i is clear ha he probabiliy of a sequence increases monoonically wih k. Thus he se consiss of sequences of k = 25, 24;..., unil we have a oal probabiliy 0.9. Using he cumulaive probabiliy column, i follows ha he se B (n) consis of sequences wih k 13 and some sequences wih k = 12. The sequences wih k 13 provide a oal probabiliy of 1 0.153768 = 0.846232 o he se B (n). The remaining probabiliy of 0.9 0.846232 = 0.053768 should come from sequences wihk = 12. The number of such sequences needed o fill his probabiliy is a leas0.053768 = p(x n ) = 0.053768/1.460813 10 8 = 3680690.1, which we round up o 3680691. Thus he smalles se wih probabiliy 0.9 has 33554432 16777216 + 3680691 = 20457907 sequences. Noe ha he se B (n) is no uniquely defined - i could include any 3680691 sequences wih k = 12. However, he size of he smalles se is a well defined number. (d) The inersecion of he ses A ε (n) and B (n) in pars (b) and (c) consiss of all sequences wih k beween 13 and 19, and 3680691 sequences wih k = 12. The probabiliy of his inersecion= 0.970638 0.153768 + 3

0.053768 = 0.870638, and he size of his inersecion = 33486026 16777216 + 3680691 = 20389501. Problem 2 (Markov s inequaliy and Chebyshev s inequaliy) (a) (Markov s inequaliy.) For any non-negaive random variable X and any > 0, show ha P (X ) E(X). Exhibi a random variable ha achieves his inequaliy wih equaliy. (b) (Chebyshev s inequaliy.) Le Y be a random variable wih mean µ and variance σ 2. By leing X = (Y µ) 2, show ha for any ε > 0, Soluion. P ( Y µ > ε) σ2 ε 2. (a) E(X) = 0 xdf (x) = xdf (x) 0 xdf (x) + df (x) = P (X ). Rearranging sides and dividing by we ge, xdf (x) Example for = P (X ) E(X). X = { wih probabiliy µ 0 wih probabiliy 1 µ where µ. (b) In Markov inequaliy, ake X = (Y µ) 2. and noicing ha we ge ChebIn. P ((Y µ) 2 > ε 2) P E(Y µ)2 ε 2 = σ2 ε 2, ( (Y µ) 2 > ε 2) = P ( Y µ > ε) 4

Problem 3 (AEP and muual informaion) Le (X i, Y i ) be i.i.d. p(x, y). We form he log likelihood raio of he hypohesis ha X and Y are independen vs. he hypohesis ha X and Y are dependen. Wha is he limi of Soluion. Thus, p(xn )p(y n ) p(x n,y n ) independen. 1 n log p (Xn ) p (Y n ) p (X n, Y n ). 1 n log p (Xn ) p (Y n ) p (X n, Y n = 1 n ) n log p (X i ) p (Y i ) p (X i=1 i, Y i ) n log p (X i) p (Y i ) = 1 n E i=1 = p (X i, Y i ) ) = I(X, Y ). ( log p (X i) p (Y i ) p (X i, Y i ) 2 ni(x,y ), which will converge o 1 if X and Y are indeed Problem 4 (Piece of cake) A cake is sliced roughly in half, he larges piece being chosen each ime, he oher pieces discarded. We will assume ha a random cu creaes pieces of proporions: { ( 2 X = 3, ) ( 1 w.p. 3 4 2 5, 5) 3 w.p. 1 4 Thus, for example, he firs cu (and choice of larges piece) may resul in a piece of size 3/5. Cuing and choosing from his piece migh reduce i o size (3/5)(2/3) a ime 2, and so on. How large, o firs order in he exponen, is he piece of cake afer n cus? Soluion. Le C i be he fracion of he piece of cake ha is cu a he ih cu, and le T n be he fracion of cake lef afer n cus. Then we have T n = C 1 C 2... C n = n i=1 C i. Hence lim 1 n log T n = 1 n n log C i = E(log C 1 ) i=1 = 3 4 log 2 3 + 1 4 log 3 5. Problem 5 (The AEP and source coding) A discree memoryless source emis a sequence of saisically independen binary digis wih probabiliies p(1) = (0.005) and p(0) = 0.995. The digis are aken 100 a a ime and a binary codeword is provided for every sequence of 100 digis conaining hree or fewer ones. 5

(a) Assuming ha all codewords are he same lengh, find he minimum lengh required o provide codewords for all sequences wih hree or fewer ones. (b) Calculae he probabiliy of observing a source sequence for which no codeword has been assigned. (c) Use Chebyshev s inequaliy o bound he probabiliy of observing a source sequence for which no codeword has been assigned. Compare his bound wih he acual probabiliy compued in par (b). Soluion. (a) The number of 100-bi binary sequences wih hree or fewer ones is ( ) ( ) ( ) ( ) 100 100 100 100 + + + = 166 751. 0 1 2 3 The required codeword lengh is log 2 166 751 = 18. (Noe ha H(0.005) = 0.0454, so 18 is quie a bi larger han he 4.5 bis of enropy.) (b) The probabiliy ha a 100-bi sequence has hree or fewer ones is 3 ( ) 100 (0.005) i (0.995) 100 i = 0.998 33. i i=0 Thus he probabiliy ha he sequence ha is generaed canno be encoded is 1 0.99833 = 0.00167. (c) In he case of a random variable S n ha is he sum of n i.i.d. random variables X 1, X 2,..., X n, Chebyshev s inequaliy saes ha P ( S n nµ ε) nσ2 ε 2, where µ and σ 2 are he mean and he variance of X i, respecively. E(S n ) = nµ, V (S n ) = nσ 2. In his problem, n = 100, µ = 0.005, and σ 2 = (0.005)(0.995). Noe ha S 100 4 if and only if S 100 100(0.005) 3.5, so we should choose ε = 3.5. Then P (S 100 4) 100(0.005)(0.995) (3.5) 2 4. 061 2 10 2. This bound is much larger han he acual probabiliy 0.00167. Problem 6 (Ses defined by probabiliies) Le X 1, X 2,... be an i.i.d. sequence of discree random variables wih enropy H(X). Le C n () = { x n X n : p(x n ) 2 n} denoe he subse of n-sequences wih probabiliies 2 n. 6

(a) Show C n () 2 n. (b) For wha values of does P (X n C n ()) 1? Proof. (a) Since he oal probabiliy of all sequences is less han 1, C n () min x n C n() p(xn ) 1 = C n () 2 n 1. (b) Since 1 n log p(xn ) H, if < H, he probabiliy ha p(x n ) > 2 n goes o 0, and if > H, he probabiliy goes o 1. Problem 7 (An AEP-like limi) Le X 1, X 2,... be i.i.d. o probabiliy mass funcion p(x). Find drawn according Soluion. log(x i ) are also i.i.d. and lim [P (X 1, X 2,..., X n )] 1/n. n lim [P (X 1, X 2,..., X n )] 1/n log[p (X1,X2,...,Xn)]1/n = lim 2 n = 2 lim 1 n log P (Xi) a.e. = 2 E(log p(x)) a.e. = 2 H(X) a.e. by he srong law of large numbers (assuming of course ha H(X) exiss). 7