Homework 2: Solution

Similar documents
Exercises with solutions (Set D)

ECE 4400:693 - Information Theory

EE/Stats 376A: Homework 7 Solutions Due on Friday March 17, 5 pm

Math 3215 Intro. Probability & Statistics Summer 14. Homework 5: Due 7/3/14

Lecture 8: Channel Capacity, Continuous Random Variables

10-704: Information Processing and Learning Spring Lecture 8: Feb 5

LECTURE 2. Convexity and related notions. Last time: mutual information: definitions and properties. Lecture outline

Hands-On Learning Theory Fall 2016, Lecture 3

Lecture 17: Differential Entropy

Convex Optimization methods for Computing Channel Capacity

LECTURE 3. Last time:

EE376A: Homeworks #4 Solutions Due on Thursday, February 22, 2018 Please submit on Gradescope. Start every question on a new page.

Improved Capacity Bounds for the Binary Energy Harvesting Channel

Lecture 6. 2 Recurrence/transience, harmonic functions and martingales

Homework 1 Due: Thursday 2/5/2015. Instructions: Turn in your homework in class on Thursday 2/5/2015

Chapter 8: Differential entropy. University of Illinois at Chicago ECE 534, Natasha Devroye

EE/Stats 376A: Information theory Winter Lecture 5 Jan 24. Lecturer: David Tse Scribe: Michael X, Nima H, Geng Z, Anton J, Vivek B.

ECE 534 Information Theory - Midterm 2

Lecture 6: Gaussian Channels. Copyright G. Caire (Sample Lectures) 157

Application of Information Theory, Lecture 7. Relative Entropy. Handout Mode. Iftach Haitner. Tel Aviv University.

Quiz Discussion. IE417: Nonlinear Programming: Lecture 12. Motivation. Why do we care? Jeff Linderoth. 16th March 2006

Information Theory Primer:

Lecture 22: Final Review

Continuous Random Variables

CS 630 Basic Probability and Information Theory. Tim Campbell

1 Joint and marginal distributions

CS229T/STATS231: Statistical Learning Theory. Lecturer: Tengyu Ma Lecture 11 Scribe: Jongho Kim, Jamie Kang October 29th, 2018

Lecture 5 - Information theory

Ergodic Theorems. Samy Tindel. Purdue University. Probability Theory 2 - MA 539. Taken from Probability: Theory and examples by R.

National Sun Yat-Sen University CSE Course: Information Theory. Maximum Entropy and Spectral Estimation

4. Score normalization technical details We now discuss the technical details of the score normalization method.

General Random Variables

Concentration Inequalities

Lecture 18: Optimization Programming

Lecture 3: Lagrangian duality and algorithms for the Lagrangian dual problem

Math 421, Homework #9 Solutions

MGMT 69000: Topics in High-dimensional Data Analysis Falll 2016

HW1 solutions. 1. α Ef(x) β, where Ef(x) is the expected value of f(x), i.e., Ef(x) = n. i=1 p if(a i ). (The function f : R R is given.

E5295/5B5749 Convex optimization with engineering applications. Lecture 5. Convex programming and semidefinite programming

Information Theory. Lecture 5 Entropy rate and Markov sources STEFAN HÖST

Computer Intensive Methods in Mathematical Statistics

Probability Background

Support Vector Machines

ICS-E4030 Kernel Methods in Machine Learning

The binary entropy function

EE/Stat 376B Handout #5 Network Information Theory October, 14, Homework Set #2 Solutions

x log x, which is strictly convex, and use Jensen s Inequality:

Solutions to Homework Set #4 Differential Entropy and Gaussian Channel

A Gentle Introduction to Stein s Method for Normal Approximation I

Announcements - Homework

Examination paper for TMA4180 Optimization I

Lecture 21: Convergence of transformations and generating a random variable

Quantitative Biology II Lecture 4: Variational Methods

Support vector machines

Recita,on: Loss, Regulariza,on, and Dual*

Multiple Random Variables

Shannon meets Wiener II: On MMSE estimation in successive decoding schemes

Machine Learning. Support Vector Machines. Manfred Huber

Advanced Econometrics II (Part 1)

PROOF OF ZADOR-GERSHO THEOREM

EE376A - Information Theory Final, Monday March 14th 2016 Solutions. Please start answering each question on a new page of the answer booklet.

Lectures 9 and 10: Constrained optimization problems and their optimality conditions

CVaR and Examples of Deviation Risk Measures

Math 416 Lecture 3. The average or mean or expected value of x 1, x 2, x 3,..., x n is

Math 211 Business Calculus TEST 3. Question 1. Section 2.2. Second Derivative Test.

Lecture 21: Quantum Communication

Announcements. Topics: Homework: - sections 4.5 and * Read these sections and study solved examples in your textbook!

Lecture 10: Linear programming duality and sensitivity 0-0

Optimal Design of Truss Structures Using a Neutrosophic Number Optimization Model under an Indeterminate Environment

Notes on duality in second order and -order cone otimization E. D. Andersen Λ, C. Roos y, and T. Terlaky z Aril 6, 000 Abstract Recently, the so-calle

Chapter 5 continued. Chapter 5 sections

Proving the central limit theorem

On the Square-free Numbers in Shifted Primes Zerui Tan The High School Attached to The Hunan Normal University November 29, 204 Abstract For a fixed o

Lecture 2. Capacity of the Gaussian channel

UCSD ECE153 Handout #30 Prof. Young-Han Kim Thursday, May 15, Homework Set #6 Due: Thursday, May 22, 2011

Support Vector Machines, Kernel SVM

Karush-Kuhn-Tucker Conditions. Lecturer: Ryan Tibshirani Convex Optimization /36-725

UCSD ECE153 Handout #34 Prof. Young-Han Kim Tuesday, May 27, Solutions to Homework Set #6 (Prepared by TA Fatemeh Arbabjolfaei)

3.8 Functions of a Random Variable

Recitation 2: Probability

An Introduction to Information Theory: Notes

STA205 Probability: Week 8 R. Wolpert

Review: mostly probability and some statistics

Constrained optimization

18.2 Continuous Alphabet (discrete-time, memoryless) Channel

Lecture 3: Expected Value. These integrals are taken over all of Ω. If we wish to integrate over a measurable subset A Ω, we will write

arxiv: v2 [math.na] 6 Apr 2016

STA 6857 Autocorrelation and Cross-Correlation & Stationary Time Series ( 1.4, 1.5)

Optimality Conditions for Constrained Optimization

MULTIVARIATE PROBABILITY DISTRIBUTIONS

B8.1 Martingales Through Measure Theory. Concept of independence

Conditional Distributions

Lecture 2: August 31

Probability Review. Yutian Li. January 18, Stanford University. Yutian Li (Stanford University) Probability Review January 18, / 27

X = X X n, + X 2

Lecture 1 October 9, 2013

Solving Dual Problems

CSCI5654 (Linear Programming, Fall 2013) Lecture-8. Lecture 8 Slide# 1

Section 27. The Central Limit Theorem. Po-Ning Chen, Professor. Institute of Communications Engineering. National Chiao Tung University

Lecture 10: Broadcast Channel and Superposition Coding

Transcription:

0-704: Information Processing and Learning Sring 0 Lecturer: Aarti Singh Homework : Solution Acknowledgement: The TA graciously thanks Rafael Stern for roviding most of these solutions.. Problem Hence, Dq = q log q dx Similarly, h i q = E[r i X] = r i xqdx. Thus: Dq = + log q h i q = r i x Finally h 0 q = qdx and hence, h 0 =. Since D q is convex and the equality restrictions are linear, we wish to solve a convex otimization roblem. The Lagrangian of this roblem is: Solving for Lq, λ i = 0, obtain: Calling λ 0 = λ 0, obtain: Lq, λ i = Dq + + logq log + λ 0 + m λ i h i q i=0 m λ i r i x = 0 i= Taking λ 0 such that qdx =, obtain: q = e λ 0 + m i= λiri q = e m i= λiri x e m i= λiri Assume there exist unique values for each λ i such that the equality constraints are satisfied. In this case, q, λ clearly satisfy stationarity and rimal feasibility. Since there are no inequality conditions, dual -

- Lecture : Solution feasibility and comlementary slackness are also satisfied. Hence, the KKT conditions are satisfied and q minimizes Dq.. Problem By results from class, we need only find constants λ 0, λ, λ such that the distribution x = exλ 0 + λ x + λ x satisfy the moment constraints. We insect the Gaussian df with first moment µ and second moment µ φx = µ ex x π = x ex π + x µ And we conclude immediately that λ = and λ = the distribution. and λ 0 is whatever constant required to normalize.3 Problem 3 Recall that, by HW b: HP,..., P n = HP i P i,..., P i= HP i The right side is comletely determined by the marginals and corresonds exactly to the joint distribution of indeendent variables. Hence, the result is roven. i=.4 Problem 4.4. 4. Let rx be the entroy rate of a stocastic rocess X. Recall that: by HW b: HX,..., X n rx = lim n n HX,..., X n = HX + HX i X i,..., X By the Markovian roerty, X i is conditionally indeendent of X i,..., X given X i. Hence: i=

Lecture : Solution -3 HX i X i,..., X = HX i X i i= i= Since the Markov chain is homogeneous and stationary, for all i, HX i X i = HX X. Thus: Finally, HX + n HX X rx = lim = HX X n n HX X = i P X = i j P X = j X = i log P X = j X = i Call P i the i th row of P. Observe that: Hence, by stationarity: j P X = j X = i log P X = j X = i = HP i HX X = i P X = ihp i = i µihp i = µ HP i i Observe that rx = HX X HX. If we take the variables to be i.i.d. HX X = HX. Finally, HX is maximized taking the uniform distribution on the suort of the Markov chain. Hence, the rx is maximized taking P as having all rows equal to S, were S is the suort of the Markov chain..4. 4. The invariant measure is obtained solving for µ = µ0 and µ0 + µ =, which lead to µ0 = + and µ = +. From the last item, the entroy rate of the Markov chain is HP µ i i. Observe that P is degenerate and, therefore, HP = 0. Hence, rx = + log + log. Setting dr d = 0, obtain: dr d = log + log + log + log = + + = log log + log log = 0 =

-4 Lecture : Solution 3 + = 0 Obtain: = 3± 5. Since 0 and rx = 0 for = 0 and =, by Weiestrass s theorem: = 3 5 maximizes the entroy rate of this Markov chain. On one hand, Reducing increases the weight HX X = 0 contributes to the entroy, which hels increase the entroy. On the other hand, reducing decreases the value of HX X = 0. The otimum value is the sweet sot between these tendencies. 5. IX; Y = HX HX Y. In class we roved that HX = 0.5 logπe. Hence, it suffices to find HX Y. Recall that X Y is a normal random variable with variance ρ ρ = ρ, which does not deend on Y. Hence HX Y = 0.5 logπe ρ if ρ <. Thus, IX; Y = HX HX Y = 0.5 log ρ This value is minimized when ρ = 0. In this case, the variables are indeendent and, therefore, there is no mutual information. When ρ = or ρ =, X is comletely determined by Y, and therefore HX Y = 0. Hence, in this case, IX; Y = HX and is the maximum value obtainable..5 Problem 5 IX; Y = HX HX Y. In class we roved that HX = 0.5 logπe. Hence, it suffices to find HX Y. Recall that X Y is a normal random variable with variance ρ ρ = ρ, which does not deend on Y. Hence HX Y = 0.5 logπe ρ if ρ <. Thus, IX; Y = HX HX Y = 0.5 log ρ This value is minimized when ρ = 0. In this case, the variables are indeendent and, therefore, there is no mutual information. When ρ = or ρ =, X is comletely determined by Y, and IX, Y =.6 Problem 6 HY X = x x y y x logy x Hence, HY X = xlogy x + Similarly, h i q = E[r i XY ] = x r ixx y yy x. Thus: h i = r i xxy Finally h 0,x = y y x and hence, h 0,x = I x. Since HY X is convex and the equality restrictions are linear, we wish to solve a convex otimization roblem. The Lagrangian of this roblem is:

Lecture : Solution -5 L, λ = xlogy x + + i λ i r i xxy + x λ 0,x I x Call x λ 0,xI x = fx and obtain: L, λ = xlogy x + + i λ i r i xxy + fx Solving for L, λ = 0: i y x = ex λ ir i xxy + fx x = x Call gx = fx x x : y x = ex i yλ i r i x + gx Since 0 x + x = : y x = ex i yλ ir i x + ex i yλ ir i x Note that we can cancel out the gx from the numerator and the denominator. Observe that clearly satisfies stationarity. Hence, if there exist λ i s such that satisfies the constraints, it also satisfies rimal feasiblity. Finally, since the solution follows the inequalities but did not use them as a constraint, dual feasibility and comlementary slackness are also satisfies. Hence, since the KKT conditions are satisfied, maximizes HY X.