0-704: Information Processing and Learning Sring 0 Lecturer: Aarti Singh Homework : Solution Acknowledgement: The TA graciously thanks Rafael Stern for roviding most of these solutions.. Problem Hence, Dq = q log q dx Similarly, h i q = E[r i X] = r i xqdx. Thus: Dq = + log q h i q = r i x Finally h 0 q = qdx and hence, h 0 =. Since D q is convex and the equality restrictions are linear, we wish to solve a convex otimization roblem. The Lagrangian of this roblem is: Solving for Lq, λ i = 0, obtain: Calling λ 0 = λ 0, obtain: Lq, λ i = Dq + + logq log + λ 0 + m λ i h i q i=0 m λ i r i x = 0 i= Taking λ 0 such that qdx =, obtain: q = e λ 0 + m i= λiri q = e m i= λiri x e m i= λiri Assume there exist unique values for each λ i such that the equality constraints are satisfied. In this case, q, λ clearly satisfy stationarity and rimal feasibility. Since there are no inequality conditions, dual -
- Lecture : Solution feasibility and comlementary slackness are also satisfied. Hence, the KKT conditions are satisfied and q minimizes Dq.. Problem By results from class, we need only find constants λ 0, λ, λ such that the distribution x = exλ 0 + λ x + λ x satisfy the moment constraints. We insect the Gaussian df with first moment µ and second moment µ φx = µ ex x π = x ex π + x µ And we conclude immediately that λ = and λ = the distribution. and λ 0 is whatever constant required to normalize.3 Problem 3 Recall that, by HW b: HP,..., P n = HP i P i,..., P i= HP i The right side is comletely determined by the marginals and corresonds exactly to the joint distribution of indeendent variables. Hence, the result is roven. i=.4 Problem 4.4. 4. Let rx be the entroy rate of a stocastic rocess X. Recall that: by HW b: HX,..., X n rx = lim n n HX,..., X n = HX + HX i X i,..., X By the Markovian roerty, X i is conditionally indeendent of X i,..., X given X i. Hence: i=
Lecture : Solution -3 HX i X i,..., X = HX i X i i= i= Since the Markov chain is homogeneous and stationary, for all i, HX i X i = HX X. Thus: Finally, HX + n HX X rx = lim = HX X n n HX X = i P X = i j P X = j X = i log P X = j X = i Call P i the i th row of P. Observe that: Hence, by stationarity: j P X = j X = i log P X = j X = i = HP i HX X = i P X = ihp i = i µihp i = µ HP i i Observe that rx = HX X HX. If we take the variables to be i.i.d. HX X = HX. Finally, HX is maximized taking the uniform distribution on the suort of the Markov chain. Hence, the rx is maximized taking P as having all rows equal to S, were S is the suort of the Markov chain..4. 4. The invariant measure is obtained solving for µ = µ0 and µ0 + µ =, which lead to µ0 = + and µ = +. From the last item, the entroy rate of the Markov chain is HP µ i i. Observe that P is degenerate and, therefore, HP = 0. Hence, rx = + log + log. Setting dr d = 0, obtain: dr d = log + log + log + log = + + = log log + log log = 0 =
-4 Lecture : Solution 3 + = 0 Obtain: = 3± 5. Since 0 and rx = 0 for = 0 and =, by Weiestrass s theorem: = 3 5 maximizes the entroy rate of this Markov chain. On one hand, Reducing increases the weight HX X = 0 contributes to the entroy, which hels increase the entroy. On the other hand, reducing decreases the value of HX X = 0. The otimum value is the sweet sot between these tendencies. 5. IX; Y = HX HX Y. In class we roved that HX = 0.5 logπe. Hence, it suffices to find HX Y. Recall that X Y is a normal random variable with variance ρ ρ = ρ, which does not deend on Y. Hence HX Y = 0.5 logπe ρ if ρ <. Thus, IX; Y = HX HX Y = 0.5 log ρ This value is minimized when ρ = 0. In this case, the variables are indeendent and, therefore, there is no mutual information. When ρ = or ρ =, X is comletely determined by Y, and therefore HX Y = 0. Hence, in this case, IX; Y = HX and is the maximum value obtainable..5 Problem 5 IX; Y = HX HX Y. In class we roved that HX = 0.5 logπe. Hence, it suffices to find HX Y. Recall that X Y is a normal random variable with variance ρ ρ = ρ, which does not deend on Y. Hence HX Y = 0.5 logπe ρ if ρ <. Thus, IX; Y = HX HX Y = 0.5 log ρ This value is minimized when ρ = 0. In this case, the variables are indeendent and, therefore, there is no mutual information. When ρ = or ρ =, X is comletely determined by Y, and IX, Y =.6 Problem 6 HY X = x x y y x logy x Hence, HY X = xlogy x + Similarly, h i q = E[r i XY ] = x r ixx y yy x. Thus: h i = r i xxy Finally h 0,x = y y x and hence, h 0,x = I x. Since HY X is convex and the equality restrictions are linear, we wish to solve a convex otimization roblem. The Lagrangian of this roblem is:
Lecture : Solution -5 L, λ = xlogy x + + i λ i r i xxy + x λ 0,x I x Call x λ 0,xI x = fx and obtain: L, λ = xlogy x + + i λ i r i xxy + fx Solving for L, λ = 0: i y x = ex λ ir i xxy + fx x = x Call gx = fx x x : y x = ex i yλ i r i x + gx Since 0 x + x = : y x = ex i yλ ir i x + ex i yλ ir i x Note that we can cancel out the gx from the numerator and the denominator. Observe that clearly satisfies stationarity. Hence, if there exist λ i s such that satisfies the constraints, it also satisfies rimal feasiblity. Finally, since the solution follows the inequalities but did not use them as a constraint, dual feasibility and comlementary slackness are also satisfies. Hence, since the KKT conditions are satisfied, maximizes HY X.