Minimax Learning for Remote Prediction
|
|
- Candice Page
- 5 years ago
- Views:
Transcription
1 1 Minimax Learning for Remote Prediction Cheuk Ting Li *, Xiugang Wu, Afer Ozgur, Abbas El Gamal * Department of Electrical Engineering and Computer Sciences Universit of California, Berkele ctli@berkele.edu Department of Electrical Engineering Stanford Universit {x3wu, aozgur}@stanford.edu; abbas@ee.stanford.edu arxiv: v1 [cs.it] 31 Ma 018 Abstract The classical problem of ervised learning is to infer an accurate predictor of a target variable Y from a measured variable X b using a finite number of labeled training samples. Motivated b the increasingl distributed nature of data and decision making, in this paper we consider a variation of this classical problem in which the prediction is performed remotel based on a rate-constrained description M of X. Upon receiving M, the remote node computes an estimate Ŷ of Y. We follow the recent minimax approach to stud this learning problem and show that it corresponds to a one-shot minimax nois source coding problem. We then establish information theoretic bounds on the risk-rate Lagrangian cost and a general method to design a near-optimal descriptor-estimator pair, which can be viewed as a rate-constrained analog to the maximum conditional entrop principle used in the classical minimax learning problem. Our results show that a naive estimate-compress scheme for rate-constrained prediction is not in general optimal. I. INTRODUCTION The classical problem of ervised learning is to infer an accurate predictor of a target variable Y from a measured variable X on the basis of n labeled training samples {X i, Y i } n i=1 independentl drawn from an unknown joint distribution P. The standard approach for solving this problem in statistical learning theor is empirical risk minimization ERM. For a given set of allowable predictors and a loss function that quantifies the risk of each predictor, ERM chooses the predictor with minimal risk under the empirical distribution of samples. To avoid overfitting, the set of allowable predictors is restricted to a class with limited complexit. Recentl, an alternative viewpoint has emerged which seeks distributionall robust predictors. Given the labeled training samples, this approach learns a predictor b minimizing its worst-case risk over an ambiguit distribution set centered at the empirical distribution of samples. In other words, instead of restricting the set of allowable predictors, it aims to avoid overfitting b requiring that the learned predictor performs well under an distribution in a chosen neighborhood of the empirical distribution. This minimax approach has been investigated under different assumptions on how the ambiguit set is constructed, e.g., b restricting the moments [1], forming the f-divergence balls [] and Wasserstein balls [3] see also references therein. In these previous works, the learning algorithm finds a predictor that acts directl on a fresh unlabeled sample X to predict the corresponding target variable Y. Often, however the fresh sample X ma be onl remotel available, and when designing the predictor it is desirable to also take into account the cost of communicating X. This is motivated b the fact that bandwidth and energ limitations on communication in networks and within multiprocessor sstems often impose significant bottlenecks on the performance of algorithms. There are also an increasing number of applications in which data is generated in a distributed manner and it or features of it are communicated over bandwidth-limited links to a central processor to perform inference. For instance, applications such as Google Goggles and Siri process the locall collected data on clouds. It is thus important to stud prediction in distributed and rate-constrained settings. In this paper, we stud an extension of the classical learning problem in which given a finite set of training samples, the learning algorithm needs to infer a descriptor-estimator pair with a desired communication rate in between them. This is especiall relevant when both X and Y come from a large alphabet or are continuous random variables as in regression problems, so neither the sample X nor its predicted value of Y can be simpl communicated in a lossless fashion. We adopt the minimax framework for learning the descriptor-estimator pair. Given a set of labeled training samples, our goal is to find a descriptor-estimator pair b minimizing their resultant worst-case risk over an ambiguit distribution set, where the risk now incorporates both the statistical risk and the communication cost. One of the important conclusions that emerge from the minimax approach to ervised learning in [1] is that the problem of finding the predictor with minimal worstcase risk over an ambiguit set can be broken into two smaller steps: 1 find the worst-case distribution in the ambiguit set that maximizes the generalized conditional entrop of Y given X, and find the optimal predictor under this worstcase distribution. In this paper, we show that an analogous principle approximatel holds for rate-constrained prediction. The This paper will be presented in part at the IEEE International Smposium on Information Theor, Vail, Colorado, USA, June 018.
2 descriptor-estimator pair with minimal worst-case risk can be found in two steps: 1 find the worst-case distribution in the ambiguit set that maximizes the risk-information Lagrangian cost, and find the optimal descriptor-estimator pair under this worst-case distribution. We then appl our results to characterize the optimal descriptor-estimator pairs for two applications: rate-constrained linear regression and rate-constrained classification. While a simple scheme whereb we first find the optimal predictor ignoring the rate constraint, then compress and communicate the predictor output, is optimal for the linear regression application, we show via the classification application that such an estimate-compress approach is not optimal in general. We show that when prediction is rate-constrained, the optimal descriptor aims to send sufficientl but not necessaril maximall informative features of the observed variable, which are at the same time eas to communicate. When applied to the case in which the ambiguit distribution set contains onl a single distribution for example, the true or empirical distribution of X, Y and the loss function for the prediction is logarithmic loss, our results provide a new one-shot operational interpretation of the information bottleneck problem. A ke technical ingredient in our results is the strong functional representation lemma SFRL developed in [4], which we use to design the optimal descriptor-estimator pair for the worst-case distribution. Notation We assume that log is base and the entrop H is in bits. The length of a variable-length description M {0, 1} is denoted as M. For random variables U, V, denote the joint distribution b P U,V and the conditional distribution of U given V b P U V. For brevit we denote the distribution of X, Y as P. We write I P X; Ŷ for IX; Ŷ when X, Y P, and is clear from the context. II. PROBLEM FORMULATION We begin b reviewing the minimax approach to the classical learning problem [1]. A. Minimax Approach to Supervised Learning Let X X and Y Y be jointl distributed random variables. The problem of statistical learning is to design an accurate predictor of a target variable Y from a measured variable X on the basis of a number of independent training samples {X i, Y i } n i=1 drawn from an unknown joint distribution. The standard approach for solving this problem is to use empirical risk minimization ERM in which one defines an admissible class of predictors F that consists of functions f : X Ŷ where the reconstruction alphabet Ŷ can be in general different from Y and a loss function l : Ŷ Y R. The risk associated with a predictor f when the underling joint distribution of X and Y is P is Lf, P E P [lfx, Y ]. ERM simpl chooses the predictor f n F with minimal risk under the empirical distribution P n of the training samples. Recentl, an alternative approach has emerged which seeks distributionall robust predictors. This approach learns a predictor b minimizing its worst-case risk over an ambiguit distribution set ΓP n, i.e., f n = argmin f max P n Lf, P, 1 where f can be an function and ΓP n can be constructed in various was, e.g., b restricting the moments, forming the f-divergence balls or Wasserstein balls. While in ERM it is important to restrict the set F of admissible predictors to a low-complexit class to prevent overfitting, in the minimax approach overfitting is prevented b explicitl requiring that the chosen predictor is distributionall robust. The learned function f n can be then used for predicting Y when presented with fresh samples of X. The learning and inference phases are illustrated in Figure 1. Learning: {X i,y i } n i=1 P n P n Learner f n Inference: X Inferrer f n Ŷ = f n X Fig. 1. Minimax approach to ervised learning.
3 3 B. Minimax Learning for Remote Prediction In this paper, we extend the minimax learning approach to the setting in which the prediction needs to be performed based on a rate-constrained description of X. In particular, given a set of finite training samples {X i, Y i } n i=1 independentl drawn from an unknown joint distribution P, our goal is to learn a pair of functions e, f, where e is a descriptor used to compress X into M = ex {0, 1} a prefix-free code, and f is an estimator that takes the compression M and generates an estimate Ŷ of Y. See Figure. Let Re, P E P [ ex ] be the rate of the descriptor e and Le, f, P E P [lfex, Y ] be the risk associated with the descriptor-estimator pair e, f, when the underling distribution of X, Y is P, and define the risk-rate Lagrangian cost parametrized b λ > 0 as L λ e, f, P = Le, f, P + λre, P. Note that this cost function takes into account both the resultant statistical prediction risk of e, f, as well as the communication rate the require. The task of a minimax learner is to find an e n, f n pair that minimizes the worst-case L λ e, f, P over the ambiguit distribution set ΓP n, i.e., e n, f n = argmin e,f max L λe, f, P, 3 P n for an appropriatel chosen ΓP n centered at the empirical distribution of samples P n. Note that we allow here all possible e, f pairs. We also assume that the descriptor and the estimator can use unlimited common randomness W which is independent of the data, i.e., e and f can be expressed as functions of X, W and M, W, respectivel, and the prefixfree codebook for M can depend on W. The availabilit of such common randomness can be justified b the fact that in practice, although the inference scheme is one-shot, it is used man times b the same user and b different users, hence the descriptor and the estimator can share a common randomness seed before communication commences without impacting the communication rate. Learning: {X i,y i } n i=1 P n P n Learner e n,f n Inference: X M = e n X Ŷ = f n M e n f n Fig.. Minimax learning for remote prediction. III. MAIN RESULTS We first consider the case where Γ consists of a single distribution P, which ma be the empirical distribution P n as in ERM. Define the minimax risk-rate cost as L λγ = inf e,f L λ e, f, P. 4 While it is difficult to minimize the risk-rate cost directl, the minimax risk-rate cost can be bounded in terms of the mutual information between X and Ŷ. Theorem 1. Let Γ = {P }. Then L λ inf E L λ inf E [lŷ, Y ] + λix; Ŷ, ] [lŷ, Y + λ IX; Ŷ + logix; Ŷ As in other one-shot compression results e.g., zero-error compression, there is a gap between the upper and lower bound. While the logarithmic gap in Theorem 1 is not as small as the 1-bit gap in the zero-error compression, it is dominated b the linear term IX; Ŷ when it is large. To prove Theorem 1, we use the strong functional representation lemma given in [4] also see [5], [6]: for an random variables X, Ŷ, there exists random variable W independent of X, such that Ŷ is a function of X, W, and HŶ W IX; Ŷ + logix; Ŷ
4 4 Here, W can be intuitivel viewed as the part of Ŷ which is not contained in X. Note that for an W such that Ŷ is a function of X, W and W is independent of X, HŶ W IX; Ŷ. The statement 5 ensures the existence of an W, independent of X, which comes close to this lower bound, and in this sense it is most informative about Ŷ. This is critical for the proof of Theorem 1 as we will see next. Identifing the part of Ŷ which is not contained in X allows us to generate and share this part between the descriptor and the estimator ahead of time, eliminating the need to communicate it during the course of inference. To find W, we use the Poisson functional representation construction detailed in [4]. Proof of Theorem 1: Recall that Ŷ = fex, W, W. The lower bound follows from the fact that I P X; Ŷ H P M E[ M ]. To establish the upper bound, fix an. Let W be obtained from 5. Note that W is independent of X and can be generated from a random seed shared between the descriptor and the estimator ahead of time. For a given w, take m = ex, w to be the Huffman codeword of ŷx, w according to the distribution PŶ W w recall that Ŷ is a function of X, W, and take fm, w to be the decoding function of the Huffman code. The expected codeword length Taking an infimum over all completes the proof. E[ M ] HŶ W + 1 IX; Ŷ + logix; Ŷ Remark 1. If we consider the logarithmic loss lŷ, = log ŷ, where ŷ is a distribution over Y, then the lower bound in Theorem 1 reduces to inf HY U + λix; U = HY + inf λix; U IY ; U, P U X P U X which is the information bottleneck function [7]. Therefore the setting of remote prediction provides an approximate one-shot operational interpretation of the information bottleneck up to a logarithmic gap. In [8], [9] it was shown that the asmptotic nois source coding problem also provides an operational interpretation of the information bottleneck. Our operational interpretation, however, is more satisfing since the feature extraction problem originall considered in [7] is b nature one-shot. We now extend Theorem 1 to the minimax setting. Theorem. Suppose Γ is convex. Then L λ inf L λ inf + λ E P [lŷ, Y ] + λi P X; Ŷ E P [lŷ, Y ] I P X; Ŷ + logi P X; Ŷ This result is related to minimax nois source coding [10]. The main difference is that we consider the one-shot expected length instead of the asmptotic rate. To prove this theorem, we first invoke a minimax result for relative entrop in [11] which generalizes the redundanccapacit theorem [1]. Then we appl the following refined version of the strong functional representation lemma that is proved in the proof of Theorem 1 in [4] also see [5]. Lemma 1. For an and PŶ, there exists random variable W, and functions kx, w {1,,...} and ŷk, w such that ŷkx, W, W x, and E [log kx, W ] D x PŶ We are now read to prove Theorem. Proof: The lower bound follows from E P [ M ] H P M I P X; Ŷ. To prove the upper bound, we fix an P Ŷ X, and show that the following risk-rate cost is achievable: ] L = E P [lŷ, Y + λ I P X; Ŷ + logi P X; Ŷ Let [ ] gp, PŶ = E P lŷ, Y + λ D =x PŶ dp x + log D PŶ X=x PŶ dp x
5 Note that g is concave in P for fixed PŶ since E P [ lŷ, Y ] and D =x PŶ dp x are linear in P. Also g is quasiconvex in PŶ for fixed P since D =x PŶ dp x is convex in PŶ, and is lower semicontinuous in PŶ since D =x PŶ is lower semicontinuous with respect to the topolog of weak convergence [13], and hence D =x PŶ dp x is lower semicontinuous b Fatou s lemma. Write P for the distribution of Ŷ when X, Y P and Ŷ {X = x} P Ŷ X x. Let Γ Ŷ = {P Ŷ X P : P Γ} and ΓŶ be the closure of ΓŶ in the topolog of weak convergence. It can be shown using the same arguments as in [11] on g instead of relative entrop, and using Sion s minimax theorem [14] instead of Lemma in [11] that if ΓŶ then there exists P Γ Ŷ Ŷ such that gp, P = inf gp, Ŷ PŶ = L. PŶ If ΓŶ is not uniforml tight, then b Lemma 4 in [11], inf PŶ inf PŶ gp, PŶ =. Appling Lemma 1 to, P Ŷ Ŷ = ŷk, W following the conditional distribution, and 5 is uniforml tight, D =x PŶ dp x =, and hence L = we obtain W independent of X, random variable K = kx, W {1,,...}, and X = x E [log K X = x] D P Ŷ for an x. Then we use Elias delta code [15] for K to produce M. Note that the average length of the Elias delta code is upper bounded b log K + log log K Hence, we have Hence E P [ M ] E P [log K] + log E P [log K] D =x P Ŷ dp x + log D =x P Ŷ dp x ] L λ E P [lŷ, Y + λ M gp, P Ŷ L. Theorem suggest that we can simplif the analsis of the risk-rate cost L λ = E P [ lŷ, Y ] + λ E P [ M ] b replacing the rate E P [ M ] with the mutual information I P X; Ŷ. Define the risk-information cost as [ L λ, P = E P l Ŷ, Y ] + λi P X; Ŷ. 7 Theorem implies that the minimax risk-rate cost L λ can be approximated b the minimax risk-information cost L λγ = inf L λ PŶ, P, 8 X within a logarithmic gap. Theorem can also be stated in the following slightl weaker form L λ L λ L λ + λ logλ 1 L λ λ. The risk-information cost has more desirable properties than the risk-rate cost. For example, it is convex in for fixed P, and concave in P for fixed. This allows us to exchange the infimum and remum in Theorem b Sion s minimax theorem [14], which gives the following proposition. Proposition 1. Suppose X, Y and Ŷ are finite, Γ is convex and closed, and λ 0, then L λγ = inf L λ PŶ, P = inf Lλ PŶ X, P. X Moreover, there exists P attaining the infimum in the left hand side, which also attains the infimum on the right hand side Ŷ X when P is fixed to P, the distribution that attains the remum on the right hand side. Proposition 1 means that in order to design a robust descriptor-estimator pair that work for an P Γ, we onl need to design them according to the worst-case distribution P as follows. Principle of maximum risk-information cost: Given a convex and closed Γ, we design the descriptor-estimator pair based on the worst-case distribution P = arg max inf Lλ PŶ, P. X We then find that minimizes L λ, P and design the descriptor-estimator pair accordingl, e.g. using Lemma 1 on and the induced distribution P Ŷ from P Ŷ X and P.
6 6 A. Rate-constrained Minimax Linear Regression IV. APPLICATIONS Suppose X R d, Y R, lŷ, = ŷ is the mean-squared loss, and we observe the data {X i, Y i } n i=1. Take Γ to be the set of distributions with the same first and second moments as given b the empirical distribution, i.e., Γ = { P XY : E[X] = µ X, E[Y ] = µ Y, Var[X] = Σ X, Var[Y ] = σ Y, Cov[X, Y ] = C XY }, 9 where µ X, µ Y, Σ X, σy, C XY are the corresponding statistics of the empirical distribution. The following proposition shows that P is Gaussian. Proposition Linear regression with rate constraint. Consider mean-squared loss and define Γ as in 9. Then the minimax risk-information cost 8 is where the optimal P XY where L λ = σy CXY T Σ 1 X C XY + λ log ect XY Σ 1 λ log e X C XY, 10 is Gaussian with its mean and covariance matrix specified in 9, and the optimal estimate { acxy T Ŷ = Σ 1 X X + b + Z λ log e if < CXY T Σ 1 X C XY 0 if λ log e CXY T Σ 1 X C XY, a = 1 and Z N0, σ Z is independent of X with σ Z λ log e C T XY Σ 1 X C XY = λa log e., b = µ Y ac T XY Σ 1 X µ X, Note that this setting does not satisf the conditions in Proposition 1. We directl analze 8 to obtain the optimal PXY. Given the optimal PXY, Theorem and Lemma 1 can be used to construct the scheme. Operationall, e nx, w is a random quantizer of acxy T Σ 1 X X + b such that the quantization noise follows N0, σ Z. With this natural choice of the ambiguit set, our formulation recovers a compressed version of the familiar MMSE estimator. Figure 3 plots the tradeoff between the rate and the risk when d = 1, µ X = µ Y = 0, σx = σ Y = 1, C XY = 0.95 for the scheme constructed using the Poisson functional representation in [4], with the lower bound given b the minimax risk-information cost L λ, and the upper bound given in Theorem. Fig. 3. Tradeoff between the rate and the risk in rate-constrained minimax linear regression. Proof of Proposition : Without loss of generalit, assume µ X = 0 and µ Y = 0. We first prove in 10. For this, fix as given in the proposition and consider an P Γ. When λ log e < CXY T Σ 1 X C XY, we have [ ] [ E P lŷ, Y = E P Ŷ Y ] σy + λ log e CXY T Σ 1 X C XY, and I P X; Ŷ = hŷ hŷ X
7 Therefore, 1 C T log XY Σ 1 X C XY. λ log e ] inf E P [lŷ, Y + λi P X; Ŷ R.H.S. of 10. It can also be checked that the above relation holds when λ log e CXY T Σ 1 X C XY, and thus we have proved in 10. To prove in 10, fix a Gaussian P XY with its mean and covariance matrix specified in 9 and consider an arbitrar. We have [ ] [ E P lŷ, Y = E P Y Ŷ ] [ = σy CXY T Σ 1 X C XY + E P Ŷ CT XY Σ 1 X X], and I P X; Ŷ = I P CXY T Σ 1 X X ; Ŷ h CXY T Σ 1 X X h CXY T Σ 1 X X Ŷ 1 log CT XY Σ 1 X C XY 1 [ log E P Ŷ CT XY Σ 1 X X]. [ Letting γ = E P Ŷ CT XY Σ 1 X ], X we have [ ] E P lŷ, Y + λi P X; Ŷ σ Y C T XY Σ 1 X C XY + λ log CT XY Σ 1 X C XY + γ λ log γ R.H.S. of 10, where the second inequalit follows b evaluating the minimum value of γ λ log γ. Combing this with the above completes the proof of Proposition. The optimal scheme in the above example corresponds to compressing and communicating the minimax optimal rateunconstrained predictor Ȳ = CT XY Σ 1 X X µ X + µ Y, since the optimal Ŷ can be obtained from Ȳ b shifting, scaling and adding noise. This estimate-compress approach can be thought as a separation scheme, since we first optimall estimate Ȳ, then optimall communicate it while satisfing the rate constraint. In the next application, we show that such separation is not optimal in general. B. Rate-constrained Minimax Classification We assume Y = Ŷ = {1,..., k} and X are finite, lŷ, = 1{ŷ }, and Γ is closed and convex. The following proposition gives the minimax risk-information cost and the optimal estimator. Proposition 3. Consider the setting described above. The minimax risk-information cost is given b L λ = 1 + λ inf E P log λ 1 P Y X X PŶ, PŶ the worst-case distribution P is the one attaining the remum, and the optimal estimator is given b P Ŷ X ŷ x λ 1 P Y X x P, where Ŷ P attains the infimum, and P Ŷ Y X is obtained from P. In particular, if Γ is smmetric for different values of Y i.e., for an 1, Y, there exists permutation π of Y, τ of X such that π 1 = and P X,Y Γ P τx,πy Γ, L λ = 1 + λ log k λ E P log λ 1 P Y X X. We can see that when λ 0, P Ŷ X tends to the maximum a posteriori estimator under P, the worst-case distribution when λ = 0. 7
8 8 Proof: Assume Γ is closed and convex. B Proposition 1, the minimax rate-information cost is L λ = inf Lλ, P, where inf Lλ PŶ, P X ] E P [lŷ, Y + λi P X; Ŷ = inf = inf = inf PŶ, P {Ŷ Y } + λ inf PŶ P {Ŷ Y } + λ = 1 + λ inf PŶ, E P = 1 + λ inf PŶ a = 1 + λ inf PŶ E P D =x PŶ dp x D =x PŶ dp x X log P Ŷ X X λ 1 P Y X X PŶ X inf X log λ 1 P Y X X PŶ / log P λ 1 Y X X PŶ E P log λ 1 P Y X X PŶ, λ 1 P Y X X PŶ where a is due to that relative entrop is nonnegative, and equalit is attained when x λ 1 P Y X X PŶ. Next we consider the case in which Γ is smmetric. Consider the minimax rate-information cost ] L λ = inf L λ PŶ, P = inf E P [lŷ, Y + λi P X; Ŷ. X For an i, j Y = {1,..., k}, let π ij be the permutation over Y such that π ij i = j and let τ ij be the corresponding permutation over X in the smmetr assumption. Since the function L λ, P is convex and smmetric about π ij and τ ij i.e., Lλ, P = Lλ P πij Ŷ τ ijx, P, to find its infimum, we onl need to consider s satisfing = P πij Ŷ τ ijx for all i, j if not, we can instead consider the average of P π a ij Ŷ τij a X for a from 1 up to the product of the periods of π ij and τ ij, which gives a value of the function not larger than that of. For brevit we sa is smmetric if it satisfies this condition. Fix an smmetric. Since the function P L λ, P is concave and smmetric about π ij and τ ij i.e., L λ, P X,Y = L λ, P τijx,π ijy, to find its remum, we onl need to consider smmetric P s. Hence, L λ = inf smm. = inf smm. = inf smm. smm. smm. smm. = 1 + λ log k + λ inf L λ, P ] E P [lŷ, Y + λi P X; Ŷ smm. = 1 + λ log k + λ inf 1 + λ log k + λ = smm. smm. inf smm. P {Ŷ Y } + λlog k H P Ŷ X E P smm. 1 + λ log k λ E P log X log X λ 1 P Y X X X E P X log smm. λ 1 P Y X X / log P λ 1 Y X X E P log λ 1 P Y X X smm. λ 1 P Y X X, λ 1 P Y X X
9 9 where the inequalit is because relative entrop is nonnegative and equalit is attained when x λ 1 P Y X x. Note that 1 + λ log k λ E P log λ 1 P Y X X = inf P {Ŷ Y } + λlog k H P Ŷ X PŶ X is an infimum of affine functions of P, hence it is concave in P. Also it is smmetric about π and τ, hence L λ smm. = 1 + λ log k λ E P log 1 + λ log k λ E P log λ 1 P Y X X λ 1 P Y X X The other direction follows from setting PŶ = 1/k. To show that the estimate-compress approach is not alwas optimal, let lŷ, = 1{ŷ }, Y = Y 1 Y, where Y 1 Y = and Y i = k i is finite. Let Γ = {P }, where P is such that X 1, X UnifY 1 Y, and Y = X i with probabilit q i for i = 1,. B Proposition 3, the optimal risk-information cost is 1 1 λ log max{ λ 1 q , λ 1 q 1 + 1}, 11 k 1 k and the optimal scheme is to generate Ŷ b compressing X 1 b passing it through a smmetric channel if 1 k 1 λ 1 q > 1 k λ 1 q 1+1, and compressing X, otherwise. Assume q 1 > q, then the optimal MAP estimate is Ȳ = X 1. An estimatecompress approach would tr to communicate a compressed version of Ȳ = X 1. However, the optimal rate constrained descriptor communicates a loss version of X rather than X 1 if k 1 k. The risk-information cost achieved b the estimatecompress approach is { 1 } 1 λ log max λ 1 q , λ 1 q k 1, 1 k 1 which is larger than 11 when k 1 k. Moreover, the gap between the rates needed b the two approaches for a fixed risk can be unbounded. Take q 1 = 1 q = /3, k =, k The minimum rate needed to achieve a risk /3 is 1 b Ŷ = X. For the estimate-compress approach, since Ŷ UnifY gives a risk 5/6, we have to compressing X 1 to achieve a risk /3, which requires an unbounded rate log k 1 1 logk Figure 4 compares the optimal scheme, the lower bound obtained from the optimal risk-information tradeoff 11, the upper bound of the optimal rate b Theorem 1, and the risk-information tradeoff for the estimate-compress approach 1 for q 1 = 1 q = /3, k 1 = 3, k =. Note that the optimal scheme is to perform time sharing using common randomness between encoding X 1 using 3 bits with risk 1/3, encoding X using 1 bit with risk /3, and fixing the output at one value of X using 0 bit with risk 5/6. The mutual information needed b the estimate-compress approach which is a lower bound on the actual rate needed b this approach is strictl greater than the optimal rate except when the risk is at its minimum 1/3 or maximum 5/6.. V. ACKNOWLEDGEMENTS This work was partiall ported b a gift from Huawei Technologies and b the Center for Science of Information CSoI, an NSF Science and Technolog Center, under grant agreement CCF REFERENCES [1] F. Farnia and D. Tse, A minimax approach to ervised learning, in Advances in Neural Information Processing Sstems, 016, pp [] H. Namkoong and J. C. Duchi, Variance-based regularization with convex objectives, in Advances in Neural Information Processing Sstems, 017, pp [3] J. Lee and M. Raginsk, Minimax statistical learning and domain adaptation with Wasserstein distances, arxiv preprint arxiv: , 017. [4] C. T. Li and A. El Gamal, Strong functional representation lemma and applications to coding theorems, in Proc. IEEE Int. Smp. Inf. Theor, June 017, pp [5] P. Harsha, R. Jain, D. McAllester, and J. Radhakrishnan, The communication complexit of correlation, IEEE Trans. Inf. Theor, vol. 56, no. 1, pp , Jan 010. [6] M. Braverman and A. Garg, Public vs private coin in bounded-round information, in International Colloquium on Automata, Languages, and Programming. Springer, 014, pp [7] N. Tishb, F. C. Pereira, and W. Bialek, The information bottleneck method, arxiv preprint phsics/ , 000. [8] P. Harremoës and N. Tishb, The information bottleneck revisited or how to choose a good distortion measure, in Information Theor, 007. ISIT 007. IEEE International Smposium on. IEEE, 007, pp
10 Optimal scheme Lower bound Upper bound Estimate-compress 5 Rate Risk Fig. 4. Tradeoff between the rate and risk in rate-constrained minimax linear classification for the optimal scheme, lower bound 11, upper bounnd b Theorem 1, and estimate-compress approach 1. [9] T. A. Courtade and T. Weissman, Multiterminal source coding under logarithmic loss, IEEE Transactions on Information Theor, vol. 60, no. 1, pp , 014. [10] A. Dembo and T. Weissman, The minimax distortion redundanc in nois source coding, IEEE Transactions on Information Theor, vol. 49, no. 11, pp , 003. [11] D. Haussler, A general minimax result for relative entrop, IEEE Transactions on Information Theor, vol. 43, no. 4, pp , Jul [1] R. G. Gallager, Source coding with side information and universal coding, Technical Report LIDS-P-937, MIT Laborator for Information and Decision Sstems, [13] E. Posner, Random coding strategies for minimum entrop, IEEE Transactions on Information Theor, vol. 1, no. 4, pp , Jul [14] M. Sion, On general minimax theorems, Pacific Journal of mathematics, vol. 8, no. 1, pp , [15] P. Elias, Universal codeword sets and representations of the integers, IEEE Trans. Inf. Theor, vol. 1, no., pp , Mar 1975.
Stochastic Interpretation for the Arimoto Algorithm
Stochastic Interpretation for the Arimoto Algorithm Serge Tridenski EE - Sstems Department Tel Aviv Universit Tel Aviv, Israel Email: sergetr@post.tau.ac.il Ram Zamir EE - Sstems Department Tel Aviv Universit
More informationComputation of Csiszár s Mutual Information of Order α
Computation of Csiszár s Mutual Information of Order Damianos Karakos, Sanjeev Khudanpur and Care E. Priebe Department of Electrical and Computer Engineering and Center for Language and Speech Processing
More informationExtended Gray Wyner System with Complementary Causal Side Information
Extended Gray Wyner System with Complementary Causal Side Information Cheuk Ting Li and Abbas El Gamal Department of Electrical Engineering, Stanford University Email: ctli@stanford.edu, abbas@ee.stanford.edu
More informationGaussian Estimation under Attack Uncertainty
Gaussian Estimation under Attack Uncertainty Tara Javidi Yonatan Kaspi Himanshu Tyagi Abstract We consider the estimation of a standard Gaussian random variable under an observation attack where an adversary
More informationOn Common Information and the Encoding of Sources that are Not Successively Refinable
On Common Information and the Encoding of Sources that are Not Successively Refinable Kumar Viswanatha, Emrah Akyol, Tejaswi Nanjundaswamy and Kenneth Rose ECE Department, University of California - Santa
More informationQuiz 2 Date: Monday, November 21, 2016
10-704 Information Processing and Learning Fall 2016 Quiz 2 Date: Monday, November 21, 2016 Name: Andrew ID: Department: Guidelines: 1. PLEASE DO NOT TURN THIS PAGE UNTIL INSTRUCTED. 2. Write your name,
More informationThe Entropy Power Inequality and Mrs. Gerber s Lemma for Groups of order 2 n
The Entrop Power Inequalit and Mrs. Gerber s Lemma for Groups of order 2 n Varun Jog EECS, UC Berkele Berkele, CA-94720 Email: varunjog@eecs.berkele.edu Venkat Anantharam EECS, UC Berkele Berkele, CA-94720
More informationECE598: Information-theoretic methods in high-dimensional statistics Spring 2016
ECE598: Information-theoretic methods in high-dimensional statistics Spring 06 Lecture : Mutual Information Method Lecturer: Yihong Wu Scribe: Jaeho Lee, Mar, 06 Ed. Mar 9 Quick review: Assouad s lemma
More informationUniversality of Logarithmic Loss in Lossy Compression
Universality of Logarithmic Loss in Lossy Compression Albert No, Member, IEEE, and Tsachy Weissman, Fellow, IEEE arxiv:709.0034v [cs.it] Sep 207 Abstract We establish two strong senses of universality
More informationLecture 22: Final Review
Lecture 22: Final Review Nuts and bolts Fundamental questions and limits Tools Practical algorithms Future topics Dr Yao Xie, ECE587, Information Theory, Duke University Basics Dr Yao Xie, ECE587, Information
More informationQUADRATIC AND CONVEX MINIMAX CLASSIFICATION PROBLEMS
Journal of the Operations Research Societ of Japan 008, Vol. 51, No., 191-01 QUADRATIC AND CONVEX MINIMAX CLASSIFICATION PROBLEMS Tomonari Kitahara Shinji Mizuno Kazuhide Nakata Toko Institute of Technolog
More informationA New Achievable Region for Gaussian Multiple Descriptions Based on Subset Typicality
0 IEEE Information Theory Workshop A New Achievable Region for Gaussian Multiple Descriptions Based on Subset Typicality Kumar Viswanatha, Emrah Akyol and Kenneth Rose ECE Department, University of California
More information3.0 PROBABILITY, RANDOM VARIABLES AND RANDOM PROCESSES
3.0 PROBABILITY, RANDOM VARIABLES AND RANDOM PROCESSES 3.1 Introduction In this chapter we will review the concepts of probabilit, rom variables rom processes. We begin b reviewing some of the definitions
More informationCOMPRESSIVE (CS) [1] is an emerging framework,
1 An Arithmetic Coding Scheme for Blocked-based Compressive Sensing of Images Min Gao arxiv:1604.06983v1 [cs.it] Apr 2016 Abstract Differential pulse-code modulation (DPCM) is recentl coupled with uniform
More informationFunctional Properties of MMSE
Functional Properties of MMSE Yihong Wu epartment of Electrical Engineering Princeton University Princeton, NJ 08544, USA Email: yihongwu@princeton.edu Sergio Verdú epartment of Electrical Engineering
More informationSchool of Computer and Communication Sciences. Information Theory and Coding Notes on Random Coding December 12, 2003.
ÉCOLE POLYTECHNIQUE FÉDÉRALE DE LAUSANNE School of Computer and Communication Sciences Handout 8 Information Theor and Coding Notes on Random Coding December 2, 2003 Random Coding In this note we prove
More informationAn Achievable Rate Region for the 3-User-Pair Deterministic Interference Channel
Forty-Ninth Annual Allerton Conference Allerton House, UIUC, Illinois, USA September 8-3, An Achievable Rate Region for the 3-User-Pair Deterministic Interference Channel Invited Paper Bernd Bandemer and
More informationStability Analysis for Linear Systems under State Constraints
Stabilit Analsis for Linear Sstems under State Constraints Haijun Fang Abstract This paper revisits the problem of stabilit analsis for linear sstems under state constraints New and less conservative sufficient
More informationLecture 6: Gaussian Channels. Copyright G. Caire (Sample Lectures) 157
Lecture 6: Gaussian Channels Copyright G. Caire (Sample Lectures) 157 Differential entropy (1) Definition 18. The (joint) differential entropy of a continuous random vector X n p X n(x) over R is: Z h(x
More informationSuperposition Encoding and Partial Decoding Is Optimal for a Class of Z-interference Channels
Superposition Encoding and Partial Decoding Is Optimal for a Class of Z-interference Channels Nan Liu and Andrea Goldsmith Department of Electrical Engineering Stanford University, Stanford CA 94305 Email:
More information11. Learning graphical models
Learning graphical models 11-1 11. Learning graphical models Maximum likelihood Parameter learning Structural learning Learning partially observed graphical models Learning graphical models 11-2 statistical
More informationA. Derivation of regularized ERM duality
A. Derivation of regularized ERM dualit For completeness, in this section we derive the dual 5 to the problem of computing proximal operator for the ERM objective 3. We can rewrite the primal problem as
More informationEntropy, Relative Entropy and Mutual Information Exercises
Entrop, Relative Entrop and Mutual Information Exercises Exercise 2.: Coin Flips. A fair coin is flipped until the first head occurs. Let X denote the number of flips required. (a) Find the entrop H(X)
More informationInformation Theoretic Limits of Randomness Generation
Information Theoretic Limits of Randomness Generation Abbas El Gamal Stanford University Shannon Centennial, University of Michigan, September 2016 Information theory The fundamental problem of communication
More informationMachine Learning And Applications: Supervised Learning-SVM
Machine Learning And Applications: Supervised Learning-SVM Raphaël Bournhonesque École Normale Supérieure de Lyon, Lyon, France raphael.bournhonesque@ens-lyon.fr 1 Supervised vs unsupervised learning Machine
More informationApproximately achieving the feedback interference channel capacity with point-to-point codes
Approximately achieving the feedback interference channel capacity with point-to-point codes Joyson Sebastian*, Can Karakus*, Suhas Diggavi* Abstract Superposition codes with rate-splitting have been used
More informationEQUIVALENT FORMULATIONS OF HYPERCONTRACTIVITY USING INFORMATION MEASURES
1 EQUIVALENT FORMULATIONS OF HYPERCONTRACTIVITY USING INFORMATION MEASURES CHANDRA NAIR 1 1 Dept. of Information Engineering, The Chinese Universit of Hong Kong Abstract We derive alternate characterizations
More informationLecture 5 Channel Coding over Continuous Channels
Lecture 5 Channel Coding over Continuous Channels I-Hsiang Wang Department of Electrical Engineering National Taiwan University ihwang@ntu.edu.tw November 14, 2014 1 / 34 I-Hsiang Wang NIT Lecture 5 From
More informationStatistical Data Mining and Machine Learning Hilary Term 2016
Statistical Data Mining and Machine Learning Hilary Term 2016 Dino Sejdinovic Department of Statistics Oxford Slides and other materials available at: http://www.stats.ox.ac.uk/~sejdinov/sdmml Naïve Bayes
More informationLinear programming: Theory
Division of the Humanities and Social Sciences Ec 181 KC Border Convex Analsis and Economic Theor Winter 2018 Topic 28: Linear programming: Theor 28.1 The saddlepoint theorem for linear programming The
More informationPerturbation Theory for Variational Inference
Perturbation heor for Variational Inference Manfred Opper U Berlin Marco Fraccaro echnical Universit of Denmark Ulrich Paquet Apple Ale Susemihl U Berlin Ole Winther echnical Universit of Denmark Abstract
More informationThe Poisson Channel with Side Information
The Poisson Channel with Side Information Shraga Bross School of Enginerring Bar-Ilan University, Israel brosss@macs.biu.ac.il Amos Lapidoth Ligong Wang Signal and Information Processing Laboratory ETH
More informationCapacity Region of Reversely Degraded Gaussian MIMO Broadcast Channel
Capacity Region of Reversely Degraded Gaussian MIMO Broadcast Channel Jun Chen Dept. of Electrical and Computer Engr. McMaster University Hamilton, Ontario, Canada Chao Tian AT&T Labs-Research 80 Park
More informationSequential prediction with coded side information under logarithmic loss
under logarithmic loss Yanina Shkel Department of Electrical Engineering Princeton University Princeton, NJ 08544, USA Maxim Raginsky Department of Electrical and Computer Engineering Coordinated Science
More informationDesign of Optimal Quantizers for Distributed Source Coding
Design of Optimal Quantizers for Distributed Source Coding David Rebollo-Monedero, Rui Zhang and Bernd Girod Information Systems Laboratory, Electrical Eng. Dept. Stanford University, Stanford, CA 94305
More informationUpper Bounds on the Capacity of Binary Intermittent Communication
Upper Bounds on the Capacity of Binary Intermittent Communication Mostafa Khoshnevisan and J. Nicholas Laneman Department of Electrical Engineering University of Notre Dame Notre Dame, Indiana 46556 Email:{mhoshne,
More informationOPTIMAL CONTROL FOR A PARABOLIC SYSTEM MODELLING CHEMOTAXIS
Trends in Mathematics Information Center for Mathematical Sciences Volume 6, Number 1, June, 23, Pages 45 49 OPTIMAL CONTROL FOR A PARABOLIC SYSTEM MODELLING CHEMOTAXIS SANG UK RYU Abstract. We stud the
More informationConvexity/Concavity of Renyi Entropy and α-mutual Information
Convexity/Concavity of Renyi Entropy and -Mutual Information Siu-Wai Ho Institute for Telecommunications Research University of South Australia Adelaide, SA 5095, Australia Email: siuwai.ho@unisa.edu.au
More informationComputation of Total Capacity for Discrete Memoryless Multiple-Access Channels
IEEE TRANSACTIONS ON INFORATION THEORY, VOL. 50, NO. 11, NOVEBER 2004 2779 Computation of Total Capacit for Discrete emorless ultiple-access Channels ohammad Rezaeian, ember, IEEE, and Ale Grant, Senior
More informationSimultaneous Nonunique Decoding Is Rate-Optimal
Fiftieth Annual Allerton Conference Allerton House, UIUC, Illinois, USA October 1-5, 2012 Simultaneous Nonunique Decoding Is Rate-Optimal Bernd Bandemer University of California, San Diego La Jolla, CA
More informationLecture 2 Machine Learning Review
Lecture 2 Machine Learning Review CMSC 35246: Deep Learning Shubhendu Trivedi & Risi Kondor University of Chicago March 29, 2017 Things we will look at today Formal Setup for Supervised Learning Things
More informationAn Achievable Error Exponent for the Mismatched Multiple-Access Channel
An Achievable Error Exponent for the Mismatched Multiple-Access Channel Jonathan Scarlett University of Cambridge jms265@camacuk Albert Guillén i Fàbregas ICREA & Universitat Pompeu Fabra University of
More informationA Minimax Approach to Supervised Learning
A Minimax Approach to Supervised Learning Farzan Farnia farnia@stanford.edu David Tse dntse@stanford.edu Abstract Given a task of predicting Y from X, a loss function L, and a set of probability distributions
More informationFORMULATION OF THE LEARNING PROBLEM
FORMULTION OF THE LERNING PROBLEM MIM RGINSKY Now that we have seen an informal statement of the learning problem, as well as acquired some technical tools in the form of concentration inequalities, we
More informationThe Capacity of the Semi-Deterministic Cognitive Interference Channel and its Application to Constant Gap Results for the Gaussian Channel
The Capacity of the Semi-Deterministic Cognitive Interference Channel and its Application to Constant Gap Results for the Gaussian Channel Stefano Rini, Daniela Tuninetti, and Natasha Devroye Department
More informationInformation Theory. Lecture 10. Network Information Theory (CT15); a focus on channel capacity results
Information Theory Lecture 10 Network Information Theory (CT15); a focus on channel capacity results The (two-user) multiple access channel (15.3) The (two-user) broadcast channel (15.6) The relay channel
More informationOn the Capacity Region of the Gaussian Z-channel
On the Capacity Region of the Gaussian Z-channel Nan Liu Sennur Ulukus Department of Electrical and Computer Engineering University of Maryland, College Park, MD 74 nkancy@eng.umd.edu ulukus@eng.umd.edu
More informationCheng Soon Ong & Christian Walder. Canberra February June 2018
Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 2018 (Many figures from C. M. Bishop, "Pattern Recognition and ") 1of 89 Part II
More informationLecture 3: Statistical Decision Theory (Part II)
Lecture 3: Statistical Decision Theory (Part II) Hao Helen Zhang Hao Helen Zhang Lecture 3: Statistical Decision Theory (Part II) 1 / 27 Outline of This Note Part I: Statistics Decision Theory (Classical
More informationNaïve Bayes classification
Naïve Bayes classification 1 Probability theory Random variable: a variable whose possible values are numerical outcomes of a random phenomenon. Examples: A person s height, the outcome of a coin toss
More informationArimoto Channel Coding Converse and Rényi Divergence
Arimoto Channel Coding Converse and Rényi Divergence Yury Polyanskiy and Sergio Verdú Abstract Arimoto proved a non-asymptotic upper bound on the probability of successful decoding achievable by any code
More informationInformation Theory Meets Game Theory on The Interference Channel
Information Theory Meets Game Theory on The Interference Channel Randall A. Berry Dept. of EECS Northwestern University e-mail: rberry@eecs.northwestern.edu David N. C. Tse Wireless Foundations University
More informationSHARED INFORMATION. Prakash Narayan with. Imre Csiszár, Sirin Nitinawarat, Himanshu Tyagi, Shun Watanabe
SHARED INFORMATION Prakash Narayan with Imre Csiszár, Sirin Nitinawarat, Himanshu Tyagi, Shun Watanabe 2/41 Outline Two-terminal model: Mutual information Operational meaning in: Channel coding: channel
More informationSolutions to Homework Set #1 Sanov s Theorem, Rate distortion
st Semester 00/ Solutions to Homework Set # Sanov s Theorem, Rate distortion. Sanov s theorem: Prove the simple version of Sanov s theorem for the binary random variables, i.e., let X,X,...,X n be a sequence
More informationOrdinal One-Switch Utility Functions
Ordinal One-Switch Utilit Functions Ali E. Abbas Universit of Southern California, Los Angeles, California 90089, aliabbas@usc.edu David E. Bell Harvard Business School, Boston, Massachusetts 0163, dbell@hbs.edu
More informationLatent Tree Approximation in Linear Model
Latent Tree Approximation in Linear Model Navid Tafaghodi Khajavi Dept. of Electrical Engineering, University of Hawaii, Honolulu, HI 96822 Email: navidt@hawaii.edu ariv:1710.01838v1 [cs.it] 5 Oct 2017
More informationTransmitting k samples over the Gaussian channel: energy-distortion tradeoff
Transmitting samples over the Gaussian channel: energy-distortion tradeoff Victoria Kostina Yury Polyansiy Sergio Verdú California Institute of Technology Massachusetts Institute of Technology Princeton
More informationThe Gallager Converse
The Gallager Converse Abbas El Gamal Director, Information Systems Laboratory Department of Electrical Engineering Stanford University Gallager s 75th Birthday 1 Information Theoretic Limits Establishing
More informationOptimal Power Control in Decentralized Gaussian Multiple Access Channels
1 Optimal Power Control in Decentralized Gaussian Multiple Access Channels Kamal Singh Department of Electrical Engineering Indian Institute of Technology Bombay. arxiv:1711.08272v1 [eess.sp] 21 Nov 2017
More information3238 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 60, NO. 6, JUNE 2014
3238 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 60, NO. 6, JUNE 2014 The Lossy Common Information of Correlated Sources Kumar B. Viswanatha, Student Member, IEEE, Emrah Akyol, Member, IEEE, and Kenneth
More informationLecture 7 Introduction to Statistical Decision Theory
Lecture 7 Introduction to Statistical Decision Theory I-Hsiang Wang Department of Electrical Engineering National Taiwan University ihwang@ntu.edu.tw December 20, 2016 1 / 55 I-Hsiang Wang IT Lecture 7
More informationDistirbutional robustness, regularizing variance, and adversaries
Distirbutional robustness, regularizing variance, and adversaries John Duchi Based on joint work with Hongseok Namkoong and Aman Sinha Stanford University November 2017 Motivation We do not want machine-learned
More informationMismatched Multi-letter Successive Decoding for the Multiple-Access Channel
Mismatched Multi-letter Successive Decoding for the Multiple-Access Channel Jonathan Scarlett University of Cambridge jms265@cam.ac.uk Alfonso Martinez Universitat Pompeu Fabra alfonso.martinez@ieee.org
More informationSHARED INFORMATION. Prakash Narayan with. Imre Csiszár, Sirin Nitinawarat, Himanshu Tyagi, Shun Watanabe
SHARED INFORMATION Prakash Narayan with Imre Csiszár, Sirin Nitinawarat, Himanshu Tyagi, Shun Watanabe 2/40 Acknowledgement Praneeth Boda Himanshu Tyagi Shun Watanabe 3/40 Outline Two-terminal model: Mutual
More informationDistributed Functional Compression through Graph Coloring
Distributed Functional Compression through Graph Coloring Vishal Doshi, Devavrat Shah, Muriel Médard, and Sidharth Jaggi Laboratory for Information and Decision Systems Massachusetts Institute of Technology
More informationOblivious Equilibrium for Large-Scale Stochastic Games with Unbounded Costs
Proceedings of the 47th IEEE Conference on Decision and Control Cancun, Mexico, Dec. 9-11, 008 Oblivious Equilibrium for Large-Scale Stochastic Games with Unbounded Costs Sachin Adlakha, Ramesh Johari,
More informationEstimation of information-theoretic quantities
Estimation of information-theoretic quantities Liam Paninski Gatsby Computational Neuroscience Unit University College London http://www.gatsby.ucl.ac.uk/ liam liam@gatsby.ucl.ac.uk November 16, 2004 Some
More informationLecture 14 February 28
EE/Stats 376A: Information Theory Winter 07 Lecture 4 February 8 Lecturer: David Tse Scribe: Sagnik M, Vivek B 4 Outline Gaussian channel and capacity Information measures for continuous random variables
More informationOn MMSE Estimation from Quantized Observations in the Nonasymptotic Regime
On MMSE Estimation from Quantized Observations in the Nonasymptotic Regime Jaeho Lee, Maxim Raginsky, and Pierre Moulin Abstract This paper studies MMSE estimation on the basis of quantized noisy observations
More informationOn the Optimality of Treating Interference as Noise in Competitive Scenarios
On the Optimality of Treating Interference as Noise in Competitive Scenarios A. DYTSO, D. TUNINETTI, N. DEVROYE WORK PARTIALLY FUNDED BY NSF UNDER AWARD 1017436 OUTLINE CHANNEL MODEL AND PAST WORK ADVANTAGES
More informationMultiterminal Source Coding with an Entropy-Based Distortion Measure
Multiterminal Source Coding with an Entropy-Based Distortion Measure Thomas Courtade and Rick Wesel Department of Electrical Engineering University of California, Los Angeles 4 August, 2011 IEEE International
More informationEE376A - Information Theory Final, Monday March 14th 2016 Solutions. Please start answering each question on a new page of the answer booklet.
EE376A - Information Theory Final, Monday March 14th 216 Solutions Instructions: You have three hours, 3.3PM - 6.3PM The exam has 4 questions, totaling 12 points. Please start answering each question on
More informationOn Optimal Coding of Hidden Markov Sources
2014 Data Compression Conference On Optimal Coding of Hidden Markov Sources Mehdi Salehifar, Emrah Akyol, Kumar Viswanatha, and Kenneth Rose Department of Electrical and Computer Engineering University
More informationWorst-Case Bounds for Gaussian Process Models
Worst-Case Bounds for Gaussian Process Models Sham M. Kakade University of Pennsylvania Matthias W. Seeger UC Berkeley Abstract Dean P. Foster University of Pennsylvania We present a competitive analysis
More informationWE start with a general discussion. Suppose we have
646 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 43, NO. 2, MARCH 1997 Minimax Redundancy for the Class of Memoryless Sources Qun Xie and Andrew R. Barron, Member, IEEE Abstract Let X n = (X 1 ; 111;Xn)be
More information(Classical) Information Theory III: Noisy channel coding
(Classical) Information Theory III: Noisy channel coding Sibasish Ghosh The Institute of Mathematical Sciences CIT Campus, Taramani, Chennai 600 113, India. p. 1 Abstract What is the best possible way
More informationLossy Compression Coding Theorems for Arbitrary Sources
Lossy Compression Coding Theorems for Arbitrary Sources Ioannis Kontoyiannis U of Cambridge joint work with M. Madiman, M. Harrison, J. Zhang Beyond IID Workshop, Cambridge, UK July 23, 2018 Outline Motivation
More informationARTIFICIAL SATELLITES, Vol. 41, No DOI: /v
ARTIFICIAL SATELLITES, Vol. 41, No. 1-006 DOI: 10.478/v10018-007-000-8 On InSAR AMBIGUITY RESOLUTION FOR DEFORMATION MONITORING P.J.G. Teunissen Department of Earth Observation and Space Sstems (DEOS)
More informationNaïve Bayes classification. p ij 11/15/16. Probability theory. Probability theory. Probability theory. X P (X = x i )=1 i. Marginal Probability
Probability theory Naïve Bayes classification Random variable: a variable whose possible values are numerical outcomes of a random phenomenon. s: A person s height, the outcome of a coin toss Distinguish
More informationThe Communication Complexity of Correlation. Prahladh Harsha Rahul Jain David McAllester Jaikumar Radhakrishnan
The Communication Complexity of Correlation Prahladh Harsha Rahul Jain David McAllester Jaikumar Radhakrishnan Transmitting Correlated Variables (X, Y) pair of correlated random variables Transmitting
More informationEquivalence for Networks with Adversarial State
Equivalence for Networks with Adversarial State Oliver Kosut Department of Electrical, Computer and Energy Engineering Arizona State University Tempe, AZ 85287 Email: okosut@asu.edu Jörg Kliewer Department
More informationA Comparison of Superposition Coding Schemes
A Comparison of Superposition Coding Schemes Lele Wang, Eren Şaşoğlu, Bernd Bandemer, and Young-Han Kim Department of Electrical and Computer Engineering University of California, San Diego La Jolla, CA
More informationBlock 2: Introduction to Information Theory
Block 2: Introduction to Information Theory Francisco J. Escribano April 26, 2015 Francisco J. Escribano Block 2: Introduction to Information Theory April 26, 2015 1 / 51 Table of contents 1 Motivation
More informationLayered Synthesis of Latent Gaussian Trees
Layered Synthesis of Latent Gaussian Trees Ali Moharrer, Shuangqing Wei, George T. Amariucai, and Jing Deng arxiv:1608.04484v2 [cs.it] 7 May 2017 Abstract A new synthesis scheme is proposed to generate
More informationEfficient Use of Joint Source-Destination Cooperation in the Gaussian Multiple Access Channel
Efficient Use of Joint Source-Destination Cooperation in the Gaussian Multiple Access Channel Ahmad Abu Al Haija ECE Department, McGill University, Montreal, QC, Canada Email: ahmad.abualhaija@mail.mcgill.ca
More information3F1: Signals and Systems INFORMATION THEORY Examples Paper Solutions
Engineering Tripos Part IIA THIRD YEAR 3F: Signals and Systems INFORMATION THEORY Examples Paper Solutions. Let the joint probability mass function of two binary random variables X and Y be given in the
More informationCS8803: Statistical Techniques in Robotics Byron Boots. Hilbert Space Embeddings
CS8803: Statistical Techniques in Robotics Byron Boots Hilbert Space Embeddings 1 Motivation CS8803: STR Hilbert Space Embeddings 2 Overview Multinomial Distributions Marginal, Joint, Conditional Sum,
More informationLecture 1: The Multiple Access Channel. Copyright G. Caire 12
Lecture 1: The Multiple Access Channel Copyright G. Caire 12 Outline Two-user MAC. The Gaussian case. The K-user case. Polymatroid structure and resource allocation problems. Copyright G. Caire 13 Two-user
More informationOptimal Natural Encoding Scheme for Discrete Multiplicative Degraded Broadcast Channels
Optimal Natural Encoding Scheme for Discrete Multiplicative Degraded Broadcast Channels Bike ie, Student Member, IEEE and Richard D. Wesel, Senior Member, IEEE Abstract Certain degraded broadcast channels
More informationLecture 5: GPs and Streaming regression
Lecture 5: GPs and Streaming regression Gaussian Processes Information gain Confidence intervals COMP-652 and ECSE-608, Lecture 5 - September 19, 2017 1 Recall: Non-parametric regression Input space X
More informationMultiuser Successive Refinement and Multiple Description Coding
Multiuser Successive Refinement and Multiple Description Coding Chao Tian Laboratory for Information and Communication Systems (LICOS) School of Computer and Communication Sciences EPFL Lausanne Switzerland
More informationIEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 52, NO. 4, APRIL
IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 52, NO. 4, APRIL 2006 1545 Bounds on Capacity Minimum EnergyPerBit for AWGN Relay Channels Abbas El Gamal, Fellow, IEEE, Mehdi Mohseni, Student Member, IEEE,
More informationChannels with cost constraints: strong converse and dispersion
Channels with cost constraints: strong converse and dispersion Victoria Kostina, Sergio Verdú Dept. of Electrical Engineering, Princeton University, NJ 08544, USA Abstract This paper shows the strong converse
More informationRandomized Quantization and Optimal Design with a Marginal Constraint
Randomized Quantization and Optimal Design with a Marginal Constraint Naci Saldi, Tamás Linder, Serdar Yüksel Department of Mathematics and Statistics, Queen s University, Kingston, ON, Canada Email: {nsaldi,linder,yuksel}@mast.queensu.ca
More informationProbability and Statistical Decision Theory
Tufts COMP 135: Introduction to Machine Learning https://www.cs.tufts.edu/comp/135/2019s/ Probability and Statistical Decision Theory Many slides attributable to: Erik Sudderth (UCI) Prof. Mike Hughes
More informationMachine Learning Basics: Maximum Likelihood Estimation
Machine Learning Basics: Maximum Likelihood Estimation Sargur N. srihari@cedar.buffalo.edu This is part of lecture slides on Deep Learning: http://www.cedar.buffalo.edu/~srihari/cse676 1 Topics 1. Learning
More informationLossy Distributed Source Coding
Lossy Distributed Source Coding John MacLaren Walsh, Ph.D. Multiterminal Information Theory, Spring Quarter, 202 Lossy Distributed Source Coding Problem X X 2 S {,...,2 R } S 2 {,...,2 R2 } Ẑ Ẑ 2 E d(z,n,
More informationOptimization in Information Theory
Optimization in Information Theory Dawei Shen November 11, 2005 Abstract This tutorial introduces the application of optimization techniques in information theory. We revisit channel capacity problem from
More informationOn the Capacity of the Interference Channel with a Relay
On the Capacity of the Interference Channel with a Relay Ivana Marić, Ron Dabora and Andrea Goldsmith Stanford University, Stanford, CA {ivanam,ron,andrea}@wsl.stanford.edu Abstract Capacity gains due
More informationAn Information Theory For Preferences
An Information Theor For Preferences Ali E. Abbas Department of Management Science and Engineering, Stanford Universit, Stanford, Ca, 94305 Abstract. Recent literature in the last Maimum Entrop workshop
More information