CONVEX NONNEGATIVE MATRIX FACTORIZATION WITH MISSING DATA

Size: px

Start display at page:

Download "CONVEX NONNEGATIVE MATRIX FACTORIZATION WITH MISSING DATA"

Lesley Skinner
5 years ago
Views:

1 2016 IEEE INTERNATIONAL WORKSHOP ON MACHINE LEARNING FOR SIGNAL PROCESSING, SEPT , 2016, SALERNO, ITALY CONVEX NONNEGATIVE MATRIX FACTORIZATION WITH MISSING DATA Ronan Hamon, Valentin Emiya Aix Marseille Univ, CNRS, LIF, Marseille, France Cédric Févotte CNRS & IRIT, Toulouse, France ABSTRACT Convex nonnegative matrix factorization (CNMF) is a variant of nonnegative matrix factorization (NMF) in which the comonents are a convex combination of atoms of a known dictionary. In this contribution, we roose to extend CNMF to the case where the data matrix and the dictionary have missing entries. After a formulation of the roblem in this context of missing data, we roose a majorization-minimization algorithm for the solving of the otimization roblem incurred. Exerimental results with synthetic data and audio sectrograms highlight an imrovement of the erformance of reconstruction with resect to standard NMF. The erformance ga is articularly significant when the task of reconstruction becomes arduous, e.g. when the ratio of missing data is high, the noise is stee, or the comlexity of data is high. Index Terms matrix factorization, nonnegativity, lowrankness, matrix comletion, sectrogram inainting 1. INTRODUCTION Convex NMF (CNMF) [1] is a secial case of nonnegative matrix factorization (NMF) [2], in which the matrix of comonents is constrained to be a linear combination of atoms of a known dictionary. The term convex refers to the constraint of the linear combination, where the combination coefficients forming each comonent are nonnegative and sum to 1. Comared to the fully unsuervised NMF setting, the use of known atoms is a source of suervision that may guide learning based on this additional data: in articular, an interesting case of CNMF consists in auto-encoding the data themselves, by defining the atoms as the data matrix. CNMF has been of interest in a number of contexts, such as clustering, data analysis, face recognition, or music transcrition [1, 3]. It is also related to the self-exressive dictionary-based reresentation roosed in [4]. An issue that has not yet been addressed is when the data matrix has missing coefficients. Such an extension of CNMF is worth being considered, as it oens the way to data-reconstruction settings with nonnegative low-rank constraints, which covers several relevant alications. One This work was suorted by ANR JCJC rogram MAD (ANR- 14- CE ). examle concerns the field of image or audio inainting [5, 6, 7, 8], where CNMF may imrove the current reconstruction techniques. In inainting of audio sectrograms for examle, setting u the dictionary to be a comrehensive collection of notes from a secific instrument may guide the factorization toward a realistic and meaningful decomosition, increasing the quality of the reconstruction of the missing data. In this contribution, we also consider the case where the dictionary may have missing coefficients itself. The aer is organized as follows. Section 2 formulates CNMF in the resence of missing entries in the data matrix and in the dictionary. Section 3 describes the roosed majorization-minimization (MM) algorithm. Sections 4 and 5 reort exerimental results with synthetic data and audio sectrograms. 2. CONVEX NONNEGATIVE MATRIX FACTORIZATION WITH MISSING DATA 2.1. Notations and definitions For any integer N, the integer set t1, 2,...,Nu is denoted by rns. The coefficients of a matrix A P R MˆN are denoted by either a mn or ras mn. The element-wise matrix roduct, A matrix division and matrix ower are denoted by A.B, B and A., resectively where A and B are matrices with same dimensions and is a scalar. 0 and 1 denote vectors or matrices comosed of zeros and ones, resectively, with dimensions that can be deduced from the context. The element-wise negation of a binary matrix M is denoted by M fi 1 M NMF and Convex NMF NMF consists in aroximating a data matrix V P R F ` ˆN as the roduct WH of two nonnegative matrices W P R F ` ˆK and H P R KˆN `. Often, K min F, Nq, such that WH is a low-rank aroximation of V. Every samle v n, the n- th column of V, is thus decomosed as a linear combination of K elementary comonents or atterns w 1,...,w K P R F`, the columns of W. The coefficients of the linear combination are given by the n-th column h n of H. In [9] and [10], algorithms have been roosed for the unsuervised estimation of W and H from V, by minimization /16/$31.00 c 2016 IEEE

2 of the cost function D V WHq d v rwhs q, where d x yq is the -divergence defined as: $ 1 1q `x ` 1q y xy 1 & for P Rzt0, 1u d x yq fi x log x y x ` y for 1 (1) x y log x y 1 for 0 When ill-defined, we set by convention d 0 0q 0. CNMF is a variant of NMF in which W SL. S rs 1,...,s P spr F ` ˆP is a nonnegative matrix of atoms and L rl 1,...,l K spr P ` ˆK is the so-called labeling matrix. Each dictionary element w k is thus equal to Sl k, with usually P K, and the data is in the end decomosed as V SLH. The scale indeterminacy between L and H may be lifted by imosing }l k } 1 1, in which case w k is recisely a convex combination of the elements of the subsace S. CNMF can be related to the so-called archetyal analysis [11], but without considering any nonnegativity constraint. The use of known examles in S can then be seen as a source of suervision that guides learning. A secial case of CNMF is obtained by setting S V, thus auto-encoding the data as VLH. This articular case is studied in deth in [1]. In this aer, we consider the general case for S, with or without missing data Convex NMF with missing data We assume that some coefficients in V and S may be missing. Let V ÄrFsˆrNs be a set of airs of indices that locates the observed coefficients in V: f,nq PV iff v is known. Similarly, let S ÄrFsˆrPs be a set of airs of indices that locates the observed coefficients in S. The use of sets V and S may be reformulated equivalently by defining masking matrices M V Pt0, 1u F ˆN and M S Pt0, 1u F ˆP from V and S as # 1 if f,nq PV rm V s PrFsˆrNs (2) 0 otherwise # 1 if f,q PS rm S s f PrF sˆrps (3) 0 otherwise A major goal in this aer is to estimate L, H and the missing entries in S, given the artially observed data matrix V. Denoting by S o the set of observed/known dictionary matrix coefficients, our aim is to minimize the objective function C S, L, Hq fi D M V.V M V.SLHq (4) subject to S P R F ˆP `, L P R P ˆK `, H P R KˆN `, and M S.S M S.S o. The articular case where the dictionary equals the data matrix itself is obtained by setting M S, S o q fi M V, Vq. Algorithm 1 CNMF with missing data Require: V, S o, M V, M S, Initialize S, L, H with random nonnegative values loo Udate S: S M S.S o` (5) M V. SLHq. 2q.V LHq T. q M S.S.. M V. SLHq 1q LHq T Udate L: L L. ST Udate H: M V. SLHq. 2q.V H T S T M V. SLHq. 1q H T H H. SLqT M V. SLHq. 2q.V SLq T. M V. SLHq 1q Rescale L and H: end loo return S, L, H. q. q (6) PrKs, h k }l k } 1 ˆ h k (8) l k 3. PROPOSED ALGORITHM l k }l k } 1 (9) 3.1. General descrition of the algorithm Algorithm 1 extends the algorithm roosed in [9] for comlete CNMF with the -divergence to the case of missing entries in V or S. The algorithm is a block-coordinate descent rocedure in which each block is one the three matrix factors. The udates of each block/factor is obtained via majorizationminimization (MM), a classic rocedure that consists in iteratively minimizing a tight uer bound (called auxiliary function) of the objective function. In the resent setting, the MM rocedure leads to multilicative udates, characteristic of many NMF algorithms, that automatically reserve nonnegativity given ositive initialization Detailed udates We consider the otimization of C S, L, Hq with resect to each of its three arguments individually, using MM. Current udates are denoted with a tilde, i.e., S, r L r and H. r We start by recalling the definition of an auxiliary function: Definition 1 (Auxiliary function). The maing G A A r :

3 R IˆJ ` ˆ R IˆJ ` fiñ R` is an auxiliary function to C Aq iff P R IˆJ `, CAq G A A r P R IˆJ `, CAq G A A r (10). The iterative minimization of G A Aq with resect to A, with relacement of A r at every iteration, monotonically decreases the objective C Aq until convergence. As exlained in detail in [9], the -divergence may be decomosed into the sum of a convex term d q..q, a concave term d u..q and a constant term cst. The first two terms can be majorized using routine Jensen and tangent inequalities, resectively, leading to tractable udates. The auxiliary functions used to derive Algorithm 1 are given by the three following roositions 1 and the monotonicity of the algorithm follows by construction. Proosition 1 (Auxiliary function for S). Let S r P R F ` ˆP be such PrFsˆrNs, ṽ 0, PrFsˆrPs, s f 0, where V r fi SLH. r Then the function G S S S r fi ÿ rm V s qg S r S ` ug S r ı S ` cst where G q S S r fi ÿ ˆ rlhs n rs f q s f d v rv rv rs f S S r fi d u v rv q ` ud 1 v rv q ÿ f rlhs n s f rs f q is an auxiliary function to C S, L, Hq with resect to S and its minimum is given by equation (5). The auxiliary function decoules with resect to the individual coefficients of S and as such, the constraint M S.S M S.S o is directly imosed by only udating the coefficients of S with indices in S. Proosition 2 (Auxiliary function for L). Let L r P R P ` ˆK be such PrF sˆrns, ṽ 0 kq PrPsˆrKs, l k 0, where V r fi SLH. r Then the function G L fi ÿ rm V s qg L r L ` ug L r ı L ` cst where G q fi ÿ s f r lk h kn q l k d v rv rv k r lk fi d u v rv q ` ud 1 v rv q ÿ k s f h kn l k rl k 1 The roof of these roositions are available in the extended version at htts://hal-amu.archives-ouvertes.fr/hal is an auxiliary function to C S, L, Hq with resect to L and its minimum subject to M S.S M S.S o for M S P t0, 1u F ˆP and S o P R F ` ˆP is given by equation (6). Proosition 3 (Auxiliary function for H). Let us define W fi SL and let H r P R KˆN ` be such P rf sˆrns, rv 0 nq P rks ˆ rns, r h kn 0, where V r fi WH. r Then the function G H H H r fi ÿ rm V s qg H r H ` ug H r ı H ` cst where G q H H r fi ÿ w fk r ˆ hkn q h kn d v rv rv k r hkn H H r fi d u v rv q ` ud 1 v rv q ÿ k w fk h kn rh kn is an auxiliary function to C S, L, Hq with resect to H and its minimum is given by equation (7). 4. EXPERIMENT ON SYNTHETIC DATA 4.1. Exerimental setting The objective of this exeriment is to analyze the erformance of CNMF for reconstructing missing data, by comaring it with the regular NMF. We consider a data matrix V of rank K synthesized under the CNMF model V S L H, where the matrix of atoms S and the ground truth factors L and H are generated as the absolute values of Gaussian noise. It is worth noting that V is also consistent with a NMF model by defining W S L. A erturbed data matrix V is obtained by considering a multilicative noise, obtained using a Gamma distribution with mean 1 and variance 1. Hence the arameter controls the imortance of the erturbation. The mask M V of known elements in V is derived by considering missing coefficients randomly and uniformly distributed over the matrix, such that the ratio of missing values is equal to V. Generation of data is reeated 3 times, as well as the generation of the masks. Results are averaged over these reetitions. From a matrix V with missing entries, NMF and CNMF with missing values are alied using K comonents. Only the case where 2 has been considered in this exeriment. In both algorithms, the convergence is reached when the relative difference of the cost function between two iterations is below reetitions are erformed using different random initialization, and the best instance (i.e., the instance which minimizes the cost function) is retained. The reconstructed data matrix is obtained as V r SLH. The reconstruction error is obtained by comuting the - divergence between the noiseless data matrix V, and the reconstructed matrix V; r the error is comuted on and averaged

4 along the missing coefficients only, as e test 1 ij r M V s ij d M V.V, M V r Vq (11) where M V is the mask of unknown elements in V. In the case of CNMF, we consider two choices for S: the data matrix V with missing values, and the ground truth matrix of atoms S, considered here without missing values. The following arameters are fixed: F 100, N 250, P 50, K 10, 2. We articularly investigate the influence of four factors: the number of estimated comonents K Pr2, 14s; the ratio of missing data V Pr0.1, 0.9s in V, i.e., of zeros in M V ; the choice of the matrix S PtS, Vu for the CNMF; the noise level Pr10, 5000s in V ( is inversely roortional to the variance of the noise) Results We first focus on the influence of the number of estimated comonents k for the case where the true dictionary is fully known. Figure 1 dislays the test error with resect to the number of estimated comonents K, for two levels of noise. Performance of reconstruction obtained by NMF and CNMF with S S are lotted, for different values of ratio of missing values in V. = 100 V =0.2 -NMF V =0.2 -CNMF V =0.6 -NMF V =0.6 -CNMF V =0.8 -NMF V =0.8 -CNMF = 3000 for CNMF or NMF, the former still erforming better than the latter. These first results outline the difference between NMF and CNMF, emhasized in the next figures. This comarison is augmented by considering the case where S V, with resect to the ratio V of missing values in V. Figure 2 dislays the erformance of NMF and CNMF as a function of V, for two noise levels. CNMF with the true atoms (S S ) gives the best results on the full range missing data ratio. When there are very few missing data and a low noise, NMF erforms almost as well as CNMF. However, the NMF error increases much faster than the CNMF error as the number of missing data grows, or as the noise in data becomes imortant. This higher sensitivity of NMF to missing data may be exlained by overfitting since the number of free arameters in NMF is higher than in CNMF. In the case of CNMF with S V, the model cannot fit the data as well as CNMF with S S or as NMF. Consequently, the resulting modeling error is observed when there is few missing data, and when comaring S V and S S on all values. However, it erforms better than NMF at high values of V since the constraint S V can be seen as a regularization. We finally investigate the robustness of methods by looking at the influence of the multilicative noise, controlled by the arameter, on the erformance. Figure 3 shows the test error for the NMF and the CNMF with S S with resect to and for some values of V. As exected, the test error decreases according to the variance of the noise, inversely roortional to. If a low value of disruts abrutly the erformance of reconstruction ( ), the test error is slightly decreasing for. When the variance of the noise is close to zero ( 5000), the erformance of NMF and CNMF are almost the same. The erformance of reconstruction differs when the variance of the noise increases, as well as the ratio of missing values in V Number of comonents Number of comonents = 100 = 3000 NMF S = S -CNMF S = V -CNMF Fig. 1. vs. number of comonents. Two levels of noise are dislayed: high noise level ( 100, left) and low noise level ( 3000, right). These results show that the noise level has a high influence on the best number of estimated comonents. As exected, a high noise requires a strong regularization, obtained here by selecting a low value of K. On the contrary, when the noise is low, the best choice of K is closer to the true value K 10. Similarly, when the number of missing data is low ( V 0.2), one should set K to K 10, either for CNMF or for NMF. When it gets higher, the otimal K gets lower in order to limit the effect of overfitting. In this case, the best number of comonents dros down to K 2, either Ratio of missing data in V Ratio of missing data in V Fig. 2. vs. ratio of missing data in V, with K K. Two levels of noise are dislayed: high noise level ( 100, left) and low noise level ( 3000, right).

5 V =0.2 -NMF V =0.2 -CNMF V =0.6 -NMF V =0.6 -CNMF V =0.8 -NMF V =0.8 -CNMF Fig. 3. vs. noise level. Each curve describes the erformance of reconstruction of missing data in V, with K K, according to the method and the ratio of missing data V. 5. APPLICATION TO SPECTROGRAM INPAINTING In order to illustrate the erformance of the roosed algorithm on real data, we consider sectrograms of iano music signals, which are known to be well modeled by NMF methods [12]. Indeed, NMF may rovide a note-level decomosition of a full music iece, each NMF comonent being the estimated sectrum and activation of a single note. This aroximation has roved successful and is also limited in terms of modelling error and of lack of suervision to guide the NMF algorithm. In such condition, we have designed an exeriment with missing data to comare regular NMF against two CNMF variants: in the first one, we set S V; in the second one, S contains examles of all ossible isolated note sectra from another iano. the recording of another iano instrument from the MAPS dataset, from C3 to C Results Figure 4 dislays the test error with resect to the ratio of missing data in V, averaged over the 17 iano ieces. It clearly shows that the CNMF with the secific dictionary S D is much more robust to missing data than the other two systems. When less than 40% of data are missing, NMF erforms slightly better; however, the NMF test error dramatically increases when more data are missing, by a factor 5. when more than 80% data are missing. This must be due to overfitting since NMF has a large number of free arameters to be estimated from very few observations when data are missing. The erformance of the CNMF system with S V robably suffers from modelling error when very few data are missing since the columns of V may not be able to combine into K comonents in a convex way. In the range 50 70%, its erformance is similar to that of NMF. Beyond this range, it seems to be less rone to overfitting than NMF, robably due to less free arameters or to a regularization effect rovided by S V. NMF ConvexNMF - S=D ConvexNMF - S=V 5.1. Exerimental setting We consider 17 iano ieces from the MAPS dataset [13]. For each recording, the magnitude sectrogram is comuted using a 46-ms sine window with 50%-overla and 4096 frequency bins, the samling frequency being 44.1kHz. Matrix V is created from the resulting sectrogram by selecting the F 500 lower-frequency art and the first five seconds, i.e., the N 214 first time frames. Missing data in V are artificially created by removing coefficients uniformly at random. Three systems are comared, based on the test error defined as the -divergence comuted on the estimation of missing data. In all of them, the number of comonent K is set to the true number of notes available from the dataset annotation and we set 1. The first system is the regular NMF, randomly initialized. The second system is the roosed CNMF with M S, S o q fi M V, Vq. The third system is the roosed CNMF with S D set as a secific matrix D of P 61 atoms. Each atom is a single-note sectrum extracted from Ratio of missing data in V Fig. 4. vs. missing data ratio in audio sectrograms. We now investigate the influence of the comlexity of the audio signal on the test error when the ratio of missing data is set to the high value 80%. Figure 5 dislays, for each music recording, the test error with resect to the number of different itches for all notes in the iano iece, which also equals the number of comonent K used by each system. CNMF with S D erforms better than NMF whatever the number of notes and the error increases by a small factor along the reresented range. NMF erforms about five times worse for easy ieces, i.e., ieces comosed of notes with about 6 different itches and it erforms about 10 4 times worse when the number of itches is larger than 25. CNMF 2 The code of the exeriments is available on the webage of the MAD roject htt://mad.lif.univ-mrs.fr/.

6 with S V erforms slightly better than NMF. Since the number of comonents K increases equals the number of note itches, those results confirm that NMF may highly suffer from overtraining while CNMF may not, being robust to missing data even for large values of K. NMF ConvexNMF - S=D ConvexNMF - S=V Number of note itches (or K) Fig. 5. vs. number of different note itches for 80% missing data (each dot reresents one iece of music). 6. CONCLUSION In this aer, we have roosed an extension of convex nonnegative matrix factorization in the case where missing data, as it has been reviously resented for regular NMF. The roosed method can deal with missing values both in the data matrix V and in the dictionary S, which is articularly useful in the case S V where the data is autoencoded. In this framework, an algorithm has been rovided and analyzed using a Majorization-Minimization (MM) scheme to guarantee the convergence to a local minimum. A large set of exeriments on synthetic data showed romising results for this variant of NMF for the task of reconstruction of missing data, and validated the value of this aroach. In many situations, CNMF outerforms NMF, esecially when the ratio of missing values is high and when the matrix data V is noisy. This trend has been confirmed on real audio sectrograms of iano music. In articular, we have shown how the use of a generic set of isolated iano notes as atoms could dramatically enhance the robustness to missing data. This reliminary study indicates that it is worthy of further investigation, beyond the roosed settings where missing values are uniformly distributed over the matrix. Furthermore, the influence of missing values in the dictionary has not been comletely assessed, as only the case where S V has been taken into account. On the alication side, this aroach could give new insight in many roblems dealing with estimation of missing data. 7. REFERENCES [1] C. Ding, Tao Li, and M.I. Jordan, Convex and seminonnegative matrix factorizations, IEEE Trans. Pattern Anal. Mach. Intell., vol. 32, no. 1, , Jan [2] D.D. Lee and H.S. Seung, Learning the arts of objects by non-negative matrix factorization, Nature, vol. 401, no. 6755, , [3] E. Vincent, N. Bertin, and R. Badeau, Adative harmonic sectral decomosition for multile itch estimation, IEEE Trans. Audio, Seech, Language Process., vol. 18, no. 3, , Mar [4] E. Elhamifar and R. Vidal, Sarse subsace clustering: Algorithm, theory, and alications, IEEE Trans. Pattern Anal. Mach. Intell., vol. 35, no. 11, , [5] M. Bertalmio, G. Sairo, V. Caselles, and C. Ballester, Image inainting, in Proc of SIGGRAPH. ACM, 2000, [6] P. Smaragdis, B. Raj, and M. Shashanka, Missing data imutation for sectral audio signals, in Proc. of MLSP, Grenoble, France, Set [7] J. Le Roux, H. Kameoka, N. Ono, A. De Cheveigne, and S. Sagayama, Comutational auditory induction as a missing-data model-fitting roblem with bregman divergence, Seech Communication, vol. 53, no. 5, , [8] D.L. Sun and R. Mazumder, Non-negative matrix comletion for bandwidth extension: A convex otimization aroach, in Proc. of MLSP, Set. 2013, [9] C. Févotte and J. Idier, Algorithms for nonnegative matrix factorization with the -divergence, Neural Comut., vol. 23, no. 9, , [10] M. Nakano, H. Kameoka, J. Le Roux, Y. Kitano, N. Ono, and S. Sagayama, Convergence-guaranteed multilicative algorithms for nonnegative matrix factorization with -divergence, Proc. of MLSP, vol. 10,. 1, [11] A. Cutler and L. Breiman, Archetyal analysis, Technometrics, vol. 36, no. 4, , [12] C. Févotte, N. Bertin, and J-L Durrieu, Nonnegative matrix factorization with the Itakura-Saito divergence: With alication to music analysis, Neural Comut., vol. 21, no. 3, , [13] V. Emiya, R. Badeau, and B. David, Multiitch estimation of iano sounds using a new robabilistic sectral smoothness rincile, IEEE Trans. Audio, Seech, Language Process., vol. 18, no. 6, , 2010.

7 A. PROOFS We detail the roofs of Proositions 1 and 2. The roof of Proosition 3 is straithforward using the same methodology. We first recall reliminary elementss from [9]. The -divergence d x yq can be decomosed as a sum of a convex term, a concave term and a constant term with resect to its second variable y as d x yq q d x yq` ud x yq`d xq (12) This decomosition is not unique. We will use decomosition given in [9, Table 1], for which we have the following derivatives w.r.t. variable y: $ & xy 2 if 1 qd 1 x yq fi y 2 y xq if 1 2 (13) y 1 if 2 $ & y 1 if 1 ud 1 x yq fi 0 if 1 2 (14) xy 2 if 2 A.1. Udate of S (Proof of Proosition 1) We rove Proosition 1 by first constructing the auxiliary function (Proosition 4 below) and then focussing on its minimum (Proosition 5 below). Due to searability, the udate of S P R F ` ˆP relies on the udate of each of its columns. Hence, we only derive the udate for a vector s P R P`. Definition 2 (Objective function C S sq). For v P R Ǹ, L P R P ` ˆK define C S sq fi ÿ n, H P R KˆN `, m Pt0, 1u N, m o Pt0, 1u P, s P R P`, let us m qcn n sq` uc ı n sq ` C (15) 0where qc n sq fi q d `v n s T LH n, u Cn sq fi u d `v n s T LH n and C fi d m.vq` d m o.s o q (16) Proosition 4 (Auxiliary function G S s rsq for C S sq). Let rs P R P` be such rv n 0 rs 0, where rv fi s T LH T. Then the function GS s rsq defined by G S s rsq fi ÿ n m qgn n s rsq` ug ı n s rsq ` C (17) where qg n s rsq fi ÿ rlhs n rs d q rv n ˆ s v n rv n and G rs u n s rsq fi d u v n rv n q` ud 1 v n rv n q ÿ rlhs n s rs q. (18) is an auxiliary function for C S sq. Proof. We trivially have G S s sq C S sq. We use the searability in n and in in order to uer bound each convex term qc n sq and each concave term u C n sq. Convex term q C n sq. Let us rove that q G n s rsq q C n sq. Let P be the set of indices such that rlhs n 0 and define, for P P, r n fi rlhs n rs rlhs n rs rv n 1 PP rlhs 1 n rs. (19) 1

8 We have PP r n 1 and qg n s rsq ÿ PP r n q d v n rlhs n s r n q d v n ÿ PP rlhs r n s n d r q n Pÿ v n rlhs n s C q n sq (20) 1 Concave term C u n sq. We have G u n s rsq C u n sq since C u n sq is concave and s fiñ G u n s rsq is a tangent lane to C u n sq in rs: ug n s rsq d u v n rv n q`ÿ ud 1 v n ÿ 1 rlhs 1 n rs rlhs n s rs q C u n rsq` r C u n rsq, s rs (21) 1 Proosition 5 (Minimum of G S s rsq). The minimum of s fiñ G S s rsq subject to the constraint m o.s m o.s o is reached at s MM with $ ˆ 2 q s MM n mn rv n v nrlhs rs n fi 1 if m o n mn rv n rlhs 0 n (22) s o if m o 1. Proof. Since variable s is fixed for such that m o 1, we only consider variables s for such that m o 0. The related enalty term in G S s rsq vanishes when m o 0. Using (13) and (14), the minimum is obtained by cancelling the gradient r s G S s rsq ÿ n m n LH n qd ˆv 1 s n rv n ` rs ud 1 v n rv n q and by considering that the Hessian matrix is diagonal with nonnegative entries since q d x yq is convex: r 2 s G S s rsq ÿ n (23) ˆ LH n m n rv n d q2 s v n rv n 0. (24) rs rs A.2. Udate of L (Proof of Proosition 2) We roove Proosition 2 by first constructing the auxiliary function (Proosition 6 below) and then focussing on its minimum (Proosition 7 below). As oosed to the udate of S, no searability is considered here. Definition 3 (Objective function C L lq). For v P R F`, S P R F ` ˆP ı C L Lq fi ÿ m qc Lq` uc Lq, H P R KˆN `, M Pt0, 1u F ˆN, L P R P ` ˆK, let us define ` C (25) where qc Lq fi q d v rslhs, u C Lq fi u d v rslhs and C fi d M.Vq. (26) Proosition 6 (Auxiliary function G L for C L Lq). Let L r P R P ` ˆK be such V r 0 k, L r k 0, where V r fi SLH. r Then the function G L defined by G L fi ÿ m qg L r L ` ug L r ı L ` C (27)

9 where qg fi ÿ s f r lk h kn rv k q d v rv l k r lk «fi ud v rv q` ud 1 v rv q ÿ s f h kn l k k rl k (28) (29) is an auxiliary function for C L Lq. Proof. We trivially have G L L Lq C L Lq. In order to rove that G L C L Lq, we use the searability in f and n and we uer bound the convex terms C q Lq and the concave terms C u Lq. Convex term C q Lq. Let us rove that G q C q Lq. Let P be the set of indices such that s f 0, K be the set of indices such that h kn 0 and define, for, kq PP ˆ K, We have,kqppˆk r fkn 1 and qg q d ÿ,kqppˆk v r fkn fi s f r l k h kn rv r fkn q d v s fl k h kn r fkn ÿ,kqppˆk r fkn s f l k h kn r fkn s f r lk h kn 1,k 1 qppˆk s r f 1 l1 k 1h k 1 n q d v Pÿ 1 k 1. (30) (31) Kÿ s f l k h kn C q Lq (32) Concave term C u Lq. We have G u C u Lq since C u Lq is concave and L fiñ G u L L r is a tangent lane to uc Lq in r L: d u v rv q ` ÿ u C rl ` k ud 1 v ÿ 1 k 1 s f 1r l1 k 1h k 1 n DrC u rl, L rl E s f h kn l k rl k (33) (34) Proosition 7 (Minimum of G L L L r ). The minimum of L fiñ G L L L r is reached at L MM k, l MM,k fi $ ˆ & l,k l,k otherwise. s fm rv 2 v h kn s fm rv 1 h kn q if M 0 (35) Proof. Using (13) and (14), the minimum is obtained by cancelling the gradient r lk G L ÿ «m s f h kn qd v 1 l k rv ` r ud 1 v rv q (36) lk and by considering that the Hessian matrix is diagonal with nonnegative entries since d q x yq is convex: r 2 l k G L ÿ s f h kn m rv d q2 l k v r rv 0. (37) lk r hk

Convex Optimization methods for Computing Channel Capacity

Convex Optimization methods for Computing Channel Capacity Convex Otimization methods for Comuting Channel Caacity Abhishek Sinha Laboratory for Information and Decision Systems (LIDS), MIT sinhaa@mit.edu May 15, 2014 We consider a classical comutational roblem