Natural Images, Gaussian Mixtures and Dead Leaves Supplementary Material

Size: px

Start display at page:

Download "Natural Images, Gaussian Mixtures and Dead Leaves Supplementary Material"

Jonah Baldwin
5 years ago
Views:

1 Natural Images, Gaussan Mxtures and Dead Leaves Supplementary Materal Danel Zoran Interdscplnary Center for Neural Computaton Hebrew Unversty of Jerusalem Israel danez Yar Wess School of Computer Scence and Engneerng Hebrew Unversty of Jerusalem Israel ywess 1 Detals of model comparson Here we wll gve full detals of model compared n Secton 2 of the man paper. In general, all tranng data were randomly sampled patches from the Berkeley Segmentaton tranng set mages, and all test data was sampled from the Berkeley test set mages (that s, testng was always done on unseen patches). We removed the DC component of all patches. The test set conssts of a 1000 patches from whch we removed patches wth a standard devaton lower than 0.002, totalng n 992 patches. set szes change wth each model, see below for detals - other than the GMM model (whch s extremely parameter rch), the sze of the tranng set has lttle effect on results (the mnmum sze we used was 50,000 patches). were averaged over the test set and normalzed by the number of pxels mnus 1 (because we remove the DC component). All log functons are base 2, so the result s n bts/pxel. All models and code wll be made avalable f the paper s accepted. 1.1 Whte Gaussan Nose (Ind. G) We ft the varance of each pxel σ 2 on a tranng set of 50,000 patches. log L(x) = log N (x ; 0, σ 2 ) Denosng Each pxel s denosed ndependently usng a Wener flter such that: ˆx = σ 2 + σ2 n where y s the nosy pxel and σn 2 the nose varance. σ 2 y 1

2 1.2 PCA/Gaussan (PCA G) We learn the covarance matrx Σ from the tranng set. z = Wx log L(x) = log N (z ; 0, 1) + log W where the sum s over the leadng N 1 coeffcents (dscardng the DC component). Denosng Denosng wth ths model s done usng a Wener flter n the whtened doman, coeffcent by coeffcent, and then transformng back to pxel space. 1.3 PCA/Laplace (PCA L) We learn the covarance matrx Σ from the tranng set. z = Wx log L(x) = log L(z ; 0, 1) + log W where L s the Laplace dstrbuton and the sum s over the leadng N 1 coeffcents (dscardng the DC component). Denosng Denosng wth ths model s done usng a Wener flter n the whtened doman, coeffcent by coeffcent, and then transformng back to pxel space. 1.4 ICA We learn the covarance matrx Σ from the tranng set and use t to whten the data. Then, to learn the rotaton matrx V we use gradent ascent on the log lkelhood usng a factoral dstrbuton wth Laplace margnals. z = VWx log L(x) = log L(z ; 0, 1) + log W where L s the Laplace dstrbuton and the sum s over the leadng N 1 coeffcents (dscardng the DC component). V s orthonormal so t does not change the log determnant part. 2

3 Denosng We tred denosng n two ways (and took the better result). Frst, we tred to estmate the clean coeffcent numercally by maxmzng the posteror coeffcent wth gradent ascent. Ths produced reasonable results. Second, we used SPAMS [4] - a sparse codng code package whch s very fast and effcent, and used t to estmate the clean coeffcents and usng them to produce the clean patches Overcomplete sparse codng (2 OCSC) We learn the model usng the code by Culpepper et al.[1] - ths s an effcent code package for learnng these types of models. We also tred to learn ths model by approxmatng the posteror wth the Laplace approxmaton [3], results were smlar. Snce the partton functon of ths model s ntractable we use Hamltonan Annealed Importance Samplng [5] (HAIS) to estmate t and normalze the model. We have verfed HAIS to gve accurate estmates for the complete case (ICA above) and made sure we use enough partcles to converge wth the overcomplete case (see below). We use 10,000 partcles for the estmaton. Denosng We tred denosng n two ways (and took the better result). Frst, we tred to estmate the clean coeffcent numercally by maxmzng the posteror coeffcent wth gradent ascent. Ths produced reasonable results. Second, we used SPAMS [4] - a sparse codng code package whch s very fast and effcent, and used t to estmate the clean coeffcents and usng them to produce the clean patches. 1.6 GSM We learn the covarance matrx Σ from the tranng set patches, and the scalng varables z k and mxng coeffcent π k usng plan EM. We learn a mxture of 16 components. The lkelhood of a gven patch x under ths model s: ( ) log L(x) = log π k N (x; 0, zkσ) 2 Denosng We used the approxmate MAP procedure we use for the GMM model, see below for detals. 1.7 MoGSM k We learn the MoGSM model usng EM - ths s smlar to learnng any mxture model, wth the mxture components beng GSMs as above. We learn a mxture of 5 GSMs, each havng 8 scale components (as n [6]). The lkelhood of a gven patch s: 3

4 log L(x) log L(x) log (Num Samples) (a) log (Num Samples) (b) Fgure 1: Estmated lkelhood as a functon of number of partcles for the Karkln and Lewck model. log L(x) = log ( k π kgsm(x; 0, z 2 k Σ)) where GSM s the lkelhood of a GSM model as above. Denosng We used the approxmate MAP procedure we use for the GMM model, see below for detals. 1.8 Karkln and Lewck (KL) We learn the model usng the code suppled by the authors of the model [2] on our tranng set. We kept the same proporton of latent varables and dctonary elements to patch szes as the orgnal paper. For 8 8 model we used 25 latent varables and 160 dctonary elements. For the model we used 96 latent varables and 640 dctonary elements. Snce the partton functon of ths model s ntractable we use Hamltonan Annealed Importance Samplng [5] (HAIS) to estmate t and normalze the model. Fgure 1 depcts the convergence of the estmaton on 5 samples as a functon of number of partcles - note that t converges qute rapdly. In both experments we used 10,000 partcles whch seem to gve an accurate answer. See the orgnal paper for detals on ths. Denosng Snce the model does not nclude a nose term calculatng the MAP n the nosy case s very hard. What we do s a smlar procedure to the GMM approxmate MAP below. For each nosy patch y we estmate the latent varables by usng the orgnal MAP code suppled by the authors of the model. Then, usng these latent varables we calculate the correspondng covarance matrx Σ 1 and use t to denose the patch by Wener flterng t: ˆx 1 = (Σ 1 + Σ n ) 1 Σ 1 y where Σ n s the nose covarance. Usng the estmated patch ˆx 1 we repeat the process to obtan a second covarance matrx Σ 2. We then us Σ 2 to perform another Wener flterng on the orgnal nosy patch to obtan the fnal clean estmate. 4

5 Log Lkelhood (bts/pxel) GMM 2 2 GMM 4 4 GMM 8 8 GMM log 2 (Num Components) (a) Log Lkelhood PSNR (db) GMM GMM 4 4 GMM 8 8 GMM log 2 (Num Components) (b) Log Lkelhood Fgure 2: Lkelhood and denosng performance as a functon of number of components, smlar to Fgure?? n the paper. 1.9 GMM We learn a 200 components GMM usng a mn-batch verson of EM. The GMM has full covarance matrces and zero means. At each teraton we sample a 20,000 or 50,000 patches (for the 8 8 and models respectvely) patches from the complete tranng set and calculate an E and an M step over t - then we update the current mxture model by takng a small step η towards the newly learned parameters. We regularze the covarance matrces by addng ɛi to the covarance of each component (ɛ = 10 5 ). We ran the learnng for 4000 teratons, decreasng η slghtly at each teraton. The code we use s based upon code avalable onlne 1. The lkelhood of a gven patch x under ths model s: ( ) log L(x) = log π k N (x; 0, Σ k ) Denosng k We used an approxmate MAP procedure. We frst calculate the assgnment probablty of each of the K mxture components. We then choose the most lkely one k max and use ts covarance matrx to perform Wener flter such that: ˆx = (Σ kmax + Σ n ) 1 Σ kmax y 2 Number of components experment Fgure 2 depcts the results for 2 2, 4 4 and 8 8 patches - note that the behavor s smlar across patch szes. 3 More components of the GMM Fgure 3 depcts some more components from the GMM, along wth samples from each components. Components are ordered by order of mxng weght, n decreasng order (more lkely ones come frst). Note the vared dfferent structures each component produces

6 Fgure 3: Egenvectors and samples from dfferent components of the GMM model. 6

7 References [1] B.J. Culpepper, J. Sohl-Dcksten, and B.A. Olshausen. Buldng a better probablstc model of mages by factorzaton. In Computer Vson (ICCV), 2011 IEEE Internatonal Conference on. IEEE, [2] Y. Karkln and M.S. Lewck. Emergence of complex cell propertes by learnng to generalze n natural scenes. Nature, November [3] M.S. Lewck and B.A. Olshausen. Probablstc framework for the adaptaton and comparson of mage codes. JOSA A, 16(7): , [4] J. Maral, F. Bach, J. Ponce, and G. Sapro. Onlne dctonary learnng for sparse codng. In Proceedngs of the 26th Annual Internatonal Conference on Machne Learnng, pages ACM, [5] J Sohl-Dcksten and BJ Culpepper. Hamltonan annealed mportance samplng for partton functon estmaton [6] L. Thes, S. Gerwnn, F. Snz, and M. Bethge. In all lkelhood, deep belef s not enough. The Journal of Machne Learnng Research, : ,

MLE and Bayesian Estimation. Jie Tang Department of Computer Science & Technology Tsinghua University 2012

MLE and Bayesian Estimation. Jie Tang Department of Computer Science & Technology Tsinghua University 2012 MLE and Bayesan Estmaton Je Tang Department of Computer Scence & Technology Tsnghua Unversty 01 1 Lnear Regresson? As the frst step, we need to decde how we re gong to represent the functon f. One example: