A Generative Perspective on MRFs in Low-Level Vision Supplemental Material

Size: px

Start display at page:

Download "A Generative Perspective on MRFs in Low-Level Vision Supplemental Material"

Gary Fisher
5 years ago
Views:

1 A Generative Perspective on MRFs in Low-Level Vision Supplemental Material Uwe Schmidt Qi Gao Stefan Roth Department of Computer Science, TU Darmstadt 1. Derivations 1.1. Sampling the Prior We first rewrite the model density from Eqs. 1) and 2) of the main paper as px; Θ) = z 1 2 ZΘ) e ɛ x /2 c C N pz ic ) N J T i x c) ;, σi 2 /s zic ), 1) i=1 where we treat the scales z {1,..., J} N C for each expert and clique as random variables with pz ic ) = α izic i.e., the GSM mixture weights). Instead of marginalizing out the scales, we can also retain them explicitly and define the joint distribution cf. [13]) px, z; Θ) = 1 N 2 ZΘ) e ɛ x /2 pz ic ) N J T i x c) ;, σi 2 /s zic ). 2) c C i=1 The conditional distribution px z; Θ) can be derived as the multivariate Gaussian px z; Θ) e ɛ x 2 /2 c C 12 xt exp N i=1 N x;, ɛi + exp s ) z ic ) J T 2 2σi 2 i x c) ) ) N s zic σ 2 w ic wic T x i=1 c C i N ) ) 1 W i Z i Wi T, ɛi + i=1 where the w ic are defined such that w T ic x is the result of applying filter J i to clique c of the image x. Z i = diag{s zic /σ 2 i } are diagonal matrices with entries for each clique, and W i are filter matrices that correspond to a convolution of the image with filter J i, i.e. W T i x = [wt ic 1 x,..., w T ic C x] T = [J T i x C 1),..., J T i x C C )] T. Following Levi and Weiss [4, 11], we further rewrite the covariance as the matrix product Σ = ɛi + ) 1 N W i Z i Wi T = [W 1,..., W N, I] i=1 Z Z N ɛi W T 1. WN T I and sample y N, I) to obtain a sample x from px z; Θ) by solving the least-squares problem The first two authors contributed equally to this work. 1 = WZW T) 1 WZW T x = W Zy. 5) 3) 4) 1

2 By using the well-known property it follows that y N, I) Ay N, AIA T ), 6) x = WZW T) 1 WZW W Zy N T x;, ) 1 ) WZW W Z T I ) 1 ) ) T W Z N x;, WZW T) ) 7) 1 is indeed a valid sample from the conditional distribution as derived in Eq. 3). Since the scales are conditionally independent given the image by construction, the conditional distribution pz x; Θ) is readily given as 1.2. Conditional Sampling pz ic x; Θ) pz ic ) N J T i x c) ;, σ 2 i /s zic ). 8) In order to avoid extreme values at the less constrained boundary pixels [5] during learning and model analysis, or to perform inpainting of missing pixels given the known ones, we rely on conditional sampling. In particular, we sample the pixels x A given fixed x B and scales z according to the conditional Gaussian distribution px A x B, z; Θ), 9) where A and B denote the index sets of the respective pixels. Without loss of generality, we assume that x = [ xa x B ], Σ = WZW T) [ ] 1 1 A C = C T, 1) B where the square sub-matrix A has as many rows and columns as the vector x A has elements, etc. The conditional distribution of interest can now be derived as px A x B, z; Θ) exp 1 [ ] T [ ] [ ] ) xa A C xa 2 x B C T B x B exp 1 xa + A 1 ) T Cx B A xa + A 1 ) ) 11) Cx B 2 N x A ; A 1 Cx B, A 1). The matrices A and C are given by the appropriate sub-matrices of W i and Z i, and allow for the same efficient sampling scheme. The mean µ = A 1 Cx B can also be computed by solving a least squares problem. Sampling the conditional distribution of scales pz x A, x B ; Θ) = pz x; Θ) remains as before Sampling the Posterior for Image Denoising Assuming additive i. i. d. Gaussian noise with known standard derivation σ, the posterior given scales z can be written as px y, z; Θ) py x) px z; Θ) exp 1 2σ 2 y x 2) exp 1 ) 2 xt Σ 1 x exp 1 2x T y ) )) I 2 σ 2 + xt σ 2 + Σ 1 x N x; Σy/σ 2, Σ ), 12) where Σ = I/σ 2 + Σ 1) 1 and Σ as in Eq. 4). The conditional distribution of the scales pz x, y; Θ) = pz x; Θ) remains as before. 2

3 2. Image Restoration To further illustrate the image restoration performance of our approach, we provide the following additional results: Table 1 repeats Tab. 1 of the main paper and additionally gives the numerical results of MAP estimation with graph cuts and α-expansion [1]. Note that in most cases, α-expansion performs slightly worse in terms of PSNR than conjugate gradients, even and in fact particularly) for non-convex potentials. Also, using a Student-t potential [3] does not show favorable results. Table 2 shows the results of the same experiment as in Tab. 1, but reports the performance in terms of the perceptually more relevant structural similarity index SSIM) [1]. Note that all of the conclusions reported in the main paper also hold for this perceptual quality metric. Table 3 repeats Tab. 2 of the main paper, and additionally reports standard deviations as well as SSIM performance. The SSIM supports the same conclusions about relative performance as the PSNR. Figs. 1 6 show denoising results for 6 of the 68 images, for which the average performance is reported in Tab. 2 of the main paper. Note that in contrast to the tested previous approaches, combining our learned models with MMSE leads to good performance on relatively smooth as well as on strongly textured images. Fig. 7 provides a different view of the summary results in Tab. 2 of the paper. Instead of the average performance, we show a per-image comparison between the denoising results of the discriminative approach of [8] using MAP) and the results of our generatively-trained 3 3 FoE using MMSE). Note that the PSNR and particularly the SSIM show a substantial performance advantage for our approach. Fig. 8 shows an uncropped version of the inpainting result in Fig. 7 of the paper. Additionally, one other inpainting result is provided as further visual illustration. 3. Sampling the Prior and Posterior The following additional results illustrate properties of the auxiliary-variable Gibbs sampler. Fig. 9 shows five subsequent samples after reaching the equilibrium distribution) from all models listed in Table 1. Note how samples from common pairwise models appear too grainy, while those from previous FoE models are too smooth and without discontinuities. Fig. 1 shows two larger samples from our learned models. Note that our pairwise model leads to locally uniform samples with occasional discontinuities that appear spatially isolated speckles ). Our learned high-order model, on the other hand, leads to smoothly varying samples with occasional spatially correlated discontinuities, which appear more realistic. Fig. 11 illustrates the convergence of the sampling procedure for the prior and the posterior in case of denoising). Fig. 12 illustrates the advantages of running multiple parallel samplers. References [1] Y. Boykov, O. Veksler, and R. Zabih. Fast approximate energy minimization via graph cuts. PAMI, 2311): , 21. [2] A. Buades, B. Coll, and J.-M. Morel. A non-local algorithm for image denoising. CVPR 25. [3] X. Lan, S. Roth, D. P. Huttenlocher, and M. J. Black. Efficient belief propagation with learned higher-order Markov random fields. ECCV 26. [4] E. Levi. Using natural image priors Maximizing or sampling? Master s thesis, The Hebrew University of Jerusalem, 29. [5] M. Norouzi, M. Ranjbar, and G. Mori. Stacks of convolutional restricted Boltzmann machines for shift-invariant feature learning. CVPR, 29. [6] J. Portilla, V. Strela, M. J. Wainwright, and E. P. Simoncelli. Image denoising using scale mixtures of Gaussians in the wavelet domain. IEEE TIP, 1211): , 23. [7] S. Roth and M. J. Black. Fields of experts. IJCV, 822):25 229, 29. [8] K. G. G. Samuel and M. F. Tappen. Learning optimized MAP estimates in continuously-valued MRF models. CVPR 29. 3

4 Model MAP λ=1) MAP opt. λ) MMSE conj. gradient α-expansion conj. gradient α-expansion σ=1 σ=2 σ=1 σ=2 σ=1 σ=2 σ=1 σ=2 σ=1 σ=2 pairwise marginal fitting) pairwise generalized Laplacian [9]) pairwise Laplacian) pairwise Student-t [3]) pairwise ours) FoE from [7] FoE from [12] FoE ours) Table 1. Average PSNR db) of denoising results for 1 test images [3]. Model MAP λ=1) MAP opt. λ) MMSE conj. gradient α-expansion conj. gradient α-expansion σ=1 σ=2 σ=1 σ=2 σ=1 σ=2 σ=1 σ=2 σ=1 σ=2 pairwise marginal fitting) pairwise generalized Laplacian [9]) pairwise Laplacian) pairwise Student-t [3]) pairwise ours) FoE from [7] FoE from [12] FoE ours) Table 2. Average SSIM [1] of denoising results for 1 test images [3]. PSNR in db SSIM [1] Model Learning Inference average std. dev. average std. dev. 5 5 FoE from [7] CD generative) MAP w/λ FoE from [8] discriminative MAP pairwise ours) CD generative) MMSE FoE ours) CD generative) MMSE Non-local means [2] MMSE) BLS-GSM [6] MMSE Table 3. Denoising results for 68 test images [7, 8] σ = 25). [9] M. F. Tappen, B. C. Russell, and W. T. Freeman. Exploiting the sparse derivative prior for super-resolution and image demosaicing. Int. Workshop SCTV, 23. [1] Z. Wang, A. C. Bovik, H. R. Sheikh, and E. P. Simoncelli. Image quality assessment: From error visibility to structural similarity. IEEE TIP, 134):6 612, 24. [11] Y. Weiss. Personal communication, 25. [12] Y. Weiss and W. T. Freeman. What makes a good model of natural images? CVPR 27. [13] M. Welling, G. E. Hinton, and S. Osindero. Learning sparse topographic representations with products of Student-t distributions. NIPS*22. 4

5 a) Original b) Noisy σ = 25), PSNR = 2.29dB, SSIM =.31 c) Pairwise Laplacian), PSNR = 27.88dB, SSIM =.763 d) Pairwise g. Lapl. [9]), PSNR = 27.65dB, SSIM =.749 e) Pairwise ours), PSNR = 28.34dB, SSIM =.779 f) 3 3 FoE ours), PSNR = 28.7dB, SSIM =.829 g) 5 5 FoE from [7], PSNR = 28.52dB, SSIM =.816 h) 5 5 FoE from [8], PSNR = 28.51dB, SSIM =.89 Figure 1. Denoising results for test image Castle : e, f) MMSE, c, d, g) MAP w/λ, h) MAP. 5

6 a) Original b) Noisy σ = 25), PSNR = 2.22dB, SSIM =.297 c) Pairwise Laplacian), PSNR = 28.78dB, SSIM =.757 d) Pairwise g. Lapl. [9]), PSNR = 27.87dB, SSIM =.711 e) Pairwise ours), PSNR = 29.dB, SSIM =.768 f) 3 3 FoE ours), PSNR = 29.79dB, SSIM =.82 g) 5 5 FoE from [7], PSNR = 29.16dB, SSIM =.794 h) 5 5 FoE from [8], PSNR = 29.35dB, SSIM =.82 Figure 2. Denoising results for test image Birds : e, f) MMSE, c, d, g) MAP w/λ, h) MAP. 6

794 f) 3 3 FoE ours), PSNR = 27.dB, SSIM =.813 g) 5 5 FoE from [7], PSNR = 26.84dB, SSIM =.

7 a) Original b) Noisy σ = 25), PSNR = 2.71dB, SSIM =.57 c) Pairwise Laplacian), PSNR = 26.32dB, SSIM =.789 d) Pairwise g. Lapl. [9]), PSNR = 26.12dB, SSIM =.781 e) Pairwise ours), PSNR = 26.51dB, SSIM =.794 f) 3 3 FoE ours), PSNR = 27.dB, SSIM =.813 g) 5 5 FoE from [7], PSNR = 26.84dB, SSIM =.792 h) 5 5 FoE from [8], PSNR = 27.6dB, SSIM =.817 Figure 3. Denoising results for test image LA : e, f) MMSE, c,d, g) MAP w/λ, h) MAP. 7

8 a) Original b) Noisy σ = 25), PSNR = 2.34dB, SSIM =.475 c) Pairwise Laplacian), PSNR = 25.93dB, SSIM =.674 d) Pairwise g. Lapl. [9]), PSNR = 25.36dB, SSIM =.647 e) Pairwise ours), PSNR = 26.12dB, SSIM =.685 f) 3 3 FoE ours), PSNR = 26.27dB, SSIM =.689 g) 5 5 FoE from [7], PSNR = 25.36dB, SSIM =.592 h) 5 5 FoE from [8], PSNR = 26.19dB, SSIM =.686 Figure 4. Denoising results for test image Goat : e, f) MMSE, c, d, g) MAP w/λ, h) MAP. 8

a) Original b) Noisy σ = 25), PSNR = 22.44dB, SSIM =.278 c) Pairwise Laplacian), PSNR = 28.77dB, SSIM =.838 d) Pairwise g. Lapl. [9]), PSNR = 28.65dB, SSIM =.831 e) Pairwise ours), PSNR = 28.

9 a) Original b) Noisy σ = 25), PSNR = 22.44dB, SSIM =.278 c) Pairwise Laplacian), PSNR = 28.77dB, SSIM =.838 d) Pairwise g. Lapl. [9]), PSNR = 28.65dB, SSIM =.831 e) Pairwise ours), PSNR = 28.81dB, SSIM =.829 f) 3 3 FoE ours), PSNR = 28.72dB, SSIM =.834 g) 5 5 FoE from [7], PSNR = 28.52dB, SSIM =.82 h) 5 5 FoE from [8], PSNR = 3.99dB, SSIM =.81 Figure 5. Denoising results for test image Wolf : e, f) MMSE, c, d, g) MAP w/λ, h) MAP. 9

10 a) Original b) Noisy σ = 25), PSNR = 2.21dB, SSIM =.136 c) Pairwise Laplacian), PSNR = 32.78dB, SSIM =.89 d) Pairwise g. Lapl. [9]), PSNR = 32.36dB, SSIM =.87 e) Pairwise ours), PSNR = 33.51dB, SSIM =.829 f) 3 3 FoE ours), PSNR = 35.28dB, SSIM =.931 g) 5 5 FoE from [7], PSNR = 35.dB, SSIM =.938 h) 5 5 FoE from [8], PSNR = 33.63dB, SSIM =.881 Figure 6. Denoising results for test image Airplane : e, f) MMSE, c, d, g) MAP w/λ, h) MAP. 1

PSNR db) using our 3x3 FoE 36 34 32 3 28 26 Airplane Wolf SSIM using our 3x3 FoE.9.8.7.

Comparing the denoising performance σ = 25) in terms of a) PSNR and b) SSIM for 68 test images between our 3 3 FoE using MMSE) and the 5 5 FoE from [8] using MAP).

11 PSNR db) using our 3x3 FoE Airplane Wolf SSIM using our 3x3 FoE Ours better Samuel and Tappen better PSNR db) using FoE of Samuel and Tappen a).5 Ours better Samuel and Tappen better SSIM using FoE of Samuel and Tappen b) Figure 7. Comparing the denoising performance σ = 25) in terms of a) PSNR and b) SSIM for 68 test images between our 3 3 FoE using MMSE) and the 5 5 FoE from [8] using MAP). A red circle above the black line means performance is better with our approach. a) Original photograph b) Restored with our pairwise MRF c) Original photograph Figure 8. MMSE-based image inpainting with our learned models. d) Restored with our 3 3 FoE 11

12 a) Pairwise, ours b) Pairwise, marginal fitting c) Pairwise, generalized Laplacian from [9] d) Pairwise, Laplacian e) 3 3 FoE, ours f) 5 5 FoE from [7] x g) FoE from [12] convolution with circular boundary handling, no pixels removed) Figure 9. Five subsequent samples l. to r.) from various MRF models after reaching the equilibrium distribution. The boundary pixels are removed for better visualization x

25 1 2 5 15 1 5 5 1 a) Pairwise MRF 5 b) 3 3 FoE Figure 1. 256 256 pixel sample from our learned models after reaching the equilibrium distribution.

8 1 2 3 4 5 # of iterations b) Posterior Figure 11. Monitoring the convergence of sampling. a) Sampling a 5 5 image from the learned pairwise MRF prior conditioned on a 1-pixel boundary.

13 a) Pairwise MRF 5 b) 3 3 FoE Figure pixel sample from our learned models after reaching the equilibrium distribution. The boundary pixels are removed for better visualization. 15 energy 2.6 x energy 4.6 x # of iterations a) Prior # of iterations b) Posterior Figure 11. Monitoring the convergence of sampling. a) Sampling a 5 5 image from the learned pairwise MRF prior conditioned on a 1-pixel boundary. Three chains and over-dispersed starting points red, dashed interior of the boundary image; blue, solid medianfiltered version; black, dash-dotted noisy version). Approximate convergence is reached after 25 iterations ˆR < 1.1). b) Sampling the posterior σ = 2, image size 16 24) with four chains and over-dispersed starting points red, dashed noisy image; blue, dash-dotted Gauss filtered version; green, solid median filtered version; black, dotted Wiener filtered version). Approximate convergence is reached after 24 iterations PSNR PSNR samplers 4 samplers 26 2 samplers 1 sampler # of iterations a) Assuming parallel computing 26 1 samplers 4 samplers 2 samplers 1 sampler # of samples b) Assuming sequential computing Figure 12. Efficiency of sampling-based MMSE denoising with different number of samplers. Learned pairwise MRF, σ = 2, image size a) In case of parallel computing one sampler per computing core), faster convergence of the denoised image can be achieved. b) Even when using sequential computing, multiple samplers can improve performance, as the samples are less correlated. 13

Modeling Natural Images with Higher-Order Boltzmann Machines

Modeling Natural Images with Higher-Order Boltzmann Machines Marc'Aurelio Ranzato Department of Computer Science Univ. of Toronto ranzato@cs.toronto.edu joint work with Geoffrey Hinton and Vlad Mnih CIFAR