Sélection adaptative des paramètres pour le débruitage des images

Size: px

Start display at page:

Download "Sélection adaptative des paramètres pour le débruitage des images"

Cameron White
6 years ago
Views:

Journées SIERRA 2014, Saint-Etienne, France, 25 mars, 2014 Sélection adaptative des paramètres pour le débruitage des images Adaptive selection of parameters for image denoising Charles Deledalle 1

1 Journées SIERRA 2014, Saint-Etienne, France, 25 mars, 2014 Sélection adaptative des paramètres pour le débruitage des images Adaptive selection of parameters for image denoising Charles Deledalle 1 Joint work with: Loïc Denis 2, Charles Dossal 1, Vincent Duval 3, Jalal Fadili 4, Gabriel Peyré 3, Joseph Salmon 5, Florence Tupin 5 and Samuel Vaiter 3 1 Institut de Mathématiques de Bordeaux, CNRS-Université Bordeaux 1, France 2 Laboratoire Hubert-Curien, Univ. Jean Monnet, Univ. Lyon, France 3 CEREMADE, CNRS-Paris Dauphine, France 4 GREYC, CNRS-ENSICAEN, France 5 Institut Mines-Télécom, Télécom-ParisTech, CNRS LTCI, France 25 mars, 2014 C. Deledalle (CNRS/IMB) Risk estimation 25 mars, / 29

the suitable model/parameters for a given image C.

2 Motivations: model/parameter selection Model Input image Smooth Piecewise constant Goal: to pick up the suitable model/parameters for a given image C. Deledalle (CNRS/IMB) Risk estimation 25 mars, / 29

Motivations: risk based selection Risk definition Evaluate a cost for each parameter of a given filter for a given image, Usually task specific, Usually defined wrt

3 Motivations: risk based selection Risk definition Evaluate a cost for each parameter of a given filter for a given image, Usually task specific, Usually defined wrt the unknown image that we attempt to recover. Goal: to find the least risky model / set of parameters θ C. Deledalle (CNRS/IMB) Risk estimation 25 mars, / 29

4 Motivations: square error risk Square error risk Define the risk as the square error: R(θ) = E Y } {{ } f θ (Y ) } {{ } x 0 2 where } {{ } Y = } {{ } x 0 + } {{ } W where f θ : R N R N is a denoiser of parameter θ, x 0 R N and W N (0, σ 2 Id). Bias-variance decomposition E Y f θ (Y ) x 0 2 = E Y [f θ (Y )] x E Y f θ (Y ) E Y [f θ (Y )] }{{}}{{} Bias 2 Variance Fidelity-complexity decomposition [Mallows, 1973, Efron, 1986] Y E Y f θ (Y ) x 0 2 = E Y f θ (Y ) Y 2 Nσ 2 +2σ 2 x0 E Y, f θ(y ) x 0 }{{} σ σ Fidelity }{{} Degree of freedom C. Deledalle (CNRS/IMB) Risk estimation 25 mars, / 29

5 Motivations: risk interpretation R(θ) E Y f θ (Y ) x 0 2 = E Y [f θ (Y )] x E Y f θ (Y ) E Y [f θ (Y )] }{{}}{{} Bias 2 Variance Y = E Y f θ (Y ) Y 2 Nσ 2 +2σ 2 x0 E Y, f θ(y ) x 0 }{{} σ σ Fidelity }{{} Degree of freedom Example name f θ (y) R(θ) Bias 2 Variance Fidelity Complexity identity y Nσ 2 0 Nσ 2 Nσ 2 2Nσ 2 oracle x null 0 x 0 2 x x For an orthogonal projector, the degree of freedom is the dimension of the target space Risk Variance Bias 2 Fidelity Complexity Cost Parameter θ C. Deledalle (CNRS/IMB) Risk estimation 25 mars, / 29

6 Motivations: limits of the risk based selection Model Input image Smooth Piecewise constant What if none of the models/sets of parameters are suitable for the whole image? Selection should be performed locally A map of risk can be defined similarly for each pixel, as well as: a map of bias a map of variance a map of fidelity a map of complexity C. Deledalle (CNRS/IMB) Risk estimation 25 mars, / 29

7 Motivations: local bias-variance trade-off Input and estimates Variance Bias 2 C. Deledalle (CNRS/IMB) Risk estimation 25 mars, / 29

8 Goal: risk estimation R(θ) E Y f θ (Y ) x 0 2 = E Y [f θ (Y )] x E Y f θ (Y ) E Y [f θ (Y )] }{{}}{{} Bias 2 Variance Y = E Y f θ (Y ) Y 2 Nσ 2 +2σ 2 x0 E Y, f θ(y ) x 0 }{{} σ σ Fidelity }{{} Degree of freedom Challenge All quantities are defined as an expectation over Y : but we only know a single realization y of Y. Some quantities depends on the image x 0: but this is the image that we attempt to recover, hence unknown. Goal: 1 choose several models/sets of parameters: f θ1, f θ2,... 2 estimate the risk for each of them: R(θ 1 ), R(θ 2 ),... (without knowing x 0 and from the single observed realization y) 3 select globally or locally the least risky set of parameters C. Deledalle (CNRS/IMB) Risk estimation 25 mars, / 29

9 Outline 1 Global selection based on Stein s Unbiased Risk estimator Stein s Unbiased Risk Estimator (SURE) Simple cases: non-local means, soft-thresholding Non differentiable cases: hard-thresholding Monte-Carlo SURE Case of total-variation 2 Local selection with risk estimation and bias reduction Risk estimation: SURE filtering Case of anisotropic non-local means Bias reduction-variance estimation approach Case of anisotropic Gaussian convolutions Case of non-local means Case of anisotropic total-variation 3 Conclusion C. Deledalle (CNRS/IMB) Risk estimation 25 mars, / 29

10 Outline 1 Global selection based on Stein s Unbiased Risk estimator Stein s Unbiased Risk Estimator (SURE) Simple cases: non-local means, soft-thresholding Non differentiable cases: hard-thresholding Monte-Carlo SURE Case of total-variation 2 Local selection with risk estimation and bias reduction Risk estimation: SURE filtering Case of anisotropic non-local means Bias reduction-variance estimation approach Case of anisotropic Gaussian convolutions Case of non-local means Case of anisotropic total-variation 3 Conclusion C. Deledalle (CNRS/IMB) Risk estimation 25 mars, / 29

11 Stein s Unbiased Risk Estimator (SURE) Motivation Y R(θ) = E Y f θ (Y ) Y 2 Nσ 2 +2σ 2 x0 E Y, f θ(y ) x 0 }{{} σ σ Fidelity }{{} Degree of freedom Can we express the degree of freedom wrt Y only independently of x 0? Lemma (Stein s Lemma [Stein, 1981]) Assume y f θ (y) is continuous and almost everywhere differentiable (+ some mild conditions), then Y x0 E Y, f [ θ(y ) x 0 = E Y tr f ] θ(y) [ ] σ σ y = E Y div y f θ (y) y=y. y=y Proof. (For N = 1). + E Y [(Y x 0)(f θ (y) x 0)] = (f θ (y) x 0) (y x 0)G σ(y x 0) dy }{{} DOG ] + + (IBP ) = σ [(f 2 θ (y) x 0)G σ(y x 0) +σ 2 f θ (y) G σ(y x 0)dy y }{{}}{{} [ =0 ] f =E θ (y) Y y y=y C. Deledalle (CNRS/IMB) Risk estimation 25 mars, / 29

12 Stein s Unbiased Risk Estimator (SURE) Motivation Y R(θ) = E Y f θ (Y ) Y 2 Nσ 2 +2σ 2 x0 E Y, f θ(y ) x 0 }{{} σ σ Fidelity }{{} Degree of freedom Can we express the degree of freedom wrt Y only independently of x 0? Lemma (Stein s Lemma [Stein, 1981]) Assume y f θ (y) is continuous and almost everywhere differentiable (+ some mild conditions), then Y x0 E Y, f [ θ(y ) x 0 = E Y tr f ] θ(y) [ ] σ σ y = E Y div y f θ (y) y=y. y=y Theorem (Stein s unbiased risk estimator (SURE)) Assume y f θ (y) is as in Stein s lemma, then SURE(y, θ) = f θ (y) y 2 Nσ 2 +2σ 2 tr f θ(y) }{{} y y Sample fidelity }{{} Sample DoF is an unbiased risk estimator: E Y [SURE(Y, θ)] = R(θ). Remark: Law of large numbers SURE(y, θ) R(θ) C. Deledalle (CNRS/IMB) Risk estimation 25 mars, / 29

13 SURE Simple case: non-local means Example (Averaging filters [Van De Ville and Kocher, 2009, Duval et al., 2011]) j Consider an averaging filter of the form: f h (y) i = wi,j(y)yj j wi,j(y) where w i,j : R N R + is a weight function, for instance: Isotropic convolution w i,j(y) = ϕ h ( i j ) Sigma filter w i,j(y) = ϕ h ( y i y j ) [Lee, 1983, Yaroslavsky, 1985] Non-local means w i,j(y) = ϕ h ( P iy P jy ) [Buades et al., 2005] C. Deledalle (CNRS/IMB) Risk estimation 25 mars, / 29

14 SURE Simple case: non-local means Example (Averaging filters [Van De Ville and Kocher, 2009, Duval et al., 2011]) j Consider an averaging filter of the form: f h (y) i = wi,j(y)yj j wi,j(y) where w i,j : R N R + is a weight function, for instance: Isotropic convolution w i,j(y) = ϕ h ( i j ) Sigma filter w i,j(y) = ϕ h ( y i y j ) [Lee, 1983, Yaroslavsky, 1985] Non-local means w i,j(y) = ϕ h ( P iy P jy ) [Buades et al., 2005] An estimator of its DOF is then given by: tr f h (y) y = N f h (y) i y i=1 y y i where f h(y) i 1 y = i y i,j wi,j(y) w i,i(y) + w i,j(y) y j w i,j(y) y f h (y) i j i y y j i y Risk SURE Quadratic cost (a) Noisy image y (b) f h (y) with small h Bandwidth h C. Deledalle (CNRS/IMB) Risk estimation 25 mars, / 29

15 SURE Simple case: non-local means Example (Averaging filters [Van De Ville and Kocher, 2009, Duval et al., 2011]) j Consider an averaging filter of the form: f h (y) i = wi,j(y)yj j wi,j(y) where w i,j : R N R + is a weight function, for instance: Isotropic convolution w i,j(y) = ϕ h ( i j ) Sigma filter w i,j(y) = ϕ h ( y i y j ) [Lee, 1983, Yaroslavsky, 1985] Non-local means w i,j(y) = ϕ h ( P iy P jy ) [Buades et al., 2005] An estimator of its DOF is then given by: tr f h (y) y = N f h (y) i y i=1 y y i where f h(y) i 1 y = i y i,j wi,j(y) w i,i(y) + w i,j(y) y j w i,j(y) y f h (y) i j i y y j i y Risk SURE Quadratic cost (a) Noisy image y (c) f h (y) with large h Bandwidth h C. Deledalle (CNRS/IMB) Risk estimation 25 mars, / 29

16 SURE Simple case: non-local means Example (Averaging filters [Van De Ville and Kocher, 2009, Duval et al., 2011]) j Consider an averaging filter of the form: f h (y) i = wi,j(y)yj j wi,j(y) where w i,j : R N R + is a weight function, for instance: Isotropic convolution w i,j(y) = ϕ h ( i j ) Sigma filter w i,j(y) = ϕ h ( y i y j ) [Lee, 1983, Yaroslavsky, 1985] Non-local means w i,j(y) = ϕ h ( P iy P jy ) [Buades et al., 2005] An estimator of its DOF is then given by: tr f h (y) y = N f h (y) i y i=1 y y i where f h(y) i 1 y = i y i,j wi,j(y) w i,i(y) + w i,j(y) y j w i,j(y) y f h (y) i j i y y j i y Risk SURE Quadratic cost (a) Noisy image y (d) f h (y) with optimal h Bandwidth h C. Deledalle (CNRS/IMB) Risk estimation 25 mars, / 29

17 SURE Simple case: soft-thresholding Example (Soft-Thresholding (ST) [Donoho and Johnstone, 1995, Zou et al., 2007]) y i + λ if y i λ The ST function is defined as: ST(y, λ) i = 0 if λ < y i < λ. y i λ otherwise [ ] { ST(y,λ) 1 if i = j For almost all y, its Jacobian is: y = i,j 0 otherwise and yi > λ. # { Y > λ} is an unbiased estimator of the degree of freedom Value x i Interval ± σ Vector x 0 Threshold λ 0 40 Risk SURE Position index i Threshold λ C. Deledalle (CNRS/IMB) Risk estimation 25 mars, / 29 Quadratic cost

18 SURE Non differentiable case: hard-thresholding Example (Hard-Thresholding (HT)) The HT function is defined as: y i if y i λ HT(y, λ) i = 0 if λ < y i < λ y i otherwise. The HT is not continuous Stein s Lemma does not apply and we cannot estimate R(θ). Can we build a biased estimator? If yes, how the bias evolves with the data dimension? C. Deledalle (CNRS/IMB) Risk estimation 25 mars, / 29

19 SURE Non differentiable case: hard-thresholding Example (Hard-Thresholding (HT)) The HT function is defined as: y i if y i λ HT(y, λ) i = 0 if λ < y i < λ y i otherwise. The HT is not continuous Stein s Lemma does not apply and we cannot estimate R(θ). Can we build a biased estimator? If yes, how the bias evolves with the data dimension? Theorem (Stein Consistent Risk Estimator (SCORE)) Take h N such that lim h N = 0 and lim N 1 h 1 N N N = 0. Then the quantity # { Y > λ} + λ σ 2 + h 2 N 2πσhN N i=1 [ ( ) ( )] exp (Y i +λ)2 2h 2 + exp (Y i λ)2 2h N 2 N is a consistent estimator of the degree of freedom (i.e. convergence in probability). [Deledalle et al., 2013] Deledalle, C., Peyré, G., Fadili, J. (2013). Stein COnsistent Risk Estimator (SCORE) for hard thresholding,. Signal Processing with Adaptive Sparse Structured Representations (SPARS). Lausanne, July 8-11, 2013 C. Deledalle (CNRS/IMB) Risk estimation 25 mars, / 29

20 SURE Non differentiable case: hard-thresholding Example (Hard-Thresholding (HT)) The HT function is defined as: y i if y i λ HT(y, λ) i = 0 if λ < y i < λ y i otherwise. Value x i The HT is not continuous Stein s Lemma does not apply and we cannot estimate R(θ) Can we build a biased estimator? If yes, how the bias evolves with the data dimension? Interval ± σ Vector x 0 Threshold λ Position index i [Deledalle et al., 2013] Deledalle, C., Peyré, G., Fadili, J. (2013). Stein COnsistent Risk Estimator (SCORE) for hard thresholding,. Signal Processing with Adaptive Sparse Structured Representations (SPARS). Lausanne, July 8-11, 2013 Quadratic cost Risk SCORE Jansen s estimator [1] Threshold λ C. Deledalle (CNRS/IMB) Risk estimation 25 mars, / 29

21 SURE Beyond simple cases Example (Favorable examples) For the ST and the HT (separable functions) Jacobian matrices are diagonals Simple derivation of diagonal elements in closed form For some functions (e.g., the non-local means) Jacobian matrices are not diagonals Diagonal elements (and trace) can be computed in closed form Example (Unfavorable examples) In general (e.g., iterative proximal algorithms used in convex regularization such as total-variation) Jacobian is not diagonal Computing its trace requires computing the P P entries of the Jacobian matrix (not thinkable) Use trace estimators C. Deledalle (CNRS/IMB) Risk estimation 25 mars, / 29

22 SURE Monte-Carlo SURE Theorem (Monte-Carlo trace estimator) Let, N (0, Id N ) and A R N N, then tr A = E A. Corollary 1 For N big enough, generate δ R P and approach the sample DOF by [Vonesch et al., 2008] Y x0, f θ(y ) x 0 = tr f θ(y) f θ (y σ σ y δ, f θ (y) y=y y δ where y y δ R P, y 2 or by its finite difference approximation (known as Monte-Carlo SURE [Ramani et al., 2008]) Y x0, f θ(y ) x 0 f θ (y + ε δ ) f θ (y) δ, σ σ ε 1. Requires evaluating only N quantities (compared to N N). X. Not easy to compute in closed-form. 2. Requires only evaluating f θ ( ) twice on y and y + ε δ. X. Choice of ε. C. Deledalle (CNRS/IMB) Risk estimation 25 mars, / 29

23 SURE Case of total-variation Example (Total-Variation [Rudin et al., 1992]) The solution of total variation is given by 1 f θ (y) = argmin x y 2 + λ x x } 2 {{}}{{ 1 } G(x,λ) F (x,y) Reformulation using z = (x, u) in terms of simple function: 1 (x(y), u(y)) = argmin x y 2 + λ u z } ι C (z) {{}}{{}}{{} G F (z,y) 1 (z) G 2 (z) where where x 1 = k ( x) k C = {z = (x, u) \ u = x} C. Deledalle (CNRS/IMB) Risk estimation 25 mars, / 29

24 SURE Case of total-variation Example (Total-Variation [Rudin et al., 1992]) The solution of total variation is given by 1 f θ (y) = argmin x y 2 + λ x x } 2 {{}}{{ 1 } G(x,λ) F (x,y) Reformulation using z = (x, u) in terms of simple function: 1 (x(y), u(y)) = argmin x y 2 + λ u z } ι C (z) {{}}{{}}{{} G F (z,y) 1 (z) G 2 (z) where where x 1 = k ( x) k C = {z = (x, u) \ u = x} Generalized Forward Backward Scheme and Derivatives [Raguet et al., 2011] The following sequence converges to f θ (y) x (l+1) = 1 Q z (l+1) i Q k=1 z (l+1) i Z (l) i = z (l) i x (l) + Prox nγgi (Z (l) i, θ) = 2x (l) z (l) i γ 1F (x (l), y) C. Deledalle (CNRS/IMB) Risk estimation 25 mars, / 29

25 SURE Case of total-variation Example (Total-Variation [Rudin et al., 1992]) The solution of total variation is given by 1 f θ (y) = argmin x y 2 + λ x x } 2 {{}}{{ 1 } G(x,λ) F (x,y) Reformulation using z = (x, u) in terms of simple function: 1 (x(y), u(y)) = argmin x y 2 + λ u z } ι C (z) {{}}{{}}{{} G F (z,y) 1 (z) G 2 (z) where where x 1 = k ( x) k C = {z = (x, u) \ u = x} Generalized Forward Backward Scheme and Derivatives [Raguet et al., 2011] The following sequence converges to f θ (y) x (l+1) = 1 Q z (l+1) i Q k=1 z (l+1) i Z (l) i = z (l) i x (l) + Prox nγgi (Z (l) i, θ) = 2x (l) z (l) i γ 1F (x (l), y) Computation of SURE associated to x (l) (y) depends on D (l) x = x(l) y [ δ ] y C. Deledalle (CNRS/IMB) Risk estimation 25 mars, / 29

26 SURE Case of total-variation Example (Total-Variation [Rudin et al., 1992]) The solution of total variation is given by 1 f θ (y) = argmin x y 2 + λ x x } 2 {{}}{{ 1 } G(x,λ) F (x,y) Reformulation using z = (x, u) in terms of simple function: 1 (x(y), u(y)) = argmin x y 2 + λ u z } ι C (z) {{}}{{}}{{} G F (z,y) 1 (z) G 2 (z) where where x 1 = k ( x) k C = {z = (x, u) \ u = x} Generalized Forward Backward Scheme and Derivatives [Raguet et al., 2011] The following sequence converges to f θ (y) x (l+1) = 1 Q z (l+1) i Q k=1 where z (l+1) i Z (l) i = z (l) i x (l) + Prox nγgi (Z (l) i, θ) = 2x (l) z (l) i γ 1F (x (l), y) Apply the chain rule D (l+1) x = 1 Q D (l+1) z Q i i=1 D (l+1) z i D (l) Z i = D (l) z i = 2D (l) x D (l) x + G(l) i,x (D(l) Z ) i D(l) z i γ(f (l) x F (l) x (.) = 1 1F (x(l), y)[.] and F (l) y (.) = 2 1F (x(l), y)[.], G (l) i,x (.) = 1 Prox nγg i (Z (l) i )[.] Computation of SURE associated to x (l) (y) depends on D (l) x = x(l) y [ δ ] y (D(l) x ) + F (l) y ( δ )) C. Deledalle (CNRS/IMB) Risk estimation 25 mars, / 29

27 SURE Case of total-variation Example (Total-Variation [Rudin et al., 1992]) The solution of total variation is given by 1 f θ (y) = argmin x y 2 + λ x x } 2 {{}}{{ 1 } G(x,λ) F (x,y) Reformulation using z = (x, u) in terms of simple function: 1 (x(y), u(y)) = argmin x y 2 + λ u z } ι C (z) {{}}{{}}{{} G F (z,y) 1 (z) G 2 (z) Generalized Forward Backward Scheme and Derivatives The gradient and proximal operators are Their derivatives are 1F (z, y) = (x y, 0) Prox τg1 (z) = (x, ST(u, λτ)) where where x 1 = k ( x) k C = {z = (x, u) \ u = x} Prox τg2 (z) = ((Id + ) 1 (u + div x), (Id + ) 1 (u + div x)) 1 1F (z, y)[ δ x, δ u] = ( δ x, 0) 2 1F (z, y)[ δ y] = ( δ y, 0) 1 Prox τg1 (z)[ δ x, δ u] = ( δ x, 1ST(u, λτ)[ δ u]) 1 Prox τg2 (z)[ δ x, δ u] = ((Id + ) 1 ( δ u + div δ x), (Id + ) 1 ( δ u + div δ x)) C. Deledalle (CNRS/IMB) Risk estimation 25 mars, / 29

28 SURE Case of total-variation Example (Total-Variation [Rudin et al., 1992]) The solution of total variation is given by 1 f θ (y) = argmin x y 2 + λ x x } 2 {{}}{{ 1 } G(x,λ) F (x,y) Reformulation using z = (x, u) in terms of simple function: 1 (x(y), u(y)) = argmin x y 2 + λ u z } ι C (z) {{}}{{}}{{} G F (z,y) 1 (z) G 2 (z) where where x 1 = k ( x) k C = {z = (x, u) \ u = x} Risk SURE Quadratic cost (a) Noisy image y (b) f h (y) with small h Parameter λ [Deledalle et al., 2012] Deledalle, C.-A. and Vaiter, S. and Peyré, G. and Fadili, J. and Dossal, C. Proximal Splitting Derivatives for Risk Estimation,. Int. Workshop on New Computational Methods for Inverse Problems (NCMIP), Cachan, France, May C. Deledalle (CNRS/IMB) Risk estimation 25 mars, / 29

29 SURE Case of total-variation Example (Total-Variation [Rudin et al., 1992]) The solution of total variation is given by 1 f θ (y) = argmin x y 2 + λ x x } 2 {{}}{{ 1 } G(x,λ) F (x,y) Reformulation using z = (x, u) in terms of simple function: 1 (x(y), u(y)) = argmin x y 2 + λ u z } ι C (z) {{}}{{}}{{} G F (z,y) 1 (z) G 2 (z) where where x 1 = k ( x) k C = {z = (x, u) \ u = x} Risk SURE Quadratic cost (a) Noisy image y (c) f h (y) with large h Parameter λ [Deledalle et al., 2012] Deledalle, C.-A. and Vaiter, S. and Peyré, G. and Fadili, J. and Dossal, C. Proximal Splitting Derivatives for Risk Estimation,. Int. Workshop on New Computational Methods for Inverse Problems (NCMIP), Cachan, France, May C. Deledalle (CNRS/IMB) Risk estimation 25 mars, / 29

30 SURE Case of total-variation Example (Total-Variation [Rudin et al., 1992]) The solution of total variation is given by 1 f θ (y) = argmin x y 2 + λ x x } 2 {{}}{{ 1 } G(x,λ) F (x,y) Reformulation using z = (x, u) in terms of simple function: 1 (x(y), u(y)) = argmin x y 2 + λ u z } ι C (z) {{}}{{}}{{} G F (z,y) 1 (z) G 2 (z) where where x 1 = k ( x) k C = {z = (x, u) \ u = x} Risk SURE Quadratic cost (a) Noisy image y (d) f h (y) with optimal h Parameter λ [Deledalle et al., 2012] Deledalle, C.-A. and Vaiter, S. and Peyré, G. and Fadili, J. and Dossal, C. Proximal Splitting Derivatives for Risk Estimation,. Int. Workshop on New Computational Methods for Inverse Problems (NCMIP), Cachan, France, May C. Deledalle (CNRS/IMB) Risk estimation 25 mars, / 29

31 Outline 1 Global selection based on Stein s Unbiased Risk estimator Stein s Unbiased Risk Estimator (SURE) Simple cases: non-local means, soft-thresholding Non differentiable cases: hard-thresholding Monte-Carlo SURE Case of total-variation 2 Local selection with risk estimation and bias reduction Risk estimation: SURE filtering Case of anisotropic non-local means Bias reduction-variance estimation approach Case of anisotropic Gaussian convolutions Case of non-local means Case of anisotropic total-variation 3 Conclusion C. Deledalle (CNRS/IMB) Risk estimation 25 mars, / 29

32 Risk estimation: SURE filtering Motivation Do the same thing but for each pixel i independently. Assume y f θ (y) is as in Stein s lemma, then SURE(y, θ) i = (f θ (y) i y i) 2 σ 2 +2σ 2 f θ (y) i }{{} y i y Sample fidelity }{{} Sample DoF We have E Y [SURE(Y, θ) i] = R(θ) i = E Y (f θ (y) i x 0i ) 2. C. Deledalle (CNRS/IMB) Risk estimation 25 mars, / 29

33 Risk estimation: SURE filtering Motivation Do the same thing but for each pixel i independently. Assume y f θ (y) is as in Stein s lemma, then SURE(y, θ) i = (f θ (y) i y i) 2 σ 2 +2σ 2 f θ (y) i }{{} y i y Sample fidelity }{{} Sample DoF We have E Y [SURE(Y, θ) i] = R(θ) i = E Y (f θ (y) i x 0i ) 2. But the law of large numbers does not apply anymore! + (a) Risk (b) SURE (c) Sample fidelity (d) Sample DoF First idea: can we regularize the SURE map? C. Deledalle (CNRS/IMB) Risk estimation 25 mars, / 29

34 Risk estimation: SURE filtering SURE filtering The local sample DoF has lower variance than the local sample fidelity. Perform filtering of the SURE map guided by the sample DoF. + (a) Risk (b) SURE (c) Sample fidelity (d) Sample DoF + (a) Filtered SURE (b) Filtered fidelity (c) Filtered DoF C. Deledalle (CNRS/IMB) Risk estimation 25 mars, / 29

35 Risk estimation: SURE filtering Example (Non-local means with oriented patches) Orientation of patches should be spatially adapted to the image content C. Deledalle (CNRS/IMB) Risk estimation 25 mars, / 29

36 Risk estimation: SURE filtering Example (Non-local means with oriented patches) Estimate the map of SURE associated to all estimators Figure: Noisy SURE [Deledalle et al., 2011b] Deledalle, C., Duval, V., and Salmon, J. (2011b). Non-local methods with shape-adaptive patches (NLM-SAP). Journal of Mathematical Imaging and Vision. C. Deledalle (CNRS/IMB) Risk estimation 25 mars, / 29

37 Risk estimation: SURE filtering Example (Non-local means with oriented patches) Estimate the map of SURE associated to all estimators Figure: Noisy SURE [Deledalle et al., 2011b] Deledalle, C., Duval, V., and Salmon, J. (2011b). Non-local methods with shape-adaptive patches (NLM-SAP). Journal of Mathematical Imaging and Vision. C. Deledalle (CNRS/IMB) Risk estimation 25 mars, / 29

Risk estimation: SURE filtering Example (Non-local means with oriented patches) Estimate the map of SURE associated to all estimators Figure: Noisy SURE [Deledalle et al., 2011b] Deledalle, C.

38 Risk estimation: SURE filtering Example (Non-local means with oriented patches) Estimate the map of SURE associated to all estimators Figure: Noisy SURE [Deledalle et al., 2011b] Deledalle, C., Duval, V., and Salmon, J. (2011b). Non-local methods with shape-adaptive patches (NLM-SAP). Journal of Mathematical Imaging and Vision. C. Deledalle (CNRS/IMB) Risk estimation 25 mars, / 29

39 Risk estimation: SURE filtering Example (Non-local means with oriented patches) Estimate the map of SURE associated to all estimators Figure: Noisy SURE [Deledalle et al., 2011b] Deledalle, C., Duval, V., and Salmon, J. (2011b). Non-local methods with shape-adaptive patches (NLM-SAP). Journal of Mathematical Imaging and Vision. C. Deledalle (CNRS/IMB) Risk estimation 25 mars, / 29

40 Risk estimation: SURE filtering Example (Non-local means with oriented patches) Estimate the map of SURE associated to all estimators Regularize the estimation guided by the sample DoF Figure: Regularized SURE [Deledalle et al., 2011b] Deledalle, C., Duval, V., and Salmon, J. (2011b). Non-local methods with shape-adaptive patches (NLM-SAP). Journal of Mathematical Imaging and Vision. C. Deledalle (CNRS/IMB) Risk estimation 25 mars, / 29

41 Risk estimation: SURE filtering Example (Non-local means with oriented patches) Estimate the map of SURE associated to all estimators Regularize the estimation guided by the sample DoF Combine the estimates using a convex aggregation Figure: Regularized SURE e.g. the Exponential Weighted Aggregation [Leung and Barron, 2006] [Deledalle et al., 2011b] Deledalle, C., Duval, V., and Salmon, J. (2011b). Non-local methods with shape-adaptive patches (NLM-SAP). Journal of Mathematical Imaging and Vision. C. Deledalle (CNRS/IMB) Risk estimation 25 mars, / 29

, 2011b] Deledalle, C., Duval, V., and Salmon, J. (2011b).

42 Risk estimation: SURE filtering Example (Non-local means with oriented patches) Yaroslavsky (a) 15 pie sizes/shapes Anisotropic diffusion (b) Regularized SURE (c) Patch orientations [Deledalle et al., 2011b] Deledalle, C., Duval, V., and Salmon, J. (2011b). Non-local methods with shape-adaptive patches (NLM-SAP). Journal of Mathematical Imaging and Vision. C. Deledalle (CNRS/IMB) Risk estimation 25 mars, / 29

Risk estimation: SURE filtering Example (Non-local means with oriented patches) (a) NL Means (b) BM3D [Dabov et al., 2007] (c) BM3D [Dabov et al., 2007] (d) [Goossens et al.

43 Risk estimation: SURE filtering Example (Non-local means with oriented patches) (a) NL Means (b) BM3D [Dabov et al., 2007] (c) BM3D [Dabov et al., 2007] (d) [Goossens et al., 2008] (e) Our approach (f) Our approach [Deledalle et al., 2011b] Deledalle, C., Duval, V., and Salmon, J. (2011b). Non-local methods with shape-adaptive patches (NLM-SAP). Journal of Mathematical Imaging and Vision. C. Deledalle (CNRS/IMB) Risk estimation 25 mars, / 29

44 Risk estimation: SURE filtering Limits of the SURE based approach Local selection relies on a risk filter Difficulty to build such filter, Difficulty to provide guaranties. Difficulty to extend to other noise models. Beyond risk estimation ( ) ( ) R(θ) i = E Y (f θ (Y ) i Y i) 2 σ 2 +2σ 2 Yi (x 0) i fθ (Y i) (x 0) i }{{} σ σ Fidelity }{{} Degree of freedom = (E Y [f θ (Y ) i] (x 0) i) E Y (f θ (Y ) i E Y [f θ (Y ) i]) }{{}}{{} Bias 2 Variance The degree of freedom can relatively be well estimated locally. The variance also, typically using the propagation of uncertainty formula: E Y (f θ (Y ) i E Y [f θ (Y ) i]) 2 }{{} Variance E Y [( fθ (y) y ) t ( )] fθ (y) y=y y y=y i,i C. Deledalle (CNRS/IMB) Risk estimation 25 mars, / 29

45 Risk estimation: SURE filtering Limits of the SURE based approach Local selection relies on a risk filter Difficulty to build such filter, Difficulty to provide guaranties. Difficulty to extend to other noise models. Cost Parameter θ Beyond risk estimation But the fidelity and the bias are difficult to locally quantify (see also [Kervrann & Boulanger, 2008]) Since bias cannot be estimated properly, can we cancel or reduce it before selection? Modify all estimators f θ to f θ in order to Debiased their solutions and then select the one with smaller variance R(θ) i = (E Y [ f ) 2 ( θ (Y ) i] (x 0) i + E Y fθ (Y ) i E Y [ f ) 2 θ (Y ) i] } {{ } } {{ } Bias 2 Variance Improve their bias-variance trade-off and select the one with smaller variance Bias 2 Variance R(θ) i 2 Variance C. Deledalle (CNRS/IMB) Risk estimation 25 mars, / 29

46 Risk estimation: SURE filtering Limits of the SURE based approach Local selection relies on a risk filter Difficulty to build such filter, Difficulty to provide guaranties. Difficulty to extend to other noise models. Cost Parameter θ Beyond risk estimation But the fidelity and the bias are difficult to locally quantify (see also [Kervrann & Boulanger, 2008]) Since bias cannot be estimated properly, can we cancel or reduce it before selection? Modify all estimators f θ to f θ in order to Debiased their solutions and then select the one with smaller variance R(θ) i = (E Y [ f ) 2 ( θ (Y ) i] (x 0) i + E Y fθ (Y ) i E Y [ f ) 2 θ (Y ) i] } {{ } } {{ } Bias 2 Variance Improve their bias-variance trade-off and select the one with smaller variance Bias 2 Variance R(θ) i 2 Variance Remark: The bias-variance decomposition still holds true for non-gaussian noises. C. Deledalle (CNRS/IMB) Risk estimation 25 mars, / 29

47 Bias reduction-variance estimation approach Bias detection Denoiser essentially performs averages: convolutions, anisotropic diffusion, non-local means, tv, bm3d... For y fixed, their solution can be written as f θ (y) = W y + b = w i,jy j + b i where w i,j = 1 j j Weight w i,j attempts to select pixel j if (x 0) j = (x 0) i (i.e. Y j i.i.d. with Y i), hence f θ (y) i = w i,jy j E[Y i] = (x 0) i C. Deledalle (CNRS/IMB) Risk estimation 25 mars, / 29

48 Bias reduction-variance estimation approach Bias detection Denoiser essentially performs averages: convolutions, anisotropic diffusion, non-local means, tv, bm3d... For y fixed, their solution can be written as f θ (y) = W y + b = w i,jy j + b i where w i,j = 1 j j Weight w i,j attempts to select pixel j if (x 0) j = (x 0) i (i.e. Y j i.i.d. with Y i), hence f θ (y) i = w i,jy j E[Y i] = (x 0) i Hence, the variance of these samples should be equal to σ 2 whatever (x 0) i: σ 2 θ (y)i = ( w i,jy 2 j ) 2 wi,jy j Var[Yi] = σ 2 C. Deledalle (CNRS/IMB) Risk estimation 25 mars, / 29

49 Bias reduction-variance estimation approach Bias detection Denoiser essentially performs averages: convolutions, anisotropic diffusion, non-local means, tv, bm3d... For y fixed, their solution can be written as f θ (y) = W y + b = w i,jy j + b i where w i,j = 1 j j Weight w i,j attempts to select pixel j if (x 0) j = (x 0) i (i.e. Y j i.i.d. with Y i), hence f θ (y) i = w i,jy j E[Y i] = (x 0) i Hence, the variance of these samples should be equal to σ 2 whatever (x 0) i: σ 2 θ (y)i = ( w i,jy 2 j ) 2 wi,jy j Var[Yi] = σ 2 C. Deledalle (CNRS/IMB) Risk estimation 25 mars, / 29

50 Bias reduction-variance estimation approach Bias detection Denoiser essentially performs averages: convolutions, anisotropic diffusion, non-local means, tv, bm3d... For y fixed, their solution can be written as f θ (y) = W y + b = w i,jy j + b i where w i,j = 1 j j Weight w i,j attempts to select pixel j if (x 0) j = (x 0) i (i.e. Y j i.i.d. with Y i), hence f θ (y) i = w i,jy j E[Y i] = (x 0) i Hence, the variance of these samples should be equal to σ 2 whatever (x 0) i: σ 2 θ (y)i = ( w i,jy 2 j ) 2 wi,jy j Var[Yi] = σ 2 Otherwise: σ 2 θ (y)i σ2 indication of bias: (x 0) j (x 0) i C. Deledalle (CNRS/IMB) Risk estimation 25 mars, / 29

51 Bias reduction-variance estimation approach Bias detection Denoiser essentially performs averages: convolutions, anisotropic diffusion, non-local means, tv, bm3d... For y fixed, their solution can be written as f θ (y) = W y + b = w i,jy j + b i where w i,j = 1 j j Weight w i,j attempts to select pixel j if (x 0) j = (x 0) i (i.e. Y j i.i.d. with Y i), hence f θ (y) i = w i,jy j E[Y i] = (x 0) i Hence, the variance of these samples should be equal to σ 2 whatever (x 0) i: σ 2 θ (y)i = ( w i,jy 2 j ) 2 wi,jy j Var[Yi] = σ 2 Otherwise: σ 2 θ (y)i σ2 indication of bias: (x 0) j (x 0) i (a) Noisy image y (b) f θ (y): convolution (c) Bias indicator: σ 2 θ (y) C. Deledalle (CNRS/IMB) Risk estimation 25 mars, / 29

52 Bias reduction-variance estimation approach Bias detection Denoiser essentially performs averages: convolutions, anisotropic diffusion, non-local means, tv, bm3d... For y fixed, their solution can be written as f θ (y) = W y + b = w i,jy j + b i where w i,j = 1 j j Weight w i,j attempts to select pixel j if (x 0) j = (x 0) i (i.e. Y j i.i.d. with Y i), hence f θ (y) i = w i,jy j E[Y i] = (x 0) i Hence, the variance of these samples should be equal to σ 2 whatever (x 0) i: σ 2 θ (y)i = ( w i,jy 2 j ) 2 wi,jy j Var[Yi] = σ 2 Otherwise: σ 2 θ (y)i σ2 indication of bias: (x 0) j (x 0) i (a) Noisy image y (b) f θ (y): convolution (c) Bias indicator: σ 2 θ (y) C. Deledalle (CNRS/IMB) Risk estimation 25 mars, / 29

53 Bias reduction-variance estimation approach Bias detection Denoiser essentially performs averages: convolutions, anisotropic diffusion, non-local means, tv, bm3d... For y fixed, their solution can be written as f θ (y) = W y + b = w i,jy j + b i where w i,j = 1 j j Weight w i,j attempts to select pixel j if (x 0) j = (x 0) i (i.e. Y j i.i.d. with Y i), hence f θ (y) i = w i,jy j E[Y i] = (x 0) i Hence, the variance of these samples should be equal to σ 2 whatever (x 0) i: σ 2 θ (y)i = ( w i,jy 2 j ) 2 wi,jy j Var[Yi] = σ 2 Otherwise: σ 2 θ (y)i σ2 indication of bias: (x 0) j (x 0) i (a) Noisy image y (b) f θ (y): non-local means (c) Bias indicator: σ 2 θ (y) C. Deledalle (CNRS/IMB) Risk estimation 25 mars, / 29

54 Bias reduction-variance estimation approach Bias detection Denoiser essentially performs averages: convolutions, anisotropic diffusion, non-local means, tv, bm3d... For y fixed, their solution can be written as f θ (y) = W y + b = w i,jy j + b i where w i,j = 1 j j Weight w i,j attempts to select pixel j if (x 0) j = (x 0) i (i.e. Y j i.i.d. with Y i), hence f θ (y) i = w i,jy j E[Y i] = (x 0) i Hence, the variance of these samples should be equal to σ 2 whatever (x 0) i: σ 2 θ (y)i = ( w i,jy 2 j ) 2 wi,jy j Var[Yi] = σ 2 Otherwise: σ 2 θ (y)i σ2 indication of bias: (x 0) j (x 0) i (a) Noisy image y (b) f θ (y): non-local means (c) Bias indicator: σ 2 θ (y) C. Deledalle (CNRS/IMB) Risk estimation 25 mars, / 29

55 Bias reduction-variance estimation approach Bias detection Denoiser essentially performs averages: convolutions, anisotropic diffusion, non-local means, tv, bm3d... For y fixed, their solution can be written as f θ (y) = W y + b = w i,jy j + b i where w i,j = 1 j j Weight w i,j attempts to select pixel j if (x 0) j = (x 0) i (i.e. Y j i.i.d. with Y i), hence f θ (y) i = w i,jy j E[Y i] = (x 0) i Hence, the variance of these samples should be equal to σ 2 whatever (x 0) i: σ 2 θ (y)i = ( w i,jy 2 j ) 2 wi,jy j Var[Yi] = σ 2 Otherwise: σ 2 θ (y)i σ2 indication of bias: (x 0) j (x 0) i (a) Noisy image y (b) f θ (y): total-variation (c) Bias indicator: σ 2 θ (y) C. Deledalle (CNRS/IMB) Risk estimation 25 mars, / 29

56 Bias reduction-variance estimation approach Bias detection Denoiser essentially performs averages: convolutions, anisotropic diffusion, non-local means, tv, bm3d... For y fixed, their solution can be written as f θ (y) = W y + b = w i,jy j + b i where w i,j = 1 j j Weight w i,j attempts to select pixel j if (x 0) j = (x 0) i (i.e. Y j i.i.d. with Y i), hence f θ (y) i = w i,jy j E[Y i] = (x 0) i Hence, the variance of these samples should be equal to σ 2 whatever (x 0) i: σ 2 θ (y)i = ( w i,jy 2 j ) 2 wi,jy j Var[Yi] = σ 2 Otherwise: σ 2 θ (y)i σ2 indication of bias: (x 0) j (x 0) i (a) Noisy image y (b) f θ (y): total-variation (c) Bias indicator: σ 2 θ (y) C. Deledalle (CNRS/IMB) Risk estimation 25 mars, / 29

57 Bias reduction-variance estimation approach Bias detection Denoiser essentially performs averages: convolutions, anisotropic diffusion, non-local means, tv, bm3d... For y fixed, their solution can be written as f θ (y) = W y + b = w i,jy j + b i where w i,j = 1 j j Weight w i,j attempts to select pixel j if (x 0) j = (x 0) i (i.e. Y j i.i.d. with Y i), hence f θ (y) i = w i,jy j E[Y i] = (x 0) i Hence, the variance of these samples should be equal to σ 2 whatever (x 0) i: σ 2 θ (y)i = ( w i,jy 2 j ) 2 wi,jy j Var[Yi] = σ 2 Otherwise: σ 2 θ (y)i σ2 indication of bias: (x 0) j (x 0) i (a) Noisy image y (b) f θ (y): total-variation (c) Bias indicator: σ 2 θ (y) C. Deledalle (CNRS/IMB) Risk estimation 25 mars, / 29

58 Bias reduction-variance estimation approach Bias reduction [Lee, 1980, Kuan et al., 1985] Assume the selected samples are realizations of Y = X + N with E N N = 0. The optimal linear estimator minimizing E X,Y [(ay + b X) 2 ] (LMMSE) is The plug-in LMMSE estimator is E Y [Y ] + σ2 Y σ2 N σy 2 (Y E Y [Y ]) f θ (y) i = f θ (y) i + α i(y i f θ (y) i) where α i = σ2 θ (y)i σ2 σ θ 2(y) C. Deledalle (CNRS/IMB) Risk estimation 25 mars, / 29

Bias reduction-variance estimation approach Bias reduction [Lee, 1980, Kuan et al., 1985] Assume the selected samples are realizations of Y = X + N with E N N = 0.

59 Bias reduction-variance estimation approach Bias reduction [Lee, 1980, Kuan et al., 1985] Assume the selected samples are realizations of Y = X + N with E N N = 0. The optimal linear estimator minimizing E X,Y [(ay + b X) 2 ] (LMMSE) is The plug-in LMMSE estimator is E Y [Y ] + σ2 Y σ2 N σy 2 (Y E Y [Y ]) f θ (y) i = f θ (y) i + α i(y i f θ (y) i) where α i = σ2 θ (y)i σ2 σ θ 2(y) (a) Noisy image y (b) f θ (y): convolution (c) Bias indicator: σ 2 θ (y) (d) Bias reduction: fθ (y) C. Deledalle (CNRS/IMB) Risk estimation 25 mars, / 29

60 Bias reduction-variance estimation approach Bias reduction [Lee, 1980, Kuan et al., 1985] Assume the selected samples are realizations of Y = X + N with E N N = 0. The optimal linear estimator minimizing E X,Y [(ay + b X) 2 ] (LMMSE) is The plug-in LMMSE estimator is E Y [Y ] + σ2 Y σ2 N σy 2 (Y E Y [Y ]) f θ (y) i = f θ (y) i + α i(y i f θ (y) i) where α i = σ2 θ (y)i σ2 σ θ 2(y) (a) Noisy image y (b) f θ (y): convolution (c) Bias indicator: σ 2 θ (y) (d) Bias reduction: fθ (y) C. Deledalle (CNRS/IMB) Risk estimation 25 mars, / 29

61 Bias reduction-variance estimation approach Bias reduction [Lee, 1980, Kuan et al., 1985] Assume the selected samples are realizations of Y = X + N with E N N = 0. The optimal linear estimator minimizing E X,Y [(ay + b X) 2 ] (LMMSE) is The plug-in LMMSE estimator is E Y [Y ] + σ2 Y σ2 N σy 2 (Y E Y [Y ]) f θ (y) i = f θ (y) i + α i(y i f θ (y) i) where α i = σ2 θ (y)i σ2 σ θ 2(y) (a) Noisy image y (b) f θ (y): non-local means (c) Bias indicator: σ 2 θ (y) (d) Bias reduction: fθ (y) C. Deledalle (CNRS/IMB) Risk estimation 25 mars, / 29

62 Bias reduction-variance estimation approach Bias reduction [Lee, 1980, Kuan et al., 1985] Assume the selected samples are realizations of Y = X + N with E N N = 0. The optimal linear estimator minimizing E X,Y [(ay + b X) 2 ] (LMMSE) is The plug-in LMMSE estimator is E Y [Y ] + σ2 Y σ2 N σy 2 (Y E Y [Y ]) f θ (y) i = f θ (y) i + α i(y i f θ (y) i) where α i = σ2 θ (y)i σ2 σ θ 2(y) (a) Noisy image y (b) f θ (y): non-local means (c) Bias indicator: σ 2 θ (y) (d) Bias reduction: fθ (y) C. Deledalle (CNRS/IMB) Risk estimation 25 mars, / 29

63 Bias reduction-variance estimation approach Bias reduction [Lee, 1980, Kuan et al., 1985] Assume the selected samples are realizations of Y = X + N with E N N = 0. The optimal linear estimator minimizing E X,Y [(ay + b X) 2 ] (LMMSE) is The plug-in LMMSE estimator is E Y [Y ] + σ2 Y σ2 N σy 2 (Y E Y [Y ]) f θ (y) i = f θ (y) i + α i(y i f θ (y) i) where α i = σ2 θ (y)i σ2 σ θ 2(y) (a) Noisy image y (b) f θ (y): total-variation (c) Bias indicator: σ 2 θ (y) (d) Bias reduction: fθ (y) C. Deledalle (CNRS/IMB) Risk estimation 25 mars, / 29

64 Bias reduction-variance estimation approach Bias reduction [Lee, 1980, Kuan et al., 1985] Assume the selected samples are realizations of Y = X + N with E N N = 0. The optimal linear estimator minimizing E X,Y [(ay + b X) 2 ] (LMMSE) is The plug-in LMMSE estimator is E Y [Y ] + σ2 Y σ2 N σy 2 (Y E Y [Y ]) f θ (y) i = f θ (y) i + α i(y i f θ (y) i) where α i = σ2 θ (y)i σ2 σ θ 2(y) (a) Noisy image y (b) f θ (y): total-variation (c) Bias indicator: σ 2 θ (y) (d) Bias reduction: fθ (y) C. Deledalle (CNRS/IMB) Risk estimation 25 mars, / 29

65 Bias reduction-variance estimation approach Variance estimation [Kervrann and Boulanger, 2008, Salmon and Strozecki, 2010] The LMMSE aims at reaching a bias variance trade-off such that Bias 2 Variance R(θ) i 2 Variance What is the residual variance of f θ (y)? Recall: f θ (y) = W y + b And: fθ (y) i = f θ (y) i + α i(y i f θ (y) i) Cost Parameter θ Then: fθ (y) i = (1 α i) i j w i,jy j + (w i,i(1 α i) + α i)y i + (1 α i)b i Hence: Var[ f θ (Y ) i] = σ 2[ (1 α i) 2 j w 2 i,j +α2 i +2αi(1 αi)wi,i ] (using Var[γY ] = γ 2 σ 2 ) Define the smoothing strength at i as: σ 2 /Var[ f θ (Y ) i] (the greater the better) C. Deledalle (CNRS/IMB) Risk estimation 25 mars, / 29

Bias reduction-variance estimation approach Variance estimation [Kervrann and Boulanger, 2008, Salmon and Strozecki, 2010] The LMMSE aims at reaching a bias variance trade-off such that Bias 2

66 Bias reduction-variance estimation approach Variance estimation [Kervrann and Boulanger, 2008, Salmon and Strozecki, 2010] The LMMSE aims at reaching a bias variance trade-off such that Bias 2 Variance R(θ) i 2 Variance What is the residual variance of f θ (y)? Recall: f θ (y) = W y + b And: fθ (y) i = f θ (y) i + α i(y i f θ (y) i) Cost Parameter θ Then: fθ (y) i = (1 α i) i j w i,jy j + (w i,i(1 α i) + α i)y i + (1 α i)b i Hence: Var[ f θ (Y ) i] = σ 2[ (1 α i) 2 j w 2 i,j +α2 i +2αi(1 αi)wi,i ] (using Var[γY ] = γ 2 σ 2 ) Define the smoothing strength at i as: σ 2 /Var[ f θ (Y ) i] (the greater the better) (a) Noisy image y (b) f θ (y): convolution (c) Bias reduction: fθ (y) (d) Smoothing strength C. Deledalle (CNRS/IMB) Risk estimation 25 mars, / 29

67 Bias reduction-variance estimation approach Variance estimation [Kervrann and Boulanger, 2008, Salmon and Strozecki, 2010] The LMMSE aims at reaching a bias variance trade-off such that Bias 2 Variance R(θ) i 2 Variance What is the residual variance of f θ (y)? Recall: f θ (y) = W y + b And: fθ (y) i = f θ (y) i + α i(y i f θ (y) i) Cost Parameter θ Then: fθ (y) i = (1 α i) i j w i,jy j + (w i,i(1 α i) + α i)y i + (1 α i)b i Hence: Var[ f θ (Y ) i] = σ 2[ (1 α i) 2 j w 2 i,j +α2 i +2αi(1 αi)wi,i ] (using Var[γY ] = γ 2 σ 2 ) Define the smoothing strength at i as: σ 2 /Var[ f θ (Y ) i] (the greater the better) (a) Noisy image y (b) f θ (y): convolution (c) Bias reduction: fθ (y) (d) Smoothing strength C. Deledalle (CNRS/IMB) Risk estimation 25 mars, / 29

68 Bias reduction-variance estimation approach Variance estimation [Kervrann and Boulanger, 2008, Salmon and Strozecki, 2010] The LMMSE aims at reaching a bias variance trade-off such that Bias 2 Variance R(θ) i 2 Variance What is the residual variance of f θ (y)? Recall: f θ (y) = W y + b And: fθ (y) i = f θ (y) i + α i(y i f θ (y) i) Cost Parameter θ Then: fθ (y) i = (1 α i) i j w i,jy j + (w i,i(1 α i) + α i)y i + (1 α i)b i Hence: Var[ f θ (Y ) i] = σ 2[ (1 α i) 2 j w 2 i,j +α2 i +2αi(1 αi)wi,i ] (using Var[γY ] = γ 2 σ 2 ) Define the smoothing strength at i as: σ 2 /Var[ f θ (Y ) i] (the greater the better) (a) Noisy image y (b) f θ (y): non-local means (c) Bias reduction: fθ (y) (d) Smoothing strength C. Deledalle (CNRS/IMB) Risk estimation 25 mars, / 29

69 Bias reduction-variance estimation approach Variance estimation [Kervrann and Boulanger, 2008, Salmon and Strozecki, 2010] The LMMSE aims at reaching a bias variance trade-off such that Bias 2 Variance R(θ) i 2 Variance What is the residual variance of f θ (y)? Recall: f θ (y) = W y + b And: fθ (y) i = f θ (y) i + α i(y i f θ (y) i) Cost Parameter θ Then: fθ (y) i = (1 α i) i j w i,jy j + (w i,i(1 α i) + α i)y i + (1 α i)b i Hence: Var[ f θ (Y ) i] = σ 2[ (1 α i) 2 j w 2 i,j +α2 i +2αi(1 αi)wi,i ] (using Var[γY ] = γ 2 σ 2 ) Define the smoothing strength at i as: σ 2 /Var[ f θ (Y ) i] (the greater the better) (a) Noisy image y (b) f θ (y): non-local means (c) Bias reduction: fθ (y) (d) Smoothing strength C. Deledalle (CNRS/IMB) Risk estimation 25 mars, / 29

70 Bias reduction-variance estimation approach Variance estimation [Kervrann and Boulanger, 2008, Salmon and Strozecki, 2010] The LMMSE aims at reaching a bias variance trade-off such that Bias 2 Variance R(θ) i 2 Variance What is the residual variance of f θ (y)? Recall: f θ (y) = W y + b And: fθ (y) i = f θ (y) i + α i(y i f θ (y) i) Cost Parameter θ Then: fθ (y) i = (1 α i) i j w i,jy j + (w i,i(1 α i) + α i)y i + (1 α i)b i Hence: Var[ f θ (Y ) i] = σ 2[ (1 α i) 2 j w 2 i,j +α2 i +2αi(1 αi)wi,i ] (using Var[γY ] = γ 2 σ 2 ) Define the smoothing strength at i as: σ 2 /Var[ f θ (Y ) i] (the greater the better) (a) Noisy image y (b) f θ (y): total-variation (c) Bias reduction: fθ (y) (d) Smoothing strength C. Deledalle (CNRS/IMB) Risk estimation 25 mars, / 29

71 Bias reduction-variance estimation approach Variance estimation [Kervrann and Boulanger, 2008, Salmon and Strozecki, 2010] The LMMSE aims at reaching a bias variance trade-off such that Bias 2 Variance R(θ) i 2 Variance What is the residual variance of f θ (y)? Recall: f θ (y) = W y + b And: fθ (y) i = f θ (y) i + α i(y i f θ (y) i) Cost Parameter θ Then: fθ (y) i = (1 α i) i j w i,jy j + (w i,i(1 α i) + α i)y i + (1 α i)b i Hence: Var[ f θ (Y ) i] = σ 2[ (1 α i) 2 j w 2 i,j +α2 i +2αi(1 αi)wi,i ] (using Var[γY ] = γ 2 σ 2 ) Define the smoothing strength at i as: σ 2 /Var[ f θ (Y ) i] (the greater the better) (a) Noisy image y (b) f θ (y): total-variation (c) Bias reduction: fθ (y) (d) Smoothing strength C. Deledalle (CNRS/IMB) Risk estimation 25 mars, / 29

5 3 Parameter θ Then: fθ (y) i = (1 α i) i j w i,jy j + (w i,i(1 α i) + α i)y i + (1 α i)b i Hence: Var[ f θ (Y ) i] = σ 2[ (1 α i) 2 j w 2 i,j +α2 i +2αi(1 αi)wi,i ] (using Var[γY ] = γ 2 σ 2 )

72 Bias reduction-variance estimation approach Variance estimation [Kervrann and Boulanger, 2008, Salmon and Strozecki, 2010] The LMMSE aims at reaching a bias variance trade-off such that Bias 2 Variance R(θ) i 2 Variance What is the residual variance of f θ (y)? Recall: f θ (y) = W y + b And: fθ (y) i = f θ (y) i + α i(y i f θ (y) i) Cost Parameter θ Then: fθ (y) i = (1 α i) i j w i,jy j + (w i,i(1 α i) + α i)y i + (1 α i)b i Hence: Var[ f θ (Y ) i] = σ 2[ (1 α i) 2 j w 2 i,j +α2 i +2αi(1 αi)wi,i ] (using Var[γY ] = γ 2 σ 2 ) Define the smoothing strength at i as: σ 2 /Var[ f θ (Y ) i] (the greater the better) (a) Noisy image y (b) f θ (y): total-variation (c) Bias reduction: fθ (y) (d) Smoothing strength C. Deledalle (CNRS/IMB) Risk estimation 25 mars, / 29

73 Our approach: Case of anisotropic Gaussian convolutions Summary Choose different set of parameters θ (1), θ (1),... For each set θ: Compute f θ (y), Compute j wi,jyj, j wi,jy2 j, j w2 i,j Deduce f θ (y) i and Var[ f θ (Y ) i] Perform local selection: choose at each pixel i, the set of parameter such that f i (y)i = f θ (y) i where θ i i = argmin Var[ f θ (Y ) i] θ C. Deledalle (CNRS/IMB) Risk estimation 25 mars, / 29

74 Our approach: Case of anisotropic Gaussian convolutions Summary Choose different set of parameters θ (1), θ (1),... For each set θ: Compute f θ (y), Compute j wi,jyj, j wi,jy2 j, j w2 i,j Deduce f θ (y) i and Var[ f θ (Y ) i] Perform local selection: choose at each pixel i, the set of parameter such that f i (y)i = f θ (y) i where θ i i = argmin Var[ f θ (Y ) i] θ Example (Gaussian convolutions) Perform several Gaussian convolutions with different anisotropies, sizes and orientations. Weights w i,j are given by the associated Gaussian kernel function. (a) Noisy image y (b) One estimator: f θ (y) C. Deledalle (CNRS/IMB) Risk estimation 25 mars, / 29

75 Our approach: Case of anisotropic Gaussian convolutions Summary Choose different set of parameters θ (1), θ (1),... For each set θ: Compute f θ (y), Compute j wi,jyj, j wi,jy2 j, j w2 i,j Deduce f θ (y) i and Var[ f θ (Y ) i] Perform local selection: choose at each pixel i, the set of parameter such that f i (y)i = f θ (y) i where θ i i = argmin Var[ f θ (Y ) i] θ Example (Gaussian convolutions) Perform several Gaussian convolutions with different anisotropies, sizes and orientations. Weights w i,j are given by the associated Gaussian kernel function. (a) Noisy image y (b) One estimator: f θ (y) C. Deledalle (CNRS/IMB) Risk estimation 25 mars, / 29

76 Our approach: Case of anisotropic Gaussian convolutions Summary Choose different set of parameters θ (1), θ (1),... For each set θ: Compute f θ (y), Compute j wi,jyj, j wi,jy2 j, j w2 i,j Deduce f θ (y) i and Var[ f θ (Y ) i] Perform local selection: choose at each pixel i, the set of parameter such that f i (y)i = f θ (y) i where θ i i = argmin Var[ f θ (Y ) i] θ Example (Gaussian convolutions) Perform several Gaussian convolutions with different anisotropies, sizes and orientations. Weights w i,j are given by the associated Gaussian kernel function. (a) Noisy image y (b) One estimator: f θ (y) (c) Final result: f (y) (d) Smoothing strength C. Deledalle (CNRS/IMB) Risk estimation 25 mars, / 29

77 Our approach: Case of anisotropic Gaussian convolutions Summary Choose different set of parameters θ (1), θ (1),... For each set θ: Compute f θ (y), Compute j wi,jyj, j wi,jy2 j, j w2 i,j Deduce f θ (y) i and Var[ f θ (Y ) i] Perform local selection: choose at each pixel i, the set of parameter such that f i (y)i = f θ (y) i where θ i i = argmin Var[ f θ (Y ) i] θ Example (Gaussian convolutions) Perform several Gaussian convolutions with different anisotropies, sizes and orientations. Weights w i,j are given by the associated Gaussian kernel function. (a) Noisy image y (b) One estimator: f θ (y) (c) Final result: f (y) (d) Scale C. Deledalle (CNRS/IMB) Risk estimation 25 mars, / 29

78 Our approach: Case of anisotropic Gaussian convolutions Summary Choose different set of parameters θ (1), θ (1),... For each set θ: Compute f θ (y), Compute j wi,jyj, j wi,jy2 j, j w2 i,j Deduce f θ (y) i and Var[ f θ (Y ) i] Perform local selection: choose at each pixel i, the set of parameter such that f i (y)i = f θ (y) i where θ i i = argmin Var[ f θ (Y ) i] θ Example (Gaussian convolutions) Perform several Gaussian convolutions with different anisotropies, sizes and orientations. Weights w i,j are given by the associated Gaussian kernel function. (a) Noisy image y (b) One estimator: f θ (y) (c) Final result: f (y) (d) Orientation C. Deledalle (CNRS/IMB) Risk estimation 25 mars, / 29

79 Our approach: Case of anisotropic Gaussian convolutions Summary Choose different set of parameters θ (1), θ (1),... For each set θ: Compute f θ (y), Compute j wi,jyj, j wi,jy2 j, j w2 i,j Deduce f θ (y) i and Var[ f θ (Y ) i] Perform local selection: choose at each pixel i, the set of parameter such that f i (y)i = f θ (y) i where θ i i = argmin Var[ f θ (Y ) i] θ Example (Gaussian convolutions) Perform several Gaussian convolutions with different anisotropies, sizes and orientations. Weights w i,j are given by the associated Gaussian kernel function. (a) Noisy image y (b) One estimator: f θ (y) (c) Final result: f (y) (d) Anisotropy C. Deledalle (CNRS/IMB) Risk estimation 25 mars, / 29

80 Our approach: Case of non-local means Example (Non-local means) Perform several non-local means with different patch size, window size and prefiltering Weights w i,j are given as the result of patch comparison. LMMSE Smoothing Estimates } {{ } A small sample of estimates obtained with different parameters } {{ } Local selection Remark: None of the parameters can preserve all kind of structures. Unlike SURE this approach adapts straight-forwardly to gamma or Poisson noises. C. Deledalle (CNRS/IMB) Risk estimation 25 mars, / 29

(c) From left to right, top to bottom: smoothing

(range: [0, 20 20]), the patch size (range: [3 3, 11

81 Our approach: Case of non-local means Example (Non-local means on simulated data (High noise level)) (a) (b) (c) (a) Noisy image. (b) Result of the adaptive approach. (c) From left to right, top to bottom: smoothing strength (range: [0, 20 20]), search window sizes (range: [0, 20 20]), the patch size (range: [3 3, 11 11]), prefiltering strength (range: [1, 3]). C. Deledalle (CNRS/IMB) Risk estimation 25 mars, / 29

(c) From left to right, top to bottom: smoothing strength (range: [0, 20 20]), search window sizes

82 Our approach: Case of non-local means Example (Non-local means on simulated data (Low noise level)) (a) (b) (c) (a) Noisy image. (b) Result of the adaptive approach. (c) From left to right, top to bottom: smoothing strength (range: [0, 20 20]), search window sizes (range: [0, 20 20]), the patch size (range: [3 3, 11 11]), prefiltering strength (range: [1, 3]). C. Deledalle (CNRS/IMB) Risk estimation 25 mars, / 29

83 Our approach: Case of non-local means Example (Non-local means on on Polarimetric data) c DLR (a) High-resolution S-band SAR image (b) Adaptive estimation [Deledalle et al., 2013] Deledalle, C., Denis, L., Tupin, F., A. Reigber, M. Ja ger (2013). NL-SAR: a unified Non-Local framework for resolution-preserving (Pol)(In)SAR denoising, Technical report HAL, hal C. Deledalle (CNRS/IMB) Risk estimation 25 mars, / 29

84 Our approach: Case of non-local means Example (Non-local means on on Polarimetric data) (a) Smoothing strength (b) Adaptive estimation [Deledalle et al., 2013] Deledalle, C., Denis, L., Tupin, F., A. Reigber, M. Jäger (2013). NL-SAR: a unified Non-Local framework for resolution-preserving (Pol)(In)SAR denoising, Technical report HAL, hal C. Deledalle (CNRS/IMB) Risk estimation 25 mars, / 29

Our approach: Case of anisotropic total-variation Example (Anisotropic Total-Variation) Recall the solution of anisotropic total variation is given by f θ (y) = argmin x 1 2 x y 2 + λ x 1 where x 1 =

85 Our approach: Case of anisotropic total-variation Example (Anisotropic Total-Variation) Recall the solution of anisotropic total variation is given by f θ (y) = argmin x 1 2 x y 2 + λ x 1 where x 1 = k ( h x) k + ( vx) k and can be computed iteratively by a proximal algorithm. Perform several total-variation with different regularization parameter λ. (a) Target image x 0 (b) Noisy image y (c) Final result: f (y) (d) Parameter λ C. Deledalle (CNRS/IMB) Risk estimation 25 mars, / 29

86 Our approach: Case of anisotropic total-variation Example (Anisotropic Total-Variation) Recall the solution of anisotropic total variation is given by f θ (y) = argmin x 1 2 x y 2 + λ x 1 where x 1 = k ( h x) k + ( vx) k and can be computed iteratively by a proximal algorithm. Perform several total-variation with different regularization parameter λ. (a) Target image x 0 (b) Noisy image y (c) Final result: f (y) (d) Smoothing strength C. Deledalle (CNRS/IMB) Risk estimation 25 mars, / 29

iteratively by a proximal algorithm. Perform several total-variation with different regularization parameter λ.

87 Our approach: Case of anisotropic total-variation Example (Anisotropic Total-Variation) Recall the solution of anisotropic total variation is given by f θ (y) = argmin x 1 2 x y 2 + λ x 1 where x 1 = k ( h x) k + ( vx) k and can be computed iteratively by a proximal algorithm. Perform several total-variation with different regularization parameter λ. (a) Target image x 0 (b) Noisy image y (c) Final result: f (y) (d) Parameter λ C. Deledalle (CNRS/IMB) Risk estimation 25 mars, / 29

88 Our approach: Case of anisotropic total-variation Example (Anisotropic Total-Variation) Recall the solution of anisotropic total variation is given by f θ (y) = argmin x 1 2 x y 2 + λ x 1 where x 1 = k ( h x) k + ( vx) k and can be computed iteratively by a proximal algorithm. Perform several total-variation with different regularization parameter λ. (a) Target image x 0 (b) Noisy image y (c) Final result: f (y) (d) Smoothing strength C. Deledalle (CNRS/IMB) Risk estimation 25 mars, / 29

89 Our approach: Case of anisotropic total-variation Example (Anisotropic Total-Variation) Recall the solution of anisotropic total variation is given by f θ (y) = argmin x 1 2 x y 2 + λ x 1 where x 1 = k ( h x) k + ( vx) k and can be computed iteratively by a proximal algorithm. Perform several total-variation with different regularization parameter λ. (a) Target image x 0 (b) Noisy image y (c) Final result: f (y) (d) Parameter λ C. Deledalle (CNRS/IMB) Risk estimation 25 mars, / 29

90 Our approach: Case of anisotropic total-variation Example (Anisotropic Total-Variation) Recall the solution of anisotropic total variation is given by f θ (y) = argmin x 1 2 x y 2 + λ x 1 where x 1 = k ( h x) k + ( vx) k and can be computed iteratively by a proximal algorithm. Perform several total-variation with different regularization parameter λ. (a) Target image x 0 (b) Noisy image y (c) Final result: f (y) (d) Smoothing strength C. Deledalle (CNRS/IMB) Risk estimation 25 mars, / 29

Poisson NL means: unsupervised non local means for Poisson noise

2010 IEEE International Conference on Image Processing Hong-Kong, September 26-29, 2010 Poisson NL means: unsupervised non local means for Poisson noise Charles Deledalle 1,FlorenceTupin 1, Loïc Denis