Sélection adaptative des paramètres pour le débruitage des images
|
|
- Cameron White
- 6 years ago
- Views:
Transcription
1 Journées SIERRA 2014, Saint-Etienne, France, 25 mars, 2014 Sélection adaptative des paramètres pour le débruitage des images Adaptive selection of parameters for image denoising Charles Deledalle 1 Joint work with: Loïc Denis 2, Charles Dossal 1, Vincent Duval 3, Jalal Fadili 4, Gabriel Peyré 3, Joseph Salmon 5, Florence Tupin 5 and Samuel Vaiter 3 1 Institut de Mathématiques de Bordeaux, CNRS-Université Bordeaux 1, France 2 Laboratoire Hubert-Curien, Univ. Jean Monnet, Univ. Lyon, France 3 CEREMADE, CNRS-Paris Dauphine, France 4 GREYC, CNRS-ENSICAEN, France 5 Institut Mines-Télécom, Télécom-ParisTech, CNRS LTCI, France 25 mars, 2014 C. Deledalle (CNRS/IMB) Risk estimation 25 mars, / 29
2 Motivations: model/parameter selection Model Input image Smooth Piecewise constant Goal: to pick up the suitable model/parameters for a given image C. Deledalle (CNRS/IMB) Risk estimation 25 mars, / 29
3 Motivations: risk based selection Risk definition Evaluate a cost for each parameter of a given filter for a given image, Usually task specific, Usually defined wrt the unknown image that we attempt to recover. Goal: to find the least risky model / set of parameters θ C. Deledalle (CNRS/IMB) Risk estimation 25 mars, / 29
4 Motivations: square error risk Square error risk Define the risk as the square error: R(θ) = E Y } {{ } f θ (Y ) } {{ } x 0 2 where } {{ } Y = } {{ } x 0 + } {{ } W where f θ : R N R N is a denoiser of parameter θ, x 0 R N and W N (0, σ 2 Id). Bias-variance decomposition E Y f θ (Y ) x 0 2 = E Y [f θ (Y )] x E Y f θ (Y ) E Y [f θ (Y )] }{{}}{{} Bias 2 Variance Fidelity-complexity decomposition [Mallows, 1973, Efron, 1986] Y E Y f θ (Y ) x 0 2 = E Y f θ (Y ) Y 2 Nσ 2 +2σ 2 x0 E Y, f θ(y ) x 0 }{{} σ σ Fidelity }{{} Degree of freedom C. Deledalle (CNRS/IMB) Risk estimation 25 mars, / 29
5 Motivations: risk interpretation R(θ) E Y f θ (Y ) x 0 2 = E Y [f θ (Y )] x E Y f θ (Y ) E Y [f θ (Y )] }{{}}{{} Bias 2 Variance Y = E Y f θ (Y ) Y 2 Nσ 2 +2σ 2 x0 E Y, f θ(y ) x 0 }{{} σ σ Fidelity }{{} Degree of freedom Example name f θ (y) R(θ) Bias 2 Variance Fidelity Complexity identity y Nσ 2 0 Nσ 2 Nσ 2 2Nσ 2 oracle x null 0 x 0 2 x x For an orthogonal projector, the degree of freedom is the dimension of the target space Risk Variance Bias 2 Fidelity Complexity Cost Parameter θ C. Deledalle (CNRS/IMB) Risk estimation 25 mars, / 29
6 Motivations: limits of the risk based selection Model Input image Smooth Piecewise constant What if none of the models/sets of parameters are suitable for the whole image? Selection should be performed locally A map of risk can be defined similarly for each pixel, as well as: a map of bias a map of variance a map of fidelity a map of complexity C. Deledalle (CNRS/IMB) Risk estimation 25 mars, / 29
7 Motivations: local bias-variance trade-off Input and estimates Variance Bias 2 C. Deledalle (CNRS/IMB) Risk estimation 25 mars, / 29
8 Goal: risk estimation R(θ) E Y f θ (Y ) x 0 2 = E Y [f θ (Y )] x E Y f θ (Y ) E Y [f θ (Y )] }{{}}{{} Bias 2 Variance Y = E Y f θ (Y ) Y 2 Nσ 2 +2σ 2 x0 E Y, f θ(y ) x 0 }{{} σ σ Fidelity }{{} Degree of freedom Challenge All quantities are defined as an expectation over Y : but we only know a single realization y of Y. Some quantities depends on the image x 0: but this is the image that we attempt to recover, hence unknown. Goal: 1 choose several models/sets of parameters: f θ1, f θ2,... 2 estimate the risk for each of them: R(θ 1 ), R(θ 2 ),... (without knowing x 0 and from the single observed realization y) 3 select globally or locally the least risky set of parameters C. Deledalle (CNRS/IMB) Risk estimation 25 mars, / 29
9 Outline 1 Global selection based on Stein s Unbiased Risk estimator Stein s Unbiased Risk Estimator (SURE) Simple cases: non-local means, soft-thresholding Non differentiable cases: hard-thresholding Monte-Carlo SURE Case of total-variation 2 Local selection with risk estimation and bias reduction Risk estimation: SURE filtering Case of anisotropic non-local means Bias reduction-variance estimation approach Case of anisotropic Gaussian convolutions Case of non-local means Case of anisotropic total-variation 3 Conclusion C. Deledalle (CNRS/IMB) Risk estimation 25 mars, / 29
10 Outline 1 Global selection based on Stein s Unbiased Risk estimator Stein s Unbiased Risk Estimator (SURE) Simple cases: non-local means, soft-thresholding Non differentiable cases: hard-thresholding Monte-Carlo SURE Case of total-variation 2 Local selection with risk estimation and bias reduction Risk estimation: SURE filtering Case of anisotropic non-local means Bias reduction-variance estimation approach Case of anisotropic Gaussian convolutions Case of non-local means Case of anisotropic total-variation 3 Conclusion C. Deledalle (CNRS/IMB) Risk estimation 25 mars, / 29
11 Stein s Unbiased Risk Estimator (SURE) Motivation Y R(θ) = E Y f θ (Y ) Y 2 Nσ 2 +2σ 2 x0 E Y, f θ(y ) x 0 }{{} σ σ Fidelity }{{} Degree of freedom Can we express the degree of freedom wrt Y only independently of x 0? Lemma (Stein s Lemma [Stein, 1981]) Assume y f θ (y) is continuous and almost everywhere differentiable (+ some mild conditions), then Y x0 E Y, f [ θ(y ) x 0 = E Y tr f ] θ(y) [ ] σ σ y = E Y div y f θ (y) y=y. y=y Proof. (For N = 1). + E Y [(Y x 0)(f θ (y) x 0)] = (f θ (y) x 0) (y x 0)G σ(y x 0) dy }{{} DOG ] + + (IBP ) = σ [(f 2 θ (y) x 0)G σ(y x 0) +σ 2 f θ (y) G σ(y x 0)dy y }{{}}{{} [ =0 ] f =E θ (y) Y y y=y C. Deledalle (CNRS/IMB) Risk estimation 25 mars, / 29
12 Stein s Unbiased Risk Estimator (SURE) Motivation Y R(θ) = E Y f θ (Y ) Y 2 Nσ 2 +2σ 2 x0 E Y, f θ(y ) x 0 }{{} σ σ Fidelity }{{} Degree of freedom Can we express the degree of freedom wrt Y only independently of x 0? Lemma (Stein s Lemma [Stein, 1981]) Assume y f θ (y) is continuous and almost everywhere differentiable (+ some mild conditions), then Y x0 E Y, f [ θ(y ) x 0 = E Y tr f ] θ(y) [ ] σ σ y = E Y div y f θ (y) y=y. y=y Theorem (Stein s unbiased risk estimator (SURE)) Assume y f θ (y) is as in Stein s lemma, then SURE(y, θ) = f θ (y) y 2 Nσ 2 +2σ 2 tr f θ(y) }{{} y y Sample fidelity }{{} Sample DoF is an unbiased risk estimator: E Y [SURE(Y, θ)] = R(θ). Remark: Law of large numbers SURE(y, θ) R(θ) C. Deledalle (CNRS/IMB) Risk estimation 25 mars, / 29
13 SURE Simple case: non-local means Example (Averaging filters [Van De Ville and Kocher, 2009, Duval et al., 2011]) j Consider an averaging filter of the form: f h (y) i = wi,j(y)yj j wi,j(y) where w i,j : R N R + is a weight function, for instance: Isotropic convolution w i,j(y) = ϕ h ( i j ) Sigma filter w i,j(y) = ϕ h ( y i y j ) [Lee, 1983, Yaroslavsky, 1985] Non-local means w i,j(y) = ϕ h ( P iy P jy ) [Buades et al., 2005] C. Deledalle (CNRS/IMB) Risk estimation 25 mars, / 29
14 SURE Simple case: non-local means Example (Averaging filters [Van De Ville and Kocher, 2009, Duval et al., 2011]) j Consider an averaging filter of the form: f h (y) i = wi,j(y)yj j wi,j(y) where w i,j : R N R + is a weight function, for instance: Isotropic convolution w i,j(y) = ϕ h ( i j ) Sigma filter w i,j(y) = ϕ h ( y i y j ) [Lee, 1983, Yaroslavsky, 1985] Non-local means w i,j(y) = ϕ h ( P iy P jy ) [Buades et al., 2005] An estimator of its DOF is then given by: tr f h (y) y = N f h (y) i y i=1 y y i where f h(y) i 1 y = i y i,j wi,j(y) w i,i(y) + w i,j(y) y j w i,j(y) y f h (y) i j i y y j i y Risk SURE Quadratic cost (a) Noisy image y (b) f h (y) with small h Bandwidth h C. Deledalle (CNRS/IMB) Risk estimation 25 mars, / 29
15 SURE Simple case: non-local means Example (Averaging filters [Van De Ville and Kocher, 2009, Duval et al., 2011]) j Consider an averaging filter of the form: f h (y) i = wi,j(y)yj j wi,j(y) where w i,j : R N R + is a weight function, for instance: Isotropic convolution w i,j(y) = ϕ h ( i j ) Sigma filter w i,j(y) = ϕ h ( y i y j ) [Lee, 1983, Yaroslavsky, 1985] Non-local means w i,j(y) = ϕ h ( P iy P jy ) [Buades et al., 2005] An estimator of its DOF is then given by: tr f h (y) y = N f h (y) i y i=1 y y i where f h(y) i 1 y = i y i,j wi,j(y) w i,i(y) + w i,j(y) y j w i,j(y) y f h (y) i j i y y j i y Risk SURE Quadratic cost (a) Noisy image y (c) f h (y) with large h Bandwidth h C. Deledalle (CNRS/IMB) Risk estimation 25 mars, / 29
16 SURE Simple case: non-local means Example (Averaging filters [Van De Ville and Kocher, 2009, Duval et al., 2011]) j Consider an averaging filter of the form: f h (y) i = wi,j(y)yj j wi,j(y) where w i,j : R N R + is a weight function, for instance: Isotropic convolution w i,j(y) = ϕ h ( i j ) Sigma filter w i,j(y) = ϕ h ( y i y j ) [Lee, 1983, Yaroslavsky, 1985] Non-local means w i,j(y) = ϕ h ( P iy P jy ) [Buades et al., 2005] An estimator of its DOF is then given by: tr f h (y) y = N f h (y) i y i=1 y y i where f h(y) i 1 y = i y i,j wi,j(y) w i,i(y) + w i,j(y) y j w i,j(y) y f h (y) i j i y y j i y Risk SURE Quadratic cost (a) Noisy image y (d) f h (y) with optimal h Bandwidth h C. Deledalle (CNRS/IMB) Risk estimation 25 mars, / 29
17 SURE Simple case: soft-thresholding Example (Soft-Thresholding (ST) [Donoho and Johnstone, 1995, Zou et al., 2007]) y i + λ if y i λ The ST function is defined as: ST(y, λ) i = 0 if λ < y i < λ. y i λ otherwise [ ] { ST(y,λ) 1 if i = j For almost all y, its Jacobian is: y = i,j 0 otherwise and yi > λ. # { Y > λ} is an unbiased estimator of the degree of freedom Value x i Interval ± σ Vector x 0 Threshold λ 0 40 Risk SURE Position index i Threshold λ C. Deledalle (CNRS/IMB) Risk estimation 25 mars, / 29 Quadratic cost
18 SURE Non differentiable case: hard-thresholding Example (Hard-Thresholding (HT)) The HT function is defined as: y i if y i λ HT(y, λ) i = 0 if λ < y i < λ y i otherwise. The HT is not continuous Stein s Lemma does not apply and we cannot estimate R(θ). Can we build a biased estimator? If yes, how the bias evolves with the data dimension? C. Deledalle (CNRS/IMB) Risk estimation 25 mars, / 29
19 SURE Non differentiable case: hard-thresholding Example (Hard-Thresholding (HT)) The HT function is defined as: y i if y i λ HT(y, λ) i = 0 if λ < y i < λ y i otherwise. The HT is not continuous Stein s Lemma does not apply and we cannot estimate R(θ). Can we build a biased estimator? If yes, how the bias evolves with the data dimension? Theorem (Stein Consistent Risk Estimator (SCORE)) Take h N such that lim h N = 0 and lim N 1 h 1 N N N = 0. Then the quantity # { Y > λ} + λ σ 2 + h 2 N 2πσhN N i=1 [ ( ) ( )] exp (Y i +λ)2 2h 2 + exp (Y i λ)2 2h N 2 N is a consistent estimator of the degree of freedom (i.e. convergence in probability). [Deledalle et al., 2013] Deledalle, C., Peyré, G., Fadili, J. (2013). Stein COnsistent Risk Estimator (SCORE) for hard thresholding,. Signal Processing with Adaptive Sparse Structured Representations (SPARS). Lausanne, July 8-11, 2013 C. Deledalle (CNRS/IMB) Risk estimation 25 mars, / 29
20 SURE Non differentiable case: hard-thresholding Example (Hard-Thresholding (HT)) The HT function is defined as: y i if y i λ HT(y, λ) i = 0 if λ < y i < λ y i otherwise. Value x i The HT is not continuous Stein s Lemma does not apply and we cannot estimate R(θ) Can we build a biased estimator? If yes, how the bias evolves with the data dimension? Interval ± σ Vector x 0 Threshold λ Position index i [Deledalle et al., 2013] Deledalle, C., Peyré, G., Fadili, J. (2013). Stein COnsistent Risk Estimator (SCORE) for hard thresholding,. Signal Processing with Adaptive Sparse Structured Representations (SPARS). Lausanne, July 8-11, 2013 Quadratic cost Risk SCORE Jansen s estimator [1] Threshold λ C. Deledalle (CNRS/IMB) Risk estimation 25 mars, / 29
21 SURE Beyond simple cases Example (Favorable examples) For the ST and the HT (separable functions) Jacobian matrices are diagonals Simple derivation of diagonal elements in closed form For some functions (e.g., the non-local means) Jacobian matrices are not diagonals Diagonal elements (and trace) can be computed in closed form Example (Unfavorable examples) In general (e.g., iterative proximal algorithms used in convex regularization such as total-variation) Jacobian is not diagonal Computing its trace requires computing the P P entries of the Jacobian matrix (not thinkable) Use trace estimators C. Deledalle (CNRS/IMB) Risk estimation 25 mars, / 29
22 SURE Monte-Carlo SURE Theorem (Monte-Carlo trace estimator) Let, N (0, Id N ) and A R N N, then tr A = E A. Corollary 1 For N big enough, generate δ R P and approach the sample DOF by [Vonesch et al., 2008] Y x0, f θ(y ) x 0 = tr f θ(y) f θ (y σ σ y δ, f θ (y) y=y y δ where y y δ R P, y 2 or by its finite difference approximation (known as Monte-Carlo SURE [Ramani et al., 2008]) Y x0, f θ(y ) x 0 f θ (y + ε δ ) f θ (y) δ, σ σ ε 1. Requires evaluating only N quantities (compared to N N). X. Not easy to compute in closed-form. 2. Requires only evaluating f θ ( ) twice on y and y + ε δ. X. Choice of ε. C. Deledalle (CNRS/IMB) Risk estimation 25 mars, / 29
23 SURE Case of total-variation Example (Total-Variation [Rudin et al., 1992]) The solution of total variation is given by 1 f θ (y) = argmin x y 2 + λ x x } 2 {{}}{{ 1 } G(x,λ) F (x,y) Reformulation using z = (x, u) in terms of simple function: 1 (x(y), u(y)) = argmin x y 2 + λ u z } ι C (z) {{}}{{}}{{} G F (z,y) 1 (z) G 2 (z) where where x 1 = k ( x) k C = {z = (x, u) \ u = x} C. Deledalle (CNRS/IMB) Risk estimation 25 mars, / 29
24 SURE Case of total-variation Example (Total-Variation [Rudin et al., 1992]) The solution of total variation is given by 1 f θ (y) = argmin x y 2 + λ x x } 2 {{}}{{ 1 } G(x,λ) F (x,y) Reformulation using z = (x, u) in terms of simple function: 1 (x(y), u(y)) = argmin x y 2 + λ u z } ι C (z) {{}}{{}}{{} G F (z,y) 1 (z) G 2 (z) where where x 1 = k ( x) k C = {z = (x, u) \ u = x} Generalized Forward Backward Scheme and Derivatives [Raguet et al., 2011] The following sequence converges to f θ (y) x (l+1) = 1 Q z (l+1) i Q k=1 z (l+1) i Z (l) i = z (l) i x (l) + Prox nγgi (Z (l) i, θ) = 2x (l) z (l) i γ 1F (x (l), y) C. Deledalle (CNRS/IMB) Risk estimation 25 mars, / 29
25 SURE Case of total-variation Example (Total-Variation [Rudin et al., 1992]) The solution of total variation is given by 1 f θ (y) = argmin x y 2 + λ x x } 2 {{}}{{ 1 } G(x,λ) F (x,y) Reformulation using z = (x, u) in terms of simple function: 1 (x(y), u(y)) = argmin x y 2 + λ u z } ι C (z) {{}}{{}}{{} G F (z,y) 1 (z) G 2 (z) where where x 1 = k ( x) k C = {z = (x, u) \ u = x} Generalized Forward Backward Scheme and Derivatives [Raguet et al., 2011] The following sequence converges to f θ (y) x (l+1) = 1 Q z (l+1) i Q k=1 z (l+1) i Z (l) i = z (l) i x (l) + Prox nγgi (Z (l) i, θ) = 2x (l) z (l) i γ 1F (x (l), y) Computation of SURE associated to x (l) (y) depends on D (l) x = x(l) y [ δ ] y C. Deledalle (CNRS/IMB) Risk estimation 25 mars, / 29
26 SURE Case of total-variation Example (Total-Variation [Rudin et al., 1992]) The solution of total variation is given by 1 f θ (y) = argmin x y 2 + λ x x } 2 {{}}{{ 1 } G(x,λ) F (x,y) Reformulation using z = (x, u) in terms of simple function: 1 (x(y), u(y)) = argmin x y 2 + λ u z } ι C (z) {{}}{{}}{{} G F (z,y) 1 (z) G 2 (z) where where x 1 = k ( x) k C = {z = (x, u) \ u = x} Generalized Forward Backward Scheme and Derivatives [Raguet et al., 2011] The following sequence converges to f θ (y) x (l+1) = 1 Q z (l+1) i Q k=1 where z (l+1) i Z (l) i = z (l) i x (l) + Prox nγgi (Z (l) i, θ) = 2x (l) z (l) i γ 1F (x (l), y) Apply the chain rule D (l+1) x = 1 Q D (l+1) z Q i i=1 D (l+1) z i D (l) Z i = D (l) z i = 2D (l) x D (l) x + G(l) i,x (D(l) Z ) i D(l) z i γ(f (l) x F (l) x (.) = 1 1F (x(l), y)[.] and F (l) y (.) = 2 1F (x(l), y)[.], G (l) i,x (.) = 1 Prox nγg i (Z (l) i )[.] Computation of SURE associated to x (l) (y) depends on D (l) x = x(l) y [ δ ] y (D(l) x ) + F (l) y ( δ )) C. Deledalle (CNRS/IMB) Risk estimation 25 mars, / 29
27 SURE Case of total-variation Example (Total-Variation [Rudin et al., 1992]) The solution of total variation is given by 1 f θ (y) = argmin x y 2 + λ x x } 2 {{}}{{ 1 } G(x,λ) F (x,y) Reformulation using z = (x, u) in terms of simple function: 1 (x(y), u(y)) = argmin x y 2 + λ u z } ι C (z) {{}}{{}}{{} G F (z,y) 1 (z) G 2 (z) Generalized Forward Backward Scheme and Derivatives The gradient and proximal operators are Their derivatives are 1F (z, y) = (x y, 0) Prox τg1 (z) = (x, ST(u, λτ)) where where x 1 = k ( x) k C = {z = (x, u) \ u = x} Prox τg2 (z) = ((Id + ) 1 (u + div x), (Id + ) 1 (u + div x)) 1 1F (z, y)[ δ x, δ u] = ( δ x, 0) 2 1F (z, y)[ δ y] = ( δ y, 0) 1 Prox τg1 (z)[ δ x, δ u] = ( δ x, 1ST(u, λτ)[ δ u]) 1 Prox τg2 (z)[ δ x, δ u] = ((Id + ) 1 ( δ u + div δ x), (Id + ) 1 ( δ u + div δ x)) C. Deledalle (CNRS/IMB) Risk estimation 25 mars, / 29
28 SURE Case of total-variation Example (Total-Variation [Rudin et al., 1992]) The solution of total variation is given by 1 f θ (y) = argmin x y 2 + λ x x } 2 {{}}{{ 1 } G(x,λ) F (x,y) Reformulation using z = (x, u) in terms of simple function: 1 (x(y), u(y)) = argmin x y 2 + λ u z } ι C (z) {{}}{{}}{{} G F (z,y) 1 (z) G 2 (z) where where x 1 = k ( x) k C = {z = (x, u) \ u = x} Risk SURE Quadratic cost (a) Noisy image y (b) f h (y) with small h Parameter λ [Deledalle et al., 2012] Deledalle, C.-A. and Vaiter, S. and Peyré, G. and Fadili, J. and Dossal, C. Proximal Splitting Derivatives for Risk Estimation,. Int. Workshop on New Computational Methods for Inverse Problems (NCMIP), Cachan, France, May C. Deledalle (CNRS/IMB) Risk estimation 25 mars, / 29
29 SURE Case of total-variation Example (Total-Variation [Rudin et al., 1992]) The solution of total variation is given by 1 f θ (y) = argmin x y 2 + λ x x } 2 {{}}{{ 1 } G(x,λ) F (x,y) Reformulation using z = (x, u) in terms of simple function: 1 (x(y), u(y)) = argmin x y 2 + λ u z } ι C (z) {{}}{{}}{{} G F (z,y) 1 (z) G 2 (z) where where x 1 = k ( x) k C = {z = (x, u) \ u = x} Risk SURE Quadratic cost (a) Noisy image y (c) f h (y) with large h Parameter λ [Deledalle et al., 2012] Deledalle, C.-A. and Vaiter, S. and Peyré, G. and Fadili, J. and Dossal, C. Proximal Splitting Derivatives for Risk Estimation,. Int. Workshop on New Computational Methods for Inverse Problems (NCMIP), Cachan, France, May C. Deledalle (CNRS/IMB) Risk estimation 25 mars, / 29
30 SURE Case of total-variation Example (Total-Variation [Rudin et al., 1992]) The solution of total variation is given by 1 f θ (y) = argmin x y 2 + λ x x } 2 {{}}{{ 1 } G(x,λ) F (x,y) Reformulation using z = (x, u) in terms of simple function: 1 (x(y), u(y)) = argmin x y 2 + λ u z } ι C (z) {{}}{{}}{{} G F (z,y) 1 (z) G 2 (z) where where x 1 = k ( x) k C = {z = (x, u) \ u = x} Risk SURE Quadratic cost (a) Noisy image y (d) f h (y) with optimal h Parameter λ [Deledalle et al., 2012] Deledalle, C.-A. and Vaiter, S. and Peyré, G. and Fadili, J. and Dossal, C. Proximal Splitting Derivatives for Risk Estimation,. Int. Workshop on New Computational Methods for Inverse Problems (NCMIP), Cachan, France, May C. Deledalle (CNRS/IMB) Risk estimation 25 mars, / 29
31 Outline 1 Global selection based on Stein s Unbiased Risk estimator Stein s Unbiased Risk Estimator (SURE) Simple cases: non-local means, soft-thresholding Non differentiable cases: hard-thresholding Monte-Carlo SURE Case of total-variation 2 Local selection with risk estimation and bias reduction Risk estimation: SURE filtering Case of anisotropic non-local means Bias reduction-variance estimation approach Case of anisotropic Gaussian convolutions Case of non-local means Case of anisotropic total-variation 3 Conclusion C. Deledalle (CNRS/IMB) Risk estimation 25 mars, / 29
32 Risk estimation: SURE filtering Motivation Do the same thing but for each pixel i independently. Assume y f θ (y) is as in Stein s lemma, then SURE(y, θ) i = (f θ (y) i y i) 2 σ 2 +2σ 2 f θ (y) i }{{} y i y Sample fidelity }{{} Sample DoF We have E Y [SURE(Y, θ) i] = R(θ) i = E Y (f θ (y) i x 0i ) 2. C. Deledalle (CNRS/IMB) Risk estimation 25 mars, / 29
33 Risk estimation: SURE filtering Motivation Do the same thing but for each pixel i independently. Assume y f θ (y) is as in Stein s lemma, then SURE(y, θ) i = (f θ (y) i y i) 2 σ 2 +2σ 2 f θ (y) i }{{} y i y Sample fidelity }{{} Sample DoF We have E Y [SURE(Y, θ) i] = R(θ) i = E Y (f θ (y) i x 0i ) 2. But the law of large numbers does not apply anymore! + (a) Risk (b) SURE (c) Sample fidelity (d) Sample DoF First idea: can we regularize the SURE map? C. Deledalle (CNRS/IMB) Risk estimation 25 mars, / 29
34 Risk estimation: SURE filtering SURE filtering The local sample DoF has lower variance than the local sample fidelity. Perform filtering of the SURE map guided by the sample DoF. + (a) Risk (b) SURE (c) Sample fidelity (d) Sample DoF + (a) Filtered SURE (b) Filtered fidelity (c) Filtered DoF C. Deledalle (CNRS/IMB) Risk estimation 25 mars, / 29
35 Risk estimation: SURE filtering Example (Non-local means with oriented patches) Orientation of patches should be spatially adapted to the image content C. Deledalle (CNRS/IMB) Risk estimation 25 mars, / 29
36 Risk estimation: SURE filtering Example (Non-local means with oriented patches) Estimate the map of SURE associated to all estimators Figure: Noisy SURE [Deledalle et al., 2011b] Deledalle, C., Duval, V., and Salmon, J. (2011b). Non-local methods with shape-adaptive patches (NLM-SAP). Journal of Mathematical Imaging and Vision. C. Deledalle (CNRS/IMB) Risk estimation 25 mars, / 29
37 Risk estimation: SURE filtering Example (Non-local means with oriented patches) Estimate the map of SURE associated to all estimators Figure: Noisy SURE [Deledalle et al., 2011b] Deledalle, C., Duval, V., and Salmon, J. (2011b). Non-local methods with shape-adaptive patches (NLM-SAP). Journal of Mathematical Imaging and Vision. C. Deledalle (CNRS/IMB) Risk estimation 25 mars, / 29
38 Risk estimation: SURE filtering Example (Non-local means with oriented patches) Estimate the map of SURE associated to all estimators Figure: Noisy SURE [Deledalle et al., 2011b] Deledalle, C., Duval, V., and Salmon, J. (2011b). Non-local methods with shape-adaptive patches (NLM-SAP). Journal of Mathematical Imaging and Vision. C. Deledalle (CNRS/IMB) Risk estimation 25 mars, / 29
39 Risk estimation: SURE filtering Example (Non-local means with oriented patches) Estimate the map of SURE associated to all estimators Figure: Noisy SURE [Deledalle et al., 2011b] Deledalle, C., Duval, V., and Salmon, J. (2011b). Non-local methods with shape-adaptive patches (NLM-SAP). Journal of Mathematical Imaging and Vision. C. Deledalle (CNRS/IMB) Risk estimation 25 mars, / 29
40 Risk estimation: SURE filtering Example (Non-local means with oriented patches) Estimate the map of SURE associated to all estimators Regularize the estimation guided by the sample DoF Figure: Regularized SURE [Deledalle et al., 2011b] Deledalle, C., Duval, V., and Salmon, J. (2011b). Non-local methods with shape-adaptive patches (NLM-SAP). Journal of Mathematical Imaging and Vision. C. Deledalle (CNRS/IMB) Risk estimation 25 mars, / 29
41 Risk estimation: SURE filtering Example (Non-local means with oriented patches) Estimate the map of SURE associated to all estimators Regularize the estimation guided by the sample DoF Combine the estimates using a convex aggregation Figure: Regularized SURE e.g. the Exponential Weighted Aggregation [Leung and Barron, 2006] [Deledalle et al., 2011b] Deledalle, C., Duval, V., and Salmon, J. (2011b). Non-local methods with shape-adaptive patches (NLM-SAP). Journal of Mathematical Imaging and Vision. C. Deledalle (CNRS/IMB) Risk estimation 25 mars, / 29
42 Risk estimation: SURE filtering Example (Non-local means with oriented patches) Yaroslavsky (a) 15 pie sizes/shapes Anisotropic diffusion (b) Regularized SURE (c) Patch orientations [Deledalle et al., 2011b] Deledalle, C., Duval, V., and Salmon, J. (2011b). Non-local methods with shape-adaptive patches (NLM-SAP). Journal of Mathematical Imaging and Vision. C. Deledalle (CNRS/IMB) Risk estimation 25 mars, / 29
43 Risk estimation: SURE filtering Example (Non-local means with oriented patches) (a) NL Means (b) BM3D [Dabov et al., 2007] (c) BM3D [Dabov et al., 2007] (d) [Goossens et al., 2008] (e) Our approach (f) Our approach [Deledalle et al., 2011b] Deledalle, C., Duval, V., and Salmon, J. (2011b). Non-local methods with shape-adaptive patches (NLM-SAP). Journal of Mathematical Imaging and Vision. C. Deledalle (CNRS/IMB) Risk estimation 25 mars, / 29
44 Risk estimation: SURE filtering Limits of the SURE based approach Local selection relies on a risk filter Difficulty to build such filter, Difficulty to provide guaranties. Difficulty to extend to other noise models. Beyond risk estimation ( ) ( ) R(θ) i = E Y (f θ (Y ) i Y i) 2 σ 2 +2σ 2 Yi (x 0) i fθ (Y i) (x 0) i }{{} σ σ Fidelity }{{} Degree of freedom = (E Y [f θ (Y ) i] (x 0) i) E Y (f θ (Y ) i E Y [f θ (Y ) i]) }{{}}{{} Bias 2 Variance The degree of freedom can relatively be well estimated locally. The variance also, typically using the propagation of uncertainty formula: E Y (f θ (Y ) i E Y [f θ (Y ) i]) 2 }{{} Variance E Y [( fθ (y) y ) t ( )] fθ (y) y=y y y=y i,i C. Deledalle (CNRS/IMB) Risk estimation 25 mars, / 29
45 Risk estimation: SURE filtering Limits of the SURE based approach Local selection relies on a risk filter Difficulty to build such filter, Difficulty to provide guaranties. Difficulty to extend to other noise models. Cost Parameter θ Beyond risk estimation But the fidelity and the bias are difficult to locally quantify (see also [Kervrann & Boulanger, 2008]) Since bias cannot be estimated properly, can we cancel or reduce it before selection? Modify all estimators f θ to f θ in order to Debiased their solutions and then select the one with smaller variance R(θ) i = (E Y [ f ) 2 ( θ (Y ) i] (x 0) i + E Y fθ (Y ) i E Y [ f ) 2 θ (Y ) i] } {{ } } {{ } Bias 2 Variance Improve their bias-variance trade-off and select the one with smaller variance Bias 2 Variance R(θ) i 2 Variance C. Deledalle (CNRS/IMB) Risk estimation 25 mars, / 29
46 Risk estimation: SURE filtering Limits of the SURE based approach Local selection relies on a risk filter Difficulty to build such filter, Difficulty to provide guaranties. Difficulty to extend to other noise models. Cost Parameter θ Beyond risk estimation But the fidelity and the bias are difficult to locally quantify (see also [Kervrann & Boulanger, 2008]) Since bias cannot be estimated properly, can we cancel or reduce it before selection? Modify all estimators f θ to f θ in order to Debiased their solutions and then select the one with smaller variance R(θ) i = (E Y [ f ) 2 ( θ (Y ) i] (x 0) i + E Y fθ (Y ) i E Y [ f ) 2 θ (Y ) i] } {{ } } {{ } Bias 2 Variance Improve their bias-variance trade-off and select the one with smaller variance Bias 2 Variance R(θ) i 2 Variance Remark: The bias-variance decomposition still holds true for non-gaussian noises. C. Deledalle (CNRS/IMB) Risk estimation 25 mars, / 29
47 Bias reduction-variance estimation approach Bias detection Denoiser essentially performs averages: convolutions, anisotropic diffusion, non-local means, tv, bm3d... For y fixed, their solution can be written as f θ (y) = W y + b = w i,jy j + b i where w i,j = 1 j j Weight w i,j attempts to select pixel j if (x 0) j = (x 0) i (i.e. Y j i.i.d. with Y i), hence f θ (y) i = w i,jy j E[Y i] = (x 0) i C. Deledalle (CNRS/IMB) Risk estimation 25 mars, / 29
48 Bias reduction-variance estimation approach Bias detection Denoiser essentially performs averages: convolutions, anisotropic diffusion, non-local means, tv, bm3d... For y fixed, their solution can be written as f θ (y) = W y + b = w i,jy j + b i where w i,j = 1 j j Weight w i,j attempts to select pixel j if (x 0) j = (x 0) i (i.e. Y j i.i.d. with Y i), hence f θ (y) i = w i,jy j E[Y i] = (x 0) i Hence, the variance of these samples should be equal to σ 2 whatever (x 0) i: σ 2 θ (y)i = ( w i,jy 2 j ) 2 wi,jy j Var[Yi] = σ 2 C. Deledalle (CNRS/IMB) Risk estimation 25 mars, / 29
49 Bias reduction-variance estimation approach Bias detection Denoiser essentially performs averages: convolutions, anisotropic diffusion, non-local means, tv, bm3d... For y fixed, their solution can be written as f θ (y) = W y + b = w i,jy j + b i where w i,j = 1 j j Weight w i,j attempts to select pixel j if (x 0) j = (x 0) i (i.e. Y j i.i.d. with Y i), hence f θ (y) i = w i,jy j E[Y i] = (x 0) i Hence, the variance of these samples should be equal to σ 2 whatever (x 0) i: σ 2 θ (y)i = ( w i,jy 2 j ) 2 wi,jy j Var[Yi] = σ 2 C. Deledalle (CNRS/IMB) Risk estimation 25 mars, / 29
50 Bias reduction-variance estimation approach Bias detection Denoiser essentially performs averages: convolutions, anisotropic diffusion, non-local means, tv, bm3d... For y fixed, their solution can be written as f θ (y) = W y + b = w i,jy j + b i where w i,j = 1 j j Weight w i,j attempts to select pixel j if (x 0) j = (x 0) i (i.e. Y j i.i.d. with Y i), hence f θ (y) i = w i,jy j E[Y i] = (x 0) i Hence, the variance of these samples should be equal to σ 2 whatever (x 0) i: σ 2 θ (y)i = ( w i,jy 2 j ) 2 wi,jy j Var[Yi] = σ 2 Otherwise: σ 2 θ (y)i σ2 indication of bias: (x 0) j (x 0) i C. Deledalle (CNRS/IMB) Risk estimation 25 mars, / 29
51 Bias reduction-variance estimation approach Bias detection Denoiser essentially performs averages: convolutions, anisotropic diffusion, non-local means, tv, bm3d... For y fixed, their solution can be written as f θ (y) = W y + b = w i,jy j + b i where w i,j = 1 j j Weight w i,j attempts to select pixel j if (x 0) j = (x 0) i (i.e. Y j i.i.d. with Y i), hence f θ (y) i = w i,jy j E[Y i] = (x 0) i Hence, the variance of these samples should be equal to σ 2 whatever (x 0) i: σ 2 θ (y)i = ( w i,jy 2 j ) 2 wi,jy j Var[Yi] = σ 2 Otherwise: σ 2 θ (y)i σ2 indication of bias: (x 0) j (x 0) i (a) Noisy image y (b) f θ (y): convolution (c) Bias indicator: σ 2 θ (y) C. Deledalle (CNRS/IMB) Risk estimation 25 mars, / 29
52 Bias reduction-variance estimation approach Bias detection Denoiser essentially performs averages: convolutions, anisotropic diffusion, non-local means, tv, bm3d... For y fixed, their solution can be written as f θ (y) = W y + b = w i,jy j + b i where w i,j = 1 j j Weight w i,j attempts to select pixel j if (x 0) j = (x 0) i (i.e. Y j i.i.d. with Y i), hence f θ (y) i = w i,jy j E[Y i] = (x 0) i Hence, the variance of these samples should be equal to σ 2 whatever (x 0) i: σ 2 θ (y)i = ( w i,jy 2 j ) 2 wi,jy j Var[Yi] = σ 2 Otherwise: σ 2 θ (y)i σ2 indication of bias: (x 0) j (x 0) i (a) Noisy image y (b) f θ (y): convolution (c) Bias indicator: σ 2 θ (y) C. Deledalle (CNRS/IMB) Risk estimation 25 mars, / 29
53 Bias reduction-variance estimation approach Bias detection Denoiser essentially performs averages: convolutions, anisotropic diffusion, non-local means, tv, bm3d... For y fixed, their solution can be written as f θ (y) = W y + b = w i,jy j + b i where w i,j = 1 j j Weight w i,j attempts to select pixel j if (x 0) j = (x 0) i (i.e. Y j i.i.d. with Y i), hence f θ (y) i = w i,jy j E[Y i] = (x 0) i Hence, the variance of these samples should be equal to σ 2 whatever (x 0) i: σ 2 θ (y)i = ( w i,jy 2 j ) 2 wi,jy j Var[Yi] = σ 2 Otherwise: σ 2 θ (y)i σ2 indication of bias: (x 0) j (x 0) i (a) Noisy image y (b) f θ (y): non-local means (c) Bias indicator: σ 2 θ (y) C. Deledalle (CNRS/IMB) Risk estimation 25 mars, / 29
54 Bias reduction-variance estimation approach Bias detection Denoiser essentially performs averages: convolutions, anisotropic diffusion, non-local means, tv, bm3d... For y fixed, their solution can be written as f θ (y) = W y + b = w i,jy j + b i where w i,j = 1 j j Weight w i,j attempts to select pixel j if (x 0) j = (x 0) i (i.e. Y j i.i.d. with Y i), hence f θ (y) i = w i,jy j E[Y i] = (x 0) i Hence, the variance of these samples should be equal to σ 2 whatever (x 0) i: σ 2 θ (y)i = ( w i,jy 2 j ) 2 wi,jy j Var[Yi] = σ 2 Otherwise: σ 2 θ (y)i σ2 indication of bias: (x 0) j (x 0) i (a) Noisy image y (b) f θ (y): non-local means (c) Bias indicator: σ 2 θ (y) C. Deledalle (CNRS/IMB) Risk estimation 25 mars, / 29
55 Bias reduction-variance estimation approach Bias detection Denoiser essentially performs averages: convolutions, anisotropic diffusion, non-local means, tv, bm3d... For y fixed, their solution can be written as f θ (y) = W y + b = w i,jy j + b i where w i,j = 1 j j Weight w i,j attempts to select pixel j if (x 0) j = (x 0) i (i.e. Y j i.i.d. with Y i), hence f θ (y) i = w i,jy j E[Y i] = (x 0) i Hence, the variance of these samples should be equal to σ 2 whatever (x 0) i: σ 2 θ (y)i = ( w i,jy 2 j ) 2 wi,jy j Var[Yi] = σ 2 Otherwise: σ 2 θ (y)i σ2 indication of bias: (x 0) j (x 0) i (a) Noisy image y (b) f θ (y): total-variation (c) Bias indicator: σ 2 θ (y) C. Deledalle (CNRS/IMB) Risk estimation 25 mars, / 29
56 Bias reduction-variance estimation approach Bias detection Denoiser essentially performs averages: convolutions, anisotropic diffusion, non-local means, tv, bm3d... For y fixed, their solution can be written as f θ (y) = W y + b = w i,jy j + b i where w i,j = 1 j j Weight w i,j attempts to select pixel j if (x 0) j = (x 0) i (i.e. Y j i.i.d. with Y i), hence f θ (y) i = w i,jy j E[Y i] = (x 0) i Hence, the variance of these samples should be equal to σ 2 whatever (x 0) i: σ 2 θ (y)i = ( w i,jy 2 j ) 2 wi,jy j Var[Yi] = σ 2 Otherwise: σ 2 θ (y)i σ2 indication of bias: (x 0) j (x 0) i (a) Noisy image y (b) f θ (y): total-variation (c) Bias indicator: σ 2 θ (y) C. Deledalle (CNRS/IMB) Risk estimation 25 mars, / 29
57 Bias reduction-variance estimation approach Bias detection Denoiser essentially performs averages: convolutions, anisotropic diffusion, non-local means, tv, bm3d... For y fixed, their solution can be written as f θ (y) = W y + b = w i,jy j + b i where w i,j = 1 j j Weight w i,j attempts to select pixel j if (x 0) j = (x 0) i (i.e. Y j i.i.d. with Y i), hence f θ (y) i = w i,jy j E[Y i] = (x 0) i Hence, the variance of these samples should be equal to σ 2 whatever (x 0) i: σ 2 θ (y)i = ( w i,jy 2 j ) 2 wi,jy j Var[Yi] = σ 2 Otherwise: σ 2 θ (y)i σ2 indication of bias: (x 0) j (x 0) i (a) Noisy image y (b) f θ (y): total-variation (c) Bias indicator: σ 2 θ (y) C. Deledalle (CNRS/IMB) Risk estimation 25 mars, / 29
58 Bias reduction-variance estimation approach Bias reduction [Lee, 1980, Kuan et al., 1985] Assume the selected samples are realizations of Y = X + N with E N N = 0. The optimal linear estimator minimizing E X,Y [(ay + b X) 2 ] (LMMSE) is The plug-in LMMSE estimator is E Y [Y ] + σ2 Y σ2 N σy 2 (Y E Y [Y ]) f θ (y) i = f θ (y) i + α i(y i f θ (y) i) where α i = σ2 θ (y)i σ2 σ θ 2(y) C. Deledalle (CNRS/IMB) Risk estimation 25 mars, / 29
59 Bias reduction-variance estimation approach Bias reduction [Lee, 1980, Kuan et al., 1985] Assume the selected samples are realizations of Y = X + N with E N N = 0. The optimal linear estimator minimizing E X,Y [(ay + b X) 2 ] (LMMSE) is The plug-in LMMSE estimator is E Y [Y ] + σ2 Y σ2 N σy 2 (Y E Y [Y ]) f θ (y) i = f θ (y) i + α i(y i f θ (y) i) where α i = σ2 θ (y)i σ2 σ θ 2(y) (a) Noisy image y (b) f θ (y): convolution (c) Bias indicator: σ 2 θ (y) (d) Bias reduction: fθ (y) C. Deledalle (CNRS/IMB) Risk estimation 25 mars, / 29
60 Bias reduction-variance estimation approach Bias reduction [Lee, 1980, Kuan et al., 1985] Assume the selected samples are realizations of Y = X + N with E N N = 0. The optimal linear estimator minimizing E X,Y [(ay + b X) 2 ] (LMMSE) is The plug-in LMMSE estimator is E Y [Y ] + σ2 Y σ2 N σy 2 (Y E Y [Y ]) f θ (y) i = f θ (y) i + α i(y i f θ (y) i) where α i = σ2 θ (y)i σ2 σ θ 2(y) (a) Noisy image y (b) f θ (y): convolution (c) Bias indicator: σ 2 θ (y) (d) Bias reduction: fθ (y) C. Deledalle (CNRS/IMB) Risk estimation 25 mars, / 29
61 Bias reduction-variance estimation approach Bias reduction [Lee, 1980, Kuan et al., 1985] Assume the selected samples are realizations of Y = X + N with E N N = 0. The optimal linear estimator minimizing E X,Y [(ay + b X) 2 ] (LMMSE) is The plug-in LMMSE estimator is E Y [Y ] + σ2 Y σ2 N σy 2 (Y E Y [Y ]) f θ (y) i = f θ (y) i + α i(y i f θ (y) i) where α i = σ2 θ (y)i σ2 σ θ 2(y) (a) Noisy image y (b) f θ (y): non-local means (c) Bias indicator: σ 2 θ (y) (d) Bias reduction: fθ (y) C. Deledalle (CNRS/IMB) Risk estimation 25 mars, / 29
62 Bias reduction-variance estimation approach Bias reduction [Lee, 1980, Kuan et al., 1985] Assume the selected samples are realizations of Y = X + N with E N N = 0. The optimal linear estimator minimizing E X,Y [(ay + b X) 2 ] (LMMSE) is The plug-in LMMSE estimator is E Y [Y ] + σ2 Y σ2 N σy 2 (Y E Y [Y ]) f θ (y) i = f θ (y) i + α i(y i f θ (y) i) where α i = σ2 θ (y)i σ2 σ θ 2(y) (a) Noisy image y (b) f θ (y): non-local means (c) Bias indicator: σ 2 θ (y) (d) Bias reduction: fθ (y) C. Deledalle (CNRS/IMB) Risk estimation 25 mars, / 29
63 Bias reduction-variance estimation approach Bias reduction [Lee, 1980, Kuan et al., 1985] Assume the selected samples are realizations of Y = X + N with E N N = 0. The optimal linear estimator minimizing E X,Y [(ay + b X) 2 ] (LMMSE) is The plug-in LMMSE estimator is E Y [Y ] + σ2 Y σ2 N σy 2 (Y E Y [Y ]) f θ (y) i = f θ (y) i + α i(y i f θ (y) i) where α i = σ2 θ (y)i σ2 σ θ 2(y) (a) Noisy image y (b) f θ (y): total-variation (c) Bias indicator: σ 2 θ (y) (d) Bias reduction: fθ (y) C. Deledalle (CNRS/IMB) Risk estimation 25 mars, / 29
64 Bias reduction-variance estimation approach Bias reduction [Lee, 1980, Kuan et al., 1985] Assume the selected samples are realizations of Y = X + N with E N N = 0. The optimal linear estimator minimizing E X,Y [(ay + b X) 2 ] (LMMSE) is The plug-in LMMSE estimator is E Y [Y ] + σ2 Y σ2 N σy 2 (Y E Y [Y ]) f θ (y) i = f θ (y) i + α i(y i f θ (y) i) where α i = σ2 θ (y)i σ2 σ θ 2(y) (a) Noisy image y (b) f θ (y): total-variation (c) Bias indicator: σ 2 θ (y) (d) Bias reduction: fθ (y) C. Deledalle (CNRS/IMB) Risk estimation 25 mars, / 29
65 Bias reduction-variance estimation approach Variance estimation [Kervrann and Boulanger, 2008, Salmon and Strozecki, 2010] The LMMSE aims at reaching a bias variance trade-off such that Bias 2 Variance R(θ) i 2 Variance What is the residual variance of f θ (y)? Recall: f θ (y) = W y + b And: fθ (y) i = f θ (y) i + α i(y i f θ (y) i) Cost Parameter θ Then: fθ (y) i = (1 α i) i j w i,jy j + (w i,i(1 α i) + α i)y i + (1 α i)b i Hence: Var[ f θ (Y ) i] = σ 2[ (1 α i) 2 j w 2 i,j +α2 i +2αi(1 αi)wi,i ] (using Var[γY ] = γ 2 σ 2 ) Define the smoothing strength at i as: σ 2 /Var[ f θ (Y ) i] (the greater the better) C. Deledalle (CNRS/IMB) Risk estimation 25 mars, / 29
66 Bias reduction-variance estimation approach Variance estimation [Kervrann and Boulanger, 2008, Salmon and Strozecki, 2010] The LMMSE aims at reaching a bias variance trade-off such that Bias 2 Variance R(θ) i 2 Variance What is the residual variance of f θ (y)? Recall: f θ (y) = W y + b And: fθ (y) i = f θ (y) i + α i(y i f θ (y) i) Cost Parameter θ Then: fθ (y) i = (1 α i) i j w i,jy j + (w i,i(1 α i) + α i)y i + (1 α i)b i Hence: Var[ f θ (Y ) i] = σ 2[ (1 α i) 2 j w 2 i,j +α2 i +2αi(1 αi)wi,i ] (using Var[γY ] = γ 2 σ 2 ) Define the smoothing strength at i as: σ 2 /Var[ f θ (Y ) i] (the greater the better) (a) Noisy image y (b) f θ (y): convolution (c) Bias reduction: fθ (y) (d) Smoothing strength C. Deledalle (CNRS/IMB) Risk estimation 25 mars, / 29
67 Bias reduction-variance estimation approach Variance estimation [Kervrann and Boulanger, 2008, Salmon and Strozecki, 2010] The LMMSE aims at reaching a bias variance trade-off such that Bias 2 Variance R(θ) i 2 Variance What is the residual variance of f θ (y)? Recall: f θ (y) = W y + b And: fθ (y) i = f θ (y) i + α i(y i f θ (y) i) Cost Parameter θ Then: fθ (y) i = (1 α i) i j w i,jy j + (w i,i(1 α i) + α i)y i + (1 α i)b i Hence: Var[ f θ (Y ) i] = σ 2[ (1 α i) 2 j w 2 i,j +α2 i +2αi(1 αi)wi,i ] (using Var[γY ] = γ 2 σ 2 ) Define the smoothing strength at i as: σ 2 /Var[ f θ (Y ) i] (the greater the better) (a) Noisy image y (b) f θ (y): convolution (c) Bias reduction: fθ (y) (d) Smoothing strength C. Deledalle (CNRS/IMB) Risk estimation 25 mars, / 29
68 Bias reduction-variance estimation approach Variance estimation [Kervrann and Boulanger, 2008, Salmon and Strozecki, 2010] The LMMSE aims at reaching a bias variance trade-off such that Bias 2 Variance R(θ) i 2 Variance What is the residual variance of f θ (y)? Recall: f θ (y) = W y + b And: fθ (y) i = f θ (y) i + α i(y i f θ (y) i) Cost Parameter θ Then: fθ (y) i = (1 α i) i j w i,jy j + (w i,i(1 α i) + α i)y i + (1 α i)b i Hence: Var[ f θ (Y ) i] = σ 2[ (1 α i) 2 j w 2 i,j +α2 i +2αi(1 αi)wi,i ] (using Var[γY ] = γ 2 σ 2 ) Define the smoothing strength at i as: σ 2 /Var[ f θ (Y ) i] (the greater the better) (a) Noisy image y (b) f θ (y): non-local means (c) Bias reduction: fθ (y) (d) Smoothing strength C. Deledalle (CNRS/IMB) Risk estimation 25 mars, / 29
69 Bias reduction-variance estimation approach Variance estimation [Kervrann and Boulanger, 2008, Salmon and Strozecki, 2010] The LMMSE aims at reaching a bias variance trade-off such that Bias 2 Variance R(θ) i 2 Variance What is the residual variance of f θ (y)? Recall: f θ (y) = W y + b And: fθ (y) i = f θ (y) i + α i(y i f θ (y) i) Cost Parameter θ Then: fθ (y) i = (1 α i) i j w i,jy j + (w i,i(1 α i) + α i)y i + (1 α i)b i Hence: Var[ f θ (Y ) i] = σ 2[ (1 α i) 2 j w 2 i,j +α2 i +2αi(1 αi)wi,i ] (using Var[γY ] = γ 2 σ 2 ) Define the smoothing strength at i as: σ 2 /Var[ f θ (Y ) i] (the greater the better) (a) Noisy image y (b) f θ (y): non-local means (c) Bias reduction: fθ (y) (d) Smoothing strength C. Deledalle (CNRS/IMB) Risk estimation 25 mars, / 29
70 Bias reduction-variance estimation approach Variance estimation [Kervrann and Boulanger, 2008, Salmon and Strozecki, 2010] The LMMSE aims at reaching a bias variance trade-off such that Bias 2 Variance R(θ) i 2 Variance What is the residual variance of f θ (y)? Recall: f θ (y) = W y + b And: fθ (y) i = f θ (y) i + α i(y i f θ (y) i) Cost Parameter θ Then: fθ (y) i = (1 α i) i j w i,jy j + (w i,i(1 α i) + α i)y i + (1 α i)b i Hence: Var[ f θ (Y ) i] = σ 2[ (1 α i) 2 j w 2 i,j +α2 i +2αi(1 αi)wi,i ] (using Var[γY ] = γ 2 σ 2 ) Define the smoothing strength at i as: σ 2 /Var[ f θ (Y ) i] (the greater the better) (a) Noisy image y (b) f θ (y): total-variation (c) Bias reduction: fθ (y) (d) Smoothing strength C. Deledalle (CNRS/IMB) Risk estimation 25 mars, / 29
71 Bias reduction-variance estimation approach Variance estimation [Kervrann and Boulanger, 2008, Salmon and Strozecki, 2010] The LMMSE aims at reaching a bias variance trade-off such that Bias 2 Variance R(θ) i 2 Variance What is the residual variance of f θ (y)? Recall: f θ (y) = W y + b And: fθ (y) i = f θ (y) i + α i(y i f θ (y) i) Cost Parameter θ Then: fθ (y) i = (1 α i) i j w i,jy j + (w i,i(1 α i) + α i)y i + (1 α i)b i Hence: Var[ f θ (Y ) i] = σ 2[ (1 α i) 2 j w 2 i,j +α2 i +2αi(1 αi)wi,i ] (using Var[γY ] = γ 2 σ 2 ) Define the smoothing strength at i as: σ 2 /Var[ f θ (Y ) i] (the greater the better) (a) Noisy image y (b) f θ (y): total-variation (c) Bias reduction: fθ (y) (d) Smoothing strength C. Deledalle (CNRS/IMB) Risk estimation 25 mars, / 29
72 Bias reduction-variance estimation approach Variance estimation [Kervrann and Boulanger, 2008, Salmon and Strozecki, 2010] The LMMSE aims at reaching a bias variance trade-off such that Bias 2 Variance R(θ) i 2 Variance What is the residual variance of f θ (y)? Recall: f θ (y) = W y + b And: fθ (y) i = f θ (y) i + α i(y i f θ (y) i) Cost Parameter θ Then: fθ (y) i = (1 α i) i j w i,jy j + (w i,i(1 α i) + α i)y i + (1 α i)b i Hence: Var[ f θ (Y ) i] = σ 2[ (1 α i) 2 j w 2 i,j +α2 i +2αi(1 αi)wi,i ] (using Var[γY ] = γ 2 σ 2 ) Define the smoothing strength at i as: σ 2 /Var[ f θ (Y ) i] (the greater the better) (a) Noisy image y (b) f θ (y): total-variation (c) Bias reduction: fθ (y) (d) Smoothing strength C. Deledalle (CNRS/IMB) Risk estimation 25 mars, / 29
73 Our approach: Case of anisotropic Gaussian convolutions Summary Choose different set of parameters θ (1), θ (1),... For each set θ: Compute f θ (y), Compute j wi,jyj, j wi,jy2 j, j w2 i,j Deduce f θ (y) i and Var[ f θ (Y ) i] Perform local selection: choose at each pixel i, the set of parameter such that f i (y)i = f θ (y) i where θ i i = argmin Var[ f θ (Y ) i] θ C. Deledalle (CNRS/IMB) Risk estimation 25 mars, / 29
74 Our approach: Case of anisotropic Gaussian convolutions Summary Choose different set of parameters θ (1), θ (1),... For each set θ: Compute f θ (y), Compute j wi,jyj, j wi,jy2 j, j w2 i,j Deduce f θ (y) i and Var[ f θ (Y ) i] Perform local selection: choose at each pixel i, the set of parameter such that f i (y)i = f θ (y) i where θ i i = argmin Var[ f θ (Y ) i] θ Example (Gaussian convolutions) Perform several Gaussian convolutions with different anisotropies, sizes and orientations. Weights w i,j are given by the associated Gaussian kernel function. (a) Noisy image y (b) One estimator: f θ (y) C. Deledalle (CNRS/IMB) Risk estimation 25 mars, / 29
75 Our approach: Case of anisotropic Gaussian convolutions Summary Choose different set of parameters θ (1), θ (1),... For each set θ: Compute f θ (y), Compute j wi,jyj, j wi,jy2 j, j w2 i,j Deduce f θ (y) i and Var[ f θ (Y ) i] Perform local selection: choose at each pixel i, the set of parameter such that f i (y)i = f θ (y) i where θ i i = argmin Var[ f θ (Y ) i] θ Example (Gaussian convolutions) Perform several Gaussian convolutions with different anisotropies, sizes and orientations. Weights w i,j are given by the associated Gaussian kernel function. (a) Noisy image y (b) One estimator: f θ (y) C. Deledalle (CNRS/IMB) Risk estimation 25 mars, / 29
76 Our approach: Case of anisotropic Gaussian convolutions Summary Choose different set of parameters θ (1), θ (1),... For each set θ: Compute f θ (y), Compute j wi,jyj, j wi,jy2 j, j w2 i,j Deduce f θ (y) i and Var[ f θ (Y ) i] Perform local selection: choose at each pixel i, the set of parameter such that f i (y)i = f θ (y) i where θ i i = argmin Var[ f θ (Y ) i] θ Example (Gaussian convolutions) Perform several Gaussian convolutions with different anisotropies, sizes and orientations. Weights w i,j are given by the associated Gaussian kernel function. (a) Noisy image y (b) One estimator: f θ (y) (c) Final result: f (y) (d) Smoothing strength C. Deledalle (CNRS/IMB) Risk estimation 25 mars, / 29
77 Our approach: Case of anisotropic Gaussian convolutions Summary Choose different set of parameters θ (1), θ (1),... For each set θ: Compute f θ (y), Compute j wi,jyj, j wi,jy2 j, j w2 i,j Deduce f θ (y) i and Var[ f θ (Y ) i] Perform local selection: choose at each pixel i, the set of parameter such that f i (y)i = f θ (y) i where θ i i = argmin Var[ f θ (Y ) i] θ Example (Gaussian convolutions) Perform several Gaussian convolutions with different anisotropies, sizes and orientations. Weights w i,j are given by the associated Gaussian kernel function. (a) Noisy image y (b) One estimator: f θ (y) (c) Final result: f (y) (d) Scale C. Deledalle (CNRS/IMB) Risk estimation 25 mars, / 29
78 Our approach: Case of anisotropic Gaussian convolutions Summary Choose different set of parameters θ (1), θ (1),... For each set θ: Compute f θ (y), Compute j wi,jyj, j wi,jy2 j, j w2 i,j Deduce f θ (y) i and Var[ f θ (Y ) i] Perform local selection: choose at each pixel i, the set of parameter such that f i (y)i = f θ (y) i where θ i i = argmin Var[ f θ (Y ) i] θ Example (Gaussian convolutions) Perform several Gaussian convolutions with different anisotropies, sizes and orientations. Weights w i,j are given by the associated Gaussian kernel function. (a) Noisy image y (b) One estimator: f θ (y) (c) Final result: f (y) (d) Orientation C. Deledalle (CNRS/IMB) Risk estimation 25 mars, / 29
79 Our approach: Case of anisotropic Gaussian convolutions Summary Choose different set of parameters θ (1), θ (1),... For each set θ: Compute f θ (y), Compute j wi,jyj, j wi,jy2 j, j w2 i,j Deduce f θ (y) i and Var[ f θ (Y ) i] Perform local selection: choose at each pixel i, the set of parameter such that f i (y)i = f θ (y) i where θ i i = argmin Var[ f θ (Y ) i] θ Example (Gaussian convolutions) Perform several Gaussian convolutions with different anisotropies, sizes and orientations. Weights w i,j are given by the associated Gaussian kernel function. (a) Noisy image y (b) One estimator: f θ (y) (c) Final result: f (y) (d) Anisotropy C. Deledalle (CNRS/IMB) Risk estimation 25 mars, / 29
80 Our approach: Case of non-local means Example (Non-local means) Perform several non-local means with different patch size, window size and prefiltering Weights w i,j are given as the result of patch comparison. LMMSE Smoothing Estimates } {{ } A small sample of estimates obtained with different parameters } {{ } Local selection Remark: None of the parameters can preserve all kind of structures. Unlike SURE this approach adapts straight-forwardly to gamma or Poisson noises. C. Deledalle (CNRS/IMB) Risk estimation 25 mars, / 29
81 Our approach: Case of non-local means Example (Non-local means on simulated data (High noise level)) (a) (b) (c) (a) Noisy image. (b) Result of the adaptive approach. (c) From left to right, top to bottom: smoothing strength (range: [0, 20 20]), search window sizes (range: [0, 20 20]), the patch size (range: [3 3, 11 11]), prefiltering strength (range: [1, 3]). C. Deledalle (CNRS/IMB) Risk estimation 25 mars, / 29
82 Our approach: Case of non-local means Example (Non-local means on simulated data (Low noise level)) (a) (b) (c) (a) Noisy image. (b) Result of the adaptive approach. (c) From left to right, top to bottom: smoothing strength (range: [0, 20 20]), search window sizes (range: [0, 20 20]), the patch size (range: [3 3, 11 11]), prefiltering strength (range: [1, 3]). C. Deledalle (CNRS/IMB) Risk estimation 25 mars, / 29
83 Our approach: Case of non-local means Example (Non-local means on on Polarimetric data) c DLR (a) High-resolution S-band SAR image (b) Adaptive estimation [Deledalle et al., 2013] Deledalle, C., Denis, L., Tupin, F., A. Reigber, M. Ja ger (2013). NL-SAR: a unified Non-Local framework for resolution-preserving (Pol)(In)SAR denoising, Technical report HAL, hal C. Deledalle (CNRS/IMB) Risk estimation 25 mars, / 29
84 Our approach: Case of non-local means Example (Non-local means on on Polarimetric data) (a) Smoothing strength (b) Adaptive estimation [Deledalle et al., 2013] Deledalle, C., Denis, L., Tupin, F., A. Reigber, M. Jäger (2013). NL-SAR: a unified Non-Local framework for resolution-preserving (Pol)(In)SAR denoising, Technical report HAL, hal C. Deledalle (CNRS/IMB) Risk estimation 25 mars, / 29
85 Our approach: Case of anisotropic total-variation Example (Anisotropic Total-Variation) Recall the solution of anisotropic total variation is given by f θ (y) = argmin x 1 2 x y 2 + λ x 1 where x 1 = k ( h x) k + ( vx) k and can be computed iteratively by a proximal algorithm. Perform several total-variation with different regularization parameter λ. (a) Target image x 0 (b) Noisy image y (c) Final result: f (y) (d) Parameter λ C. Deledalle (CNRS/IMB) Risk estimation 25 mars, / 29
86 Our approach: Case of anisotropic total-variation Example (Anisotropic Total-Variation) Recall the solution of anisotropic total variation is given by f θ (y) = argmin x 1 2 x y 2 + λ x 1 where x 1 = k ( h x) k + ( vx) k and can be computed iteratively by a proximal algorithm. Perform several total-variation with different regularization parameter λ. (a) Target image x 0 (b) Noisy image y (c) Final result: f (y) (d) Smoothing strength C. Deledalle (CNRS/IMB) Risk estimation 25 mars, / 29
87 Our approach: Case of anisotropic total-variation Example (Anisotropic Total-Variation) Recall the solution of anisotropic total variation is given by f θ (y) = argmin x 1 2 x y 2 + λ x 1 where x 1 = k ( h x) k + ( vx) k and can be computed iteratively by a proximal algorithm. Perform several total-variation with different regularization parameter λ. (a) Target image x 0 (b) Noisy image y (c) Final result: f (y) (d) Parameter λ C. Deledalle (CNRS/IMB) Risk estimation 25 mars, / 29
88 Our approach: Case of anisotropic total-variation Example (Anisotropic Total-Variation) Recall the solution of anisotropic total variation is given by f θ (y) = argmin x 1 2 x y 2 + λ x 1 where x 1 = k ( h x) k + ( vx) k and can be computed iteratively by a proximal algorithm. Perform several total-variation with different regularization parameter λ. (a) Target image x 0 (b) Noisy image y (c) Final result: f (y) (d) Smoothing strength C. Deledalle (CNRS/IMB) Risk estimation 25 mars, / 29
89 Our approach: Case of anisotropic total-variation Example (Anisotropic Total-Variation) Recall the solution of anisotropic total variation is given by f θ (y) = argmin x 1 2 x y 2 + λ x 1 where x 1 = k ( h x) k + ( vx) k and can be computed iteratively by a proximal algorithm. Perform several total-variation with different regularization parameter λ. (a) Target image x 0 (b) Noisy image y (c) Final result: f (y) (d) Parameter λ C. Deledalle (CNRS/IMB) Risk estimation 25 mars, / 29
90 Our approach: Case of anisotropic total-variation Example (Anisotropic Total-Variation) Recall the solution of anisotropic total variation is given by f θ (y) = argmin x 1 2 x y 2 + λ x 1 where x 1 = k ( h x) k + ( vx) k and can be computed iteratively by a proximal algorithm. Perform several total-variation with different regularization parameter λ. (a) Target image x 0 (b) Noisy image y (c) Final result: f (y) (d) Smoothing strength C. Deledalle (CNRS/IMB) Risk estimation 25 mars, / 29
Poisson NL means: unsupervised non local means for Poisson noise
2010 IEEE International Conference on Image Processing Hong-Kong, September 26-29, 2010 Poisson NL means: unsupervised non local means for Poisson noise Charles Deledalle 1,FlorenceTupin 1, Loïc Denis
More informationThe degrees of freedom of the Lasso for general design matrix
The degrees of freedom of the Lasso for general design matrix C. Dossal (1) M. Kachour (2), M.J. Fadili (2), G. Peyré (3) and C. Chesneau (4) (1) IMB, CNRS-Univ. Bordeaux 1 351 Cours de la Libération,
More informationPerturbed Proximal Gradient Algorithm
Perturbed Proximal Gradient Algorithm Gersende FORT LTCI, CNRS, Telecom ParisTech Université Paris-Saclay, 75013, Paris, France Large-scale inverse problems and optimization Applications to image processing
More informationImage Noise: Detection, Measurement and Removal Techniques. Zhifei Zhang
Image Noise: Detection, Measurement and Removal Techniques Zhifei Zhang Outline Noise measurement Filter-based Block-based Wavelet-based Noise removal Spatial domain Transform domain Non-local methods
More informationPatch similarity under non Gaussian noise
The 18th IEEE International Conference on Image Processing Brussels, Belgium, September 11 14, 011 Patch similarity under non Gaussian noise Charles Deledalle 1, Florence Tupin 1, Loïc Denis 1 Institut
More informationThe degrees of freedom of the Group Lasso for a General Design
The degrees of freedom of the Group Lasso for a General Design Samuel Vaiter, Charles Deledalle, Gabriel Peyré, Jalal M. Fadili, Charles Dossal To cite this version: Samuel Vaiter, Charles Deledalle, Gabriel
More informationStochastic Proximal Gradient Algorithm
Stochastic Institut Mines-Télécom / Telecom ParisTech / Laboratoire Traitement et Communication de l Information Joint work with: Y. Atchade, Ann Arbor, USA, G. Fort LTCI/Télécom Paristech and the kind
More informationEE 367 / CS 448I Computational Imaging and Display Notes: Image Deconvolution (lecture 6)
EE 367 / CS 448I Computational Imaging and Display Notes: Image Deconvolution (lecture 6) Gordon Wetzstein gordon.wetzstein@stanford.edu This document serves as a supplement to the material discussed in
More informationPoisson Image Denoising Using Best Linear Prediction: A Post-processing Framework
Poisson Image Denoising Using Best Linear Prediction: A Post-processing Framework Milad Niknejad, Mário A.T. Figueiredo Instituto de Telecomunicações, and Instituto Superior Técnico, Universidade de Lisboa,
More informationStein Unbiased GrAdient estimator of the Risk (SUGAR) for multiple parameter selection
Stein Unbiased GrAdient estimator of the Risk (SUGAR) for multiple parameter selection Charles-Alban Deledalle, Samuel Vaiter, Jalal M. Fadili, Gabriel Peyré To cite this version: Charles-Alban Deledalle,
More informationExponential Weighted Aggregation vs Penalized Estimation: Guarantees And Algorithms
Exponential Weighted Aggregation vs Penalized Estimation: Guarantees And Algorithms Jalal Fadili Normandie Université-ENSICAEN, GREYC CNRS UMR 6072 Joint work with Tùng Luu and Christophe Chesneau SPARS
More informationA Riemannian Framework for Denoising Diffusion Tensor Images
A Riemannian Framework for Denoising Diffusion Tensor Images Manasi Datar No Institute Given Abstract. Diffusion Tensor Imaging (DTI) is a relatively new imaging modality that has been extensively used
More informationSignal Denoising with Wavelets
Signal Denoising with Wavelets Selin Aviyente Department of Electrical and Computer Engineering Michigan State University March 30, 2010 Introduction Assume an additive noise model: x[n] = f [n] + w[n]
More informationEcon 2148, fall 2017 Gaussian process priors, reproducing kernel Hilbert spaces, and Splines
Econ 2148, fall 2017 Gaussian process priors, reproducing kernel Hilbert spaces, and Splines Maximilian Kasy Department of Economics, Harvard University 1 / 37 Agenda 6 equivalent representations of the
More informationFast learning rates for plug-in classifiers under the margin condition
Fast learning rates for plug-in classifiers under the margin condition Jean-Yves Audibert 1 Alexandre B. Tsybakov 2 1 Certis ParisTech - Ecole des Ponts, France 2 LPMA Université Pierre et Marie Curie,
More informationMA 575 Linear Models: Cedric E. Ginestet, Boston University Midterm Review Week 7
MA 575 Linear Models: Cedric E. Ginestet, Boston University Midterm Review Week 7 1 Random Vectors Let a 0 and y be n 1 vectors, and let A be an n n matrix. Here, a 0 and A are non-random, whereas y is
More informationLinear Diffusion. E9 242 STIP- R. Venkatesh Babu IISc
Linear Diffusion Derivation of Heat equation Consider a 2D hot plate with Initial temperature profile I 0 (x, y) Uniform (isotropic) conduction coefficient c Unit thickness (along z) Problem: What is temperature
More informationCross-Validation with Confidence
Cross-Validation with Confidence Jing Lei Department of Statistics, Carnegie Mellon University UMN Statistics Seminar, Mar 30, 2017 Overview Parameter est. Model selection Point est. MLE, M-est.,... Cross-validation
More informationThis model of the conditional expectation is linear in the parameters. A more practical and relaxed attitude towards linear regression is to say that
Linear Regression For (X, Y ) a pair of random variables with values in R p R we assume that E(Y X) = β 0 + with β R p+1. p X j β j = (1, X T )β j=1 This model of the conditional expectation is linear
More informationA Majorize-Minimize subspace approach for l 2 -l 0 regularization with applications to image processing
A Majorize-Minimize subspace approach for l 2 -l 0 regularization with applications to image processing Emilie Chouzenoux emilie.chouzenoux@univ-mlv.fr Université Paris-Est Lab. d Informatique Gaspard
More informationInverse Problems meets Statistical Learning
Inverse Problems meets Statistical Learning Gabriel Peyré www.numerical-tours.com ÉCOLE NORMALE SUPÉRIEURE s=3 s=6 0.5 0.5 0 0 0.5 0.5 https://mathematical-coffees.github.io 1 10 20 30 40 50 Organized
More informationLecture 4: Types of errors. Bayesian regression models. Logistic regression
Lecture 4: Types of errors. Bayesian regression models. Logistic regression A Bayesian interpretation of regularization Bayesian vs maximum likelihood fitting more generally COMP-652 and ECSE-68, Lecture
More informationI P IANO : I NERTIAL P ROXIMAL A LGORITHM FOR N ON -C ONVEX O PTIMIZATION
I P IANO : I NERTIAL P ROXIMAL A LGORITHM FOR N ON -C ONVEX O PTIMIZATION Peter Ochs University of Freiburg Germany 17.01.2017 joint work with: Thomas Brox and Thomas Pock c 2017 Peter Ochs ipiano c 1
More informationAdaptive one-bit matrix completion
Adaptive one-bit matrix completion Joseph Salmon Télécom Paristech, Institut Mines-Télécom Joint work with Jean Lafond (Télécom Paristech) Olga Klopp (Crest / MODAL X, Université Paris Ouest) Éric Moulines
More informationdans les modèles à vraisemblance non explicite par des algorithmes gradient-proximaux perturbés
Inférence pénalisée dans les modèles à vraisemblance non explicite par des algorithmes gradient-proximaux perturbés Gersende Fort Institut de Mathématiques de Toulouse, CNRS and Univ. Paul Sabatier Toulouse,
More informationISyE 691 Data mining and analytics
ISyE 691 Data mining and analytics Regression Instructor: Prof. Kaibo Liu Department of Industrial and Systems Engineering UW-Madison Email: kliu8@wisc.edu Office: Room 3017 (Mechanical Engineering Building)
More informationGraph Signal Processing for Image Compression & Restoration (Part II)
ene Cheung, Xianming Liu National Institute of Informatics 11 th July, 016 raph Signal Processing for Image Compression & Restoration (Part II). ICME'16 utorial 07/11/016 1 Outline (Part II) Image Restoration
More informationInverse problem and optimization
Inverse problem and optimization Laurent Condat, Nelly Pustelnik CNRS, Gipsa-lab CNRS, Laboratoire de Physique de l ENS de Lyon Decembre, 15th 2016 Inverse problem and optimization 2/36 Plan 1. Examples
More informationMaster 2 MathBigData. 3 novembre CMAP - Ecole Polytechnique
Master 2 MathBigData S. Gaïffas 1 3 novembre 2014 1 CMAP - Ecole Polytechnique 1 Supervised learning recap Introduction Loss functions, linearity 2 Penalization Introduction Ridge Sparsity Lasso 3 Some
More informationBeyond stochastic gradient descent for large-scale machine learning
Beyond stochastic gradient descent for large-scale machine learning Francis Bach INRIA - Ecole Normale Supérieure, Paris, France Joint work with Eric Moulines - October 2014 Big data revolution? A new
More informationLeast Squares and Linear Systems
Least Squares and Linear Systems Gabriel Peyré www.numerical-tours.com ÉCOLE NORMALE SUPÉRIEURE s=3 s=6 0.5 0.5 0 0 0.5 0.5 https://mathematical-coffees.github.io 1 10 20 30 40 50 Organized by: Mérouane
More informationNL-SAR: a unified Non-Local framework for resolution-preserving (Pol)(In)SAR denoising
NL-SAR: a unified Non-Local framework for resolution-preserving (Pol)(In)SAR denoising Charles-Alban Deledalle, Loïc Denis, Florence Tupin, Andreas Reigber, Marc Jäger To cite this version: Charles-Alban
More informationMedical Image Analysis
Medical Image Analysis CS 593 / 791 Computer Science and Electrical Engineering Dept. West Virginia University 23rd January 2006 Outline 1 Recap 2 Edge Enhancement 3 Experimental Results 4 The rest of
More informationImage processing and nonparametric regression
Image processing and nonparametric regression Rencontres R BoRdeaux 2012 B. Thieurmel Collaborators : P.A. Cornillon, N. Hengartner, E. Matzner-Løber, B. Wolhberg 2 Juillet 2012 Rencontres R BoRdeaux 2012
More informationGeneralized SURE for optimal shrinkage of singular values in low-rank matrix denoising
Journal of Machine Learning Research 8 (207) -50 Submitted 5/6; Revised 0/7; Published /7 Generalized SURE for optimal shrinkage of singular values in low-rank matrix denoising Jérémie Bigot Institut de
More informationITK Filters. Thresholding Edge Detection Gradients Second Order Derivatives Neighborhood Filters Smoothing Filters Distance Map Image Transforms
ITK Filters Thresholding Edge Detection Gradients Second Order Derivatives Neighborhood Filters Smoothing Filters Distance Map Image Transforms ITCS 6010:Biomedical Imaging and Visualization 1 ITK Filters:
More informationSTAT 100C: Linear models
STAT 100C: Linear models Arash A. Amini June 9, 2018 1 / 56 Table of Contents Multiple linear regression Linear model setup Estimation of β Geometric interpretation Estimation of σ 2 Hat matrix Gram matrix
More informationLPA-ICI Applications in Image Processing
LPA-ICI Applications in Image Processing Denoising Deblurring Derivative estimation Edge detection Inverse halftoning Denoising Consider z (x) =y (x)+η (x), wherey is noise-free image and η is noise. assume
More informationBM3D-prGAMP: Compressive Phase Retrieval Based on BM3D Denoising
BM3D-prGAMP: Compressive Phase Retrieval Based on BM3D Denoising Chris Metzler, Richard Baraniuk Rice University Arian Maleki Columbia University Phase Retrieval Applications: Crystallography Microscopy
More informationA Dual Sparse Decomposition Method for Image Denoising
A Dual Sparse Decomposition Method for Image Denoising arxiv:1704.07063v1 [cs.cv] 24 Apr 2017 Hong Sun 1 School of Electronic Information Wuhan University 430072 Wuhan, China 2 Dept. Signal and Image Processing
More informationCross-Validation with Confidence
Cross-Validation with Confidence Jing Lei Department of Statistics, Carnegie Mellon University WHOA-PSI Workshop, St Louis, 2017 Quotes from Day 1 and Day 2 Good model or pure model? Occam s razor We really
More informationCovariance function estimation in Gaussian process regression
Covariance function estimation in Gaussian process regression François Bachoc Department of Statistics and Operations Research, University of Vienna WU Research Seminar - May 2015 François Bachoc Gaussian
More informationImage processing and Computer Vision
1 / 1 Image processing and Computer Vision Continuous Optimization and applications to image processing Martin de La Gorce martin.de-la-gorce@enpc.fr February 2015 Optimization 2 / 1 We have a function
More informationEmpirical Risk Minimization as Parameter Choice Rule for General Linear Regularization Methods
Empirical Risk Minimization as Parameter Choice Rule for General Linear Regularization Methods Frank Werner 1 Statistical Inverse Problems in Biophysics Group Max Planck Institute for Biophysical Chemistry,
More informationGeneralized greedy algorithms.
Generalized greedy algorithms. François-Xavier Dupé & Sandrine Anthoine LIF & I2M Aix-Marseille Université - CNRS - Ecole Centrale Marseille, Marseille ANR Greta Séminaire Parisien des Mathématiques Appliquées
More informationAdvanced Statistics I : Gaussian Linear Model (and beyond)
Advanced Statistics I : Gaussian Linear Model (and beyond) Aurélien Garivier CNRS / Telecom ParisTech Centrale Outline One and Two-Sample Statistics Linear Gaussian Model Model Reduction and model Selection
More informationStatistics: Learning models from data
DS-GA 1002 Lecture notes 5 October 19, 2015 Statistics: Learning models from data Learning models from data that are assumed to be generated probabilistically from a certain unknown distribution is a crucial
More information2 Regularized Image Reconstruction for Compressive Imaging and Beyond
EE 367 / CS 448I Computational Imaging and Display Notes: Compressive Imaging and Regularized Image Reconstruction (lecture ) Gordon Wetzstein gordon.wetzstein@stanford.edu This document serves as a supplement
More informationarxiv: v2 [math.st] 9 Feb 2017
Submitted to the Annals of Statistics PREDICTION ERROR AFTER MODEL SEARCH By Xiaoying Tian Harris, Department of Statistics, Stanford University arxiv:1610.06107v math.st 9 Feb 017 Estimation of the prediction
More informationECONOMETRIC METHODS II: TIME SERIES LECTURE NOTES ON THE KALMAN FILTER. The Kalman Filter. We will be concerned with state space systems of the form
ECONOMETRIC METHODS II: TIME SERIES LECTURE NOTES ON THE KALMAN FILTER KRISTOFFER P. NIMARK The Kalman Filter We will be concerned with state space systems of the form X t = A t X t 1 + C t u t 0.1 Z t
More informationAlgebra of Random Variables: Optimal Average and Optimal Scaling Minimising
Review: Optimal Average/Scaling is equivalent to Minimise χ Two 1-parameter models: Estimating < > : Scaling a pattern: Two equivalent methods: Algebra of Random Variables: Optimal Average and Optimal
More informationModel Selection with Partly Smooth Functions
Model Selection with Partly Smooth Functions Samuel Vaiter, Gabriel Peyré and Jalal Fadili vaiter@ceremade.dauphine.fr August 27, 2014 ITWIST 14 Model Consistency of Partly Smooth Regularizers, arxiv:1405.1004,
More informationNonlinear Diffusion. 1 Introduction: Motivation for non-standard diffusion
Nonlinear Diffusion These notes summarize the way I present this material, for my benefit. But everything in here is said in more detail, and better, in Weickert s paper. 1 Introduction: Motivation for
More informationAlgebra of Random Variables: Optimal Average and Optimal Scaling Minimising
Review: Optimal Average/Scaling is equivalent to Minimise χ Two 1-parameter models: Estimating < > : Scaling a pattern: Two equivalent methods: Algebra of Random Variables: Optimal Average and Optimal
More informationDegrees of Freedom in Regression Ensembles
Degrees of Freedom in Regression Ensembles Henry WJ Reeve Gavin Brown University of Manchester - School of Computer Science Kilburn Building, University of Manchester, Oxford Rd, Manchester M13 9PL Abstract.
More informationStochastic gradient descent and robustness to ill-conditioning
Stochastic gradient descent and robustness to ill-conditioning Francis Bach INRIA - Ecole Normale Supérieure, Paris, France ÉCOLE NORMALE SUPÉRIEURE Joint work with Aymeric Dieuleveut, Nicolas Flammarion,
More informationLecture 7: Edge Detection
#1 Lecture 7: Edge Detection Saad J Bedros sbedros@umn.edu Review From Last Lecture Definition of an Edge First Order Derivative Approximation as Edge Detector #2 This Lecture Examples of Edge Detection
More informationMA 575 Linear Models: Cedric E. Ginestet, Boston University Regularization: Ridge Regression and Lasso Week 14, Lecture 2
MA 575 Linear Models: Cedric E. Ginestet, Boston University Regularization: Ridge Regression and Lasso Week 14, Lecture 2 1 Ridge Regression Ridge regression and the Lasso are two forms of regularized
More informationStatistically-Based Regularization Parameter Estimation for Large Scale Problems
Statistically-Based Regularization Parameter Estimation for Large Scale Problems Rosemary Renaut Joint work with Jodi Mead and Iveta Hnetynkova March 1, 2010 National Science Foundation: Division of Computational
More informationErkut Erdem. Hacettepe University February 24 th, Linear Diffusion 1. 2 Appendix - The Calculus of Variations 5.
LINEAR DIFFUSION Erkut Erdem Hacettepe University February 24 th, 2012 CONTENTS 1 Linear Diffusion 1 2 Appendix - The Calculus of Variations 5 References 6 1 LINEAR DIFFUSION The linear diffusion (heat)
More informationInverse problems in statistics
Inverse problems in statistics Laurent Cavalier (Université Aix-Marseille 1, France) Yale, May 2 2011 p. 1/35 Introduction There exist many fields where inverse problems appear Astronomy (Hubble satellite).
More informationA NO-REFERENCE SHARPNESS METRIC SENSITIVE TO BLUR AND NOISE. Xiang Zhu and Peyman Milanfar
A NO-REFERENCE SARPNESS METRIC SENSITIVE TO BLUR AND NOISE Xiang Zhu and Peyman Milanfar Electrical Engineering Department University of California at Santa Cruz, CA, 9564 xzhu@soeucscedu ABSTRACT A no-reference
More informationVector spaces. DS-GA 1013 / MATH-GA 2824 Optimization-based Data Analysis.
Vector spaces DS-GA 1013 / MATH-GA 2824 Optimization-based Data Analysis http://www.cims.nyu.edu/~cfgranda/pages/obda_fall17/index.html Carlos Fernandez-Granda Vector space Consists of: A set V A scalar
More informationBits of Machine Learning Part 1: Supervised Learning
Bits of Machine Learning Part 1: Supervised Learning Alexandre Proutiere and Vahan Petrosyan KTH (The Royal Institute of Technology) Outline of the Course 1. Supervised Learning Regression and Classification
More informationDesign of Image Adaptive Wavelets for Denoising Applications
Design of Image Adaptive Wavelets for Denoising Applications Sanjeev Pragada and Jayanthi Sivaswamy Center for Visual Information Technology International Institute of Information Technology - Hyderabad,
More information13. Parameter Estimation. ECE 830, Spring 2014
13. Parameter Estimation ECE 830, Spring 2014 1 / 18 Primary Goal General problem statement: We observe X p(x θ), θ Θ and the goal is to determine the θ that produced X. Given a collection of observations
More informationA Generative Perspective on MRFs in Low-Level Vision Supplemental Material
A Generative Perspective on MRFs in Low-Level Vision Supplemental Material Uwe Schmidt Qi Gao Stefan Roth Department of Computer Science, TU Darmstadt 1. Derivations 1.1. Sampling the Prior We first rewrite
More informationDirect Learning: Linear Regression. Donglin Zeng, Department of Biostatistics, University of North Carolina
Direct Learning: Linear Regression Parametric learning We consider the core function in the prediction rule to be a parametric function. The most commonly used function is a linear function: squared loss:
More informationEvaluation. Andrea Passerini Machine Learning. Evaluation
Andrea Passerini passerini@disi.unitn.it Machine Learning Basic concepts requires to define performance measures to be optimized Performance of learning algorithms cannot be evaluated on entire domain
More informationGeneralized Concomitant Multi-Task Lasso for sparse multimodal regression
Generalized Concomitant Multi-Task Lasso for sparse multimodal regression Mathurin Massias https://mathurinm.github.io INRIA Saclay Joint work with: Olivier Fercoq (Télécom ParisTech) Alexandre Gramfort
More informationMath 273a: Optimization Overview of First-Order Optimization Algorithms
Math 273a: Optimization Overview of First-Order Optimization Algorithms Wotao Yin Department of Mathematics, UCLA online discussions on piazza.com 1 / 9 Typical flow of numerical optimization Optimization
More informationLeast squares problems
Least squares problems How to state and solve them, then evaluate their solutions Stéphane Mottelet Université de Technologie de Compiègne 30 septembre 2016 Stéphane Mottelet (UTC) Least squares 1 / 55
More informationECE 8201: Low-dimensional Signal Models for High-dimensional Data Analysis
ECE 8201: Low-dimensional Signal Models for High-dimensional Data Analysis Lecture 7: Matrix completion Yuejie Chi The Ohio State University Page 1 Reference Guaranteed Minimum-Rank Solutions of Linear
More information2 Statistical Estimation: Basic Concepts
Technion Israel Institute of Technology, Department of Electrical Engineering Estimation and Identification in Dynamical Systems (048825) Lecture Notes, Fall 2009, Prof. N. Shimkin 2 Statistical Estimation:
More informationLecture 7 Introduction to Statistical Decision Theory
Lecture 7 Introduction to Statistical Decision Theory I-Hsiang Wang Department of Electrical Engineering National Taiwan University ihwang@ntu.edu.tw December 20, 2016 1 / 55 I-Hsiang Wang IT Lecture 7
More informationReview. DS GA 1002 Statistical and Mathematical Models. Carlos Fernandez-Granda
Review DS GA 1002 Statistical and Mathematical Models http://www.cims.nyu.edu/~cfgranda/pages/dsga1002_fall16 Carlos Fernandez-Granda Probability and statistics Probability: Framework for dealing with
More informationEE 367 / CS 448I Computational Imaging and Display Notes: Noise, Denoising, and Image Reconstruction with Noise (lecture 10)
EE 367 / CS 448I Computational Imaging and Display Notes: Noise, Denoising, and Image Reconstruction with Noise (lecture 0) Gordon Wetzstein gordon.wetzstein@stanford.edu This document serves as a supplement
More informationA memory gradient algorithm for l 2 -l 0 regularization with applications to image restoration
A memory gradient algorithm for l 2 -l 0 regularization with applications to image restoration E. Chouzenoux, A. Jezierska, J.-C. Pesquet and H. Talbot Université Paris-Est Lab. d Informatique Gaspard
More informationBatch, Stochastic and Mirror Gradient Descents
Batch, Stochastic and Mirror Gradient Descents Gabriel Peyré www.numerical-tours.com ÉCOLE NORMALE SUPÉRIEURE s=3 s=6 0.5 0.5 0 0 0.5 0.5 https://mathematical-coffees.github.io 1 10 20 30 40 50 Organized
More informationRegression Shrinkage and Selection via the Lasso
Regression Shrinkage and Selection via the Lasso ROBERT TIBSHIRANI, 1996 Presenter: Guiyun Feng April 27 () 1 / 20 Motivation Estimation in Linear Models: y = β T x + ɛ. data (x i, y i ), i = 1, 2,...,
More informationFundamentals of Non-local Total Variation Spectral Theory
Fundamentals of Non-local Total Variation Spectral Theory Jean-François Aujol 1,2, Guy Gilboa 3, Nicolas Papadakis 1,2 1 Univ. Bordeaux, IMB, UMR 5251, F-33400 Talence, France 2 CNRS, IMB, UMR 5251, F-33400
More informationEstimating network degree distributions from sampled networks: An inverse problem
Estimating network degree distributions from sampled networks: An inverse problem Eric D. Kolaczyk Dept of Mathematics and Statistics, Boston University kolaczyk@bu.edu Introduction: Networks and Degree
More informationCovariance Matrix Simplification For Efficient Uncertainty Management
PASEO MaxEnt 2007 Covariance Matrix Simplification For Efficient Uncertainty Management André Jalobeanu, Jorge A. Gutiérrez PASEO Research Group LSIIT (CNRS/ Univ. Strasbourg) - Illkirch, France *part
More informationInverse of a Square Matrix. For an N N square matrix A, the inverse of A, 1
Inverse of a Square Matrix For an N N square matrix A, the inverse of A, 1 A, exists if and only if A is of full rank, i.e., if and only if no column of A is a linear combination 1 of the others. A is
More informationStatistical Measures of Uncertainty in Inverse Problems
Statistical Measures of Uncertainty in Inverse Problems Workshop on Uncertainty in Inverse Problems Institute for Mathematics and Its Applications Minneapolis, MN 19-26 April 2002 P.B. Stark Department
More informationPhysics-based Prior modeling in Inverse Problems
Physics-based Prior modeling in Inverse Problems MURI Meeting 2013 M Usman Sadiq, Purdue University Charles A. Bouman, Purdue University In collaboration with: Jeff Simmons, AFRL Venkat Venkatakrishnan,
More informationLecture 8: Information Theory and Statistics
Lecture 8: Information Theory and Statistics Part II: Hypothesis Testing and I-Hsiang Wang Department of Electrical Engineering National Taiwan University ihwang@ntu.edu.tw December 23, 2015 1 / 50 I-Hsiang
More informationFinal Review. Yang Feng. Yang Feng (Columbia University) Final Review 1 / 58
Final Review Yang Feng http://www.stat.columbia.edu/~yangfeng Yang Feng (Columbia University) Final Review 1 / 58 Outline 1 Multiple Linear Regression (Estimation, Inference) 2 Special Topics for Multiple
More informationSparse Nonparametric Density Estimation in High Dimensions Using the Rodeo
Outline in High Dimensions Using the Rodeo Han Liu 1,2 John Lafferty 2,3 Larry Wasserman 1,2 1 Statistics Department, 2 Machine Learning Department, 3 Computer Science Department, Carnegie Mellon University
More informationCOMP 551 Applied Machine Learning Lecture 20: Gaussian processes
COMP 55 Applied Machine Learning Lecture 2: Gaussian processes Instructor: Ryan Lowe (ryan.lowe@cs.mcgill.ca) Slides mostly by: (herke.vanhoof@mcgill.ca) Class web page: www.cs.mcgill.ca/~hvanho2/comp55
More informationPILCO: A Model-Based and Data-Efficient Approach to Policy Search
PILCO: A Model-Based and Data-Efficient Approach to Policy Search (M.P. Deisenroth and C.E. Rasmussen) CSC2541 November 4, 2016 PILCO Graphical Model PILCO Probabilistic Inference for Learning COntrol
More information1-bit Matrix Completion. PAC-Bayes and Variational Approximation
: PAC-Bayes and Variational Approximation (with P. Alquier) PhD Supervisor: N. Chopin Bayes In Paris, 5 January 2017 (Happy New Year!) Various Topics covered Matrix Completion PAC-Bayesian Estimation Variational
More informationA Modern Look at Classical Multivariate Techniques
A Modern Look at Classical Multivariate Techniques Yoonkyung Lee Department of Statistics The Ohio State University March 16-20, 2015 The 13th School of Probability and Statistics CIMAT, Guanajuato, Mexico
More informationarxiv: v3 [stat.me] 12 Jul 2015
Derivative-Free Estimation of the Score Vector and Observed Information Matrix with Application to State-Space Models Arnaud Doucet 1, Pierre E. Jacob and Sylvain Rubenthaler 3 1 Department of Statistics,
More informationNONLINEAR DIFFUSION PDES
NONLINEAR DIFFUSION PDES Erkut Erdem Hacettepe University March 5 th, 0 CONTENTS Perona-Malik Type Nonlinear Diffusion Edge Enhancing Diffusion 5 References 7 PERONA-MALIK TYPE NONLINEAR DIFFUSION The
More informationSemi-Parametric Importance Sampling for Rare-event probability Estimation
Semi-Parametric Importance Sampling for Rare-event probability Estimation Z. I. Botev and P. L Ecuyer IMACS Seminar 2011 Borovets, Bulgaria Semi-Parametric Importance Sampling for Rare-event probability
More informationBeyond stochastic gradient descent for large-scale machine learning
Beyond stochastic gradient descent for large-scale machine learning Francis Bach INRIA - Ecole Normale Supérieure, Paris, France Joint work with Eric Moulines, Nicolas Le Roux and Mark Schmidt - CAP, July
More informationMath 423/533: The Main Theoretical Topics
Math 423/533: The Main Theoretical Topics Notation sample size n, data index i number of predictors, p (p = 2 for simple linear regression) y i : response for individual i x i = (x i1,..., x ip ) (1 p)
More informationLecture 24 May 30, 2018
Stats 3C: Theory of Statistics Spring 28 Lecture 24 May 3, 28 Prof. Emmanuel Candes Scribe: Martin J. Zhang, Jun Yan, Can Wang, and E. Candes Outline Agenda: High-dimensional Statistical Estimation. Lasso
More informationSparse Regularization via Convex Analysis
Sparse Regularization via Convex Analysis Ivan Selesnick Electrical and Computer Engineering Tandon School of Engineering New York University Brooklyn, New York, USA 29 / 66 Convex or non-convex: Which
More information