Sélection adaptative des paramètres pour le débruitage des images

Size: px
Start display at page:

Download "Sélection adaptative des paramètres pour le débruitage des images"

Transcription

1 Journées SIERRA 2014, Saint-Etienne, France, 25 mars, 2014 Sélection adaptative des paramètres pour le débruitage des images Adaptive selection of parameters for image denoising Charles Deledalle 1 Joint work with: Loïc Denis 2, Charles Dossal 1, Vincent Duval 3, Jalal Fadili 4, Gabriel Peyré 3, Joseph Salmon 5, Florence Tupin 5 and Samuel Vaiter 3 1 Institut de Mathématiques de Bordeaux, CNRS-Université Bordeaux 1, France 2 Laboratoire Hubert-Curien, Univ. Jean Monnet, Univ. Lyon, France 3 CEREMADE, CNRS-Paris Dauphine, France 4 GREYC, CNRS-ENSICAEN, France 5 Institut Mines-Télécom, Télécom-ParisTech, CNRS LTCI, France 25 mars, 2014 C. Deledalle (CNRS/IMB) Risk estimation 25 mars, / 29

2 Motivations: model/parameter selection Model Input image Smooth Piecewise constant Goal: to pick up the suitable model/parameters for a given image C. Deledalle (CNRS/IMB) Risk estimation 25 mars, / 29

3 Motivations: risk based selection Risk definition Evaluate a cost for each parameter of a given filter for a given image, Usually task specific, Usually defined wrt the unknown image that we attempt to recover. Goal: to find the least risky model / set of parameters θ C. Deledalle (CNRS/IMB) Risk estimation 25 mars, / 29

4 Motivations: square error risk Square error risk Define the risk as the square error: R(θ) = E Y } {{ } f θ (Y ) } {{ } x 0 2 where } {{ } Y = } {{ } x 0 + } {{ } W where f θ : R N R N is a denoiser of parameter θ, x 0 R N and W N (0, σ 2 Id). Bias-variance decomposition E Y f θ (Y ) x 0 2 = E Y [f θ (Y )] x E Y f θ (Y ) E Y [f θ (Y )] }{{}}{{} Bias 2 Variance Fidelity-complexity decomposition [Mallows, 1973, Efron, 1986] Y E Y f θ (Y ) x 0 2 = E Y f θ (Y ) Y 2 Nσ 2 +2σ 2 x0 E Y, f θ(y ) x 0 }{{} σ σ Fidelity }{{} Degree of freedom C. Deledalle (CNRS/IMB) Risk estimation 25 mars, / 29

5 Motivations: risk interpretation R(θ) E Y f θ (Y ) x 0 2 = E Y [f θ (Y )] x E Y f θ (Y ) E Y [f θ (Y )] }{{}}{{} Bias 2 Variance Y = E Y f θ (Y ) Y 2 Nσ 2 +2σ 2 x0 E Y, f θ(y ) x 0 }{{} σ σ Fidelity }{{} Degree of freedom Example name f θ (y) R(θ) Bias 2 Variance Fidelity Complexity identity y Nσ 2 0 Nσ 2 Nσ 2 2Nσ 2 oracle x null 0 x 0 2 x x For an orthogonal projector, the degree of freedom is the dimension of the target space Risk Variance Bias 2 Fidelity Complexity Cost Parameter θ C. Deledalle (CNRS/IMB) Risk estimation 25 mars, / 29

6 Motivations: limits of the risk based selection Model Input image Smooth Piecewise constant What if none of the models/sets of parameters are suitable for the whole image? Selection should be performed locally A map of risk can be defined similarly for each pixel, as well as: a map of bias a map of variance a map of fidelity a map of complexity C. Deledalle (CNRS/IMB) Risk estimation 25 mars, / 29

7 Motivations: local bias-variance trade-off Input and estimates Variance Bias 2 C. Deledalle (CNRS/IMB) Risk estimation 25 mars, / 29

8 Goal: risk estimation R(θ) E Y f θ (Y ) x 0 2 = E Y [f θ (Y )] x E Y f θ (Y ) E Y [f θ (Y )] }{{}}{{} Bias 2 Variance Y = E Y f θ (Y ) Y 2 Nσ 2 +2σ 2 x0 E Y, f θ(y ) x 0 }{{} σ σ Fidelity }{{} Degree of freedom Challenge All quantities are defined as an expectation over Y : but we only know a single realization y of Y. Some quantities depends on the image x 0: but this is the image that we attempt to recover, hence unknown. Goal: 1 choose several models/sets of parameters: f θ1, f θ2,... 2 estimate the risk for each of them: R(θ 1 ), R(θ 2 ),... (without knowing x 0 and from the single observed realization y) 3 select globally or locally the least risky set of parameters C. Deledalle (CNRS/IMB) Risk estimation 25 mars, / 29

9 Outline 1 Global selection based on Stein s Unbiased Risk estimator Stein s Unbiased Risk Estimator (SURE) Simple cases: non-local means, soft-thresholding Non differentiable cases: hard-thresholding Monte-Carlo SURE Case of total-variation 2 Local selection with risk estimation and bias reduction Risk estimation: SURE filtering Case of anisotropic non-local means Bias reduction-variance estimation approach Case of anisotropic Gaussian convolutions Case of non-local means Case of anisotropic total-variation 3 Conclusion C. Deledalle (CNRS/IMB) Risk estimation 25 mars, / 29

10 Outline 1 Global selection based on Stein s Unbiased Risk estimator Stein s Unbiased Risk Estimator (SURE) Simple cases: non-local means, soft-thresholding Non differentiable cases: hard-thresholding Monte-Carlo SURE Case of total-variation 2 Local selection with risk estimation and bias reduction Risk estimation: SURE filtering Case of anisotropic non-local means Bias reduction-variance estimation approach Case of anisotropic Gaussian convolutions Case of non-local means Case of anisotropic total-variation 3 Conclusion C. Deledalle (CNRS/IMB) Risk estimation 25 mars, / 29

11 Stein s Unbiased Risk Estimator (SURE) Motivation Y R(θ) = E Y f θ (Y ) Y 2 Nσ 2 +2σ 2 x0 E Y, f θ(y ) x 0 }{{} σ σ Fidelity }{{} Degree of freedom Can we express the degree of freedom wrt Y only independently of x 0? Lemma (Stein s Lemma [Stein, 1981]) Assume y f θ (y) is continuous and almost everywhere differentiable (+ some mild conditions), then Y x0 E Y, f [ θ(y ) x 0 = E Y tr f ] θ(y) [ ] σ σ y = E Y div y f θ (y) y=y. y=y Proof. (For N = 1). + E Y [(Y x 0)(f θ (y) x 0)] = (f θ (y) x 0) (y x 0)G σ(y x 0) dy }{{} DOG ] + + (IBP ) = σ [(f 2 θ (y) x 0)G σ(y x 0) +σ 2 f θ (y) G σ(y x 0)dy y }{{}}{{} [ =0 ] f =E θ (y) Y y y=y C. Deledalle (CNRS/IMB) Risk estimation 25 mars, / 29

12 Stein s Unbiased Risk Estimator (SURE) Motivation Y R(θ) = E Y f θ (Y ) Y 2 Nσ 2 +2σ 2 x0 E Y, f θ(y ) x 0 }{{} σ σ Fidelity }{{} Degree of freedom Can we express the degree of freedom wrt Y only independently of x 0? Lemma (Stein s Lemma [Stein, 1981]) Assume y f θ (y) is continuous and almost everywhere differentiable (+ some mild conditions), then Y x0 E Y, f [ θ(y ) x 0 = E Y tr f ] θ(y) [ ] σ σ y = E Y div y f θ (y) y=y. y=y Theorem (Stein s unbiased risk estimator (SURE)) Assume y f θ (y) is as in Stein s lemma, then SURE(y, θ) = f θ (y) y 2 Nσ 2 +2σ 2 tr f θ(y) }{{} y y Sample fidelity }{{} Sample DoF is an unbiased risk estimator: E Y [SURE(Y, θ)] = R(θ). Remark: Law of large numbers SURE(y, θ) R(θ) C. Deledalle (CNRS/IMB) Risk estimation 25 mars, / 29

13 SURE Simple case: non-local means Example (Averaging filters [Van De Ville and Kocher, 2009, Duval et al., 2011]) j Consider an averaging filter of the form: f h (y) i = wi,j(y)yj j wi,j(y) where w i,j : R N R + is a weight function, for instance: Isotropic convolution w i,j(y) = ϕ h ( i j ) Sigma filter w i,j(y) = ϕ h ( y i y j ) [Lee, 1983, Yaroslavsky, 1985] Non-local means w i,j(y) = ϕ h ( P iy P jy ) [Buades et al., 2005] C. Deledalle (CNRS/IMB) Risk estimation 25 mars, / 29

14 SURE Simple case: non-local means Example (Averaging filters [Van De Ville and Kocher, 2009, Duval et al., 2011]) j Consider an averaging filter of the form: f h (y) i = wi,j(y)yj j wi,j(y) where w i,j : R N R + is a weight function, for instance: Isotropic convolution w i,j(y) = ϕ h ( i j ) Sigma filter w i,j(y) = ϕ h ( y i y j ) [Lee, 1983, Yaroslavsky, 1985] Non-local means w i,j(y) = ϕ h ( P iy P jy ) [Buades et al., 2005] An estimator of its DOF is then given by: tr f h (y) y = N f h (y) i y i=1 y y i where f h(y) i 1 y = i y i,j wi,j(y) w i,i(y) + w i,j(y) y j w i,j(y) y f h (y) i j i y y j i y Risk SURE Quadratic cost (a) Noisy image y (b) f h (y) with small h Bandwidth h C. Deledalle (CNRS/IMB) Risk estimation 25 mars, / 29

15 SURE Simple case: non-local means Example (Averaging filters [Van De Ville and Kocher, 2009, Duval et al., 2011]) j Consider an averaging filter of the form: f h (y) i = wi,j(y)yj j wi,j(y) where w i,j : R N R + is a weight function, for instance: Isotropic convolution w i,j(y) = ϕ h ( i j ) Sigma filter w i,j(y) = ϕ h ( y i y j ) [Lee, 1983, Yaroslavsky, 1985] Non-local means w i,j(y) = ϕ h ( P iy P jy ) [Buades et al., 2005] An estimator of its DOF is then given by: tr f h (y) y = N f h (y) i y i=1 y y i where f h(y) i 1 y = i y i,j wi,j(y) w i,i(y) + w i,j(y) y j w i,j(y) y f h (y) i j i y y j i y Risk SURE Quadratic cost (a) Noisy image y (c) f h (y) with large h Bandwidth h C. Deledalle (CNRS/IMB) Risk estimation 25 mars, / 29

16 SURE Simple case: non-local means Example (Averaging filters [Van De Ville and Kocher, 2009, Duval et al., 2011]) j Consider an averaging filter of the form: f h (y) i = wi,j(y)yj j wi,j(y) where w i,j : R N R + is a weight function, for instance: Isotropic convolution w i,j(y) = ϕ h ( i j ) Sigma filter w i,j(y) = ϕ h ( y i y j ) [Lee, 1983, Yaroslavsky, 1985] Non-local means w i,j(y) = ϕ h ( P iy P jy ) [Buades et al., 2005] An estimator of its DOF is then given by: tr f h (y) y = N f h (y) i y i=1 y y i where f h(y) i 1 y = i y i,j wi,j(y) w i,i(y) + w i,j(y) y j w i,j(y) y f h (y) i j i y y j i y Risk SURE Quadratic cost (a) Noisy image y (d) f h (y) with optimal h Bandwidth h C. Deledalle (CNRS/IMB) Risk estimation 25 mars, / 29

17 SURE Simple case: soft-thresholding Example (Soft-Thresholding (ST) [Donoho and Johnstone, 1995, Zou et al., 2007]) y i + λ if y i λ The ST function is defined as: ST(y, λ) i = 0 if λ < y i < λ. y i λ otherwise [ ] { ST(y,λ) 1 if i = j For almost all y, its Jacobian is: y = i,j 0 otherwise and yi > λ. # { Y > λ} is an unbiased estimator of the degree of freedom Value x i Interval ± σ Vector x 0 Threshold λ 0 40 Risk SURE Position index i Threshold λ C. Deledalle (CNRS/IMB) Risk estimation 25 mars, / 29 Quadratic cost

18 SURE Non differentiable case: hard-thresholding Example (Hard-Thresholding (HT)) The HT function is defined as: y i if y i λ HT(y, λ) i = 0 if λ < y i < λ y i otherwise. The HT is not continuous Stein s Lemma does not apply and we cannot estimate R(θ). Can we build a biased estimator? If yes, how the bias evolves with the data dimension? C. Deledalle (CNRS/IMB) Risk estimation 25 mars, / 29

19 SURE Non differentiable case: hard-thresholding Example (Hard-Thresholding (HT)) The HT function is defined as: y i if y i λ HT(y, λ) i = 0 if λ < y i < λ y i otherwise. The HT is not continuous Stein s Lemma does not apply and we cannot estimate R(θ). Can we build a biased estimator? If yes, how the bias evolves with the data dimension? Theorem (Stein Consistent Risk Estimator (SCORE)) Take h N such that lim h N = 0 and lim N 1 h 1 N N N = 0. Then the quantity # { Y > λ} + λ σ 2 + h 2 N 2πσhN N i=1 [ ( ) ( )] exp (Y i +λ)2 2h 2 + exp (Y i λ)2 2h N 2 N is a consistent estimator of the degree of freedom (i.e. convergence in probability). [Deledalle et al., 2013] Deledalle, C., Peyré, G., Fadili, J. (2013). Stein COnsistent Risk Estimator (SCORE) for hard thresholding,. Signal Processing with Adaptive Sparse Structured Representations (SPARS). Lausanne, July 8-11, 2013 C. Deledalle (CNRS/IMB) Risk estimation 25 mars, / 29

20 SURE Non differentiable case: hard-thresholding Example (Hard-Thresholding (HT)) The HT function is defined as: y i if y i λ HT(y, λ) i = 0 if λ < y i < λ y i otherwise. Value x i The HT is not continuous Stein s Lemma does not apply and we cannot estimate R(θ) Can we build a biased estimator? If yes, how the bias evolves with the data dimension? Interval ± σ Vector x 0 Threshold λ Position index i [Deledalle et al., 2013] Deledalle, C., Peyré, G., Fadili, J. (2013). Stein COnsistent Risk Estimator (SCORE) for hard thresholding,. Signal Processing with Adaptive Sparse Structured Representations (SPARS). Lausanne, July 8-11, 2013 Quadratic cost Risk SCORE Jansen s estimator [1] Threshold λ C. Deledalle (CNRS/IMB) Risk estimation 25 mars, / 29

21 SURE Beyond simple cases Example (Favorable examples) For the ST and the HT (separable functions) Jacobian matrices are diagonals Simple derivation of diagonal elements in closed form For some functions (e.g., the non-local means) Jacobian matrices are not diagonals Diagonal elements (and trace) can be computed in closed form Example (Unfavorable examples) In general (e.g., iterative proximal algorithms used in convex regularization such as total-variation) Jacobian is not diagonal Computing its trace requires computing the P P entries of the Jacobian matrix (not thinkable) Use trace estimators C. Deledalle (CNRS/IMB) Risk estimation 25 mars, / 29

22 SURE Monte-Carlo SURE Theorem (Monte-Carlo trace estimator) Let, N (0, Id N ) and A R N N, then tr A = E A. Corollary 1 For N big enough, generate δ R P and approach the sample DOF by [Vonesch et al., 2008] Y x0, f θ(y ) x 0 = tr f θ(y) f θ (y σ σ y δ, f θ (y) y=y y δ where y y δ R P, y 2 or by its finite difference approximation (known as Monte-Carlo SURE [Ramani et al., 2008]) Y x0, f θ(y ) x 0 f θ (y + ε δ ) f θ (y) δ, σ σ ε 1. Requires evaluating only N quantities (compared to N N). X. Not easy to compute in closed-form. 2. Requires only evaluating f θ ( ) twice on y and y + ε δ. X. Choice of ε. C. Deledalle (CNRS/IMB) Risk estimation 25 mars, / 29

23 SURE Case of total-variation Example (Total-Variation [Rudin et al., 1992]) The solution of total variation is given by 1 f θ (y) = argmin x y 2 + λ x x } 2 {{}}{{ 1 } G(x,λ) F (x,y) Reformulation using z = (x, u) in terms of simple function: 1 (x(y), u(y)) = argmin x y 2 + λ u z } ι C (z) {{}}{{}}{{} G F (z,y) 1 (z) G 2 (z) where where x 1 = k ( x) k C = {z = (x, u) \ u = x} C. Deledalle (CNRS/IMB) Risk estimation 25 mars, / 29

24 SURE Case of total-variation Example (Total-Variation [Rudin et al., 1992]) The solution of total variation is given by 1 f θ (y) = argmin x y 2 + λ x x } 2 {{}}{{ 1 } G(x,λ) F (x,y) Reformulation using z = (x, u) in terms of simple function: 1 (x(y), u(y)) = argmin x y 2 + λ u z } ι C (z) {{}}{{}}{{} G F (z,y) 1 (z) G 2 (z) where where x 1 = k ( x) k C = {z = (x, u) \ u = x} Generalized Forward Backward Scheme and Derivatives [Raguet et al., 2011] The following sequence converges to f θ (y) x (l+1) = 1 Q z (l+1) i Q k=1 z (l+1) i Z (l) i = z (l) i x (l) + Prox nγgi (Z (l) i, θ) = 2x (l) z (l) i γ 1F (x (l), y) C. Deledalle (CNRS/IMB) Risk estimation 25 mars, / 29

25 SURE Case of total-variation Example (Total-Variation [Rudin et al., 1992]) The solution of total variation is given by 1 f θ (y) = argmin x y 2 + λ x x } 2 {{}}{{ 1 } G(x,λ) F (x,y) Reformulation using z = (x, u) in terms of simple function: 1 (x(y), u(y)) = argmin x y 2 + λ u z } ι C (z) {{}}{{}}{{} G F (z,y) 1 (z) G 2 (z) where where x 1 = k ( x) k C = {z = (x, u) \ u = x} Generalized Forward Backward Scheme and Derivatives [Raguet et al., 2011] The following sequence converges to f θ (y) x (l+1) = 1 Q z (l+1) i Q k=1 z (l+1) i Z (l) i = z (l) i x (l) + Prox nγgi (Z (l) i, θ) = 2x (l) z (l) i γ 1F (x (l), y) Computation of SURE associated to x (l) (y) depends on D (l) x = x(l) y [ δ ] y C. Deledalle (CNRS/IMB) Risk estimation 25 mars, / 29

26 SURE Case of total-variation Example (Total-Variation [Rudin et al., 1992]) The solution of total variation is given by 1 f θ (y) = argmin x y 2 + λ x x } 2 {{}}{{ 1 } G(x,λ) F (x,y) Reformulation using z = (x, u) in terms of simple function: 1 (x(y), u(y)) = argmin x y 2 + λ u z } ι C (z) {{}}{{}}{{} G F (z,y) 1 (z) G 2 (z) where where x 1 = k ( x) k C = {z = (x, u) \ u = x} Generalized Forward Backward Scheme and Derivatives [Raguet et al., 2011] The following sequence converges to f θ (y) x (l+1) = 1 Q z (l+1) i Q k=1 where z (l+1) i Z (l) i = z (l) i x (l) + Prox nγgi (Z (l) i, θ) = 2x (l) z (l) i γ 1F (x (l), y) Apply the chain rule D (l+1) x = 1 Q D (l+1) z Q i i=1 D (l+1) z i D (l) Z i = D (l) z i = 2D (l) x D (l) x + G(l) i,x (D(l) Z ) i D(l) z i γ(f (l) x F (l) x (.) = 1 1F (x(l), y)[.] and F (l) y (.) = 2 1F (x(l), y)[.], G (l) i,x (.) = 1 Prox nγg i (Z (l) i )[.] Computation of SURE associated to x (l) (y) depends on D (l) x = x(l) y [ δ ] y (D(l) x ) + F (l) y ( δ )) C. Deledalle (CNRS/IMB) Risk estimation 25 mars, / 29

27 SURE Case of total-variation Example (Total-Variation [Rudin et al., 1992]) The solution of total variation is given by 1 f θ (y) = argmin x y 2 + λ x x } 2 {{}}{{ 1 } G(x,λ) F (x,y) Reformulation using z = (x, u) in terms of simple function: 1 (x(y), u(y)) = argmin x y 2 + λ u z } ι C (z) {{}}{{}}{{} G F (z,y) 1 (z) G 2 (z) Generalized Forward Backward Scheme and Derivatives The gradient and proximal operators are Their derivatives are 1F (z, y) = (x y, 0) Prox τg1 (z) = (x, ST(u, λτ)) where where x 1 = k ( x) k C = {z = (x, u) \ u = x} Prox τg2 (z) = ((Id + ) 1 (u + div x), (Id + ) 1 (u + div x)) 1 1F (z, y)[ δ x, δ u] = ( δ x, 0) 2 1F (z, y)[ δ y] = ( δ y, 0) 1 Prox τg1 (z)[ δ x, δ u] = ( δ x, 1ST(u, λτ)[ δ u]) 1 Prox τg2 (z)[ δ x, δ u] = ((Id + ) 1 ( δ u + div δ x), (Id + ) 1 ( δ u + div δ x)) C. Deledalle (CNRS/IMB) Risk estimation 25 mars, / 29

28 SURE Case of total-variation Example (Total-Variation [Rudin et al., 1992]) The solution of total variation is given by 1 f θ (y) = argmin x y 2 + λ x x } 2 {{}}{{ 1 } G(x,λ) F (x,y) Reformulation using z = (x, u) in terms of simple function: 1 (x(y), u(y)) = argmin x y 2 + λ u z } ι C (z) {{}}{{}}{{} G F (z,y) 1 (z) G 2 (z) where where x 1 = k ( x) k C = {z = (x, u) \ u = x} Risk SURE Quadratic cost (a) Noisy image y (b) f h (y) with small h Parameter λ [Deledalle et al., 2012] Deledalle, C.-A. and Vaiter, S. and Peyré, G. and Fadili, J. and Dossal, C. Proximal Splitting Derivatives for Risk Estimation,. Int. Workshop on New Computational Methods for Inverse Problems (NCMIP), Cachan, France, May C. Deledalle (CNRS/IMB) Risk estimation 25 mars, / 29

29 SURE Case of total-variation Example (Total-Variation [Rudin et al., 1992]) The solution of total variation is given by 1 f θ (y) = argmin x y 2 + λ x x } 2 {{}}{{ 1 } G(x,λ) F (x,y) Reformulation using z = (x, u) in terms of simple function: 1 (x(y), u(y)) = argmin x y 2 + λ u z } ι C (z) {{}}{{}}{{} G F (z,y) 1 (z) G 2 (z) where where x 1 = k ( x) k C = {z = (x, u) \ u = x} Risk SURE Quadratic cost (a) Noisy image y (c) f h (y) with large h Parameter λ [Deledalle et al., 2012] Deledalle, C.-A. and Vaiter, S. and Peyré, G. and Fadili, J. and Dossal, C. Proximal Splitting Derivatives for Risk Estimation,. Int. Workshop on New Computational Methods for Inverse Problems (NCMIP), Cachan, France, May C. Deledalle (CNRS/IMB) Risk estimation 25 mars, / 29

30 SURE Case of total-variation Example (Total-Variation [Rudin et al., 1992]) The solution of total variation is given by 1 f θ (y) = argmin x y 2 + λ x x } 2 {{}}{{ 1 } G(x,λ) F (x,y) Reformulation using z = (x, u) in terms of simple function: 1 (x(y), u(y)) = argmin x y 2 + λ u z } ι C (z) {{}}{{}}{{} G F (z,y) 1 (z) G 2 (z) where where x 1 = k ( x) k C = {z = (x, u) \ u = x} Risk SURE Quadratic cost (a) Noisy image y (d) f h (y) with optimal h Parameter λ [Deledalle et al., 2012] Deledalle, C.-A. and Vaiter, S. and Peyré, G. and Fadili, J. and Dossal, C. Proximal Splitting Derivatives for Risk Estimation,. Int. Workshop on New Computational Methods for Inverse Problems (NCMIP), Cachan, France, May C. Deledalle (CNRS/IMB) Risk estimation 25 mars, / 29

31 Outline 1 Global selection based on Stein s Unbiased Risk estimator Stein s Unbiased Risk Estimator (SURE) Simple cases: non-local means, soft-thresholding Non differentiable cases: hard-thresholding Monte-Carlo SURE Case of total-variation 2 Local selection with risk estimation and bias reduction Risk estimation: SURE filtering Case of anisotropic non-local means Bias reduction-variance estimation approach Case of anisotropic Gaussian convolutions Case of non-local means Case of anisotropic total-variation 3 Conclusion C. Deledalle (CNRS/IMB) Risk estimation 25 mars, / 29

32 Risk estimation: SURE filtering Motivation Do the same thing but for each pixel i independently. Assume y f θ (y) is as in Stein s lemma, then SURE(y, θ) i = (f θ (y) i y i) 2 σ 2 +2σ 2 f θ (y) i }{{} y i y Sample fidelity }{{} Sample DoF We have E Y [SURE(Y, θ) i] = R(θ) i = E Y (f θ (y) i x 0i ) 2. C. Deledalle (CNRS/IMB) Risk estimation 25 mars, / 29

33 Risk estimation: SURE filtering Motivation Do the same thing but for each pixel i independently. Assume y f θ (y) is as in Stein s lemma, then SURE(y, θ) i = (f θ (y) i y i) 2 σ 2 +2σ 2 f θ (y) i }{{} y i y Sample fidelity }{{} Sample DoF We have E Y [SURE(Y, θ) i] = R(θ) i = E Y (f θ (y) i x 0i ) 2. But the law of large numbers does not apply anymore! + (a) Risk (b) SURE (c) Sample fidelity (d) Sample DoF First idea: can we regularize the SURE map? C. Deledalle (CNRS/IMB) Risk estimation 25 mars, / 29

34 Risk estimation: SURE filtering SURE filtering The local sample DoF has lower variance than the local sample fidelity. Perform filtering of the SURE map guided by the sample DoF. + (a) Risk (b) SURE (c) Sample fidelity (d) Sample DoF + (a) Filtered SURE (b) Filtered fidelity (c) Filtered DoF C. Deledalle (CNRS/IMB) Risk estimation 25 mars, / 29

35 Risk estimation: SURE filtering Example (Non-local means with oriented patches) Orientation of patches should be spatially adapted to the image content C. Deledalle (CNRS/IMB) Risk estimation 25 mars, / 29

36 Risk estimation: SURE filtering Example (Non-local means with oriented patches) Estimate the map of SURE associated to all estimators Figure: Noisy SURE [Deledalle et al., 2011b] Deledalle, C., Duval, V., and Salmon, J. (2011b). Non-local methods with shape-adaptive patches (NLM-SAP). Journal of Mathematical Imaging and Vision. C. Deledalle (CNRS/IMB) Risk estimation 25 mars, / 29

37 Risk estimation: SURE filtering Example (Non-local means with oriented patches) Estimate the map of SURE associated to all estimators Figure: Noisy SURE [Deledalle et al., 2011b] Deledalle, C., Duval, V., and Salmon, J. (2011b). Non-local methods with shape-adaptive patches (NLM-SAP). Journal of Mathematical Imaging and Vision. C. Deledalle (CNRS/IMB) Risk estimation 25 mars, / 29

38 Risk estimation: SURE filtering Example (Non-local means with oriented patches) Estimate the map of SURE associated to all estimators Figure: Noisy SURE [Deledalle et al., 2011b] Deledalle, C., Duval, V., and Salmon, J. (2011b). Non-local methods with shape-adaptive patches (NLM-SAP). Journal of Mathematical Imaging and Vision. C. Deledalle (CNRS/IMB) Risk estimation 25 mars, / 29

39 Risk estimation: SURE filtering Example (Non-local means with oriented patches) Estimate the map of SURE associated to all estimators Figure: Noisy SURE [Deledalle et al., 2011b] Deledalle, C., Duval, V., and Salmon, J. (2011b). Non-local methods with shape-adaptive patches (NLM-SAP). Journal of Mathematical Imaging and Vision. C. Deledalle (CNRS/IMB) Risk estimation 25 mars, / 29

40 Risk estimation: SURE filtering Example (Non-local means with oriented patches) Estimate the map of SURE associated to all estimators Regularize the estimation guided by the sample DoF Figure: Regularized SURE [Deledalle et al., 2011b] Deledalle, C., Duval, V., and Salmon, J. (2011b). Non-local methods with shape-adaptive patches (NLM-SAP). Journal of Mathematical Imaging and Vision. C. Deledalle (CNRS/IMB) Risk estimation 25 mars, / 29

41 Risk estimation: SURE filtering Example (Non-local means with oriented patches) Estimate the map of SURE associated to all estimators Regularize the estimation guided by the sample DoF Combine the estimates using a convex aggregation Figure: Regularized SURE e.g. the Exponential Weighted Aggregation [Leung and Barron, 2006] [Deledalle et al., 2011b] Deledalle, C., Duval, V., and Salmon, J. (2011b). Non-local methods with shape-adaptive patches (NLM-SAP). Journal of Mathematical Imaging and Vision. C. Deledalle (CNRS/IMB) Risk estimation 25 mars, / 29

42 Risk estimation: SURE filtering Example (Non-local means with oriented patches) Yaroslavsky (a) 15 pie sizes/shapes Anisotropic diffusion (b) Regularized SURE (c) Patch orientations [Deledalle et al., 2011b] Deledalle, C., Duval, V., and Salmon, J. (2011b). Non-local methods with shape-adaptive patches (NLM-SAP). Journal of Mathematical Imaging and Vision. C. Deledalle (CNRS/IMB) Risk estimation 25 mars, / 29

43 Risk estimation: SURE filtering Example (Non-local means with oriented patches) (a) NL Means (b) BM3D [Dabov et al., 2007] (c) BM3D [Dabov et al., 2007] (d) [Goossens et al., 2008] (e) Our approach (f) Our approach [Deledalle et al., 2011b] Deledalle, C., Duval, V., and Salmon, J. (2011b). Non-local methods with shape-adaptive patches (NLM-SAP). Journal of Mathematical Imaging and Vision. C. Deledalle (CNRS/IMB) Risk estimation 25 mars, / 29

44 Risk estimation: SURE filtering Limits of the SURE based approach Local selection relies on a risk filter Difficulty to build such filter, Difficulty to provide guaranties. Difficulty to extend to other noise models. Beyond risk estimation ( ) ( ) R(θ) i = E Y (f θ (Y ) i Y i) 2 σ 2 +2σ 2 Yi (x 0) i fθ (Y i) (x 0) i }{{} σ σ Fidelity }{{} Degree of freedom = (E Y [f θ (Y ) i] (x 0) i) E Y (f θ (Y ) i E Y [f θ (Y ) i]) }{{}}{{} Bias 2 Variance The degree of freedom can relatively be well estimated locally. The variance also, typically using the propagation of uncertainty formula: E Y (f θ (Y ) i E Y [f θ (Y ) i]) 2 }{{} Variance E Y [( fθ (y) y ) t ( )] fθ (y) y=y y y=y i,i C. Deledalle (CNRS/IMB) Risk estimation 25 mars, / 29

45 Risk estimation: SURE filtering Limits of the SURE based approach Local selection relies on a risk filter Difficulty to build such filter, Difficulty to provide guaranties. Difficulty to extend to other noise models. Cost Parameter θ Beyond risk estimation But the fidelity and the bias are difficult to locally quantify (see also [Kervrann & Boulanger, 2008]) Since bias cannot be estimated properly, can we cancel or reduce it before selection? Modify all estimators f θ to f θ in order to Debiased their solutions and then select the one with smaller variance R(θ) i = (E Y [ f ) 2 ( θ (Y ) i] (x 0) i + E Y fθ (Y ) i E Y [ f ) 2 θ (Y ) i] } {{ } } {{ } Bias 2 Variance Improve their bias-variance trade-off and select the one with smaller variance Bias 2 Variance R(θ) i 2 Variance C. Deledalle (CNRS/IMB) Risk estimation 25 mars, / 29

46 Risk estimation: SURE filtering Limits of the SURE based approach Local selection relies on a risk filter Difficulty to build such filter, Difficulty to provide guaranties. Difficulty to extend to other noise models. Cost Parameter θ Beyond risk estimation But the fidelity and the bias are difficult to locally quantify (see also [Kervrann & Boulanger, 2008]) Since bias cannot be estimated properly, can we cancel or reduce it before selection? Modify all estimators f θ to f θ in order to Debiased their solutions and then select the one with smaller variance R(θ) i = (E Y [ f ) 2 ( θ (Y ) i] (x 0) i + E Y fθ (Y ) i E Y [ f ) 2 θ (Y ) i] } {{ } } {{ } Bias 2 Variance Improve their bias-variance trade-off and select the one with smaller variance Bias 2 Variance R(θ) i 2 Variance Remark: The bias-variance decomposition still holds true for non-gaussian noises. C. Deledalle (CNRS/IMB) Risk estimation 25 mars, / 29

47 Bias reduction-variance estimation approach Bias detection Denoiser essentially performs averages: convolutions, anisotropic diffusion, non-local means, tv, bm3d... For y fixed, their solution can be written as f θ (y) = W y + b = w i,jy j + b i where w i,j = 1 j j Weight w i,j attempts to select pixel j if (x 0) j = (x 0) i (i.e. Y j i.i.d. with Y i), hence f θ (y) i = w i,jy j E[Y i] = (x 0) i C. Deledalle (CNRS/IMB) Risk estimation 25 mars, / 29

48 Bias reduction-variance estimation approach Bias detection Denoiser essentially performs averages: convolutions, anisotropic diffusion, non-local means, tv, bm3d... For y fixed, their solution can be written as f θ (y) = W y + b = w i,jy j + b i where w i,j = 1 j j Weight w i,j attempts to select pixel j if (x 0) j = (x 0) i (i.e. Y j i.i.d. with Y i), hence f θ (y) i = w i,jy j E[Y i] = (x 0) i Hence, the variance of these samples should be equal to σ 2 whatever (x 0) i: σ 2 θ (y)i = ( w i,jy 2 j ) 2 wi,jy j Var[Yi] = σ 2 C. Deledalle (CNRS/IMB) Risk estimation 25 mars, / 29

49 Bias reduction-variance estimation approach Bias detection Denoiser essentially performs averages: convolutions, anisotropic diffusion, non-local means, tv, bm3d... For y fixed, their solution can be written as f θ (y) = W y + b = w i,jy j + b i where w i,j = 1 j j Weight w i,j attempts to select pixel j if (x 0) j = (x 0) i (i.e. Y j i.i.d. with Y i), hence f θ (y) i = w i,jy j E[Y i] = (x 0) i Hence, the variance of these samples should be equal to σ 2 whatever (x 0) i: σ 2 θ (y)i = ( w i,jy 2 j ) 2 wi,jy j Var[Yi] = σ 2 C. Deledalle (CNRS/IMB) Risk estimation 25 mars, / 29

50 Bias reduction-variance estimation approach Bias detection Denoiser essentially performs averages: convolutions, anisotropic diffusion, non-local means, tv, bm3d... For y fixed, their solution can be written as f θ (y) = W y + b = w i,jy j + b i where w i,j = 1 j j Weight w i,j attempts to select pixel j if (x 0) j = (x 0) i (i.e. Y j i.i.d. with Y i), hence f θ (y) i = w i,jy j E[Y i] = (x 0) i Hence, the variance of these samples should be equal to σ 2 whatever (x 0) i: σ 2 θ (y)i = ( w i,jy 2 j ) 2 wi,jy j Var[Yi] = σ 2 Otherwise: σ 2 θ (y)i σ2 indication of bias: (x 0) j (x 0) i C. Deledalle (CNRS/IMB) Risk estimation 25 mars, / 29

51 Bias reduction-variance estimation approach Bias detection Denoiser essentially performs averages: convolutions, anisotropic diffusion, non-local means, tv, bm3d... For y fixed, their solution can be written as f θ (y) = W y + b = w i,jy j + b i where w i,j = 1 j j Weight w i,j attempts to select pixel j if (x 0) j = (x 0) i (i.e. Y j i.i.d. with Y i), hence f θ (y) i = w i,jy j E[Y i] = (x 0) i Hence, the variance of these samples should be equal to σ 2 whatever (x 0) i: σ 2 θ (y)i = ( w i,jy 2 j ) 2 wi,jy j Var[Yi] = σ 2 Otherwise: σ 2 θ (y)i σ2 indication of bias: (x 0) j (x 0) i (a) Noisy image y (b) f θ (y): convolution (c) Bias indicator: σ 2 θ (y) C. Deledalle (CNRS/IMB) Risk estimation 25 mars, / 29

52 Bias reduction-variance estimation approach Bias detection Denoiser essentially performs averages: convolutions, anisotropic diffusion, non-local means, tv, bm3d... For y fixed, their solution can be written as f θ (y) = W y + b = w i,jy j + b i where w i,j = 1 j j Weight w i,j attempts to select pixel j if (x 0) j = (x 0) i (i.e. Y j i.i.d. with Y i), hence f θ (y) i = w i,jy j E[Y i] = (x 0) i Hence, the variance of these samples should be equal to σ 2 whatever (x 0) i: σ 2 θ (y)i = ( w i,jy 2 j ) 2 wi,jy j Var[Yi] = σ 2 Otherwise: σ 2 θ (y)i σ2 indication of bias: (x 0) j (x 0) i (a) Noisy image y (b) f θ (y): convolution (c) Bias indicator: σ 2 θ (y) C. Deledalle (CNRS/IMB) Risk estimation 25 mars, / 29

53 Bias reduction-variance estimation approach Bias detection Denoiser essentially performs averages: convolutions, anisotropic diffusion, non-local means, tv, bm3d... For y fixed, their solution can be written as f θ (y) = W y + b = w i,jy j + b i where w i,j = 1 j j Weight w i,j attempts to select pixel j if (x 0) j = (x 0) i (i.e. Y j i.i.d. with Y i), hence f θ (y) i = w i,jy j E[Y i] = (x 0) i Hence, the variance of these samples should be equal to σ 2 whatever (x 0) i: σ 2 θ (y)i = ( w i,jy 2 j ) 2 wi,jy j Var[Yi] = σ 2 Otherwise: σ 2 θ (y)i σ2 indication of bias: (x 0) j (x 0) i (a) Noisy image y (b) f θ (y): non-local means (c) Bias indicator: σ 2 θ (y) C. Deledalle (CNRS/IMB) Risk estimation 25 mars, / 29

54 Bias reduction-variance estimation approach Bias detection Denoiser essentially performs averages: convolutions, anisotropic diffusion, non-local means, tv, bm3d... For y fixed, their solution can be written as f θ (y) = W y + b = w i,jy j + b i where w i,j = 1 j j Weight w i,j attempts to select pixel j if (x 0) j = (x 0) i (i.e. Y j i.i.d. with Y i), hence f θ (y) i = w i,jy j E[Y i] = (x 0) i Hence, the variance of these samples should be equal to σ 2 whatever (x 0) i: σ 2 θ (y)i = ( w i,jy 2 j ) 2 wi,jy j Var[Yi] = σ 2 Otherwise: σ 2 θ (y)i σ2 indication of bias: (x 0) j (x 0) i (a) Noisy image y (b) f θ (y): non-local means (c) Bias indicator: σ 2 θ (y) C. Deledalle (CNRS/IMB) Risk estimation 25 mars, / 29

55 Bias reduction-variance estimation approach Bias detection Denoiser essentially performs averages: convolutions, anisotropic diffusion, non-local means, tv, bm3d... For y fixed, their solution can be written as f θ (y) = W y + b = w i,jy j + b i where w i,j = 1 j j Weight w i,j attempts to select pixel j if (x 0) j = (x 0) i (i.e. Y j i.i.d. with Y i), hence f θ (y) i = w i,jy j E[Y i] = (x 0) i Hence, the variance of these samples should be equal to σ 2 whatever (x 0) i: σ 2 θ (y)i = ( w i,jy 2 j ) 2 wi,jy j Var[Yi] = σ 2 Otherwise: σ 2 θ (y)i σ2 indication of bias: (x 0) j (x 0) i (a) Noisy image y (b) f θ (y): total-variation (c) Bias indicator: σ 2 θ (y) C. Deledalle (CNRS/IMB) Risk estimation 25 mars, / 29

56 Bias reduction-variance estimation approach Bias detection Denoiser essentially performs averages: convolutions, anisotropic diffusion, non-local means, tv, bm3d... For y fixed, their solution can be written as f θ (y) = W y + b = w i,jy j + b i where w i,j = 1 j j Weight w i,j attempts to select pixel j if (x 0) j = (x 0) i (i.e. Y j i.i.d. with Y i), hence f θ (y) i = w i,jy j E[Y i] = (x 0) i Hence, the variance of these samples should be equal to σ 2 whatever (x 0) i: σ 2 θ (y)i = ( w i,jy 2 j ) 2 wi,jy j Var[Yi] = σ 2 Otherwise: σ 2 θ (y)i σ2 indication of bias: (x 0) j (x 0) i (a) Noisy image y (b) f θ (y): total-variation (c) Bias indicator: σ 2 θ (y) C. Deledalle (CNRS/IMB) Risk estimation 25 mars, / 29

57 Bias reduction-variance estimation approach Bias detection Denoiser essentially performs averages: convolutions, anisotropic diffusion, non-local means, tv, bm3d... For y fixed, their solution can be written as f θ (y) = W y + b = w i,jy j + b i where w i,j = 1 j j Weight w i,j attempts to select pixel j if (x 0) j = (x 0) i (i.e. Y j i.i.d. with Y i), hence f θ (y) i = w i,jy j E[Y i] = (x 0) i Hence, the variance of these samples should be equal to σ 2 whatever (x 0) i: σ 2 θ (y)i = ( w i,jy 2 j ) 2 wi,jy j Var[Yi] = σ 2 Otherwise: σ 2 θ (y)i σ2 indication of bias: (x 0) j (x 0) i (a) Noisy image y (b) f θ (y): total-variation (c) Bias indicator: σ 2 θ (y) C. Deledalle (CNRS/IMB) Risk estimation 25 mars, / 29

58 Bias reduction-variance estimation approach Bias reduction [Lee, 1980, Kuan et al., 1985] Assume the selected samples are realizations of Y = X + N with E N N = 0. The optimal linear estimator minimizing E X,Y [(ay + b X) 2 ] (LMMSE) is The plug-in LMMSE estimator is E Y [Y ] + σ2 Y σ2 N σy 2 (Y E Y [Y ]) f θ (y) i = f θ (y) i + α i(y i f θ (y) i) where α i = σ2 θ (y)i σ2 σ θ 2(y) C. Deledalle (CNRS/IMB) Risk estimation 25 mars, / 29

59 Bias reduction-variance estimation approach Bias reduction [Lee, 1980, Kuan et al., 1985] Assume the selected samples are realizations of Y = X + N with E N N = 0. The optimal linear estimator minimizing E X,Y [(ay + b X) 2 ] (LMMSE) is The plug-in LMMSE estimator is E Y [Y ] + σ2 Y σ2 N σy 2 (Y E Y [Y ]) f θ (y) i = f θ (y) i + α i(y i f θ (y) i) where α i = σ2 θ (y)i σ2 σ θ 2(y) (a) Noisy image y (b) f θ (y): convolution (c) Bias indicator: σ 2 θ (y) (d) Bias reduction: fθ (y) C. Deledalle (CNRS/IMB) Risk estimation 25 mars, / 29

60 Bias reduction-variance estimation approach Bias reduction [Lee, 1980, Kuan et al., 1985] Assume the selected samples are realizations of Y = X + N with E N N = 0. The optimal linear estimator minimizing E X,Y [(ay + b X) 2 ] (LMMSE) is The plug-in LMMSE estimator is E Y [Y ] + σ2 Y σ2 N σy 2 (Y E Y [Y ]) f θ (y) i = f θ (y) i + α i(y i f θ (y) i) where α i = σ2 θ (y)i σ2 σ θ 2(y) (a) Noisy image y (b) f θ (y): convolution (c) Bias indicator: σ 2 θ (y) (d) Bias reduction: fθ (y) C. Deledalle (CNRS/IMB) Risk estimation 25 mars, / 29

61 Bias reduction-variance estimation approach Bias reduction [Lee, 1980, Kuan et al., 1985] Assume the selected samples are realizations of Y = X + N with E N N = 0. The optimal linear estimator minimizing E X,Y [(ay + b X) 2 ] (LMMSE) is The plug-in LMMSE estimator is E Y [Y ] + σ2 Y σ2 N σy 2 (Y E Y [Y ]) f θ (y) i = f θ (y) i + α i(y i f θ (y) i) where α i = σ2 θ (y)i σ2 σ θ 2(y) (a) Noisy image y (b) f θ (y): non-local means (c) Bias indicator: σ 2 θ (y) (d) Bias reduction: fθ (y) C. Deledalle (CNRS/IMB) Risk estimation 25 mars, / 29

62 Bias reduction-variance estimation approach Bias reduction [Lee, 1980, Kuan et al., 1985] Assume the selected samples are realizations of Y = X + N with E N N = 0. The optimal linear estimator minimizing E X,Y [(ay + b X) 2 ] (LMMSE) is The plug-in LMMSE estimator is E Y [Y ] + σ2 Y σ2 N σy 2 (Y E Y [Y ]) f θ (y) i = f θ (y) i + α i(y i f θ (y) i) where α i = σ2 θ (y)i σ2 σ θ 2(y) (a) Noisy image y (b) f θ (y): non-local means (c) Bias indicator: σ 2 θ (y) (d) Bias reduction: fθ (y) C. Deledalle (CNRS/IMB) Risk estimation 25 mars, / 29

63 Bias reduction-variance estimation approach Bias reduction [Lee, 1980, Kuan et al., 1985] Assume the selected samples are realizations of Y = X + N with E N N = 0. The optimal linear estimator minimizing E X,Y [(ay + b X) 2 ] (LMMSE) is The plug-in LMMSE estimator is E Y [Y ] + σ2 Y σ2 N σy 2 (Y E Y [Y ]) f θ (y) i = f θ (y) i + α i(y i f θ (y) i) where α i = σ2 θ (y)i σ2 σ θ 2(y) (a) Noisy image y (b) f θ (y): total-variation (c) Bias indicator: σ 2 θ (y) (d) Bias reduction: fθ (y) C. Deledalle (CNRS/IMB) Risk estimation 25 mars, / 29

64 Bias reduction-variance estimation approach Bias reduction [Lee, 1980, Kuan et al., 1985] Assume the selected samples are realizations of Y = X + N with E N N = 0. The optimal linear estimator minimizing E X,Y [(ay + b X) 2 ] (LMMSE) is The plug-in LMMSE estimator is E Y [Y ] + σ2 Y σ2 N σy 2 (Y E Y [Y ]) f θ (y) i = f θ (y) i + α i(y i f θ (y) i) where α i = σ2 θ (y)i σ2 σ θ 2(y) (a) Noisy image y (b) f θ (y): total-variation (c) Bias indicator: σ 2 θ (y) (d) Bias reduction: fθ (y) C. Deledalle (CNRS/IMB) Risk estimation 25 mars, / 29

65 Bias reduction-variance estimation approach Variance estimation [Kervrann and Boulanger, 2008, Salmon and Strozecki, 2010] The LMMSE aims at reaching a bias variance trade-off such that Bias 2 Variance R(θ) i 2 Variance What is the residual variance of f θ (y)? Recall: f θ (y) = W y + b And: fθ (y) i = f θ (y) i + α i(y i f θ (y) i) Cost Parameter θ Then: fθ (y) i = (1 α i) i j w i,jy j + (w i,i(1 α i) + α i)y i + (1 α i)b i Hence: Var[ f θ (Y ) i] = σ 2[ (1 α i) 2 j w 2 i,j +α2 i +2αi(1 αi)wi,i ] (using Var[γY ] = γ 2 σ 2 ) Define the smoothing strength at i as: σ 2 /Var[ f θ (Y ) i] (the greater the better) C. Deledalle (CNRS/IMB) Risk estimation 25 mars, / 29

66 Bias reduction-variance estimation approach Variance estimation [Kervrann and Boulanger, 2008, Salmon and Strozecki, 2010] The LMMSE aims at reaching a bias variance trade-off such that Bias 2 Variance R(θ) i 2 Variance What is the residual variance of f θ (y)? Recall: f θ (y) = W y + b And: fθ (y) i = f θ (y) i + α i(y i f θ (y) i) Cost Parameter θ Then: fθ (y) i = (1 α i) i j w i,jy j + (w i,i(1 α i) + α i)y i + (1 α i)b i Hence: Var[ f θ (Y ) i] = σ 2[ (1 α i) 2 j w 2 i,j +α2 i +2αi(1 αi)wi,i ] (using Var[γY ] = γ 2 σ 2 ) Define the smoothing strength at i as: σ 2 /Var[ f θ (Y ) i] (the greater the better) (a) Noisy image y (b) f θ (y): convolution (c) Bias reduction: fθ (y) (d) Smoothing strength C. Deledalle (CNRS/IMB) Risk estimation 25 mars, / 29

67 Bias reduction-variance estimation approach Variance estimation [Kervrann and Boulanger, 2008, Salmon and Strozecki, 2010] The LMMSE aims at reaching a bias variance trade-off such that Bias 2 Variance R(θ) i 2 Variance What is the residual variance of f θ (y)? Recall: f θ (y) = W y + b And: fθ (y) i = f θ (y) i + α i(y i f θ (y) i) Cost Parameter θ Then: fθ (y) i = (1 α i) i j w i,jy j + (w i,i(1 α i) + α i)y i + (1 α i)b i Hence: Var[ f θ (Y ) i] = σ 2[ (1 α i) 2 j w 2 i,j +α2 i +2αi(1 αi)wi,i ] (using Var[γY ] = γ 2 σ 2 ) Define the smoothing strength at i as: σ 2 /Var[ f θ (Y ) i] (the greater the better) (a) Noisy image y (b) f θ (y): convolution (c) Bias reduction: fθ (y) (d) Smoothing strength C. Deledalle (CNRS/IMB) Risk estimation 25 mars, / 29

68 Bias reduction-variance estimation approach Variance estimation [Kervrann and Boulanger, 2008, Salmon and Strozecki, 2010] The LMMSE aims at reaching a bias variance trade-off such that Bias 2 Variance R(θ) i 2 Variance What is the residual variance of f θ (y)? Recall: f θ (y) = W y + b And: fθ (y) i = f θ (y) i + α i(y i f θ (y) i) Cost Parameter θ Then: fθ (y) i = (1 α i) i j w i,jy j + (w i,i(1 α i) + α i)y i + (1 α i)b i Hence: Var[ f θ (Y ) i] = σ 2[ (1 α i) 2 j w 2 i,j +α2 i +2αi(1 αi)wi,i ] (using Var[γY ] = γ 2 σ 2 ) Define the smoothing strength at i as: σ 2 /Var[ f θ (Y ) i] (the greater the better) (a) Noisy image y (b) f θ (y): non-local means (c) Bias reduction: fθ (y) (d) Smoothing strength C. Deledalle (CNRS/IMB) Risk estimation 25 mars, / 29

69 Bias reduction-variance estimation approach Variance estimation [Kervrann and Boulanger, 2008, Salmon and Strozecki, 2010] The LMMSE aims at reaching a bias variance trade-off such that Bias 2 Variance R(θ) i 2 Variance What is the residual variance of f θ (y)? Recall: f θ (y) = W y + b And: fθ (y) i = f θ (y) i + α i(y i f θ (y) i) Cost Parameter θ Then: fθ (y) i = (1 α i) i j w i,jy j + (w i,i(1 α i) + α i)y i + (1 α i)b i Hence: Var[ f θ (Y ) i] = σ 2[ (1 α i) 2 j w 2 i,j +α2 i +2αi(1 αi)wi,i ] (using Var[γY ] = γ 2 σ 2 ) Define the smoothing strength at i as: σ 2 /Var[ f θ (Y ) i] (the greater the better) (a) Noisy image y (b) f θ (y): non-local means (c) Bias reduction: fθ (y) (d) Smoothing strength C. Deledalle (CNRS/IMB) Risk estimation 25 mars, / 29

70 Bias reduction-variance estimation approach Variance estimation [Kervrann and Boulanger, 2008, Salmon and Strozecki, 2010] The LMMSE aims at reaching a bias variance trade-off such that Bias 2 Variance R(θ) i 2 Variance What is the residual variance of f θ (y)? Recall: f θ (y) = W y + b And: fθ (y) i = f θ (y) i + α i(y i f θ (y) i) Cost Parameter θ Then: fθ (y) i = (1 α i) i j w i,jy j + (w i,i(1 α i) + α i)y i + (1 α i)b i Hence: Var[ f θ (Y ) i] = σ 2[ (1 α i) 2 j w 2 i,j +α2 i +2αi(1 αi)wi,i ] (using Var[γY ] = γ 2 σ 2 ) Define the smoothing strength at i as: σ 2 /Var[ f θ (Y ) i] (the greater the better) (a) Noisy image y (b) f θ (y): total-variation (c) Bias reduction: fθ (y) (d) Smoothing strength C. Deledalle (CNRS/IMB) Risk estimation 25 mars, / 29

71 Bias reduction-variance estimation approach Variance estimation [Kervrann and Boulanger, 2008, Salmon and Strozecki, 2010] The LMMSE aims at reaching a bias variance trade-off such that Bias 2 Variance R(θ) i 2 Variance What is the residual variance of f θ (y)? Recall: f θ (y) = W y + b And: fθ (y) i = f θ (y) i + α i(y i f θ (y) i) Cost Parameter θ Then: fθ (y) i = (1 α i) i j w i,jy j + (w i,i(1 α i) + α i)y i + (1 α i)b i Hence: Var[ f θ (Y ) i] = σ 2[ (1 α i) 2 j w 2 i,j +α2 i +2αi(1 αi)wi,i ] (using Var[γY ] = γ 2 σ 2 ) Define the smoothing strength at i as: σ 2 /Var[ f θ (Y ) i] (the greater the better) (a) Noisy image y (b) f θ (y): total-variation (c) Bias reduction: fθ (y) (d) Smoothing strength C. Deledalle (CNRS/IMB) Risk estimation 25 mars, / 29

72 Bias reduction-variance estimation approach Variance estimation [Kervrann and Boulanger, 2008, Salmon and Strozecki, 2010] The LMMSE aims at reaching a bias variance trade-off such that Bias 2 Variance R(θ) i 2 Variance What is the residual variance of f θ (y)? Recall: f θ (y) = W y + b And: fθ (y) i = f θ (y) i + α i(y i f θ (y) i) Cost Parameter θ Then: fθ (y) i = (1 α i) i j w i,jy j + (w i,i(1 α i) + α i)y i + (1 α i)b i Hence: Var[ f θ (Y ) i] = σ 2[ (1 α i) 2 j w 2 i,j +α2 i +2αi(1 αi)wi,i ] (using Var[γY ] = γ 2 σ 2 ) Define the smoothing strength at i as: σ 2 /Var[ f θ (Y ) i] (the greater the better) (a) Noisy image y (b) f θ (y): total-variation (c) Bias reduction: fθ (y) (d) Smoothing strength C. Deledalle (CNRS/IMB) Risk estimation 25 mars, / 29

73 Our approach: Case of anisotropic Gaussian convolutions Summary Choose different set of parameters θ (1), θ (1),... For each set θ: Compute f θ (y), Compute j wi,jyj, j wi,jy2 j, j w2 i,j Deduce f θ (y) i and Var[ f θ (Y ) i] Perform local selection: choose at each pixel i, the set of parameter such that f i (y)i = f θ (y) i where θ i i = argmin Var[ f θ (Y ) i] θ C. Deledalle (CNRS/IMB) Risk estimation 25 mars, / 29

74 Our approach: Case of anisotropic Gaussian convolutions Summary Choose different set of parameters θ (1), θ (1),... For each set θ: Compute f θ (y), Compute j wi,jyj, j wi,jy2 j, j w2 i,j Deduce f θ (y) i and Var[ f θ (Y ) i] Perform local selection: choose at each pixel i, the set of parameter such that f i (y)i = f θ (y) i where θ i i = argmin Var[ f θ (Y ) i] θ Example (Gaussian convolutions) Perform several Gaussian convolutions with different anisotropies, sizes and orientations. Weights w i,j are given by the associated Gaussian kernel function. (a) Noisy image y (b) One estimator: f θ (y) C. Deledalle (CNRS/IMB) Risk estimation 25 mars, / 29

75 Our approach: Case of anisotropic Gaussian convolutions Summary Choose different set of parameters θ (1), θ (1),... For each set θ: Compute f θ (y), Compute j wi,jyj, j wi,jy2 j, j w2 i,j Deduce f θ (y) i and Var[ f θ (Y ) i] Perform local selection: choose at each pixel i, the set of parameter such that f i (y)i = f θ (y) i where θ i i = argmin Var[ f θ (Y ) i] θ Example (Gaussian convolutions) Perform several Gaussian convolutions with different anisotropies, sizes and orientations. Weights w i,j are given by the associated Gaussian kernel function. (a) Noisy image y (b) One estimator: f θ (y) C. Deledalle (CNRS/IMB) Risk estimation 25 mars, / 29

76 Our approach: Case of anisotropic Gaussian convolutions Summary Choose different set of parameters θ (1), θ (1),... For each set θ: Compute f θ (y), Compute j wi,jyj, j wi,jy2 j, j w2 i,j Deduce f θ (y) i and Var[ f θ (Y ) i] Perform local selection: choose at each pixel i, the set of parameter such that f i (y)i = f θ (y) i where θ i i = argmin Var[ f θ (Y ) i] θ Example (Gaussian convolutions) Perform several Gaussian convolutions with different anisotropies, sizes and orientations. Weights w i,j are given by the associated Gaussian kernel function. (a) Noisy image y (b) One estimator: f θ (y) (c) Final result: f (y) (d) Smoothing strength C. Deledalle (CNRS/IMB) Risk estimation 25 mars, / 29

77 Our approach: Case of anisotropic Gaussian convolutions Summary Choose different set of parameters θ (1), θ (1),... For each set θ: Compute f θ (y), Compute j wi,jyj, j wi,jy2 j, j w2 i,j Deduce f θ (y) i and Var[ f θ (Y ) i] Perform local selection: choose at each pixel i, the set of parameter such that f i (y)i = f θ (y) i where θ i i = argmin Var[ f θ (Y ) i] θ Example (Gaussian convolutions) Perform several Gaussian convolutions with different anisotropies, sizes and orientations. Weights w i,j are given by the associated Gaussian kernel function. (a) Noisy image y (b) One estimator: f θ (y) (c) Final result: f (y) (d) Scale C. Deledalle (CNRS/IMB) Risk estimation 25 mars, / 29

78 Our approach: Case of anisotropic Gaussian convolutions Summary Choose different set of parameters θ (1), θ (1),... For each set θ: Compute f θ (y), Compute j wi,jyj, j wi,jy2 j, j w2 i,j Deduce f θ (y) i and Var[ f θ (Y ) i] Perform local selection: choose at each pixel i, the set of parameter such that f i (y)i = f θ (y) i where θ i i = argmin Var[ f θ (Y ) i] θ Example (Gaussian convolutions) Perform several Gaussian convolutions with different anisotropies, sizes and orientations. Weights w i,j are given by the associated Gaussian kernel function. (a) Noisy image y (b) One estimator: f θ (y) (c) Final result: f (y) (d) Orientation C. Deledalle (CNRS/IMB) Risk estimation 25 mars, / 29

79 Our approach: Case of anisotropic Gaussian convolutions Summary Choose different set of parameters θ (1), θ (1),... For each set θ: Compute f θ (y), Compute j wi,jyj, j wi,jy2 j, j w2 i,j Deduce f θ (y) i and Var[ f θ (Y ) i] Perform local selection: choose at each pixel i, the set of parameter such that f i (y)i = f θ (y) i where θ i i = argmin Var[ f θ (Y ) i] θ Example (Gaussian convolutions) Perform several Gaussian convolutions with different anisotropies, sizes and orientations. Weights w i,j are given by the associated Gaussian kernel function. (a) Noisy image y (b) One estimator: f θ (y) (c) Final result: f (y) (d) Anisotropy C. Deledalle (CNRS/IMB) Risk estimation 25 mars, / 29

80 Our approach: Case of non-local means Example (Non-local means) Perform several non-local means with different patch size, window size and prefiltering Weights w i,j are given as the result of patch comparison. LMMSE Smoothing Estimates } {{ } A small sample of estimates obtained with different parameters } {{ } Local selection Remark: None of the parameters can preserve all kind of structures. Unlike SURE this approach adapts straight-forwardly to gamma or Poisson noises. C. Deledalle (CNRS/IMB) Risk estimation 25 mars, / 29

81 Our approach: Case of non-local means Example (Non-local means on simulated data (High noise level)) (a) (b) (c) (a) Noisy image. (b) Result of the adaptive approach. (c) From left to right, top to bottom: smoothing strength (range: [0, 20 20]), search window sizes (range: [0, 20 20]), the patch size (range: [3 3, 11 11]), prefiltering strength (range: [1, 3]). C. Deledalle (CNRS/IMB) Risk estimation 25 mars, / 29

82 Our approach: Case of non-local means Example (Non-local means on simulated data (Low noise level)) (a) (b) (c) (a) Noisy image. (b) Result of the adaptive approach. (c) From left to right, top to bottom: smoothing strength (range: [0, 20 20]), search window sizes (range: [0, 20 20]), the patch size (range: [3 3, 11 11]), prefiltering strength (range: [1, 3]). C. Deledalle (CNRS/IMB) Risk estimation 25 mars, / 29

83 Our approach: Case of non-local means Example (Non-local means on on Polarimetric data) c DLR (a) High-resolution S-band SAR image (b) Adaptive estimation [Deledalle et al., 2013] Deledalle, C., Denis, L., Tupin, F., A. Reigber, M. Ja ger (2013). NL-SAR: a unified Non-Local framework for resolution-preserving (Pol)(In)SAR denoising, Technical report HAL, hal C. Deledalle (CNRS/IMB) Risk estimation 25 mars, / 29

84 Our approach: Case of non-local means Example (Non-local means on on Polarimetric data) (a) Smoothing strength (b) Adaptive estimation [Deledalle et al., 2013] Deledalle, C., Denis, L., Tupin, F., A. Reigber, M. Jäger (2013). NL-SAR: a unified Non-Local framework for resolution-preserving (Pol)(In)SAR denoising, Technical report HAL, hal C. Deledalle (CNRS/IMB) Risk estimation 25 mars, / 29

85 Our approach: Case of anisotropic total-variation Example (Anisotropic Total-Variation) Recall the solution of anisotropic total variation is given by f θ (y) = argmin x 1 2 x y 2 + λ x 1 where x 1 = k ( h x) k + ( vx) k and can be computed iteratively by a proximal algorithm. Perform several total-variation with different regularization parameter λ. (a) Target image x 0 (b) Noisy image y (c) Final result: f (y) (d) Parameter λ C. Deledalle (CNRS/IMB) Risk estimation 25 mars, / 29

86 Our approach: Case of anisotropic total-variation Example (Anisotropic Total-Variation) Recall the solution of anisotropic total variation is given by f θ (y) = argmin x 1 2 x y 2 + λ x 1 where x 1 = k ( h x) k + ( vx) k and can be computed iteratively by a proximal algorithm. Perform several total-variation with different regularization parameter λ. (a) Target image x 0 (b) Noisy image y (c) Final result: f (y) (d) Smoothing strength C. Deledalle (CNRS/IMB) Risk estimation 25 mars, / 29

87 Our approach: Case of anisotropic total-variation Example (Anisotropic Total-Variation) Recall the solution of anisotropic total variation is given by f θ (y) = argmin x 1 2 x y 2 + λ x 1 where x 1 = k ( h x) k + ( vx) k and can be computed iteratively by a proximal algorithm. Perform several total-variation with different regularization parameter λ. (a) Target image x 0 (b) Noisy image y (c) Final result: f (y) (d) Parameter λ C. Deledalle (CNRS/IMB) Risk estimation 25 mars, / 29

88 Our approach: Case of anisotropic total-variation Example (Anisotropic Total-Variation) Recall the solution of anisotropic total variation is given by f θ (y) = argmin x 1 2 x y 2 + λ x 1 where x 1 = k ( h x) k + ( vx) k and can be computed iteratively by a proximal algorithm. Perform several total-variation with different regularization parameter λ. (a) Target image x 0 (b) Noisy image y (c) Final result: f (y) (d) Smoothing strength C. Deledalle (CNRS/IMB) Risk estimation 25 mars, / 29

89 Our approach: Case of anisotropic total-variation Example (Anisotropic Total-Variation) Recall the solution of anisotropic total variation is given by f θ (y) = argmin x 1 2 x y 2 + λ x 1 where x 1 = k ( h x) k + ( vx) k and can be computed iteratively by a proximal algorithm. Perform several total-variation with different regularization parameter λ. (a) Target image x 0 (b) Noisy image y (c) Final result: f (y) (d) Parameter λ C. Deledalle (CNRS/IMB) Risk estimation 25 mars, / 29

90 Our approach: Case of anisotropic total-variation Example (Anisotropic Total-Variation) Recall the solution of anisotropic total variation is given by f θ (y) = argmin x 1 2 x y 2 + λ x 1 where x 1 = k ( h x) k + ( vx) k and can be computed iteratively by a proximal algorithm. Perform several total-variation with different regularization parameter λ. (a) Target image x 0 (b) Noisy image y (c) Final result: f (y) (d) Smoothing strength C. Deledalle (CNRS/IMB) Risk estimation 25 mars, / 29

Poisson NL means: unsupervised non local means for Poisson noise

Poisson NL means: unsupervised non local means for Poisson noise 2010 IEEE International Conference on Image Processing Hong-Kong, September 26-29, 2010 Poisson NL means: unsupervised non local means for Poisson noise Charles Deledalle 1,FlorenceTupin 1, Loïc Denis

More information

The degrees of freedom of the Lasso for general design matrix

The degrees of freedom of the Lasso for general design matrix The degrees of freedom of the Lasso for general design matrix C. Dossal (1) M. Kachour (2), M.J. Fadili (2), G. Peyré (3) and C. Chesneau (4) (1) IMB, CNRS-Univ. Bordeaux 1 351 Cours de la Libération,

More information

Perturbed Proximal Gradient Algorithm

Perturbed Proximal Gradient Algorithm Perturbed Proximal Gradient Algorithm Gersende FORT LTCI, CNRS, Telecom ParisTech Université Paris-Saclay, 75013, Paris, France Large-scale inverse problems and optimization Applications to image processing

More information

Image Noise: Detection, Measurement and Removal Techniques. Zhifei Zhang

Image Noise: Detection, Measurement and Removal Techniques. Zhifei Zhang Image Noise: Detection, Measurement and Removal Techniques Zhifei Zhang Outline Noise measurement Filter-based Block-based Wavelet-based Noise removal Spatial domain Transform domain Non-local methods

More information

Patch similarity under non Gaussian noise

Patch similarity under non Gaussian noise The 18th IEEE International Conference on Image Processing Brussels, Belgium, September 11 14, 011 Patch similarity under non Gaussian noise Charles Deledalle 1, Florence Tupin 1, Loïc Denis 1 Institut

More information

The degrees of freedom of the Group Lasso for a General Design

The degrees of freedom of the Group Lasso for a General Design The degrees of freedom of the Group Lasso for a General Design Samuel Vaiter, Charles Deledalle, Gabriel Peyré, Jalal M. Fadili, Charles Dossal To cite this version: Samuel Vaiter, Charles Deledalle, Gabriel

More information

Stochastic Proximal Gradient Algorithm

Stochastic Proximal Gradient Algorithm Stochastic Institut Mines-Télécom / Telecom ParisTech / Laboratoire Traitement et Communication de l Information Joint work with: Y. Atchade, Ann Arbor, USA, G. Fort LTCI/Télécom Paristech and the kind

More information

EE 367 / CS 448I Computational Imaging and Display Notes: Image Deconvolution (lecture 6)

EE 367 / CS 448I Computational Imaging and Display Notes: Image Deconvolution (lecture 6) EE 367 / CS 448I Computational Imaging and Display Notes: Image Deconvolution (lecture 6) Gordon Wetzstein gordon.wetzstein@stanford.edu This document serves as a supplement to the material discussed in

More information

Poisson Image Denoising Using Best Linear Prediction: A Post-processing Framework

Poisson Image Denoising Using Best Linear Prediction: A Post-processing Framework Poisson Image Denoising Using Best Linear Prediction: A Post-processing Framework Milad Niknejad, Mário A.T. Figueiredo Instituto de Telecomunicações, and Instituto Superior Técnico, Universidade de Lisboa,

More information

Stein Unbiased GrAdient estimator of the Risk (SUGAR) for multiple parameter selection

Stein Unbiased GrAdient estimator of the Risk (SUGAR) for multiple parameter selection Stein Unbiased GrAdient estimator of the Risk (SUGAR) for multiple parameter selection Charles-Alban Deledalle, Samuel Vaiter, Jalal M. Fadili, Gabriel Peyré To cite this version: Charles-Alban Deledalle,

More information

Exponential Weighted Aggregation vs Penalized Estimation: Guarantees And Algorithms

Exponential Weighted Aggregation vs Penalized Estimation: Guarantees And Algorithms Exponential Weighted Aggregation vs Penalized Estimation: Guarantees And Algorithms Jalal Fadili Normandie Université-ENSICAEN, GREYC CNRS UMR 6072 Joint work with Tùng Luu and Christophe Chesneau SPARS

More information

A Riemannian Framework for Denoising Diffusion Tensor Images

A Riemannian Framework for Denoising Diffusion Tensor Images A Riemannian Framework for Denoising Diffusion Tensor Images Manasi Datar No Institute Given Abstract. Diffusion Tensor Imaging (DTI) is a relatively new imaging modality that has been extensively used

More information

Signal Denoising with Wavelets

Signal Denoising with Wavelets Signal Denoising with Wavelets Selin Aviyente Department of Electrical and Computer Engineering Michigan State University March 30, 2010 Introduction Assume an additive noise model: x[n] = f [n] + w[n]

More information

Econ 2148, fall 2017 Gaussian process priors, reproducing kernel Hilbert spaces, and Splines

Econ 2148, fall 2017 Gaussian process priors, reproducing kernel Hilbert spaces, and Splines Econ 2148, fall 2017 Gaussian process priors, reproducing kernel Hilbert spaces, and Splines Maximilian Kasy Department of Economics, Harvard University 1 / 37 Agenda 6 equivalent representations of the

More information

Fast learning rates for plug-in classifiers under the margin condition

Fast learning rates for plug-in classifiers under the margin condition Fast learning rates for plug-in classifiers under the margin condition Jean-Yves Audibert 1 Alexandre B. Tsybakov 2 1 Certis ParisTech - Ecole des Ponts, France 2 LPMA Université Pierre et Marie Curie,

More information

MA 575 Linear Models: Cedric E. Ginestet, Boston University Midterm Review Week 7

MA 575 Linear Models: Cedric E. Ginestet, Boston University Midterm Review Week 7 MA 575 Linear Models: Cedric E. Ginestet, Boston University Midterm Review Week 7 1 Random Vectors Let a 0 and y be n 1 vectors, and let A be an n n matrix. Here, a 0 and A are non-random, whereas y is

More information

Linear Diffusion. E9 242 STIP- R. Venkatesh Babu IISc

Linear Diffusion. E9 242 STIP- R. Venkatesh Babu IISc Linear Diffusion Derivation of Heat equation Consider a 2D hot plate with Initial temperature profile I 0 (x, y) Uniform (isotropic) conduction coefficient c Unit thickness (along z) Problem: What is temperature

More information

Cross-Validation with Confidence

Cross-Validation with Confidence Cross-Validation with Confidence Jing Lei Department of Statistics, Carnegie Mellon University UMN Statistics Seminar, Mar 30, 2017 Overview Parameter est. Model selection Point est. MLE, M-est.,... Cross-validation

More information

This model of the conditional expectation is linear in the parameters. A more practical and relaxed attitude towards linear regression is to say that

This model of the conditional expectation is linear in the parameters. A more practical and relaxed attitude towards linear regression is to say that Linear Regression For (X, Y ) a pair of random variables with values in R p R we assume that E(Y X) = β 0 + with β R p+1. p X j β j = (1, X T )β j=1 This model of the conditional expectation is linear

More information

A Majorize-Minimize subspace approach for l 2 -l 0 regularization with applications to image processing

A Majorize-Minimize subspace approach for l 2 -l 0 regularization with applications to image processing A Majorize-Minimize subspace approach for l 2 -l 0 regularization with applications to image processing Emilie Chouzenoux emilie.chouzenoux@univ-mlv.fr Université Paris-Est Lab. d Informatique Gaspard

More information

Inverse Problems meets Statistical Learning

Inverse Problems meets Statistical Learning Inverse Problems meets Statistical Learning Gabriel Peyré www.numerical-tours.com ÉCOLE NORMALE SUPÉRIEURE s=3 s=6 0.5 0.5 0 0 0.5 0.5 https://mathematical-coffees.github.io 1 10 20 30 40 50 Organized

More information

Lecture 4: Types of errors. Bayesian regression models. Logistic regression

Lecture 4: Types of errors. Bayesian regression models. Logistic regression Lecture 4: Types of errors. Bayesian regression models. Logistic regression A Bayesian interpretation of regularization Bayesian vs maximum likelihood fitting more generally COMP-652 and ECSE-68, Lecture

More information

I P IANO : I NERTIAL P ROXIMAL A LGORITHM FOR N ON -C ONVEX O PTIMIZATION

I P IANO : I NERTIAL P ROXIMAL A LGORITHM FOR N ON -C ONVEX O PTIMIZATION I P IANO : I NERTIAL P ROXIMAL A LGORITHM FOR N ON -C ONVEX O PTIMIZATION Peter Ochs University of Freiburg Germany 17.01.2017 joint work with: Thomas Brox and Thomas Pock c 2017 Peter Ochs ipiano c 1

More information

Adaptive one-bit matrix completion

Adaptive one-bit matrix completion Adaptive one-bit matrix completion Joseph Salmon Télécom Paristech, Institut Mines-Télécom Joint work with Jean Lafond (Télécom Paristech) Olga Klopp (Crest / MODAL X, Université Paris Ouest) Éric Moulines

More information

dans les modèles à vraisemblance non explicite par des algorithmes gradient-proximaux perturbés

dans les modèles à vraisemblance non explicite par des algorithmes gradient-proximaux perturbés Inférence pénalisée dans les modèles à vraisemblance non explicite par des algorithmes gradient-proximaux perturbés Gersende Fort Institut de Mathématiques de Toulouse, CNRS and Univ. Paul Sabatier Toulouse,

More information

ISyE 691 Data mining and analytics

ISyE 691 Data mining and analytics ISyE 691 Data mining and analytics Regression Instructor: Prof. Kaibo Liu Department of Industrial and Systems Engineering UW-Madison Email: kliu8@wisc.edu Office: Room 3017 (Mechanical Engineering Building)

More information

Graph Signal Processing for Image Compression & Restoration (Part II)

Graph Signal Processing for Image Compression & Restoration (Part II) ene Cheung, Xianming Liu National Institute of Informatics 11 th July, 016 raph Signal Processing for Image Compression & Restoration (Part II). ICME'16 utorial 07/11/016 1 Outline (Part II) Image Restoration

More information

Inverse problem and optimization

Inverse problem and optimization Inverse problem and optimization Laurent Condat, Nelly Pustelnik CNRS, Gipsa-lab CNRS, Laboratoire de Physique de l ENS de Lyon Decembre, 15th 2016 Inverse problem and optimization 2/36 Plan 1. Examples

More information

Master 2 MathBigData. 3 novembre CMAP - Ecole Polytechnique

Master 2 MathBigData. 3 novembre CMAP - Ecole Polytechnique Master 2 MathBigData S. Gaïffas 1 3 novembre 2014 1 CMAP - Ecole Polytechnique 1 Supervised learning recap Introduction Loss functions, linearity 2 Penalization Introduction Ridge Sparsity Lasso 3 Some

More information

Beyond stochastic gradient descent for large-scale machine learning

Beyond stochastic gradient descent for large-scale machine learning Beyond stochastic gradient descent for large-scale machine learning Francis Bach INRIA - Ecole Normale Supérieure, Paris, France Joint work with Eric Moulines - October 2014 Big data revolution? A new

More information

Least Squares and Linear Systems

Least Squares and Linear Systems Least Squares and Linear Systems Gabriel Peyré www.numerical-tours.com ÉCOLE NORMALE SUPÉRIEURE s=3 s=6 0.5 0.5 0 0 0.5 0.5 https://mathematical-coffees.github.io 1 10 20 30 40 50 Organized by: Mérouane

More information

NL-SAR: a unified Non-Local framework for resolution-preserving (Pol)(In)SAR denoising

NL-SAR: a unified Non-Local framework for resolution-preserving (Pol)(In)SAR denoising NL-SAR: a unified Non-Local framework for resolution-preserving (Pol)(In)SAR denoising Charles-Alban Deledalle, Loïc Denis, Florence Tupin, Andreas Reigber, Marc Jäger To cite this version: Charles-Alban

More information

Medical Image Analysis

Medical Image Analysis Medical Image Analysis CS 593 / 791 Computer Science and Electrical Engineering Dept. West Virginia University 23rd January 2006 Outline 1 Recap 2 Edge Enhancement 3 Experimental Results 4 The rest of

More information

Image processing and nonparametric regression

Image processing and nonparametric regression Image processing and nonparametric regression Rencontres R BoRdeaux 2012 B. Thieurmel Collaborators : P.A. Cornillon, N. Hengartner, E. Matzner-Løber, B. Wolhberg 2 Juillet 2012 Rencontres R BoRdeaux 2012

More information

Generalized SURE for optimal shrinkage of singular values in low-rank matrix denoising

Generalized SURE for optimal shrinkage of singular values in low-rank matrix denoising Journal of Machine Learning Research 8 (207) -50 Submitted 5/6; Revised 0/7; Published /7 Generalized SURE for optimal shrinkage of singular values in low-rank matrix denoising Jérémie Bigot Institut de

More information

ITK Filters. Thresholding Edge Detection Gradients Second Order Derivatives Neighborhood Filters Smoothing Filters Distance Map Image Transforms

ITK Filters. Thresholding Edge Detection Gradients Second Order Derivatives Neighborhood Filters Smoothing Filters Distance Map Image Transforms ITK Filters Thresholding Edge Detection Gradients Second Order Derivatives Neighborhood Filters Smoothing Filters Distance Map Image Transforms ITCS 6010:Biomedical Imaging and Visualization 1 ITK Filters:

More information

STAT 100C: Linear models

STAT 100C: Linear models STAT 100C: Linear models Arash A. Amini June 9, 2018 1 / 56 Table of Contents Multiple linear regression Linear model setup Estimation of β Geometric interpretation Estimation of σ 2 Hat matrix Gram matrix

More information

LPA-ICI Applications in Image Processing

LPA-ICI Applications in Image Processing LPA-ICI Applications in Image Processing Denoising Deblurring Derivative estimation Edge detection Inverse halftoning Denoising Consider z (x) =y (x)+η (x), wherey is noise-free image and η is noise. assume

More information

BM3D-prGAMP: Compressive Phase Retrieval Based on BM3D Denoising

BM3D-prGAMP: Compressive Phase Retrieval Based on BM3D Denoising BM3D-prGAMP: Compressive Phase Retrieval Based on BM3D Denoising Chris Metzler, Richard Baraniuk Rice University Arian Maleki Columbia University Phase Retrieval Applications: Crystallography Microscopy

More information

A Dual Sparse Decomposition Method for Image Denoising

A Dual Sparse Decomposition Method for Image Denoising A Dual Sparse Decomposition Method for Image Denoising arxiv:1704.07063v1 [cs.cv] 24 Apr 2017 Hong Sun 1 School of Electronic Information Wuhan University 430072 Wuhan, China 2 Dept. Signal and Image Processing

More information

Cross-Validation with Confidence

Cross-Validation with Confidence Cross-Validation with Confidence Jing Lei Department of Statistics, Carnegie Mellon University WHOA-PSI Workshop, St Louis, 2017 Quotes from Day 1 and Day 2 Good model or pure model? Occam s razor We really

More information

Covariance function estimation in Gaussian process regression

Covariance function estimation in Gaussian process regression Covariance function estimation in Gaussian process regression François Bachoc Department of Statistics and Operations Research, University of Vienna WU Research Seminar - May 2015 François Bachoc Gaussian

More information

Image processing and Computer Vision

Image processing and Computer Vision 1 / 1 Image processing and Computer Vision Continuous Optimization and applications to image processing Martin de La Gorce martin.de-la-gorce@enpc.fr February 2015 Optimization 2 / 1 We have a function

More information

Empirical Risk Minimization as Parameter Choice Rule for General Linear Regularization Methods

Empirical Risk Minimization as Parameter Choice Rule for General Linear Regularization Methods Empirical Risk Minimization as Parameter Choice Rule for General Linear Regularization Methods Frank Werner 1 Statistical Inverse Problems in Biophysics Group Max Planck Institute for Biophysical Chemistry,

More information

Generalized greedy algorithms.

Generalized greedy algorithms. Generalized greedy algorithms. François-Xavier Dupé & Sandrine Anthoine LIF & I2M Aix-Marseille Université - CNRS - Ecole Centrale Marseille, Marseille ANR Greta Séminaire Parisien des Mathématiques Appliquées

More information

Advanced Statistics I : Gaussian Linear Model (and beyond)

Advanced Statistics I : Gaussian Linear Model (and beyond) Advanced Statistics I : Gaussian Linear Model (and beyond) Aurélien Garivier CNRS / Telecom ParisTech Centrale Outline One and Two-Sample Statistics Linear Gaussian Model Model Reduction and model Selection

More information

Statistics: Learning models from data

Statistics: Learning models from data DS-GA 1002 Lecture notes 5 October 19, 2015 Statistics: Learning models from data Learning models from data that are assumed to be generated probabilistically from a certain unknown distribution is a crucial

More information

2 Regularized Image Reconstruction for Compressive Imaging and Beyond

2 Regularized Image Reconstruction for Compressive Imaging and Beyond EE 367 / CS 448I Computational Imaging and Display Notes: Compressive Imaging and Regularized Image Reconstruction (lecture ) Gordon Wetzstein gordon.wetzstein@stanford.edu This document serves as a supplement

More information

arxiv: v2 [math.st] 9 Feb 2017

arxiv: v2 [math.st] 9 Feb 2017 Submitted to the Annals of Statistics PREDICTION ERROR AFTER MODEL SEARCH By Xiaoying Tian Harris, Department of Statistics, Stanford University arxiv:1610.06107v math.st 9 Feb 017 Estimation of the prediction

More information

ECONOMETRIC METHODS II: TIME SERIES LECTURE NOTES ON THE KALMAN FILTER. The Kalman Filter. We will be concerned with state space systems of the form

ECONOMETRIC METHODS II: TIME SERIES LECTURE NOTES ON THE KALMAN FILTER. The Kalman Filter. We will be concerned with state space systems of the form ECONOMETRIC METHODS II: TIME SERIES LECTURE NOTES ON THE KALMAN FILTER KRISTOFFER P. NIMARK The Kalman Filter We will be concerned with state space systems of the form X t = A t X t 1 + C t u t 0.1 Z t

More information

Algebra of Random Variables: Optimal Average and Optimal Scaling Minimising

Algebra of Random Variables: Optimal Average and Optimal Scaling Minimising Review: Optimal Average/Scaling is equivalent to Minimise χ Two 1-parameter models: Estimating < > : Scaling a pattern: Two equivalent methods: Algebra of Random Variables: Optimal Average and Optimal

More information

Model Selection with Partly Smooth Functions

Model Selection with Partly Smooth Functions Model Selection with Partly Smooth Functions Samuel Vaiter, Gabriel Peyré and Jalal Fadili vaiter@ceremade.dauphine.fr August 27, 2014 ITWIST 14 Model Consistency of Partly Smooth Regularizers, arxiv:1405.1004,

More information

Nonlinear Diffusion. 1 Introduction: Motivation for non-standard diffusion

Nonlinear Diffusion. 1 Introduction: Motivation for non-standard diffusion Nonlinear Diffusion These notes summarize the way I present this material, for my benefit. But everything in here is said in more detail, and better, in Weickert s paper. 1 Introduction: Motivation for

More information

Algebra of Random Variables: Optimal Average and Optimal Scaling Minimising

Algebra of Random Variables: Optimal Average and Optimal Scaling Minimising Review: Optimal Average/Scaling is equivalent to Minimise χ Two 1-parameter models: Estimating < > : Scaling a pattern: Two equivalent methods: Algebra of Random Variables: Optimal Average and Optimal

More information

Degrees of Freedom in Regression Ensembles

Degrees of Freedom in Regression Ensembles Degrees of Freedom in Regression Ensembles Henry WJ Reeve Gavin Brown University of Manchester - School of Computer Science Kilburn Building, University of Manchester, Oxford Rd, Manchester M13 9PL Abstract.

More information

Stochastic gradient descent and robustness to ill-conditioning

Stochastic gradient descent and robustness to ill-conditioning Stochastic gradient descent and robustness to ill-conditioning Francis Bach INRIA - Ecole Normale Supérieure, Paris, France ÉCOLE NORMALE SUPÉRIEURE Joint work with Aymeric Dieuleveut, Nicolas Flammarion,

More information

Lecture 7: Edge Detection

Lecture 7: Edge Detection #1 Lecture 7: Edge Detection Saad J Bedros sbedros@umn.edu Review From Last Lecture Definition of an Edge First Order Derivative Approximation as Edge Detector #2 This Lecture Examples of Edge Detection

More information

MA 575 Linear Models: Cedric E. Ginestet, Boston University Regularization: Ridge Regression and Lasso Week 14, Lecture 2

MA 575 Linear Models: Cedric E. Ginestet, Boston University Regularization: Ridge Regression and Lasso Week 14, Lecture 2 MA 575 Linear Models: Cedric E. Ginestet, Boston University Regularization: Ridge Regression and Lasso Week 14, Lecture 2 1 Ridge Regression Ridge regression and the Lasso are two forms of regularized

More information

Statistically-Based Regularization Parameter Estimation for Large Scale Problems

Statistically-Based Regularization Parameter Estimation for Large Scale Problems Statistically-Based Regularization Parameter Estimation for Large Scale Problems Rosemary Renaut Joint work with Jodi Mead and Iveta Hnetynkova March 1, 2010 National Science Foundation: Division of Computational

More information

Erkut Erdem. Hacettepe University February 24 th, Linear Diffusion 1. 2 Appendix - The Calculus of Variations 5.

Erkut Erdem. Hacettepe University February 24 th, Linear Diffusion 1. 2 Appendix - The Calculus of Variations 5. LINEAR DIFFUSION Erkut Erdem Hacettepe University February 24 th, 2012 CONTENTS 1 Linear Diffusion 1 2 Appendix - The Calculus of Variations 5 References 6 1 LINEAR DIFFUSION The linear diffusion (heat)

More information

Inverse problems in statistics

Inverse problems in statistics Inverse problems in statistics Laurent Cavalier (Université Aix-Marseille 1, France) Yale, May 2 2011 p. 1/35 Introduction There exist many fields where inverse problems appear Astronomy (Hubble satellite).

More information

A NO-REFERENCE SHARPNESS METRIC SENSITIVE TO BLUR AND NOISE. Xiang Zhu and Peyman Milanfar

A NO-REFERENCE SHARPNESS METRIC SENSITIVE TO BLUR AND NOISE. Xiang Zhu and Peyman Milanfar A NO-REFERENCE SARPNESS METRIC SENSITIVE TO BLUR AND NOISE Xiang Zhu and Peyman Milanfar Electrical Engineering Department University of California at Santa Cruz, CA, 9564 xzhu@soeucscedu ABSTRACT A no-reference

More information

Vector spaces. DS-GA 1013 / MATH-GA 2824 Optimization-based Data Analysis.

Vector spaces. DS-GA 1013 / MATH-GA 2824 Optimization-based Data Analysis. Vector spaces DS-GA 1013 / MATH-GA 2824 Optimization-based Data Analysis http://www.cims.nyu.edu/~cfgranda/pages/obda_fall17/index.html Carlos Fernandez-Granda Vector space Consists of: A set V A scalar

More information

Bits of Machine Learning Part 1: Supervised Learning

Bits of Machine Learning Part 1: Supervised Learning Bits of Machine Learning Part 1: Supervised Learning Alexandre Proutiere and Vahan Petrosyan KTH (The Royal Institute of Technology) Outline of the Course 1. Supervised Learning Regression and Classification

More information

Design of Image Adaptive Wavelets for Denoising Applications

Design of Image Adaptive Wavelets for Denoising Applications Design of Image Adaptive Wavelets for Denoising Applications Sanjeev Pragada and Jayanthi Sivaswamy Center for Visual Information Technology International Institute of Information Technology - Hyderabad,

More information

13. Parameter Estimation. ECE 830, Spring 2014

13. Parameter Estimation. ECE 830, Spring 2014 13. Parameter Estimation ECE 830, Spring 2014 1 / 18 Primary Goal General problem statement: We observe X p(x θ), θ Θ and the goal is to determine the θ that produced X. Given a collection of observations

More information

A Generative Perspective on MRFs in Low-Level Vision Supplemental Material

A Generative Perspective on MRFs in Low-Level Vision Supplemental Material A Generative Perspective on MRFs in Low-Level Vision Supplemental Material Uwe Schmidt Qi Gao Stefan Roth Department of Computer Science, TU Darmstadt 1. Derivations 1.1. Sampling the Prior We first rewrite

More information

Direct Learning: Linear Regression. Donglin Zeng, Department of Biostatistics, University of North Carolina

Direct Learning: Linear Regression. Donglin Zeng, Department of Biostatistics, University of North Carolina Direct Learning: Linear Regression Parametric learning We consider the core function in the prediction rule to be a parametric function. The most commonly used function is a linear function: squared loss:

More information

Evaluation. Andrea Passerini Machine Learning. Evaluation

Evaluation. Andrea Passerini Machine Learning. Evaluation Andrea Passerini passerini@disi.unitn.it Machine Learning Basic concepts requires to define performance measures to be optimized Performance of learning algorithms cannot be evaluated on entire domain

More information

Generalized Concomitant Multi-Task Lasso for sparse multimodal regression

Generalized Concomitant Multi-Task Lasso for sparse multimodal regression Generalized Concomitant Multi-Task Lasso for sparse multimodal regression Mathurin Massias https://mathurinm.github.io INRIA Saclay Joint work with: Olivier Fercoq (Télécom ParisTech) Alexandre Gramfort

More information

Math 273a: Optimization Overview of First-Order Optimization Algorithms

Math 273a: Optimization Overview of First-Order Optimization Algorithms Math 273a: Optimization Overview of First-Order Optimization Algorithms Wotao Yin Department of Mathematics, UCLA online discussions on piazza.com 1 / 9 Typical flow of numerical optimization Optimization

More information

Least squares problems

Least squares problems Least squares problems How to state and solve them, then evaluate their solutions Stéphane Mottelet Université de Technologie de Compiègne 30 septembre 2016 Stéphane Mottelet (UTC) Least squares 1 / 55

More information

ECE 8201: Low-dimensional Signal Models for High-dimensional Data Analysis

ECE 8201: Low-dimensional Signal Models for High-dimensional Data Analysis ECE 8201: Low-dimensional Signal Models for High-dimensional Data Analysis Lecture 7: Matrix completion Yuejie Chi The Ohio State University Page 1 Reference Guaranteed Minimum-Rank Solutions of Linear

More information

2 Statistical Estimation: Basic Concepts

2 Statistical Estimation: Basic Concepts Technion Israel Institute of Technology, Department of Electrical Engineering Estimation and Identification in Dynamical Systems (048825) Lecture Notes, Fall 2009, Prof. N. Shimkin 2 Statistical Estimation:

More information

Lecture 7 Introduction to Statistical Decision Theory

Lecture 7 Introduction to Statistical Decision Theory Lecture 7 Introduction to Statistical Decision Theory I-Hsiang Wang Department of Electrical Engineering National Taiwan University ihwang@ntu.edu.tw December 20, 2016 1 / 55 I-Hsiang Wang IT Lecture 7

More information

Review. DS GA 1002 Statistical and Mathematical Models. Carlos Fernandez-Granda

Review. DS GA 1002 Statistical and Mathematical Models.   Carlos Fernandez-Granda Review DS GA 1002 Statistical and Mathematical Models http://www.cims.nyu.edu/~cfgranda/pages/dsga1002_fall16 Carlos Fernandez-Granda Probability and statistics Probability: Framework for dealing with

More information

EE 367 / CS 448I Computational Imaging and Display Notes: Noise, Denoising, and Image Reconstruction with Noise (lecture 10)

EE 367 / CS 448I Computational Imaging and Display Notes: Noise, Denoising, and Image Reconstruction with Noise (lecture 10) EE 367 / CS 448I Computational Imaging and Display Notes: Noise, Denoising, and Image Reconstruction with Noise (lecture 0) Gordon Wetzstein gordon.wetzstein@stanford.edu This document serves as a supplement

More information

A memory gradient algorithm for l 2 -l 0 regularization with applications to image restoration

A memory gradient algorithm for l 2 -l 0 regularization with applications to image restoration A memory gradient algorithm for l 2 -l 0 regularization with applications to image restoration E. Chouzenoux, A. Jezierska, J.-C. Pesquet and H. Talbot Université Paris-Est Lab. d Informatique Gaspard

More information

Batch, Stochastic and Mirror Gradient Descents

Batch, Stochastic and Mirror Gradient Descents Batch, Stochastic and Mirror Gradient Descents Gabriel Peyré www.numerical-tours.com ÉCOLE NORMALE SUPÉRIEURE s=3 s=6 0.5 0.5 0 0 0.5 0.5 https://mathematical-coffees.github.io 1 10 20 30 40 50 Organized

More information

Regression Shrinkage and Selection via the Lasso

Regression Shrinkage and Selection via the Lasso Regression Shrinkage and Selection via the Lasso ROBERT TIBSHIRANI, 1996 Presenter: Guiyun Feng April 27 () 1 / 20 Motivation Estimation in Linear Models: y = β T x + ɛ. data (x i, y i ), i = 1, 2,...,

More information

Fundamentals of Non-local Total Variation Spectral Theory

Fundamentals of Non-local Total Variation Spectral Theory Fundamentals of Non-local Total Variation Spectral Theory Jean-François Aujol 1,2, Guy Gilboa 3, Nicolas Papadakis 1,2 1 Univ. Bordeaux, IMB, UMR 5251, F-33400 Talence, France 2 CNRS, IMB, UMR 5251, F-33400

More information

Estimating network degree distributions from sampled networks: An inverse problem

Estimating network degree distributions from sampled networks: An inverse problem Estimating network degree distributions from sampled networks: An inverse problem Eric D. Kolaczyk Dept of Mathematics and Statistics, Boston University kolaczyk@bu.edu Introduction: Networks and Degree

More information

Covariance Matrix Simplification For Efficient Uncertainty Management

Covariance Matrix Simplification For Efficient Uncertainty Management PASEO MaxEnt 2007 Covariance Matrix Simplification For Efficient Uncertainty Management André Jalobeanu, Jorge A. Gutiérrez PASEO Research Group LSIIT (CNRS/ Univ. Strasbourg) - Illkirch, France *part

More information

Inverse of a Square Matrix. For an N N square matrix A, the inverse of A, 1

Inverse of a Square Matrix. For an N N square matrix A, the inverse of A, 1 Inverse of a Square Matrix For an N N square matrix A, the inverse of A, 1 A, exists if and only if A is of full rank, i.e., if and only if no column of A is a linear combination 1 of the others. A is

More information

Statistical Measures of Uncertainty in Inverse Problems

Statistical Measures of Uncertainty in Inverse Problems Statistical Measures of Uncertainty in Inverse Problems Workshop on Uncertainty in Inverse Problems Institute for Mathematics and Its Applications Minneapolis, MN 19-26 April 2002 P.B. Stark Department

More information

Physics-based Prior modeling in Inverse Problems

Physics-based Prior modeling in Inverse Problems Physics-based Prior modeling in Inverse Problems MURI Meeting 2013 M Usman Sadiq, Purdue University Charles A. Bouman, Purdue University In collaboration with: Jeff Simmons, AFRL Venkat Venkatakrishnan,

More information

Lecture 8: Information Theory and Statistics

Lecture 8: Information Theory and Statistics Lecture 8: Information Theory and Statistics Part II: Hypothesis Testing and I-Hsiang Wang Department of Electrical Engineering National Taiwan University ihwang@ntu.edu.tw December 23, 2015 1 / 50 I-Hsiang

More information

Final Review. Yang Feng. Yang Feng (Columbia University) Final Review 1 / 58

Final Review. Yang Feng.   Yang Feng (Columbia University) Final Review 1 / 58 Final Review Yang Feng http://www.stat.columbia.edu/~yangfeng Yang Feng (Columbia University) Final Review 1 / 58 Outline 1 Multiple Linear Regression (Estimation, Inference) 2 Special Topics for Multiple

More information

Sparse Nonparametric Density Estimation in High Dimensions Using the Rodeo

Sparse Nonparametric Density Estimation in High Dimensions Using the Rodeo Outline in High Dimensions Using the Rodeo Han Liu 1,2 John Lafferty 2,3 Larry Wasserman 1,2 1 Statistics Department, 2 Machine Learning Department, 3 Computer Science Department, Carnegie Mellon University

More information

COMP 551 Applied Machine Learning Lecture 20: Gaussian processes

COMP 551 Applied Machine Learning Lecture 20: Gaussian processes COMP 55 Applied Machine Learning Lecture 2: Gaussian processes Instructor: Ryan Lowe (ryan.lowe@cs.mcgill.ca) Slides mostly by: (herke.vanhoof@mcgill.ca) Class web page: www.cs.mcgill.ca/~hvanho2/comp55

More information

PILCO: A Model-Based and Data-Efficient Approach to Policy Search

PILCO: A Model-Based and Data-Efficient Approach to Policy Search PILCO: A Model-Based and Data-Efficient Approach to Policy Search (M.P. Deisenroth and C.E. Rasmussen) CSC2541 November 4, 2016 PILCO Graphical Model PILCO Probabilistic Inference for Learning COntrol

More information

1-bit Matrix Completion. PAC-Bayes and Variational Approximation

1-bit Matrix Completion. PAC-Bayes and Variational Approximation : PAC-Bayes and Variational Approximation (with P. Alquier) PhD Supervisor: N. Chopin Bayes In Paris, 5 January 2017 (Happy New Year!) Various Topics covered Matrix Completion PAC-Bayesian Estimation Variational

More information

A Modern Look at Classical Multivariate Techniques

A Modern Look at Classical Multivariate Techniques A Modern Look at Classical Multivariate Techniques Yoonkyung Lee Department of Statistics The Ohio State University March 16-20, 2015 The 13th School of Probability and Statistics CIMAT, Guanajuato, Mexico

More information

arxiv: v3 [stat.me] 12 Jul 2015

arxiv: v3 [stat.me] 12 Jul 2015 Derivative-Free Estimation of the Score Vector and Observed Information Matrix with Application to State-Space Models Arnaud Doucet 1, Pierre E. Jacob and Sylvain Rubenthaler 3 1 Department of Statistics,

More information

NONLINEAR DIFFUSION PDES

NONLINEAR DIFFUSION PDES NONLINEAR DIFFUSION PDES Erkut Erdem Hacettepe University March 5 th, 0 CONTENTS Perona-Malik Type Nonlinear Diffusion Edge Enhancing Diffusion 5 References 7 PERONA-MALIK TYPE NONLINEAR DIFFUSION The

More information

Semi-Parametric Importance Sampling for Rare-event probability Estimation

Semi-Parametric Importance Sampling for Rare-event probability Estimation Semi-Parametric Importance Sampling for Rare-event probability Estimation Z. I. Botev and P. L Ecuyer IMACS Seminar 2011 Borovets, Bulgaria Semi-Parametric Importance Sampling for Rare-event probability

More information

Beyond stochastic gradient descent for large-scale machine learning

Beyond stochastic gradient descent for large-scale machine learning Beyond stochastic gradient descent for large-scale machine learning Francis Bach INRIA - Ecole Normale Supérieure, Paris, France Joint work with Eric Moulines, Nicolas Le Roux and Mark Schmidt - CAP, July

More information

Math 423/533: The Main Theoretical Topics

Math 423/533: The Main Theoretical Topics Math 423/533: The Main Theoretical Topics Notation sample size n, data index i number of predictors, p (p = 2 for simple linear regression) y i : response for individual i x i = (x i1,..., x ip ) (1 p)

More information

Lecture 24 May 30, 2018

Lecture 24 May 30, 2018 Stats 3C: Theory of Statistics Spring 28 Lecture 24 May 3, 28 Prof. Emmanuel Candes Scribe: Martin J. Zhang, Jun Yan, Can Wang, and E. Candes Outline Agenda: High-dimensional Statistical Estimation. Lasso

More information

Sparse Regularization via Convex Analysis

Sparse Regularization via Convex Analysis Sparse Regularization via Convex Analysis Ivan Selesnick Electrical and Computer Engineering Tandon School of Engineering New York University Brooklyn, New York, USA 29 / 66 Convex or non-convex: Which

More information