Image processing and Computer Vision

Size: px

Start display at page:

Download "Image processing and Computer Vision"

Lillian Cannon
5 years ago
Views:

1 1 / 1 Image processing and Computer Vision Continuous Optimization and applications to image processing Martin de La Gorce martin.de-la-gorce@enpc.fr February 2015

2 Optimization 2 / 1 We have a function f from R n to R and we look for the global minimum i.e. x R n such that x R n : f (x) f (x ) When n = 2, we can visualize the function as a surface. We look for the lowest point of that surface

3 First order approximations in 1D 3 / 1 A functionf (x) that is differentiable around a point a can be approximated by a linear function around that point more formally f (x) f (a) + f (a)(x a) (1) with f (x) = f (a) + f (a)(x a) + h 1 (x)(x a)) (2) lim x a h 1(x) = 0 (3) Figure: e x 1 + x around x=0

4 Second order approximations in 1D 4 / 1 A functionf (x) that is twice differentiable around a point a can be approximated by a quadratic function around that point: f (x) = f (a)+f (a)(x a)+ f (a) 2 (x a)2 +h 2 (x)(x a) 2 ) (4) with lim h 2(x) = 0 (5) x a Figure: e x 1 + x + x 2 /2 around x=0

5 Taylor s theorem 5 / 1 A functionf (x) that is n times differentiable at a point a can be approximated by a polynome around that point using: f (x) = n k=0 f (k) (a) (x a) k + h n (x)(x a) n (6) k! with lim h n(x) = 0 (7) x a

6 Gradient 6 / 1 If f is differentiable in x, then the gradient of f in x is defined as the vector f (x) = [ df dx 1 (x),..., df dx n (x)] with df = lim h 0 f (x) f (x + he i) dx i h and e 1,..., e n the canonic base of R n (e k have all values equal to zeros but the k t h that is equal to 0 but the k th equal to 1 f (x) points towards the steepest slope direction of f at location x

7 First order approximation in ND 7 / 1 If f is differentiable in a,f can be approximated by an affine function around a: f (x) f (a)+ < f (a), (x a) > (cos(x) + cos(y) x)/2 1 + x/2 around 0

8 Hessian matrix 8 / 1 If a function f mapping from R n into R is twice differentiable in x, then the Hessian matrix of f in x is defined as the matrix 2 f x 1 x 1 2 f x 1 x n H f (x) =..... (8) 2 f x n x 1 2 f x n x n with 2 f x i x j = x i ( f x j ) (9)

9 9 / 1 If f is twice differentiable in a,f can be approximated by an quadratic function around a using the gradient and the hessian matrix: f (x) f (a)+ < f (a), (x a) > +(x a) t H f (a)(x a) (cos(x) + cos(y) x)/2 1 + x/2 x 2 /2 y 2 /2 around 0

all x that is a local minimum of the function we have f (x) = 0 n with 0 n the nul vector of size n Warning: f

10 Minimum 10 / 1 We say that x is a local minimum if there exist a radius r such that for all x such that x x r then f (x) f (x ) When we do not have any constraint and the function is differentiable everywhere then for all x that is a local minimum of the function we have f (x) = 0 n with 0 n the nul vector of size n Warning: f (x) = 0 does not implie that x is a local minimum it can also be a sell point or a local maximum minimum point col

11 Gradient descent 11 / 1 suppose f (x) continus differentiable everywhere For a small displacement d around a point x we have f (x + d) f (x)+ < f (x t ), d > The function f decreases the fastest in the direction f (x t ), indeed: argmin d, d <=1 (< f (x t ), d >) = f (x t )/ f (x t ) A minimization strategy called gradient descent consists in following iteratively this direction: x t+1 = x t τ f (x t ) with τ a fixed parameter small enough to guaranty the decrease of f at each step

12 Gradient descent 12 / 1 Rather than keeping τ fixed, it is possible to do a one dimensional search in the direction opposite to the gradient at each iteration : τ n = agmin λ>0 (f (x t λ f (x t ))) Rather than spend time looking for the optimal step length τ n at each iteration we can look for a τ n that gives a sufficient decrease of f The gradient descent can be very slow to converge if the function has a deep narrow valley shape

13 Majoration-Minimization 13 / 1 Instead of minimizing directly f (x), the Majoration-Minimization approach consists in solving a sequence of easier minimization problems x k+1 = argmin x g k (x) The MM method requires that Each function g k (x) is majoring f i.e. x : g k (x) f (x) g k and f touch each other at x k i.e g k (x k ) = f (x k )

14 Linear least squares Linear least squares: f (x) = Ax b 2 = N (A[i, :]x b[i]) 2 i=1 Ax b 2 is quadratic and we have A(x + h) b 2 = Ax b 2 + f (x)h ht H(x)h with H and f (x) respectively the hessian and the gradient of f en x. A(x + h) b 2 = Ah + (Ax b) 2 = (Ah + (Ax b)) T (Ah + (Ax b)) =... = Ax b 2 + 2(Ax b) T Ah + h t A T Ah (10) By identification we get f (x) = 2(Ax b) T A 14 / 1

15 Linear least squares 15 / 1 The gradient writes f (x) = 2(Ax b) T A The gradient is nul at the minimum of f : f (x) = 0 N A t Ax = A t b The matrix A t A is a symetric square matrix. If it is inversible then the equation has a unique solution and the problem has a single minima at location x = (A t A) 1 A t b

16 Regularized least squares 16 / 1 if A t A is not invertible, the linear system has an infinity of solutions and f (x) has an infinity of minimas. A solution consists in adding a small regularization term, refered as tikhonov regularization: with λ > 0. The gradient writes gradient nul: f (x) = Ax b 2 + λ x 2 f (x) = 2(A t (Ax b) + λx) f (x) = 0 (A t A + λi d )x = A t b with I d the identity matrix. Forλ > 0 we have (A t A + λi d ) invertible and the solution writes x = (A t A + λi d ) 1 A t b

17 Least squares We can look for the minimum of the sum of least squares N f (x) = A i x b i 2 i=1 The gradient writes a the sum of the gradients: N N N f (x) = 2 A t i (A ix b i ) = 2 A t i A ix A t i b i the solution of f (x) = 0 writes i=1 i=1 i=1 ( N ) 1 x = A t i A N i A t i b i i=1 Note : we ] can also rewrite f (x) as A x b 2 with A the matrix A = [ A1. A N i=1 and b the concatenation of the vectors b i 17 / 1

18 Denoising 18 / 1 Suppose that we have an image I b corresponding to an image I to which we added a gaussian noise on each pixel. A simple method to estimate I from I b consists in minimizing f (u) = u(x, y) I b (x, y) 2 dxdy+λ u(x, y) 2 dxdy x y x y The first term minimise the difference between the denoised image and the noisy image. It is called the data term The second term is called regularization term and favor smooth reconstructed images. The λ allows to control the strength of the smoothing. The more you have noise the bigger should λ be

19 2D denoising 19 / 1 Examples of denoised images with various λ I b λ = 0.2 λ = 0.8 λ = 5 We observe that the boundaries are blured If we synthetize I b from a known image I, it is possible to compute the SNR 1 for various λ. SNR lambda 1 Signal to Noise ratio = ij I(i, j)2 / ij (I(i, j) I denoised(i, j)) 2

20 1D denoising 20 / 1 In the discrete space and in 1D we can rewrite the first term as U I 2 and the second term as n 1 (u(x + 1) u(x)) 2 = DU 2 i=0 with D the following Toeplitz matrix of size n 1 n : D = We minimize U I b 2 + λ DU 2 using the solution for a sum of least sqaures seen before and we get: Ũ = (λdt D + I d ) 1 I b with I d the identity matrix.

21 1D denoising 21 / 1 We can do the same things but now wrapping the image periodically using the modulo % operator: n (u((x + 1)%n) u(x)) 2 = D c U 2 i=0 with D c the circulant matrix (We can interpret the product by this matrix as the 1D convolution by [1, 1, 0]): D =

1D denoising Using the solution for a least squares sum we get: U = (λd t cd c + I d ) 1 I b with I d the identity matrix The matrix M = (λd t cd c + I d ) is also a circulant matrix and its inverse

22 1D denoising Using the solution for a least squares sum we get: U = (λd t cd c + I d ) 1 I b with I d the identity matrix The matrix M = (λd t cd c + I d ) is also a circulant matrix and its inverse M 1 is also a circulant matrix (property of circulant matrices) we can interpret the multiplication by M 1 as a 1D filter or convolution. We can visualize the impulse response of this filter by visualizing a line of that matrix M 1 pour λ = 10 et n = M 1 M 1 [:, 25] 22 / 1

23 2D denoising 23 / 1 In the 2D case we look for a matrix U with the same size as I i.e. H W. Denoting I v and U v the vectors obtained by aligning the elements of I b and V (using the row-major order), we can rewrite the data term as U v I v 2.

24 Dbruitage 2D 24 / 1 We can approximate the regularization term as follow: x y u(x, y) 2 dxdy W 1 j=0 H 1 i=1 d x (i, j) 2 + d y (i, j) 2 with d x and d y two arrays of size W 1 H 1: d x (i, j) = (u(i + 1, j) u(i, j)) d y (i, j) = (u(i, j + 1) u(i, j)) this term can be rewriten with two sparse matrices 2 D x and D y in the form D x U v 2 + D y U v 2. 2 see slide below

25 2D denoising 25 / 1 For H = 3 and W = 4, we have d x and d y of size 2 by 3 and the 6 values in row-major order can be obtained from the 12 coefficients of Uv by mutliplication by these two matrices Dx = Dy =

26 2D denoising 26 / 1 If we want that the vectors D x U v et D y U v correspond respectively the differences u(i + 1, j) u(i, j) and (u(i, j + 1) u(i, j)) un the row-major order then: D x = I H 1,H D W D y = D H I W 1,W with the kronecker product of two matrices, I kl the identity matrix of size k l and D k the bi-diagonal matrix of size k 1 k with D ii = 1 and D i,j+1 = 1 the kronecker product is defined by: a 11 B a 1n B A B =..... a m1 B a mn B

27 Dbruitage 2D 27 / 1 We want to minimize U v I v 2 + D x U v 2 + D y U v 2 Using the solution for the sum of least sqaures we have: U v = (λ(dxd t x + DyD t y ) + I d ) 1 I v with I d the identity matrix.

28 Weighted least squares 28 / 1 We can weight least squares f (x) = N w i (A[i, :]x b[i]) 2 i=1 f (x) = 2(A t WAx A t Wb) with W the diagonal matrix with W ii = w i We have: f (x) = 0 A t WAx = A t Wb if A t A is invertible then the solution is unique and we have x = (A t WA) 1 A t Wb

29 Inpainting 29 / 1 Objectif : Reconstructing missing parts of an image Applications: photography and movies

30 Inpainting 30 / 1 We denote Ω the region where we know the image A simple inpainting method consists in minimizing f (u) = u(x, y) 2 dxdy x y with the constraint u(x, y) = I(x, y) for the observed points (x, y) Ω. To make it easier, we reuse the denoising formulation without constraints with weights α(x, y) for each point of the image. α(x, y) is big relativelemy to λ for (x, y) Ω and nul for (x, y) / Ω f (u) = α(x, y) u(x, y) I(x, y) 2 dxdy x y +λ u(x, y) 2 dxdy x y

31 Inpainting In discrete setting we have : f (U) = λ ( ) D x U v 2 + D y U v 2 + α ij U(i, j)] I(i, j) 2 Let A be the diagonal matrix of size WH WH whose elements correspond to α ij in the row-major order. Using the weighted least square solution we get: ij Ũ v = (λ(d T x D x + D t yd y ) + A) 1 AI v 31 / 1

32 Inpainting 32 / 1 Inpainting domains: Results:

33 Reweighted least squares 33 / 1 We sometimes we want to use another function than x 2 f (x) = i h(a[i, :]x b[i]) with h(u) increasing not as fast as u 2 such that the point with important errors are less penalized than if we were using u 2.

34 Reweighted least squares 34 / 1 We consider functions h that are symetric, increasing such that h( x) is concave exemple of such functions h : { Huber smoothed absolute value u h(u) 2 si u τ ɛ τ( u τ/2) sinon 2 + u tau=1/ h(u) 2.0 epsilon=1/

35 Reweighted least squares For a function h in the set of previously considered functions, there exists an associated function g 3 (called conjugate function ) such that it is possible to rewrite h under this form: ( ) γu 2 h(u) = min γ>0 2 + g(γ) example with the Huber function 1.0 tau=1/ defined next slide 35 / 1

36 Reweighted least squares 36 / 1 The conjugate function g is defined by g(γ) = max u (h(u) γu 2 /2) example with the Huber function tau=1/2 h(u) γu 2 /2) γu 2 /2 + g(γ) g(γ) The maximum with respect to u is obtained by looking for u > 0 such that h (u) γu = 0

37 Reweighted least squares 37 / 1 We define the influence function ψ(u) by ) ψ(u) = argmin γ (γu 2 /2 + g(γ) ψ(u) is the curvature of the quadratic function that touches h at location u γx 2 /2 + g(γ) tangent to h(x) in u d(γx 2 /2+g(γ) h(x)) dx (u) = 0 γu = h (u) γ = h (u)/u We have ψ(u) = h (u)/u From h( (u)) increasing and concave we can show that ψ(u) is a decreasing function

38 Reweighted least squares 38 / 1 we have: { Huber u h(u) 2 /2 si u τ τ( u { τ/2) otherwise 1 si u τ ψ(u) τ/ u otherwise tau=1/ smoothed absolute value ɛ 2 + u / ɛ 2 + u 2 tau=1/

39 Reweighted least squares 39 / 1 ( Usingh(u) = min γ>0 γu 2 /2 + g(γ) ), we can rewrite the least squares as follow : min x f (x) = min x h(a[i, :]x b[i]) i = min x,γ1,..., γ n ( N i=1 γ i 2 (A[i, :]x b[i])2 + g(γ i ) )

40 Reweighted least squares 40 / 1 min x f (x) = min x,γ1,..., γ n ( N i=1 γ i 2 (A[i, :]x b[i])2 + We can minimize this function iteratively by minimizing alternatively with respect to Γ = (γ 1,..., γ N ) and x ) N g(γ i ) i=1 The minimization with respect to Γ is done bu solving N independant problems and we get γ i = ψ(u i ) with u i = A[i, :]x b[i] The second term do not depend on x and the minimization with respect to x can be done using a weighted least square with w i = γ i /2 i.e. x = (A t WA) 1 A t Wb

41 Majoration-Minimization Interpretation 41 / 1 La minimization par rapport Γ revient chercher pour caque rsidu u i la quadratique 1D qui touche h et u i et supprieure h partout ailleurs. En sommant ces quadratiques on obtient une majoration quadratique N dimensions de f (avec N la taille de x) La minimization par rapport x avec Γ fixe revient minimizer cette majoration

42 Robustesse 42 / 1 ψ(u) is decreasing, the weight of the points with the bigest residual is reduced when computing the least square solution. This allows to gain robustness by limiting the impact of the point with big residuals on the least square solution.

43 Denoising : The ROF model 43 / 1 A denoising method proposed by Rudin, Osher et, Fatemi (ROF) consists in minimizing f (u) = u(x, y) I(x, y) 2 dxdy + λ u(x, y) dxdy x y The use of u(x, y) instead of u(x, y) 2 in the second term allows the favor a smooth image while keeping the edges sharp. The second term is called total variation of u : TV (u) = u(x, y) dxdy x y x y

44 Variation Totale 1D 44 / 1 in 1D the total variation writes: TV (u) = u (x) dx The can be compute ad the sum of the absolute differences between successive extremums x TV (u) = = 1300

45 Variation Totale 1D 45 / 1 An increasing function going from 0 to 1 has a total variation of 1 whatever its shape is Unlike the TV a regularization of the x u (x) 2 dx will favor a smooth curve and thus will blur the edges

46 Variation Totale 2D et courbes de niveaux 46 / 1 For u a smooth 2D function and λ a real number, we define the level set L λ (u) by L λ (u) = {(x, y) u(x, y) = λ} L λ (u) is either empty or a set of closed curves In 2D, the total variation can be written a the sum of the level set lengths TV (u) = λ= length(l λ )(u)dλ

47 ROF discret 47 / 1 We can approximate the TV (u) as follows: with TV (u) = W 1 j=0 H 1 i=1 d x (i, j) 2 + d y (i, j) 2 d x (i, j) = (u(i + 1, j) u(i, j)) d y (i, j) = (u(i, j + 1) u(i, j))

48 ROF : descente de gradient 48 / 1 TV (u) is not differentiable for d x (i, j) 2 + d y (i, j) 2 = 0 We approximate x 2 + y 2 by ɛ 2 + x 2 + y 2, with ɛ small We get a function f (u) that is differentiable everywhere and we can use a gradient descent Problem : using a small ɛ forces use to use a small step in the gradient descent to get convergence

49 ROF : descente de gradient 49 / 1 We try to minimize the discrete smooth ROF cost: with W j=0 i=1 H (I(i, j) u(i, j)) 2 + W 1 j=0 H 1 i=1 ɛ 2 + d x (i, j) 2 + d y (i, j) 2 d x (i, j) = (u(i + 1, j) u(i, j)) d y (i, j) = (u(i, j + 1) u(i, j))

50 ROF : Moindres carrs repondrs We can use a reweighted least square approach that converges faster than the gradient descent : There is a function g(λ) such that ɛ 2 + x 2 + y 2 rewrite as the lower enveloppe of a set of quadratic functions: ɛ2 + x 2 + y 2 = min γ ( γ 2 (x 2 + y 2 ) + g(λ)) We definie ψ(x, y) = argmin γ ( γ 2 (x 2 + y 2 ) + g(γ) ) we can show that we have ψ(x, y) = 1/ ɛ 2 + x 2 + y 2 50 / 1

51 ROF : Moindres carrs repondrs 51 / 1 We can rewrite min u f (u) as min U,Λ f (u, Γ) with f (u, Γ) = Uv I v 2 + λ W 1 j=0 H 1 i=1 γ ij 2 (d x(i, j) 2 + d y (i, j) 2 ) + g(γ ij ) With Γ the matrix of size W 1 H 1 containing the γ ij. We can minimize f (u, Γ) using and alternate minimization: We minimize with respect to Γ using γ ij = ψ(d x (i, j), d y (i, j)) We minimise with respect to u using U v = (λ(d T x Γ d D x + D t yγ d D y ) + I d ) 1 I v with Γ d the diagonal matrix of size (W 1)(H 1) (W 1)(H 1) whose coefficient are γ ij /2 in the row-major order, D x and D y the two sparse matrices defined previously.

52 ROF discret 52 / 1 Example of denoised images for variation λ I b λ = 10 λ = 20 λ = 50 We can see that the edges are preserved SNR for various λ: SNR lambda

53 Inpainting avec TV 53 / 1 We can use the total variation to do inpainting by minimizing f (u) = u(x, y) dxdy x y with the constraint u(x, y) = I(x, y) for the observed points (x, y) Ω. To make the derivation easier, we reuse the ROF denoising formulation with weights α(x, y) for each point in the image. α(x, y) is big relatively to λ for (x, y) Ω and nul for (x, y) / Ω f (u) = α(x, y) u(x, y) I(x, y) 2 dxdy x y +λ u(x, y) dxdy x y

54 Inpainting : Reweighted least squares We can rewrite min u f (u) as min U,Λ f (u, Γ) with f (u, Γ) = α ij U(i, j)] I(i, j) 2 i +λ W 1 j=0 H 1 i=1 γ ij 2 (d x(i, j) 2 + d y (i, j) 2 ) + g(γ ij ) We can minimize f (u, Γ) using an alternated minimization: We minimize with respect to Γ using γ ij = ψ(d x (i, j), d y (i, j)) We minimize with respect to u by calculating U v = (λ(d T x Γ d D x + D t yγ d D y ) + A) 1 AI v with A the diagonal matrix diagonale of size WH WH whose elements correspond to the α ij in the row-major order, Γ d the diagonal matrix with coefficients γ ij /2, D x and D y the two sparse matrices defined previously. 54 / 1

55 Inpainting 55 / 1 inpainting domains: Least squares result : Results with TV :

56 Inpainting Zoom 56 / 1

57 some links 57 / TVDmm/TVDmm.pdf postgrad/cca/files/ipol.pdf

58 Nikolova, Mila and Ng, Michael K., Analysis of Half-Quadratic Minimization Methods for Signal and Image Recovery. SIAM J. Scientific Computing / 1

Motion Estimation (I) Ce Liu Microsoft Research New England

Motion Estimation (I) Ce Liu Microsoft Research New England Motion Estimation (I) Ce Liu celiu@microsoft.com Microsoft Research New England We live in a moving world Perceiving, understanding and predicting motion is an important part of our daily lives Motion