CN780 Final Lecture. Low-Level Vision, Scale-Space, and Polyakov Action. Neil I. Weisenfeld

CN780 Final Lecture Low-Level Vision, Scale-Space, and Polyakov Action Neil I. Weisenfeld Department of Cognitive and Neural Systems Boston University chapter 14.2-14.3 May 9, 2005 p.1/25

Introduction The term early vision refers to the processing that happens, or is thought to happen, at the beginning of the visual analysis chain close to the retinal image. At least since Hubel and Wiesel s discovery of orientation sensitive cells in V1, scientists have been obsessed with models of early vision that center around the extraction of features, especially edges, from a scene. Computational strategies focus on context-free and therefore local image-processing operations. As Sochen points out, this locality leads to differential operators. chapter 14.2-14.3 May 9, 2005 p.2/25

Laplacian Edge Detection Image on the left is the original, blurred image. Image in the middle is the log-magnitude of the Laplacian applied to the image. Image on the right is the zero-crossing of the Laplacian image. http://www.owlnet.rice.edu/ elec539/projects97/morphj chapter 14.2-14.3 May 9, 2005 p.3/25

Scale-Space: Isotropic Diffusion Some authors propose a pyramid where images are smooth at various resolutions. But this can be generalized to a continuum: I(x, y, t) = I 0 (x, y) G(x, y; t) Perona-Malik note that this is also the solution to the heat equation and therefore has a physical interpretation: isotropic diffusion: I t = I = (I xx + I yy ) chapter 14.2-14.3 May 9, 2005 p.4/25

Scale-Space: Isotropic Diffusion But isotropic diffusion has two problems: edges are not preserved and they move at different scales: chapter 14.2-14.3 May 9, 2005 p.5/25

Scale-Space: Anisotropic Diffusion Perona and Malik introduced the idea of anisotropic diffusion where the constant in the heat equation is now variable with space. I t = div(c(x, y, t) I) = c(x, y, t) I + c I if c(x, y, t) = c then we have I t = c I now choose c in a way that takes the gradient into account: c(x, y, t) = g( I(x, y, t) ) chapter 14.2-14.3 May 9, 2005 p.6/25

Scale-Space: Anisotropic vs. Isotropic Diffusion chapter 14.2-14.3 May 9, 2005 p.7/25

Sochen Unifying Theory Sochen, 1996, From High Energy Physics to Low Level Vision (Lawrence Berkeley National Labs Tech Report LBNL-39243) Images modeled as a surface embedded in a higher dimensional space (e.g. a 2D image as a height-field in R 3 ). Image processing operations that generate a scale space are modeled as a metric in the embedded space and a flow. The Polyakov action, from high-energy physics, is used to generate the flow. A number of classic operations fall into this framework and the mathematics are in place for extension to other problems. Sochen, for example, shows how to use the theory to extend some classical operations on grayscale images to RGB-color images. chapter 14.2-14.3 May 9, 2005 p.8/25

Image as a Surface 2D, single-channel image I(x, y) to R 3 as (x, y, I) 20 400 200 40 60 0 0 20 80 40 60 100 80 120 20 40 60 80 100 120 I(x, y) 100 120 140 140 80 100 120 (x, y, I) 60 40 20 0 chapter 14.2-14.3 May 9, 2005 p.9/25

Where are we going with this? We re interested in differential operations on, say, image intensity values with respect to image coordinates. The surface embedding allows us to visualize this in two ways, either in (σ 1, σ 2 ) space (with a local metric) or (x, y, I): chapter 14.2-14.3 May 9, 2005 p.10/25

Differential Geometry Intro How do we measure distances between points within our coordinate systems? Consider the map X : Σ R 3 X is explicitly [X 1 (σ 1, σ 2 ), X 2 (σ 1, σ 2 ), X 3 (σ 1, σ 2 )] Squared distance ds 2 on Σ is local coordinates adjusted by g µν (σ 1, σ 2 ): ds 2 = g µν dσ µ dσ ν Einstein summation = g 11 (dσ 1 ) 2 + 2g 12 dσ 1 dσ 2 + g 22 (dσ 2 ) 2 chapter 14.2-14.3 May 9, 2005 p.11/25

Connect Metrics Consider a more general map X : Σ M and metric g on Σ and metric h on M. If we know h and X, we can construct g: g µν (σ 1, σ 2 ) = h ij (X) µ X i ν X j note we re summing over i, j and µ X i means X i / σ µ For our image map: X : (σ 1, σ 2 ) [x = σ 1, y = σ 2, z = I(σ 1, σ 2 )] we have: ( ) 1 + Ix 2 I x I y (g µν ) = I x I y 1 + Iy 2 chapter 14.2-14.3 May 9, 2005 p.12/25

Where are we? In the grand tradition, and hopefully more successfully that Sun Microsystems campaign The Network is the Computer : note that The Map is the Image. Or, at least, it contains the image. We want to (remember) define a scale-space and flows (transitions) from one image to another. The framework calls for a measure on the maps and equations will be derived by minimizing the measure with respect to something. chapter 14.2-14.3 May 9, 2005 p.13/25

Polyakov Action: The Measure Again, we have (Σ, g) and (M, h) and our mapping X : Σ M A measure of the weight of the map is: S[X i, g µν, h ij ] = d m σ gg µν µ X i ν X j h ij (X) m = dimension(σ), g = g µν, g µν = g 1 µν rememember that g µν = µ X i ν X j h ij (X) and g µν g νγ = δ µ γ, so the weight is related to the surface area. This functional is thought to have been introduced by Polyakov for m = 2. chapter 14.2-14.3 May 9, 2005 p.14/25

Simple Example Embed a surface in R 3 and use Cartesian coordinates, so: ( ) 1 0 (g µν ) =, x = σ 1, y = σ 2, h ij = δ ij 0 1 The functional is now the Euclidean L2 norm: S[I, g µν = δ µν, h ij = δ ij ] = d 2 σ( x 2 + y 2 + I 2 ) and minimizing with respect to I gives the heat operator (c.f. isotropic diffusion, but we don t have a time factor, yet. Part of the flexibility here is deciding what to minimize with respect to and scaling of intensity via h ij chapter 14.2-14.3 May 9, 2005 p.15/25

Minimizing a Functional The Euler-Lagrange equations can be used to find an extremum. Given: I = x2 f[y(x), ẏ(x), x]dx x1 Euler-Lagrange gets you: f y d dx ( f ẏ ) = 0 chapter 14.2-14.3 May 9, 2005 p.16/25

Minimizing and Adding Time The details of how to do this are worked out in Appendix A of the difficult to find LBNL technical report and are eliminated from all of the published works. Minimizing the Polyakov functional with respect to the embedding space yields: 1 2 g δs hil =... (see paper) δxl Add time (scale-space) by viewing this as a gradient descent problem: X i = 1 t 2 δs hil g δx l chapter 14.2-14.3 May 9, 2005 p.17/25

Perona-Malik Flow Perona-Malik Diffusion can be derived within this framework and has the advantage of extending to n-dimensions. The Euler-Lagrange equation simplifies, in a Euclidean space to: I t = µ gg µν ν I with the n-dimensional image embedded in R n+1 if we choose (g µν ) = fi d where I d is the identity, we have: I t = n µ=1 µ f n 2 1 µ I chapter 14.2-14.3 May 9, 2005 p.18/25

Perona-Malik Flow cnt d For n 2, we can choose f n 2 1 = C(I) and we have Perona-Malik Flow: for C(I) = f(i 0) I, we have: I t = div(c(i) I) I t = div(f(i 0 ) I I ) which is the fundamental equation for geodesic active contours (c.f. level set methods) chapter 14.2-14.3 May 9, 2005 p.19/25

Beltrami Flow The authors propose a new flow, operating in multiple dimensions, which the dub the Beltrami flow: I i t = g I g µ ( gg µν ν I i ) = H i where H i is the mean curvature in the ith direction. Geometrically, each point moves with a velocity proportional to the mean curvature in each direction. This looks nasty, but it is the induced flow when using the trivial embedding, Euclidean metrics, and minimizing with respect to I. See examples next page. chapter 14.2-14.3 May 9, 2005 p.20/25

Beltrami Flow Examples chapter 14.2-14.3 May 9, 2005 p.21/25

Multiscale Active Contours Bresson, et. al. used the framework to extend active contours to multiple scales: chapter 14.2-14.3 May 9, 2005 p.22/25

Final Words on the Framework The framework provides: an embedding of images in a higher dimensional space via a mapping a connection between the metric in the image manifold and the (n + 1)D space an action functional on the maps themselves (which encode the images) The connection to established methods seems somewhat forced I need to understand the geometry better The technique opens up avenues for analysis and derivation: metrics can be chosen in either space and derived for the other and the action can be minimized with respect to the metrics or the mapping. chapter 14.2-14.3 May 9, 2005 p.23/25

Concluding Ideas The authors cite 1994 work of Florac, et. al., in which they present linear heat flow scale-space in the log-polar coordinate system. They point out that the Beltrami operator is a parameterization invariant differential operator which can work in this capacity and is edge preserving. The generalization of Perona-Malik flow to n-dimensions may be of interest in multi-channel MRI. I have never seen precisely the same text recycled into so many different publications. Many of the details are worked out only in the LBNL tech report. The generalization of various techniques to n-dimensions may be of great interest in and of itself. chapter 14.2-14.3 May 9, 2005 p.24/25

References Primary reference is Sochen, et. al.: A General Framework for Low Level Vision. IEEE Trans Image Proc 7(3):310 318, 1998. The LBNL tech report is not available online from LBNL and the version of Sochen s web site is stripped down for one of the many re-publications of the text. The full version was on Kimmel s website at: http://www.cs.technion.ac.il/ ron/pub.html Sochen has a talk at http://www.math.tau.ac.il/ sochen/beltrami2.pdf Multiscale Active Contours was from Bresson, et al. at EPFL, Lausanne, Switzerland. It does not appear to have been published yet. Malik and Perona is in IEEE PAMI 12(7):629 631, 1990. chapter 14.2-14.3 May 9, 2005 p.25/25