Surface Reconstruction: GNCs and MFA. Mads Nielsen y DIKU. Universitetsparken 1. DK-2100 Copenhagen. Denmark.

Size: px

Start display at page:

Download "Surface Reconstruction: GNCs and MFA. Mads Nielsen y DIKU. Universitetsparken 1. DK-2100 Copenhagen. Denmark."

Osborn Singleton
5 years ago
Views:

1 Surface Reconstruction: GNCs and MFA Mads Nielsen y DIKU Universitetsparken 1 DK-100 Copenhagen Denmark malte@diku.dk September 1994 Abstract Noise-corrupted signals and images can be reconstructed by minimization of a Hamiltonian. Often this Hamiltonian is a non-convex functional. The solution of minimum energy can then be approximated by the Graduated Non-Convexity (GNC) algorithm developed for the weak membrane by Blake and isserman. The GNC approximates the non-convex functional by a convex functional, and varies the solution space slowly towards the non-convex functional. In this work, we propose a way of monitoring a functional-relaxation algorithm. Especially the dependency on initial estimates and the way of convergence is easily captured. Earlier work used Mean Field Annealing (MFA) to relax the Hamiltonian in the general case. It is often claimed that MFA leads to a GNC algorithm. It is shown that this is not necessarily the case, and especially not the case for earlier MF approximations of the weak membrane. In the case of the weak membrane, MFA might lead to predictable and inexpedient results. Two automatic and proven GNC-generating methods are presented. One is using a Gaussian ltering of the smoothness term and is called Smoothness Focusing (SF). The other is using a Gaussian ltering of the a priori distribution of the derivative in a Maximum A Posteriori estimation scheme, and is called Probability Focusing (PF). The algorithms are experimentally compared to the Blake-isserman GNC and shown competitive. Index Terms{ Surface Reconstruction, Relaxation Algorithms, Regularization, Mean Field Annealing, Graduated Non-Convexity, Discontinuities. 1 Introduction Regularization is a method of reformulating ill-posed inverse problems as well-posed problems as done by Tikhonov and Arsenin [1]. This reformulation implies the addition of a stabilizing term, Published as Technical Report, INRIA 353, France y Part of this work was carried out at INRIA, Sophia Antipolis 1

2 followed by a global minimization of a energy functional, yielding a unique solution. Applied to the problem of reconstruction of a noise-corrupted surface, the energy functional or Hamiltonian (the terms are used interchangeably) can be expressed as follows: E[~s] = (~s? c) + f[~s]da where ~s is the reconstruction, is the domain of the reconstruction, c is the measurements, da is an area element of, and f is the stabilizing functional. Tikhonov uses a stabilizing functional which is a sum of the squares of derivatives of ~s, and obtains a convex energy functional, making the minimization is simple. Geman and Geman [] introduced a discontinuity eld in the reconstruction and proposed thereby the model of the weak membrane, which is formulated in the continuous case by Mumford and Shah [3]. The discontinuity eld is incorporated directly in the stabilizing functional by Blake and isserman [4], using what Rangarajan and Cheleppa [5] call the adiabatic approximation. The resulting stabilizing term will no longer lead to a convex solution space, therefore we will not use the term regularization, but surface reconstruction, because several solutions of minimum energy might theoretically exist. In general the stabilizing term reects the a priori knowledge of the surface, and other stabilizing functionals might be appropriate for other reconstruction problems. The stabilizing functional can be interpreted in terms of information theory, as an entropy measure [6], or in terms of Bayesian Estimation using Maximum A Posteriori (MAP) estimation as an M-estimator [7]. Given a noise model P (cjs), where s is the ideal surface, which is to be reconstructed and an a priori distribution of surfaces P (s), we can calculate the a posteriori probability of a given surface using Bayes formula: P (sjc) = P (cjs)p (s) P (c) where P (c) is a normalizing constant or in terms of statistical physics the partition function. Typically, the a priori distribution is given in terms of the derivative of the surface, which is why we use the notation P (rs). Using a noise model of additive, Gaussian, uncorrelated noise, and no spatial correlation of P (rs) we can nd the minus-log-probability function of a surface as E[~s] = X (~s? c) + f(r~s) (1) where f(r~s) =? log P (r~s), and r is a dierence operator in the discrete approximation. In terms of statistical physics E is called the Gibbs energy. Because the minus-log function is monotonically decreasing, a minimization of the Gibbs energy correspond to a maximization of the a posteriori probability. In the following we call the function f in Equation 1 the smoothness function. The properties of the reconstruction and the convexity of the solution space depends strongly on the smoothness function. Tikhonov [1] used a parabolic smoothness function, Blake and isserman [4] used a thresholded parabolic smoothness function, and Nielsen [8] used a Lorentzian estimator. In the following, we will not in general refer to any specic smoothness function. Geman and Geman [] used Simulated Annealing (SA) to nd the minimum of Equation 1, while Blake and isserman [4] introduced the deterministic and approximative Graduated Non-Convexity (GNC) algorithm in the case of the weak membrane and showed that it is up to 50 times faster than SA [9]. Geiger and Girosi [10] and Bilbro et. al. [11] used the Mean Field Annealing (MFA) formalism to create a deterministic version of the SA, and claimed it creates GNC-like algorithms. This paper concerns how the concept of GNC can be generalized to analytical and non-analytical smoothness functions, which criteria an algorithm has to fulll to be a GNC, and how previous work

3 such as MFA is explained in terms of GNC. In Section the Blake and isserman GNC is sketched and the criteria of an algorithm to be a GNC algorithm are emphasized. In Section 3 the MFA is sketched, and it is shown, that it is not leading to a GNC algorithm in the case of the weak membrane, but to an algorithm which in some cases yields predictable and inexpedient results. In Section 4 two alternatives to the MFA are given in the general case. They are proven to yield GNC algorithms and are fully automatic. All the proofs have been left to appendices in order not to distract the reader, which is only interested in the results. Graduated Non-Convexity The deterministic and approximative approach of GNC implies an approximation of the non-convex Hamiltonian by a convex Hamiltonian. This approximation is slowly varied towards the non-convex Hamiltonian, in the hope that the local minimum, which is tracked, will converge to the global minimum of the non-convex Hamiltonian. Three snapshots illustrating a GNC algorithm are shown in Figure 1. Energy Intermediate Final Convex Solution Figure 1: The Graduated Non-Convexity algorithm of the weak membrane in three snapshots. In general, a crucial point of the GNC algorithm is to be sure to have an initially convex Hamiltonian [4]. If it is not convex, the nal solution will depend upon an initial estimation. Let us assume the minimization algorithm to be of gradient descend type. We can dene the basin of attraction of a minimum as the set of points on characteristic paths ending in the minimum. In general the solution space will in this way be divided into a number of basins of attraction corresponding to the number of minima. If the initial solution space contains more than one basin of attraction, the nal result of the GNC will depend on the starting point of the algorithm. In other words, it is necessary to have an idea a priori of the solution, in order to nd the best solution. If the initial energy functional is non-convex, we cannot talk of a GNC, but of a relaxation algorithm. The second crucial point of a GNC is the development of non-convexities (when and where they appear and how they move), when the Hamiltonian is varied. It is dicult to gain any intuition or to formalize the criterion of having a good GNC. However, we can say that a non-convexity should not move much in the solution space after it is introduced in the Hamiltonian. If it moves far, it can push the solution far away from the earlier found equilibrium, after detection. Finally, we want the series of Hamiltonians not to be uniformly convergent towards the true energy functional but to be?-convergent towards the true energy functional [1]. A?-convergent series can informally be described as a series of functionals where the minimum of the functionals converges towards the minimum of the convergence functional. This is not the same as uniformly convergent because of phenomena such as Gibbs phenomenon, but for discrete approximations, uniform convergence will imply?-convergence [1], which is why we will be satised by uniform convergence. 3

4 In this paper we focus primarily on the rst crucial point (initial convexity), while the second (concavity motion) will only be used, when we will explain the behaviour of an algorithm. In Appendix A it is shown that a reconstruction problem on the form of Equation 1 implies a convex energy functional if and only if for all values of x the second derivative of f(x) is larger than? 1 in the one dimensional case. This implies that a GNC which changes only the smoothness function, should guarantee that the second derivative is dened for all x, and is larger than? 1 in the initial approximation. In Appendix A, it is also shown that in the D dimensional case the criterion is somewhat more complex, but a convex energy functional is guaranteed if all the eigenvalues of the Hessian matrix of f(x) are larger than? 1. Furthermore, non-convexities are guaranteed if one D of the eigenvalues are smaller than? 1 1. In the interval [? D ;? 1 ], the exact criterion is given in Appendix A. We see that a lack of convexity arising in one dimension, cannot be balanced by the other dimensions. The convexity must exist in all dimensions, and the dimensions must mix in an appropriate manner. The weak membrane is dened by minimization of the energy in Equation 1, where f(r~s) = ( jr~sj if jr~sj < T T otherwise and is a weighing constant and T can be perceived as a discontinuity threshold control parameter. The energy functional is not convex according to Theorem 1, Appendix A, because f(x) has second derivatives which are minus innity for jxj = T. Therefore the approximation of a GNC is constructed. The idea of the GNC algorithm is to approximate f by another functional, whose second derivative is bounded. This can be done by approximating f by f 1 in the critical region (around jxj = T ) by a second order polynomial with a second derivative larger than? 1. This formulation D was proposed by Blake and isserman [4] and is illustrated for the 1D case in Figure. T f(s ) x T s x Figure : Smoothness term as function of the derivative of the solution in the weak membrane. The solid line is the weak membrane, while the dotted line is the starting level of GNC as formulated by Blake and isserman. If we denote the initial smoothness function by f 0, we can construct a series of functions f c which vary continuously as a function of c. When c = 1 the solution space is convex, and when c = 0 the reconstruction corresponds to the weak membrane. The intermediate functions f c, c ]0; 1[ can be constructed by letting the interval of approximation shrink to a factor c of the original interval. The claim of Blake and isserman is that if we track the local minimum of f c, when slowly varying c from 1 to 0, we obtain a good approximation of the solution of the weak membrane [4]. It is shown [4], that the global minimum cannot always be tracked as the local minimum. A discussion of convergence is given by March [1] and Nielsen [13]. Other approximations of the initial smoothness function, which yield a convex energy functional can be constructed. In Section 4 two methods which are generally applicable and totally automatic are presented. 4

5 3 Mean Field Annealing and GNC Mean Field Annealing (MFA) is a technique, which is a deterministic version of Simulated Annealing. An amount of free energy is added to the molecules, which let them have many dierent possible states. Instead of simulating the stochastic behaviour of a molecule when the free energy is removed (ie. the temperature is decreased), the mean state of the molecule is simulated. Approximation of the distribution by its mean value is called the MF approximation. To calculate the mean state a Gibbs distribution of the states is used: P (~s) = E(~s) e? where E is the potential energy of a state, here given by Equation 1, while is used for the temperature which has often been denoted 1 or kt, but by here to gain consistency with later Gaussian scale space notation. is a normalizing constant called the partition function. The mean of the Gibbs distribution can be evaluated when tends towards zero, where only the ground state has non-zero probability and also becomes the mean state. For approaching innity, all states become equally probable, which is why the addition of free energy and the MF Approximation can be regarded as a relaxation algorithm. A convex energy functional is not necessarily obtained for high temperatures. This depends on the structure of the state space. In general, the solution space will become totally at and all states will be equally probable for innite high temperatures. If E : D! IR + is dened on a domain D with nite measure then the mean of ~s will be at the center of gravity of D for innite. If only the smoothness term of Equation 1 is MF approximated, the data term (which is quadratic and thereby convex) will ensure convexity of the total energy functional, and we obtain a GNC algorithm. The evaluation of the mean of the Gibbs distribution, is, however, often non-trivial as a summation over all possible states (ie. all possible combinations of reconstruction values in all points) has to be carried out. In many cases MF approximations can more easily be made, not in the total state space, but in some subspace, eg. the value of neighbouring points can be exchanged with their MF approximation. This is not the same as the global MFA, and will in general yield another result and might thus violate the condition of initial convexity of a GNC. When only a subspace is approximated, the approximation might depend on the rest of the state space. Eg. in the case of approximating a neighbouring pixel value it is necessary, for each temperature, to minimize the energy of each pixel, which changes the MF Approximation of the neighbouring pixels. Whether this optimization is simple (ie. the energy having a single stationary point) is not in general easily determined, but will depend on the actual properties of the eld and the approximation. In the recent years MFA algorithms of the weak membrane has been published [11], [10], [14]. In these works, the discontinuity eld introduced by Geman and Geman [] is substituted by its MF approximation, yielding an expression of the eective energy in a pixel in terms of a functional of the gradient. This MF-approximation of the weak membrane yields [11], [10] a smoothness function which has the form: f (x) = T? log(1 + e (T?jxj ) ) where can be interpreted as the temperature. This has the characteristics of being a smoothed version of the weak string. It is claimed [11], [10], that the MFA of the weak membrane is a GNC-like algorithm. This is not the case as the solution space might be non-convex no matter how much we \heat". When tends to zero, the smoothness function tends to the weak string. When tends to innity, the lower bound on the second derivative tends to a negative limit, which depends on. 5

6 The limiting bound is approximately?0:6, which shows that the MFA is not a GNC for > 0:9 no matter the dimensionality of the surface. This analysis is carried out in Appendix C. An illustration of the MFA smoothness function as a function of gradient and temperature is shown in Figure 3, while minus the second derivative with respect to the gradient is shown in Figure x sigma Figure 3: The F Approximation of the smoothness function of the weak membrane as a function of the gradient x and the temperature for the case T = 1, = 1. In the 1D case, we will analyse the behaviour of the MFA of the weak membrane by investigating the motion of the non-convexities. The positions on the smoothness function, where the second derivative reaches its minimum value is approximately a linear function of the square root of the temperature for high temperatures (see Appendix C). This implies that the concavities in the solution space is located in p k in the gradient space of the solution, where k is a positive constant. If is initialised p to an arbitrarily high value, such that every derivative in the signal is in the interval [?k ; p k ], no discontinuities will be detected using this. As is decreased slowly, the concavities traverse towards zero in the gradient space. Because we track the solution as a local minimum, the solution stays in the interval, when is lowered. This implies that all gradients of the solution are pulled towards zero. They will end in the interval p between p the two concavities in the function. When is decreased to zero, this interval is [?T= ; T= ]. This way, all gradients of the resulting signal will be in this interval and no discontinuities are detected. If the temperature was initialised at a lower value, some discontinuities might have been detected. These discontinuities will not be able to pass the concavities of the energy functional, and will still be outside the interval of the concavities no matter how the temperature is increased or decreased. The phenomenon can be visualized by looking at the graph of minus the second derivative of the MFA smoothness function in Figure 4. While the temperature is changed, the leaps cannot be overcome, and the solution is pushed in front of the concavities. The consequence of the MFA is, that the start temperature denes which discontinuities are detected. The annealing might let these discontinuities move, but the detection is performed by the arbitrary choice of initial temperature and the initial estimate. In Figure 5 the result of MFAs using dierent start temperatures is shown. The discontinuity threshold is low (T=1) resulting in the ground state when all points are detected as discontinuities. By raising the initial temperature the discontinuity detection threshold is raised, and fewer discontinuities are detected yielding a higher nal energy. In the limit we can force the MFA to detect no discontinuities if we start at a suciently high temperature. The above analysis was based upon the positions of the concavities in the energy functional. A concavity might not lead to multiple minima. One necessary condition to be fullled is that the 6

7 sigma x -0 Figure 4: Minus the second derivative of the MF Approximation of the smoothness function of the weak membrane as a function of the gradient x and the temperature for the case T = 1, = s Tstart X Figure 5: A MFA reconstruction. For each start temperature the MFA algorithm is run, using the same nal temperature close to zero in all executions. The parameters was T = 1 and = 10. The initial relaxation parameter was varied from 5 to using = 5 10 Tstart=5. concavity in a smooth energy functional contains a maximum. This is not the case when the forces from the data and the smoothness term point in the same direction. This means, that a "heating" to innity will not let discontinuities grow towards innity. The enlargement of the gradient will stop when the reconstruction reaches the initial data. The argumentation of using MFA is solely based upon statistical physics. In statistical physics MF Theory is not regarded as a good approximation inside the critical regions. The weak membrane is normally situated in a critical region. A critical region is the region where phase transitions are present. If both discontinuity points and non-discontinuity points are present in the optimal state, the weak membrane is in the critical region. If not, another model than the weak string could have been used. It should be mentioned, that MF Theory is regarded as a better approximation, when the dimension of the eld is high. This means that the MFA should be a better approximation in the D or 3D reconstruction problem. Nevertheless, the MFA on the weak membrane, will suer from the same problems outlined above, independently of the number of dimensions. This can be seen from the fact that the MF Approximation in higher dimensions, just exchange the derivative by the gradient magnitude, and that the non-convexities in this way (using Theorem ) are present in 7

8 a sphere in gradient space. This sphere is shrunk, without possibility of letting the gradients escape, when the temperature is decreased. 4 Focusing as GNC In this section we will present two algorithms based upon Gaussian defocusing as sources to gain convexity of the energy functional. In Appendix B it is proven that the weak membrane yields a convex Hamiltonian if the smoothness function is convoluted by a Gaussian of suciently large standard deviation. In fact, it is proven that any smoothness function, which diers from one which yields a convex Hamiltonian by only a Lebesgue integrable function, will cause a convex solution space if the smoothness function is ltered with a Gaussian of suciently large standard deviation. In this way a convex Hamiltonian can be constructed, and the solution can be found using a simple gradient descend algorithm. When slowly decreasing the standard deviation of the Gaussian towards zero, we can track the solution of the optimization problem by tracking the minimum as a local minimum. When the standard deviation approaches zero, the tracked solution is an approximation of the solution of the original problem. This method is denoted Smoothness Focusing (SF) using the nomenclature of coarse-to-ne strategies in scale space from Bergholms Edge Focusing [15]. In Appendix B it is also proven, that a convex energy functional can be obtained by Gaussian convolution of the a priori distribution of the gradient in a MAP estimation scheme. This corresponds to a Gaussian convolution of the Gibbs distribution of the gradient:?f (r~s) P (r~s) = e The corresponding minus-log-probability function yields a convex solution space if the initial probability function has a nite standard deviation. This is not the case for the weak membrane, but for many other robust estimators such as the Lorentzian. The solution can be tracked as a local minimum while the standard deviation of the Gaussian is decreased towards zero. This method is called Probability Focusing (PF). In both GNC generating schemes a conservative measure of the needed amount of smoothing is given in Appendix B. This means that we can guarantee that a nite amount of smoothing is needed and we can initiate the amount of smoothing. The general scheme of constructing a GNC algorithm which only requires that the smoothness function fullls the demands of Theorem 3 or Lemma in Appendix B, is presented as follows: = 0 while > 1 Minimize E (u) = P (c? ~s) + f (~s x ) = = endwhile where is the standard deviation of the Gaussian used for convolution of the smoothness function f or the probability function e?f. This means f (t) = G (t) f(t) or f (t) =? log(g (t) e?f (t) ) where G is the Gaussian of standard deviation. The choice of is a trade of between speed (large ) and precision (small ). A discussion of how to choose this \cooling rate" is given by Blake and isserman [4]. 8

9 4.1 Bayesian interpretation Reconstruction and the scale space extensions of the smoothness function can be interpreted in terms of probabilistic theory. The connection is: Assume an a priori probability distribution of the solutions is given and the smoothness function f is properly derived from this, then the minimization of the reconstruction energy corresponds to a maximization of the a posteriori probability (see derivation of Equation 1). Let us assume that we are not able to measure the derivative s x of the solution precisely, and the noise model on the derivatives is assumed stationary and Gaussian. The a priori distribution of the observed derivatives will in this case be the convolution of the a priori distribution of the real derivatives and the Gaussian noise model distribution. This gives an interpretation of the PF scheme as an expectation value of the probability (or an MF approximation, but using a Gaussian noise model of the gradient instead of the Gibbs distribution). The SF corresponds to having an uncertain determination of the derivatives and then estimating the expected energy. The expected energy is found by the Gaussian convolution if the noise model of the derivative is Gaussian. The GNCs by focusing can be perceived as an iterative way to nd increasingly certain values of the rst derivative. At the rst level, the derivatives are inaccurately determined, and we convolute the smoothness function by a Gaussian of large standard deviation. As the minimization of this yields an increasingly certain determination of the derivative of the solution, we can now use a Gaussian with a smaller standard deviation, and so forth. 5 Experiments We compare empirically the SF and the Blake isserman GNC on the weak membrane. The latter has the advantage that the smoothness function is only changed around the critical points where the second derivative is smaller than? 1. The Gaussian convolution yields the theoretically satisfying property of being derivable from probability theory. In the following experiments we show, that the SF can compete with the Blake isserman formulation. The outcome of the GNC algorithm implemented as done by Blake and isserman [4] is compared to the SF. The test problem is the weak membrane, which in scale space extension (after Gaussian convolution, with as scale parameter) yields: where f (x) = T + + x? T x? = p (x? T ) (erf(x? ) + erf(x + ))? x p (e?x?? e?x + )? T p (e?x? + e?x + ) x + = p (x + T ) and erf(x) = p x e?t dt?1 The qualitative dierence between the Blake-isserman approximation and the scale space approximation is that the scale space not only rounds o the corners, it also increases the value in zero (see Figure 6). Furthermore, the critical value of the second derivative is only reached in single points in the SF, but in intervals of nite size in the Blake-isserman formulation (see Figure 7). In these intervals, the energy functional is nearly non-convex, and might not have any gradient at all. This means that a gradient descent algorithm will probably end up in one of the bounds of the interval at random. Whether a gradient in the input is perceived as a discontinuity or not, might in this way be random. 9

10 Scale Space Blake-isserman Figure 6: Smoothness function of the derivative of the solution in starting level of GNC as formulated by Blake and isserman and in scale space extension. 1.5 Scale-Space Blake-isserman Figure 7: Second derivative of the smoothness function of the derivative of the solution in starting level of GNC as formulated by Blake and isserman and in scale space extension. Two types of experiments have been performed. One on a noise corrupted signal, and one on an ideal and precisely adjusted signal. The latter to test the precise behaviour to certain features, the rst to make an overall judgment. An ideal signal (consisting of an interval of negative gradient, an interval of zero gradient, a step edge, and an interval of zero gradient) has been noise corrupted with stationary Gaussian noise with standard deviation = :5. The two GNC algorithms have been run on the signal and the result can be seen in Figure 8 and Figure 9. The Blake-isserman GNC detects more discontinuities than the 10

11 SF GNC. This tendency is general and is present for many other parameter settings. The SF yields a nal energy which is 9 percent of the total energy found by the Blake-isserman algorithm. This is a general tendency which is emphasized in the next experiment Signal Scale-Space Figure 8: Regularized signal using the weak string approximation by SF. The nal energy E = 38:6 and only 1 discontinuity is detected Signal Blake-isserman Figure 9: Regularized signal using the weak string approximation by the Blake-isserman GNC. The nal energy E = 35:6 and discontinuities are detected. The two GNC algorithms are not always detecting the same discontinuities. It is known [4], that the weak string will detect discontinuities from a gradient if the gradient g > p T [17]. In this experiment, the algorithms have been tested on a constant gradient. From one experiment to the 11

12 next, the gradient has been increased. In Figure 11, the energy of the solution found by the two algorithms is plotted as a function of the gradient. In regions where the solution is not changing the detection of discontinuities, the plot should be a parabola, as illustrated in Figure 10. For each combination of discontinuities a parabola exists. The perfect GNC algorithm would choose the parabola of lowest energy for every gradient value. This is not the case for any of the two algorithms evaluated in this paper. Energy Gradient Figure 10: Energy of the weak string as a function of the gradient in an interval. Each of the curves corresponds to one combination of discontinuities. The optimal solution is the one of minimum energy. The experiment shows that, initially, where the gradient is small none of the algorithms detect any discontinuities, and the solutions are thereby identical. From a certain point (around g = 0:6) the Blake-isserman GNC detects discontinuities, and thereby leaves the initial parabola. The energy increases relative to the initial parabola, and it can be concluded that the discontinuities have been detected too early. In Figure 1 a zoom-in on the region of dierences can be seen. The SF GNC follows the initial parabola until the gradient g is close to 0:8. After this it follows a new parabola. The energy drops from the rst to the second parabola, and it can be concluded that discontinuities have been detected too late. The ideal detection gradient is g = 1=p. In the region of larger gradients (approximately 1.6), the SF GNC results in the worst solution, as too many discontinuities have been detected. This region is, though, of less interest, because it is the region where nearly all points have been detected as discontinuities. This situation is unlikely to appear in a realistic environment. In the interval of gradients 0:6 < g < 0:95 the Blake-isserman GNC yields a higher energy than the SF except in the region 0:75? 0:78. In order to show, that the Focusing GNC is a general scheme which can be applied to a broad class of regularization problems, it has also been tested on D data using the isotropic regularization introduced by Nielsen [8]. In the isotropic regularization, the smoothness function is the Lorentzian robust estimator, which is f(x) = log(1 + jxj ) It should be noticed, that this is not nite for innite jxj and furthermore not simple to approximate in the Blake-isserman way by substituting the function in the critical interval by a second order polynomial. The Lorentzian estimator implies a convex solution space if <. In Figure 13 and 14 the reconstruction using = 50 and the GNC by PF can be seen. 6 Conclusion We have proven criteria to guarantee convexity of an energy functional containing a quadratic dataterm and an arbitrary smoothness term has been given. Necessary conditions for the smoothness 1

13 5 0 Blake-isserman Smoothness Focusing Figure 11: The energy of the weak string as a function of the gradient. For each gradient a signal of length 0 and constant gradient has been constructed. The energy is plotted for the Blake-isserman GNC and for the Gauss GNC. = T = 1:0 in all computations Blake-isserman Smoothness Focusing Figure 1: The energy of the weak string as a function of the gradient. Computations performed as in 11. Focus is on the region of dierent energy. term to be approximated by a scale space extension of energy or probability yielding a convex energy functional is given. These results have been used for automatic construction of GNC algorithms. In the case of the weak string, the application is straight forward and yields results which are competitive to those of Blake and isserman. The Blake-isserman GNC has a tendency to overestimate the number of discontinuities, while the SF GNC has a tendency to under-estimate the number of discontinuities. 13

(a) (b) (c) (d) Figure 13: Regularization using the Lorentzian estimator and Smoothness Focusing. (a) is the original data. (b) is the noise corrupted signal, width SN R = p on the step edges.

Earlier, Mean Field Annealing has been used to make deterministic approximations of the process of simulated annealing of the weak string [10], [11].

The start temperature of the MFA denes the positions of the discontinuities. The higher the temperature the fewer discontinuities will be detected.

14 (a) (b) (c) (d) Figure 13: Regularization using the Lorentzian estimator and Smoothness Focusing. (a) is the original data. (b) is the noise corrupted signal, width SN R = p on the step edges. (c) is the reconstruction, and (d) is the normalized residual. Earlier, Mean Field Annealing has been used to make deterministic approximations of the process of simulated annealing of the weak string [10], [11]. It is proven, that these MFAs does not yield a GNC algorithm of the weak string, as the energy functional might be non-convex even for innite high temperatures. The start temperature of the MFA denes the positions of the discontinuities. The higher the temperature the fewer discontinuities will be detected. No matter how low the discontinuity threshold is in the weak string, it can be matched by a suciently high temperature, resulting in detection of no discontinuities by the MFA. The SF and PF GNCs implies the possibility of automatically applying GNC to any reconstruction, which can be formulated as the minimization of the energy of Equation 1. In general, an analytic expression of the energy functional is not needed. This implies the possibility of using an energy functional, which is measured as a histogram, and then only numerically known. In this way a new category of GNC applications is made possible. One application [18] to computer vision is to measure the statistics of the gradient in a scene. At a later time instance, we can, instead of using a priori information directly about the gradient, use the information, that the statistics of the scene is changing slowly. Both SF and PF has been applied with success in this conguration. 14

(a) (b) (c) (d) Figure 14: Regularization using the Lorentzian estimator and Smoothness Focusing. (a) is the original data. (b) noise corrupted signal, width SN R = p on the step edges. residual.

results of the second. We will look into the case where the smoothness term is only dependent of the rst derivative of the solution.

15 (a) (b) (c) (d) Figure 14: Regularization using the Lorentzian estimator and Smoothness Focusing. (a) is the original data. (b) noise corrupted signal, width SN R = p on the step edges. residual. Appendix A (c) is the reconstruction, and (d) is the normalized In this appendix, the necessary and sucient conditions for convexity of the energy functional are given. The proof is divided into three parts. The rst concerns the conditions on the smoothness function to create a convex solution space in the 1D case, the second deals with the D-dimensional case, and the last gives an interpretation of the results of the second. We will look into the case where the smoothness term is only dependent of the rst derivative of the solution. The energy can in the discrete formulation be expressed as: E(~s) = X (~s i? c i ) + f(~s i? ~s i?1 ) () where f : IR 7! IR, and is dependent on the sampling distance h, and subscript i denotes the function value taken in sample number i. Theorem 1 If the second derivative of the smoothness function f of the rst derivative of the solution can be bounded downwards to? 1 and the data term is the square distance, the solution space will be convex. Proof The solution space is convex if and only if the Hessian Matrix H is positive denite. By 15

16 denition In this case it yields where where F = 8 >< >: f 00?f 00 H ij j H = I + F?f 00 f 00 + f 00 3?f 00 3?f 00 3?f 00 N?1 f 00 i = f 00 (~s i? ~s i?1 ) The criterion for a matrix H to be positive denite is:?f 00 N?1 f 00 N?1 + f 00 N?f 00 N?f 00 N f 00 N 9 >= >; 8~x : ~x t H~x > 0 (3) By denition the solution space is convex if: 8~x : NX 1 x i + NX (x i? x i?1 ) f 00 i > 0 As P (x i ) P (x i? x i?1 ) for all ~x, H is positive denite if 8i : f 00 i >? 1 From Theorem 1 we can see that we only have to prove the existence of a lower bound on the second derivative of the smoothness term higher than? 1 to gain convexity of the solution space. The above results are correct for an integer sampled signal. If a dierently sampled signal was used instead, or a function of higher dimensionality was reconstructed, the limit would be changed according to the sampling distance h and the dimensionality D. If the smoothness function was a function of any higher order derivative of the solution, the lower bound would still exist, but take a dierent value. In the following table the values corresponding to smoothness functions of dierent derivative of the solution is listed. No matter which derivative the smoothness function is a function of, it is still a limit on the second derivative of the smoothness function. The smoothness function is f(s x (n)), where n denotes the order of dierentiation. n Limit h D 9h 4 D 40h 6 D If this pattern repeats in general the limit can be expressed as L(n) = n 1 n n! h n D 1 175h 8 D

17 The above results are valid for the one dimensional problem. In the following we generalize to the D dimensional case. Let an energy function be given on the following form: E(~s) = X (~s? c) + f(~s i+1? ~s i?1 ; ~s i+n1? ~s i?n1 ; ~s i+n1 n? ~s i?n1 n... :) (4) where n 1 is the length of a row, n is the length of a column of the image etc, and central approximations are used to gain symmetry. In this case we will search for the criterion for convexity. Lemma 1 If the Hessian H of f is in the class H, then the energy function given in Equation 4 is convex, where H = H IR DD 8y fir D kyk1 1g : y T Hy >? 1 Proof The solution space is convex if and only if the Hessian Matrix H of the energy function E(~s) is positive denite. The Hessian of the energy given in Equation 4 can be written as H = I + F where F is a band matrix with extra bands in distance n 1 ; n 1 n ;... etc. from the diagonal. The matrix F is composed by addition of submatrices F i originating from each of the image. If we remove the rows and columns containing only zeroes, we nd a D D matrix of the form: ( ) Hi?H F i = i j?h i H i j where vertical lines means reverse order of columns, while horizontal line means reverse order of rows, and H i is the Hessian matrix of f in the ith point. The D eigenvalues of this are D zeroes and D eigenvalues corresponding to twice the eigenvalues of H i. We search a class H of Hessian matrices H i such that 8i H i H, 8x : x T Hx > 0 (5) Let x i denote the sub-vector of x corresponding to the positions of F i in F, such that If we split x i into halves so that x i = 8i H i H, 8x : X i x T i F i x i = x T Fx! x 1i, we can write the criterion from (5) as x i X i (x 1i? x i ) T H i (x 1i? x i ) >?jxj Using the formula P (x i? x i+k ) 4jxj, and noticing that worst cases in the dierent dimensions are not mutual exclusive, we nd that this is equivalent to 8i H i H, 8i8y i fir D 8j jyj j 1g : y T i H i y i >? 1 This leads to the following class H of allowed Hessian matrices H = H IR DD 8y fir D kyj k 1 1g : y T Hy >? 1 where H is a symmetric matrix. 17

18 In general H is a symmetric matrix, and can thereby be diagonalized into H = T T LT where L is a diagonal matrix with real eigenvalues i. It should be noticed, that this class H is not rotationally symmetric, which is due to the rotational asymmetrical sampling grid. This asymmetry makes it dicult, in general, to reformulate the criterion directly in terms of H and eliminate y. We can though in general nd a sucient and a necessary criterion of convexity. Theorem The energy function of Equation 4 is convex if all the eigenvalues of the Hessian of f in Equation 4 are larger than? 1, where D is the dimensionality of ~s and is non-convex if one of D the eigenvalues are smaller than? 1. Proof The criterion of convexity in Lemma 1 is weakened by letting the domain of y, where the constraint on the Hessian shall be fullled, be shrunk into jyj = kyk 1. Because the Hessian is diagonalizable with real eigenvalues and with eigenvectors being an orthonormal basis H = T T LT, we have H = = H IR DD 8y fir D jyj 1g : y T T T LT y >? 1 H IR DD 8 fir D jxj 1g : x T Lx >? 1 where the substitution x = T y is used. This relaxed criterion is obviously violated if one of the eigenvalues of the Hessian is less than or equal to? 1. The criterion of convexity in Lemma 1 is tightened by letting the domain of y, where the constraint on the Hessian must be fullled, be enlarged into jyj D, where D is the dimensionality of ~s. Because the Hessian is diagonalizable with real eigenvalues and with eigenvectors being an orthonormal basis H = T T LT, we have that H = = = H IR DD 8y fir D jyj Dg : y T T T LT y >? 1 H IR DD 8 fir D jxj Dg : x T Lx >? 1 H IR DD 8 fir D jxj 1g : x T Lx >? 1 where the substitution x = T y is used. This sharpened criterion is fullled if all linear combinations of the eigenvalues with total weight 1 is larger than? 1. This is obviously only fullled if all the D eigenvalues of the Hessian are larger than? 1. D The above lemmas show that moving into higher dimensionality cannot balance a non-convexity from a lower dimensionality to obtain total convexity. In one dimension the only eigenvalue of the 11 Hessian shall be larger than?1= to ensure convexity. In D both the eigenvectors shall be larger than?1= to have the possibility of convexity, while also additional constraints on the interaction of the dimensionality must be fullled to ensure convexity. The result is that the dimensions cannot balance each other in gaining convexity, but might help each other in constructing non-convexities. Appendix B In this appendix two related methods to approximate a Hamiltonian to gain convexity is proposed. The rst operates on the smoothness term directly, while the second operates on the underlying probability distribution of the derivative(s). 18 D

19 Smoothness focussing The lower bound on the second derivative can be reached by a convolution of the smoothness function f by a Gaussian with an adequate standard deviation if the smoothness function only diers from the convex smoothness function by a Lebesgue integrable function. Theorem 3 If we let b be any constant, denote the convolution, and G(x; ) be the Gaussian in x of standard deviation we have for any function f which implies that Proof We have 8 > 0 : 8x IR : f(x) = g(x) > A p 1=3 ) 8x (G f) = (G g) + g(x)?b @ G(k; )g(x? k)dk + g(x? k)dk + G(k; IR jh(x)jdx = A (f(x) G(x; )) IR IR h(k)g(x? k; )dk G(x? k; IR (?b + )G(k; )dk? (?b + ) IR G(k; )dk? IR jh(k)jsup iir =?b +? sup G(x? i; )j IR jh(k)jdk =?b +? p e?3= A 3 G(x? k; )jdk G(x? i; This is a lower bound on the second derivative of the convolution. This bound should be greater than?b to prove the lemma.?b +? p e?3= A > 3?b, > Ap e?3=! 1=3 (6) This means that the second derivative of f will be larger than?b by convolution by a Gaussian of standard deviation larger than the above stated quantity. We have now proven that any function (which can be described as a function g whose second derivative is downwards bounded plus a Lebesgue integrable function h), can be bounded downwards in the second derivative arbitrarily close to the bound on the second derivative of g. As an example, we can mention the weak membrane, which can be described as a constant function, plus a negative parabola in a limited region. We want to limit the second derivative to 19

20 be larger than?1=. For the generality we express this as?b. The smoothness function f(x) corresponding to the weak string is f(x) = g(x) + h(x) where g(x) = T and h(x) = The Lebesgue integral A of jh(x)j yields A = IR jh(x)jdx = T?T ( (x? T ) if x < T 0 otherwise (T? x )dx = 4T 3 This and Equation 6 (noting that = b because g 00 (x) = 0) results in the following standard deviation of the Gaussian to ensure, that the energy function is convex: p > ( e?3= ) 1=3 T 0:70(=b ) 1=3 T 0:91 1=3 T 3b It should be mentioned, that this bound is a conservative measure, and in practice smaller values of might yield a convex energy function. Actually, in practice we nd the limit for the weak membrane of rst order regularization to be 30% lower in the example used earlier in this paper. The above mentioned limit on is conservative. Actually, it is so conservative, that some functions where A is innite (h is not Lebesgue integrable) have nevertheless a nite bound on. An example is periodic functions. By a Fourier series expansion followed by a scale space extension of f, one can see, that any periodic function can be limited downwards by any negative value of the second derivative. Lemma Any periodic function f(x) = f(x + l), which can be expressed as a Fourier series F (u), can be limited to have a second derivative larger than any negative limit by scale space extension. Proof The Fourier series expansion can be described as where! 0 = l f(x) = 1X u=0 F (u)e iu! 0x. The scale space extension of the Fourier series yields The second derivative of this is f(x; ) f(x; = 1X u=1 1X u=0 F (u)e iu! 0x?u! 0?u! 0 F (n)eiu! 0x e?u! 0 As this is exponentially decaying in all the terms of the sum as a function of, it can be limited to any interval around 0, by scale space extension with an adequate. In computer vision we might use the angle of a surface normal as parameter to the smoothness function. Such an angular description is periodic, and thereby we can limit the second derivative to any negative limit by scale space extension, if the function can be expressed as a Fourier series. The above result for periodic functions is not of practical importance if we use derivatives as basis of the smoothness measure. 0 3

21 Probability focussing Reconstruction can be formulated as Maximum A Posteriori (MAP) estimation. When the probability of observing a derivative is independent of the other observations, we nd that MAP-estimation leads to reconstruction, where the smoothness term is the minus-log-probability-function. We have that f(x) =? log p(x) where p(x) is the density of x. Also the scale space extension of the probability function directly leads to a convex solution space, if the distribution has a nite standard deviation. Theorem 4 The Gaussian convolution of a density function with nite standard deviation leads to a minus-log-probability function, whose second derivative can be bounded downwards by zero, if the standard deviation of the Gaussian is larger than the standard deviation of the density function. Proof Given a probability distribution p(x), we know that 8x : p(x) > 0 1 = > 1 1?1 p(x)?1 1?1 (x? y) p(x)p(y)dxdy The two rst conditions state that p is a probability distribution, while the third is a reformulation of the standard deviation being less than. Let G denote the Gaussian of zero mean and standard deviation and let f =? log(p G ) (7) be the minus log-probability of p convolved by a Gaussian. The criterion of f being convex is, 8x > 0, 8x : (G p) < 0 #, 8x : (G p) "( x? G ) p? (?x 4 G ) p (?x G ) p By expressing the convolution products as integrals and substituting t = x= this yields 8k : G(k? t)p(t)dt (t? 1)G(k? t)p(t)dt? tg(k? t)p(t)dt < 0 tg(k? t)p(t)dt < 0 where the integrals are to be taken over the real axis and no index on the Gauss functions means standard deviation = 1. We substitute t = s in the rightmost integral yielding 8k :, 8k : t? t s t s (s? 1)G(k? t)g(k? s)p(t)p(s)dsdt s tsg(k? t)g(k? s)p(t)p(s)dsdt < 0 (s? ts? 1)G(k? t)g(k? s)p(t)p(s)dsdt < 0 1

22 Noticing that the last factor is symmetric in s and t, we can add the symmetric term, and the inequality will still be valid. 8k :, 8k : t t s s [(s? ts? 1) + (t? st? 1)]G(k? t)g(k? s)p(t)p(s)dsdt < 0 [(s? t)? ]G(k? t)g(k? s)p(t)p(s)dsdt < 0 This last expression say that no matter which mean value a Gaussian of standard deviation 1 has, the multiplication with a function of standard deviation 1, will yield a function which standard deviation is less than 1. This is obviously true, as the Gaussian is less than 1 for every x, and the resulting minimized second order moment will be smaller. This proofs that we can guarantee that the second derivative of the smoothness function is larger than zero, and not just a negative limit, implying that this is actually a stronger proof than what is needed to guarantee GNC. Appendix C In this appendix the Mean Field Annealing of the weak membrane as proposed by Geiger and Girosi [10] and by Bilbro et. al. [11] is analysed. In this case the smoothness term is f (x) = T? log(1 + e (T?x ) ) We will prove that this has a lower bound on the second derivative which is not vanishing when the relaxation parameter is increased towards innity, and that the positions of the minima are moved towards plus and minus innity when is increased towards innity. Lemma 3 The third derivative of f is zero when x = 0 or when the equation 3 tanh(y=)? y + T = 0 is fullled where y = T?x. This gives solutions in x, which are proportional to p for high p and correspond to T= for low. Proof Let q = e (T?x and =?x f (x) = x q 1 f (x) = (1 + q)? 3 f (x) = 1 x +4 3 x 3 q and f (x) = T? log(1 + q). We nd x! q (1 + q)? q 1 + q q 1 +? 6 q q q (1 + q) (8)! (1 + q) + 4 q 1 + q The third derivative is evidently zero if x = 0 or (by division of the derivative by 3(1 + q) + x (q? 1) = 0, x = 3 = 0 1?q 1+q 3 3! 4 x ) (1+q) 3 (9)

23 Using the above substitution of y we nd which leads to y = 3 e y + 1 e y? 1 + T 3 tanh(y=)? y + T = 0 (10) This is not easily analytically solvable. The solution depends on T and but not on. The 3 function h(y) =? y consists of two decreasing functions so that h tanh(y=)?(y) : IR?! IR and h + (y) : IR +! IR are two invertible functions. This means that Equation 10 always has one negative and one positive root. Furthermore, these solutions are monotonic functions of T. Let us denote by y a the solution of the above equation when =T = a. When a is increased towards innity the last term of the equation vanishes, and we numerically nd the solution y 1 1: , where the negative solution has a meaning in terms of x. When a is decreased towards zero from above, two solutions still exist: y =?3a and y = 1=a + 3=. Still the negative solution is the meaningful in terms of x. When the solution is known in y it can be found in x as x = s T? y In case of high temperature (ie. large a), we nd x = q T?y1 temperatures (ie. vanishing a) we nd x = T p (1 + p q y 1 =. In case of low p 3). We do now know the positions of the minima of the second derivative of f, and will calculate the value in order to prove the lack of convexity of the MFA. Theorem 5 There exists an x so that the second derivative of f (x) is smaller than or equal to k, where k = y1q1+q1(q1+1)?0: , and y (1+q1) 1 is dened in the previous lemma and q 1 = e y1. Proof By substituting y and q into the second derivative of f in Equation 8, f yq + q(1 + q)? T = (x) = (1 + q) If we no matter the actual value of a uses y = y 1, which always results in two real solutions in x, f (x) = y 1q 1 + q 1 (1 + q 1 )? T = = k? T = (1 + q 1 ) (1 + q 1 ) This is always smaller than or equal to k because > 0 We have know proven, that there always exists an x so that the second derivative of the mean eld approximation of the smoothness term of the weak string will be smaller than k, where k?0: Along with the conditions of convexity of the solution space, we nd, that the solution space will never be convex if >?1=k. 3

G : Statistical Mechanics Notes for Lecture 3 I. MICROCANONICAL ENSEMBLE: CONDITIONS FOR THERMAL EQUILIBRIUM Consider bringing two systems into

G : Statistical Mechanics Notes for Lecture 3 I. MICROCANONICAL ENSEMBLE: CONDITIONS FOR THERMAL EQUILIBRIUM Consider bringing two systems into G25.2651: Statistical Mechanics Notes for Lecture 3 I. MICROCANONICAL ENSEMBLE: CONDITIONS FOR THERMAL EQUILIBRIUM Consider bringing two systems into thermal contact. By thermal contact, we mean that the