STATISTICAL ANALYSIS OF CURVE FITTING IN ERRORS IN-VARIABLES MODELS ALI AL-SHARADQAH

Size: px

Start display at page:

Download "STATISTICAL ANALYSIS OF CURVE FITTING IN ERRORS IN-VARIABLES MODELS ALI AL-SHARADQAH"

Janice Dickerson
5 years ago
Views:

1 STATISTICAL ANALYSIS OF CURVE FITTING IN ERRORS IN-VARIABLES MODELS by ALI AL-SHARADQAH NIKOLAI CHERNOV, COMMITTEE CHAIR WEI-SHEN HSIA CHARLES KATHOLI IAN KNOWLES BORIS KUNIN A DISSERTATION Submitted to the faculty of the University of Alabama at Birmingham, in partial fulfillment of the requirements of the degree of Doctor of Philosophy BIRMINGHAM, ALABAMA 2011

2 STATISTICAL ANALYSIS OF CURVE FITTING IN ERRORS IN-VARIABLES MODELS ALI AL-SHARADQAH APPLIED MATHEMATICS ABSTRACT This dissertation is devoted to the problem of fitting geometric curves such as lines, circles, and ellipses to a set of experimental observations whose both coordinates are contaminated with noisy errors. This kind of regression is called Errors-in-Variables models (EIV), which is quite different and much more difficult to solve than the classical regression. This research study is motivated by the wide range of EIV applications in computer vision and image processing. We adopted statistical assumptions suitable for these applications and we studied the statistical properties of two kinds of fits; geometric fit and algebraic fit for line, circle and ellipse fittings. The main contribution of the dissertation is proposing several fits for both circle and ellipse fitting problems. These proposed fits were discovered after we developed our unconventional statistical analysis that allowed us to effectively assess EIV parameter estimates. This approach was validated through a series of numerical tests. We theoretically compared the most popular fits for circles and ellipses to each other and we showed why, and by how much, each fit differs from others. Our theoretical comparison leads to new unbeatable fits with superior characteristics that surpass all existing fits theoretically and experimentally. Another contribution is discussing some statistical issues in circle fitting. We proved that the most popular and accurate fits have infinite absolute first moment while the one with finite first moment is, paradoxically, the least accurate and has the heaviest bias. Also, we proved that the geometric fit returns absolutely continuous estimator. Keywords: errors in variables (EIV) models, conic fitting, circle fitting, geometric estimation, orthogonal least squares ii

3 DEDICATION TO MY BELOVED PARENTS TO MY WIFE OLA NUSIERAT TO MY KIDS KAREEM AND JANA iii

4 ACKNOWLEDGEMENTS I would like to express deepest gratitude to my advisor Professor Nikolai Chernov. Without his support, I will never successfully finish my dissertation. Without his guidance and giving me his precious time, I will never become a independent researcher with solid mathematical and statistical background. To the faculty of the U.A.B. s mathematics department, in particular, R. Weikard, Y. Karpeshina, G.Stolz, Y. Zeng. I thank you for your continuous support and encouragement. Also, I would like to express a deep gratitude to my PhD committee members, W. Hsia, C. Katholi, I. Knowles, B. Kunin. Of course, I am grateful to my parents, my wife, and my family. Last but not least, to my friends Rami Alahmad, Kendrick White, Derar Issa, Mohammad Muzyan, Ali Shati, and Susan Abdoli. Thank you all for your love, support, and continuous encouragement. iv

5 TABLE OF CONTENTS ABSTRACT ii DEDICATION iii ACKNOWLEDGEMENTS iv LIST OF TABLES viii LIST OF FIGURES ix CHAPTER 1. INTRODUCTION Statistical Model Geometric Fitting Linear fitting Geometric Fitting for Nonlinear Models MLE Versus Geometric Fit Statistical Properties of EIV Models Scope of the Dissertation Algebraic Fits and Weighted Algebraic Fits Algebraic Circle Fits Algebraic Conic Fits Weighted Algebraic Conic Fits Organization of the Dissertation CHAPTER 2. CIRCLE ALGEBRAIC FITS AND THE PROBLEM OF THE MOMENTS Algebraic Circle Fits Kåsa Fit Pratt s Fit Taubin s Fit Numerical Experiments Hyperaccurate Fit The Problem of the Moments Proofs Regularity Condition Pratt s Fit Taubin s Fit Hyper Fit Kåsa s Fit CHAPTER 3. ERROR ANALYSIS: A GENERAL SCHEME v

6 3.1. Introduction and Notations Asymptotic Models The Errors Consistency Versus Geometric Consistency Taylor Expansion Example: Geometric Linear Fitting Kanatani Cramér Rao Lower Bound (KCR) Cramér Rao Lower Bound for General Curve Fitting in EIV Models KCR Components of MSE Classification of Higher Order Terms Kanatani s Classification of Higher Order Terms Total Mean Square Error Covariances Assessing the Quality of Estimators MSE of ˆβ Adjusted Maximum Likelihood Estimator (AMLE) AMLE for Linear Regression Numerical Experiments CHAPTER 4. GEOMETRIC FITTING FOR CIRCLES Introduction Existence of Densities for the Circle Fit Error Analysis of the Geometric Circle Fit Mean Square Error of ˆΘ Classifications of Terms MSE of 2 ˆΘ cov( 1 ˆΘ, 3 ˆΘ) Final formula for the MSE Numerical Experiments CHAPTER 5. ERROR ANALYSIS OF ALGEBRAIC CONIC FITS Conic Fitting and General Framework Statistical Model Matrix Perturbation Variance of Algebraic Fits Bias of Algebraic Fits Error Analysis of Popular Algebraic Linear Fits Our Proposed Linear Fit Error Analysis of Popular Algebraic Circular Fit Comparison of Various Circle Fits Bias of Pratt s and Taubin s fits Transition Between Parameter Schemes Variance and Bias of Algebraic Circle Fits in the Natural Parameters Numerical Experiment Novel Circle Fits Based on our Statistical Analysis Hyperaccurate Algebraic Fit vi

7 Improved Hyperaccurate Fit Adjusted MLE for Circle Fitting Experimental Tests Error Analysis for Algebraic Ellipse Fits Variance and Biase of Algebraic Ellipse Fits Comparison of Various Algebraic Conic Fits Our Proposed Fits for Conic Fitting Experimental tests Summary CHAPTER 6. NUMERICAL EFFICIENT SCHEMES FOR ELLIPSE FITTING Motivation Gradient Weighted Algebraic Fit Numerical Schemes for Conic Fitting Numerical Schemes Solving the Variational Equation (6.14) and its variants Renormalization Scheme Reduced Scheme Matrix Perturbation and Covariance Matrix Variance Useful Identities Statistical Analysis of Popular Conic Fits Bias of Reduced Scheme Bias of GRAF Error Analysis of Fits Based on Generalized Eigenvalue Problem Bias of Renormalization Scheme Proposed Algorithm Numerical Experiments CHAPTER 7. VALIDATION OF THE ERROR ANALYSIS Validation of our Approximative Formulas General criterion Linear Regression Circular Regression Conclusions REFERENCES Appendix. Appendix Proof of Some Theorems and Lemmas vii

8 LIST OF TABLES 3.1 The order of magnitude of the four terms in (3.35) Mean square error (and its components) for geometric circle fit (10 4 values are shown). In this test n points are placed (equally spaced) along a semicircle of center (0, 0) and radius R = 1. The noise level is σ = Mean square error (and its components) for four circle fits (10 4 values are shown). In this test n = 100 points are placed (equally spaced) along a semicircle of radius R = 1 and the noise level is σ = Mean square error (and its components) for four circle fits (10 4 values are shown). In this test n = 100 points are placed (equally spaced) along a semicircle of radius R = 1 and the noise level is σ = Mean square error (and its components) for four circle fits (10 6 values are shown). In this test n = points are placed (equally spaced) along a semicircle of radius R = 1 and the noise level is σ = Mean square error (and its components) for four circle fits (10 6 values are shown). In this test n = 100 points are placed (equally spaced) along a semicircle of radius R = 1 and the noise level is σ = Mean square error of ˆR (and its components) for four circle fits (10 6 values are shown). In this test n = 20 points are placed (equally spaced) along a semicircle of radius R = 1 and the noise level is σ = The nonessential bias for both Least square fit and Taubin s fit viii

9 LIST OF FIGURES 1.1 Dot-dash line represents the geometric fit, dashed line represents the classical fit, while the solid line is the true line Normalized mean square error for various circle fits versus the noise σ Optional caption for list of figures RMSE( ˆβ) σ for MLE and AMLE versus σ. The horizontal line represents KCR divided by σ This diagram commutes, i.e., G L = M G The normalized root mean square error, RMSE(Â) σ, versus σ The arc containing the true points MSE for various circle fits (on the logarithmic scale) versus the sample size n (from 10 to 10 4 ) Normalized root mean square error for two algebraic ellipse fits, LSF (green) and Taubin (red), and KCR (green). In this test n = 100 points are placed (equally spaced) along half of ellipse of center (0, 0), major axis 100, and minor axis 50. The noise level σ varies from 0 to (10 4 )values are shown of the normalized root mean square error, i.e. RMSE σ, and the bias for three algebraic ellipse fits; Taubin fit (solid line), Hyper fit (dots), and Improved Hyper fit (dashes) while KCR is represented by the dash-dot line. 100 points are placed (equally spaced) along half of ellipse of center (0, 0), major axis 100, and minor axis 50. σ varies from.001 to ix

10 5.6 (10 4 )values are shown of the normalized root mean square error, i.e. RMSE σ, and the bias for three algebraic ellipse fits; Taubin fit (dash-dot), Hyper fit (dots), and Improved Hyper fit (dashes). 50 points are placed (equally spaced) along quarter of ellipse centered at (0, 0) with major axis 100, and minor axis 50. The noise level σ varies from.001 to 0.6. KCR is represented by the solid line T-Taubin, HF-Hyperfit, IH1-Improved Hyperfit (Ours 1), F-FNS, H-HEIV,R- Reduced, R-Renormalization, IH2-Ours 2, and D min is the trace of KCR, tr( M ) The p L (dashed line) and p Q (solid line) versus σ The p L (dashed line) and p Q (solid line) versus σ; this figure corresponds to the radius estimate The p L (dashed line) and p Q (solid line) versus σ; this figure corresponds to the center estimate x

11 CHAPTER 1 INTRODUCTION In classical regression, one usually concerns about the relationship between two variables x and y with the functional relation y = g(x). The variable x is called regressor and the variable y is called dependent variable. Then the question raises immediately what form g(x) could take? It might be a linear relationship, i.e. g(x) = α + βx or quadratic g(x) = αx 2 + βx + γ and so on. Thus it is standard to record a set of observations and plot them to depict such a model. The next question is how we can find the best values of the parameter vector Θ = (α, β,...) such that y g(x, α, β,...). As a principle in classical regression, the variable x is considered as error-free, while the dependent variable y is contaminated by some error. In other words, suppose n experimental observations (namely m i = (x i, y i ) T, i = 1,..., n) were recorded, then y i = g(x i, Θ) + ε i, i = 1,..., n where each ε i, for i = 1,..., n is a small random error. As a standard assumption each ε i is independently distributed normal random variable with mean 0 and variance σ 2. If σ 2 is unknown, then it is called nuisance (or latent) parameter. There are several ways to estimate Θ, but the most appealing one is called the Maximum Likelihood Estimator (MLE), which can be found by maximizing the likelihood function. The latter is the product of probabilities of observed deviations. i.e. L(Θ) = 1 (2π) 1 P n e 2σ n/2 2 (y i g(x i,θ)) 2 1

12 Instead, one can equivalently find ˆΘ ML by minimizing the log L(Θ), i.e. (1.1) F c = (y i g(x i, Θ)) 2 min If g(x) = α + βx, then classical linear regression becomes minimizing the objective function (1.2) F c = (y i (α + βx i )) 2, and hence the MLE is (1.3) ˆβc = s xy /s xx and ˆα c = ȳ β x where we use standard notation for sample means x = 1 n xi and ȳ = 1 n yi and for the components of the so called scatter matrix : (1.4) s xx = (x i x) 2, s yy = (y i ȳ) 2, s xy = (x i x)(y i ȳ). The classical linear regression was published by Legendre in 1805, and by Gauss in The estimators ˆα c and ˆβ c have excellent statistical properties. They are optimal in all senses. The classical regression is built according to the assumption that the regressor variable x is error-free. However, in the case that some regressors have been measured with errors, the standard assumption leads to inconsistent estimates. Their biases persist even in very large samples and thus the above estimators (ˆα c, ˆβ c ) lose their appealing features. Regression models in which all variables are subject to errors are known as Errors- In-Variables (EIV) models [61, 17, 67]. The EIV regression problem is quite different (and far more difficult) than the classical regression. The EIV regression, even in the linear case, presents challenging questions and leads to some counterintuitive results. 2

13 Fitting straight lines to observed data when both coordinates x and y are corrupted by errors dates back to the 1870s when Adcock published two papers [1, 2]. He derived formulas for the slope and the intercept estimates of the fitting line. His calculations were based on geometric rather than statistical consideration. Statisticians have studied this problem since 1901 [58]. An intensive research has been focused on line fitting in the twentieth century because of its applications in economics, sciences, image processing, and computer vision. By the late 1990s most of the major issues of the line fitting appeared resolved. The problem of fitting circles or circular arcs dates back to the 1950s. British engineers and archaeologist examined stone rings in the British Isle. The arrangement of the stones are either a circle or an ellipse (or more rarely, a setting of four stones laid on an arc of a circle). They were concerned about whether the ancient people used common units of length or not [10, 30, 72, 73, 74]. In the 1970s, fitting circles to experimental data took place in microwave engineering. The circular fitting emerged in medicine. For example, in many diseases, such as lung cancer, detection of nodule (which is either circular or elliptical) is a real problem for the diagnosis of lung cancer (the most common cause of the cancer death in the world). This illustrates the importance of contour fitting in medicine. Circular fitting also dominates publications in nuclear physics. In high energy physics millions of particles are born in accelerator. After their circular-shaped trajectories are tracked, their energies can be measured by estimating the radius of their trajectories. As we can see fitting lines, circles, and ellipses has a variety of applications. However, major applications of fitting circles and other geometric shapes appear in computer vision, image processing and pattern recognition (it is considered a basic task in these computer sciences). In this dissertation we study the geometric fitting of some geometric contours (lines, circles, and ellipses). Our work is tailored for image processing applications, and as such we adopt standard statistical assumptions that are appropriate for these 3

14 applications and we develop unconventional approach to study the statistical properties of the geometric fitting. More details will be discussed in the dissertation later Statistical Model Our problem can be formulated as follows. Given n experimental observations (namely m i = (x i, y i ) T, i = 1,..., n) and given a family of curves depending on k parameters θ 1,..., θ k, our goal is to estimate the parameter vector Θ = (θ 1,..., θ k ) T that corresponds to the best fitting curve. For example in fitting lines, we have two parameters to estimate (α, β) that describe the model y = α+βx. Similarly, in fitting circles, we may want to estimate the coordinates of the circle center (a, b) and the radius R that describes the best fit to the experimental observations. Suppose one is fitting curves defined by an implicit equation (1.5) P (x, y; Θ) = 0 Let Θ = ( θ 1,..., θ k ) T denote the true parameter vector corresponding to the true curve. Here we should distinguish between the true points and the observations. We call the point m i = ( x i, ỹ i ) T, for which m i = (x i, y i ) T was an inaccurate measurement and which lies on the true curve the true point. Mathematically, it satisfies the implicit equation (1.6) P ( m i ; Θ) = 0, i = 1,..., n For example, the true points in the linear regression satisfy (1.7) ỹ i = α + β x i, i = 1,..., n, where ( α, β) denote the true parameters. In the circular regression they satisfy (1.8) ( x i ã) 2 + (ỹ i b) 2 = R 2, i = 1,..., n, 4

15 where (ã, b, R) denote the true (unknown) parameters. Therefore (1.9) x i = ã + R cos ϕ i, ỹ i = b + R sin ϕ i, where ϕ 1,..., ϕ n specify the locations of the true points on the true circle. The angles ϕ 1,..., ϕ n are regarded as fixed unknowns and treated as additional (latent) parameters of the model. On the other hand, the observed points m i s are regarded as realizations (observations) of the true points m i. In other words, we assume that each m i is a random perturbation of a true point m i, i.e. m i = m i + e i, where e i = (δ i, ε i ) T is a small random vector. In coordinates, (1.10) x i = x i + δ i, y i = ỹ i + ε i, i = 1,..., n, To understand the statistical properties of an estimator ˆΘ, we should adopt realistic assumptions about the probability distribution of the observations. The noise vectors e i = m i m i are assumed to be independent and have zero mean. The independency assumption is essential in practice and it will be a standard assumption in my dissertation, which is devoted to computer vision applications. For edge detection, pattern recognition and computer vision, detecting (observing) any point m i does not give any information about other points and hence the errors are independent. It is also reasonable to assume that the errors e i s have the same covariance matrix for all points, because we use the same algorithm for edge detection. As a common assumption, we assume that δ i, ε i are independent and normally distributed with a common variance σ 2, i.e. (1.11) δ i N(0, σ 2 ), ε i N(0, σ 2 ), i = 1,..., n The true points ( x i, ỹ i ) can be thought of as fixed, so they are treated as nuisance parameters. This model is known as functional model. Functional model is intensively studied in the literature, and widely used in the applied community, especially in the 5

16 computer vision. Therefore, this model is adopted in our work. Others regard x s, ỹ s as realizations of some underlying random variables. This treatment of the true values is known as structural model. In the next section we will leave statistics for a while and discuss the geometric fit (it is also called orthogonal distance regression (ODR)). Then we discuss MLE in the EIV model and their relationship Geometric Fitting Let us first assume n experimental observations m i = (x i, y i ) T are recorded. If the object of approximation is a functional relation between x and y, y = g(x), then one can obtain the best fitting curve y = g(x) by minimizing the sum of the squares of the vertical distances between g(x i ) and y i. This kind of least square is well know as ordinary least squares. Moreover, it coincides with the MLE obtained via the classical regression whenever the regressors are error-free (see equation (1.1)). But if the object of approximation is a curve in the xy plane, then it is natural to minimize the sum of the squares of the geometric (orthogonal) distances from the observed points m i s to the true curve P (x, y; Θ) = 0. In other words, minimizing the objective function (1.12) F = d 2 i, where d i is the smallest Euclidean distance between m i and a point on the curve P. This procedure is called geometric fit or orthogonal distance regression (ODR). It has been used since 1870 s [1] and is commonly regarded as the best (most accurate) fitting method (under the above assumptions). In the following subsections we will discuss briefly the geometric fitting for lines, circles and conics. The linear regression is used in this dissertation to explain our general analysis, which is devoted mainly to circles and conics fittings. Even though we obtain some results in this simple case, but these results contribute marginally to our main results. 6

17 that Linear fitting. With simple calculus and geometry, one can easily show (1.13) F(α, β) = i d 2 i = 1 (y 1 + β 2 i α βx i ) 2. i The minimizer of (1.13) is called the geometric linear fit. To find the minimizer of (1.13), we differentiate F(α, β) with respect to α and solve equation the F/ α = 0, then we get (1.14) ˆα = ȳ ˆβ x, Substituting ˆα in (1.13) and define by x i = x i x and y i = y i ȳ the centered coordinate of (x i, y i ) yield (1.15) F(β) = β 2 (y i βx i ) 2. from which (1.16) F(β) = s yy 2βs xy + s xx β β 2, which is a function of one variable, and as such its minimum attains when (1.17) ˆβ = s yy s xx + (s yy s xx ) 2 + 4s 2 xy 2s xy. The formula (1.17) holds whenever s xy 0, which is true almost surely. In the case s xy = 0, we have F(β) = (1 + β 2 ) 1 (s yy + s xx β 2 ). Simple algebra leads to the conclusion ˆβ = 0 if s xx > s yy while ˆβ = if s xx < s yy. Until the late 1980s all books and papers discussing fitting lines have dealt with the equation y = α + βx. This model fails if data points lie on the vertical or even nearly vertical lines. An alert reader can clearly see that from the formula (1.17), it leads to numerically unstable problem for this case whenever s xy 0. This turns the 7

18 EIV community to using another line parametrization such as (1.18) Ax + By + C = 0 Hence F(α, β) can be expressed in terms of A = (A, B, C) T as (1.19) F(A) = 1 n(a 2 + B 2 ) (Ax i + By i + C) 2 (Note that we intentionally divide the objective by n which does not change the minimizer). Also note that if A = B = 0 then (1.18) describes the entire plane R 2 if C = 0 and an empty set if C 0. Therefore A 2 + B 2 > 0 is required to exclude meaningless cases. Furthermore both numerator and denominator are quadratic functions of A, thus if we multiply A by a nonzero scalar, the minimizer of (1.19) remains unchanged. Thus it is natural to impose the the geometric constraint: (1.20) A 2 + B 2 = 1, or equivalently A T I 0 A = 1 where (1.21) I 0 = which automatically satisfies the restriction A 2 + B 2 > 0. Now the new problem becomes minimizing the objective function (1.22) ( Axi + By i + C ) 2 subject to A T I 0 A = 1. In fact, the constraint A 2 + B 2 = 1 removes the indeterminacy of multiple solutions but not entirely. So how can we choose the right one? Suppose for a while that the program solving (1.22) gives us A. To obtain the right solution, we should check the sign of C. Accordingly, if C 0, then we choose A itself, otherwise we pick A. 8

19 But, how can we solve equation (1.22)? Since C is unconstrained, it can be eliminated by minimizing F with respect to C. Differentiating F with respect to C and setting F/ C = 0, one has C = A x Bȳ. Hence F turns to be F(A ) = n 1 ( ) Ax i + Byi 2. where A = (A, B) T. To implement linear algebra we define Z i = (x i, y i ) T, M i = Z i Z T i and finally we define M = n 1 M i = x x x y x y y y, where x x = n 1 x i x i, etc. Next we write F as F(A ) = (A ) T MA Minimizing F(A ) subject to (1.20) can be solved by employing Lagrangian multiplier λ and hence solving the variational equation (1.23) MA = λa Since M is positive definite, solving the variational equation (1.23) gives two nonnegative real eigenvalues, say λ 1 λ 2. But which eigenvector should we choose? It is the unit eigenvector A 1 corresponding to λ 1, because F(A 1) = (A 1) T MA 1 = λ A = λ Geometric Fitting for Nonlinear Models. In the dissertation we discuss geometric fitting for two geometric contours, circles and conics. We present them here briefly. 9

20 Circular fitting. Suppose one wants to fit a circle to observed points, then d i in (1.12) stands for the distance from (x i, y i ) to the circle, (1.24) d i = r i R, with r i = (x i a) 2 + (y i b) 2 where (a, b) denotes the center and R the radius of the circle. The geometric circle fit is known as the most accurate circle fit. A major concern with the geometric fit, however, is that the above minimization problem has no closed form solution. All practical algorithms for minimizing F are iterative; some implement a general Gauss-Newton [15, 36] or Levenberg-Marquardt [21] schemes, which is a standard procedure to minimize the nonlinear least squares problems. Others use circle-specific methods, such as proposed by Landau [49] and Späth [69]. The performance of iterative algorithms heavily depends on the choice of the initial guess. They often take dozens or hundreds of iterations to converge, and there is always a chance that they would be trapped in a local minimum of F or diverge entirely. These issues were explored in [21] Conic fitting. The situation in conic fitting becomes worse. Given n points m i, i = 1,..., n, then the orthogonal distance d i between the conic P (x, y; Θ) = 0 and m i has no simple analytic formula. This distance can be computed by solving a polynomial equation of the 4th degree, which is a numerically unstable algorithm. The algorithm returns solutions involving complex numbers [3]. Instead one can use iterative schemes such as Newton s method or its modifications such as Levenberg-Marquart algorithm (which is the most accurate scheme to solve nonlinear least squares problems). However it turns to be extremely expensive and still unreliable. Furthermore, Newton s method requires computing the partial derivatives of d i but the latter has no closed form expression as well, so some use a finite difference approximation, which also turns to be inaccurate. These observations make using Newton s method and its implementation schemes impractical for ellipse 10

21 fitting. Finally, many researchers such as [53] pointed out that Levenberg-Marquart algorithm is a very expensive and it could take up to 1000 iterations to converge MLE Versus Geometric Fit Whenever the estimation of parameters becomes an issue, the Maximum Likelihood Estimator (MLE) takes a prominent position. Since we adopt the functional model, m i s are treated as nonrandom (incidental) parameters constrained through the implicit equation (1.6). Assume that e i s are independent and normally distributed with mean 0 = (0, 0) T and covariance matrix σ 2 I 2, where I 2 is a square identity matrix of size 2. The likelihood function is L(Θ, m 1,..., m n ) = n f(m i ), under the constrains P ( m i, Θ) = 0 for i = 1,..., n where f(m i ) = 1 2πσ 2 e 1 2σ 2 (m i m i ) T (m i m i ), i = 1,..., n Finding ˆΘ, the maximizer of the constraint likelihood function L(Θ, m 1,..., m n ), is not an easy task without implementing the logarithmic function. Thus MLE can be obtained by minimizing log L(Θ, m 1,..., m n ), where log L(Θ, m 1,..., m n ) = i [ 1 2σ 2 (m i m i ) T (m i m i ) + log 2πσ 2 ] Adding a constant to a function does not change the location of its minimizer. This means that the above problem reduces to minimizing the objective function (1.25) F(Θ, m 1,..., m n ) = d 2 i where (1.26) d 2 i = (m i m i ) T (m i m i ) = (x i x i ) 2 + (y i ỹ i ) 2 11

22 is the Mahalanobis distance of m i from m i. The above minimization problem is quite unclear since it depends on the unknown true points m i s. Let us consider the simplest case; fitting lines to data. In this case ỹ i = α + β x i. Hence F can be written as (1.27) F(Θ, m 1,..., m n ) = i [ (xi x i ) 2 + (y i (α + β x i ) 2] Differentiating F with respect to x i gives F(Θ, m 1,..., m n ) x i = 2 [ x i x i + β(y i (α + β x i )) ] To get the MLE of x i we set F(Θ, m 1,..., m n ) x i = 0 and obtain ˆ x i = x i + β(y i α) 1 + β 2, and ˆỹ i = α + βˆ x i If we substitute ˆ x i and ˆỹ i in (1.27), we get (1.13). This means that MLE can be obtained by minimizing the sum of the squares of the geometric (orthogonal) distances from the observed points to the true line. For general nonlinear EIV models the relation between ODR and MLE is much more ambiguous and is barely mentioned in the literature, even though it was established by Chan in 1965 [15]. To present the proof of this fact, we introduce a simple but powerful observation. Definition 1.1 (Minimization-in-steps Technique). Let A and B be two compact subsets of the p, q dimensional Euclidean space, respectively. If h is a real-valued continuous function on the Cartesian product A B of the (p + q) dimensional Euclidean space, then the minimization of h over A B can be taken by steps over A and then over B, that is min A B h = min B ( mina h ) In the functional model, estimating the parameters can be formulated as follows. Let u(t, Θ) and v(t, Θ) be two real-valued functions of k+1 real variables t, θ 1,..., θ k. For each Θ, the point (u(t, Θ), v(t, Θ)) represents a curve in the plane as t varies. 12

23 This means for each ( x i, ỹ i ) there exists t i such that x i = u( t i, Θ), ỹ i = v( t i, Θ), for i = 1,..., n We obviously see that in the circular fitting t i equals to ϕ i given in (1.9). Finally, we assume that the curve represented by the implicit parametrization (u(t, Θ), v(t, Θ)) is smooth. Theorem 1.1. [15] Assume that the noise e i = (δ i, ε i ) satisfies (1.10) and (1.11) for each i = 1,..., n and the primary parameter vector Θ is constrained through the implicit equation (1.6), then the MLE ˆΘ is attained on the curve that minimizes the sum of squares of orthogonal distances to the data points. Proof. According to above discussion, we can write (1.25) as (1.28) F 1 (Θ, t 1,..., t n ) = i [( xi u( t i, Θ) ) 2 + ( yi v( t i, Θ) ) 2] To minimize the objective function (1.28) we use minimization-in-steps technique. Accordingly, we minimize (1.28) with respect to the nuisance parameters t 1,..., t n keeping θ 1,..., θ k as fixed parameters. Then we minimize the resulting function with respect to the primary parameter vector Θ. In other word, the curve P (x, y, Θ) = 0 is kept fixed, and hence the conditional minimization of F(Θ, t 1,..., t n ) is attained when each ˆ t i for i = 1..., n corresponds to the point ˆm i = (ˆx i, ŷ i ), where ˆm i lies on the curve P (x, y, Θ) = 0 and has the shortest distance to the observed point m i. i.e. (1.29) min t 1,..., t n i [ (xi u( t i, Θ)) 2 + (y i v( t i, Θ)) 2] = i d i (Θ) 2, where d i (Θ) denotes the (geometric) orthogonal distance from m i to the curve given in (1.5), i.e. the minimum of F is attained when d i is the orthogonal projection of m i to the curve P (x, y, Θ) = 0. Note that the existence and the uniqueness of the point ˆm i for each i = 1,..., n is ensured because of the smoothness of the u and v that parameterize P (x, y, Θ). 13

24 Next we minimize (1.29) with respect Θ. Thus the minimizer ˆΘ will be the minimizer of the sum of squares of orthogonal distances to the data points. i.e. (1.30) ˆΘ = argmin F1 (Θ, m 1,..., m n ) = argmin d 2 i This completes the proof of the theorem 1.4. Statistical Properties of EIV Models It is a standard procedure in traditional statistics to examine the statistical properties of any estimator such as MLE after adopting some statistical assumptions. Their assessments are based on some standard measures such as efficiency, sufficiency, consistency and others. These measures are mainly based on the bias, variance, mean square error (MSE), Cramér Rao lower bounds (CRB), and the asymptotic behavior of the estimator as n. The nature of the problem in EIV models, however, is intractable. In the linear model, the exact densities of the MLE (ˆα, ˆβ) have awkward probability densities. In fact, those densities are not normal and do not belong to any standard family of probability densities. The formulas of their densities are overly complicated, involve double-infinite series, and it was promptly noted [7, 8] that they were not very useful for practical purposes. In an attempt to understand the nature of the estimates (ˆα, ˆβ), Anderson [7] assumed that δ i and ε i are i.i.d. normal random variables and proved that these estimates have infinite first moment: E( ˆα ) = E( ˆβ ) =. As a result, the bias and the Mean Squared Errors (MSE) are not defined either! These astonishing results led Anderson, Kunitomo, and Sawa [7, 8, 48] to investigate the MLE (ˆα, ˆβ) further. They used Taylor expansions up to terms of order 3 and 4 and they approximated the distribution functions of the MLE for the line; and the resulting approximate distributions had finite moments. Those approximations 14

25 remarkably well agreed with numerically computed characteristics of the actual estimators, at least in simulated experiments of all typical cases. Anderson and Sawa [8] noted that their approximations were virtually exact. At the same time, they compared ˆβ, and the slope estimate of the classical regression line ˆβ C, which is optimal when x i s are error-free. However, the latter are quite inaccurate when both variables x and y are observed with random errors and they are known to be heavily biased [7, 19], even though they have finite moments. To demonstrate this phenomenon we generated 10 6 random samples of n = 30 on the true line y = 1 + 3x whose true points are positioned equally spaced on the true line of length L. The noise level σ was set such that σ/l =.05. For each sample we compute both parameter vectors (ˆα, ˆβ) and (ˆα C, ˆβ C ). Finally we compute their averages and draw their corresponding lines as in Figure 1.1. It is clear from the figure how the classical regression estimates underestimate the true value for the parameters. Return back to Anderson (and his coauthors) results. Their results are y=1+3x True line Geometric fit Classical fit Figure 1.1: Dot-dash line represents the geometric fit, dashed line represents the classical fit, while the solid line is the true line. summarized as follows. The mean square error of ˆβ is infinite while that of ˆβ C is finite. 15

26 The estimate ˆβ is consistent and asymptotically unbiased, while ˆβ C is inconsistent and asymptotically biased (except when β = 0). In other words, paradoxically, better estimates have theoretically infinite moments. So in practical terms, what does the nonexistence of moments mean anyway? Anderson explained these controversial properties of the MLE of (ˆα, ˆβ) by the following example. Suppose Z is a random variable with an infinite moment, which is a mixture of a standard normal random variable X, with weight (1 p), and a bad Cauchy random variable Y, with a small weight p. i.e. (1.31) Z = (1 p)x + py Thus, E(Z) = even though p may be very small such as p = This means Z and X appear to be virtually indistinguishable as p is small. Consequently, infinite moments occur because the distributions of these estimates have somewhat heavy tails, even though those tails barely affect the practical performance of the MLE, and as such one can use, in this case, properly constructed approximative distribution. In the nonlinear EIV regression, the probability densities of the MLE can not be obtained explicitly. For example, the MLE s for the geometric parameters for other contours such as circle or ellipse have no closed form and hence it is not known if the probability densities of the center (â, ˆb) and the radius ˆR exist or not. Hence the situation here is even more obscure than it is in the linear model. Furthermore, it was recently discovered [18] that the MLE â, ˆb, ˆR have infinite moments, too, i.e. E( â ) =, E( ˆb ) =, and E( ˆR ) =. On the other hand, the least accurate estimate (the Kåsa fit) returns estimates (a, b, R) that have finite first moment, see [79]. Thus one faces the same methodological questions as in the linear case. Another difficulty added in the EIV account is studying the statistical property called consistency. In traditional statistics, it is natural to seek estimators that converge in probability to the true parameters as n, i.e. lim n ˆθn P = θ. Such estimators are called consistent estimators. This means that as the sample size 16

27 n increases, the distribution of the estimator becomes more and more concentrated near the true value of the parameter. In the EIV functional model the number of (nuisance) parameters grows with n, which makes tracking the consistency an impossible task. To conclude, tracking the statistical properties of the estimator by using standard approaches appears to be impossible so an alternative approach becomes required Scope of the Dissertation Previous discussion leads to immediate methodological questions: (Q1) How can we characterize, in practical terms, the accuracy of estimates whose theoretical MSE is infinite (and whose bias is undefined)? (Q2) Is there any precise meaning to the widely accepted notion that the MLE such as (ˆα, ˆβ) or (â, ˆb, ˆR) are best? My dissertation is a part of a bigger project whose purpose is to revisit some difficulties in the EIV regression studies and develop an unconventional approach to their resolution. The main goal of our dissertation is to answer the above questions and many other issues in the EIV regression that remain largely unresolved. Our approach is tailored for image processing applications, where the number of observed points (pixels) is limited but the noise is small. Our general goal is to develop a simplified error analysis that nonetheless allows us to effectively assess the performance of existing fitting algorithms. In addition, based on our approach one can design algorithms with increased accuracy. Our study for circular and conic regressions will not be complete without discussing algebraic fits and weighted algebraic fits. Thus before closing this chapter we discuss them briefly, then we discuss them in full details thorough the dissertation Algebraic Fits and Weighted Algebraic Fits Since it is an iterative and hence expensive method, the geometric fit requires an initial guess, which should be chosen carefully to ensure the convergence of the 17

28 algorithm to the minimizer of the objective function. Consequently, alternative methods must be used to obtain accurate and non iterative estimates. Usually, instead of computing and minimizing geometric distances, one minimizes various algebraic distances, thus such methods are known as algebraic fits. Such fits are non iterative, simpler and faster, but usually most of them are less accurate than the geometric fit. Several approximations have been designed which take the forms (1.32) F 1 (Θ) = [ P (mi, Θ) ] 2 or F 2 (Θ) = [ w i P (mi, Θ) ] 2 for some weight w i which depends on the parameters and the experimental observations Algebraic Circle Fits. In the course of circle fitting, P, takes the form (1.33) A(x 2 + y 2 ) + Bx + Cy + D = 0. From now we denote by A = (A, B, C, D) the vector of all the algebraic circle parameters A, B, C, D. The simplest fit is known as Kåsa fit [47], which uses the constraint A = 1. Kåsa fit is common in practice and works very well for full circle or a long circular arc but it is heavily biased toward smaller circles when data are clustered along a short arc. Pratt (and independently Chernov-Ososkov) [23, 60] proposed another fit by choosing w i carefully. They minimize F 1 subject to B 2 + C 2 4AC = 1. Their choice leads to a significant reduction of the bias. Another elegant choice of w i is proposed by Taubin [71]. The latter works also very well for conic fitting. A complete analysis about these fits will be discussed in full details through the dissertation Algebraic Conic Fits. In conic fitting many more non iterative algebraic fits were designed by Bookstein [13], Fitzgibbon et al. [28, 29], and others. More details of these fits and others will be given in the dissertation. But all these algebraic fits are not statistically optimal in a sense that they do not achieve Kanatani Cramèr Rao lower bound (KCR) (full details for these concepts and the proofs are 18

29 explained in Chapters 4,5 and 6). Thus other types of conic fits become quite reasonable Weighted Algebraic Conic Fits. The geometric fit is the most accurate method, but this fit has some disadvantages as we described above force the scientists to develop iterative methods that converge to statistically optimal estimates. These methods usually take few iterations to converge. Usually they require 2 to 4 iterations to attain the optimal estimates. These estimates coincides with the geometric fit up to the leading order. The first estimate is obtained by approximating the geometric fit so it is known as approximating maximum likelihood estimate, which is also called Gradient Weighted Algebraic Fit (GRAF). The estimate is a minimizer for the objective function which takes the formal expression F 2 as given via equation (1.32). Finding the minimizer of this objective function leads to a need to solve an equation called the variational equation, which in turn allows the scientist to propose several schemes to solve it. The first attempt was in 1997, where Leedan and Meer [50] proposed a scheme to solve the variational equation using the generalized eigenvalue problem and they called their scheme by Heteroscedastic Errors In Variables (HEIV) method. Another scheme was proposed in 2000 by Chojnacki et al. [25]. They proposed another iterative scheme to solve the variational equation and they called their scheme by Fundamental Numerical Scheme (FNS). As a final remark, even though minimizing the same objective function, these estimates take different paths to converge to the minimizer. Another clever technique, used to obtain an optimal solution and reduce the bias of the parameter estimates, was proposed in the middle of the 1990s by Kanatani who proposed the first and the second-order renormalization schemes which soon took a prominent place in the study of curve fitting in computer vision. Kanatani developed his method in the early 1990s aiming at the reduction of bias and minimizing the variance in two standard image processing tasks, one of which was fitting ellipses to data and the other was the fundamental matrix computation. 19

30 Kanatani s first publications were complicated and hard to understand. It took the computer vision community almost a decade to fully understand Kanatani s method when Chojnacki et. al. [24] compared Kanatani s method to other popular fitting algorithms and placed all of them within a unified framework. Kanatani s method does not minimize any objective function. Instead, he approximated the variational equation by another equation. Then he solved the resulting equation using generalized eigenvalue problem. Another novel scheme is derived in this dissertation to obtain a statistically optimal estimate for conic fitting. Our method does not only attain KCR, but also it has zero bias up to the second leading order as we discuss in Chapter Organization of the Dissertation My dissertation is organized as follows. Chapter 2 discusses the problem of infinite moments for the most popular circular algebraic fits including ours. Chapter 3 presents the general scheme of our error analysis under general statistical assumptions and highlights the procedures to compare two estimators on the same parametric space based on their statistical efficiency, to the leading order. The general scheme is explained by the simplest EIV models, the linear regression. Chapter 4 is devoted to the geometric circular fit. In the first part of the chapter we provide a complete proof for the existence of the probability densities of the geometric fit (i.e. the MLE of (â, ˆb, ˆR)). The proof is based on Lebesgue measure theory. In the second part we apply our general scheme to geometric circular fit. Chapter 5 discusses algebraic fits for lines, circles and ellipses. We put all fits in a unified framework and we discuses their error analysis in general, then we study each fit separately. Accordingly, we compare the most popular circular algebraic fits together with the geometric fit. Such comparison leads us to a non iterative algebraic fit that outperforms the (usually considered unbeatable ) geometric fit. The new method is called Hyperaccurate fit (or Hyperfit for short). Also we will 20

31 develop a new circle fit, which is considered as a modification of MLE, thus we call it Adjusted Maximum Likelihood Estimator (AMLE). Similarly we compare between several algebraic ellipse fits and propose a modified version of Hyperfit for ellipse fitting. The latter proposed method for ellipse fitting outperforms all algebraic fits but not the geometric fit. Chapter 6 surveys some numerical schemes for ellipse fitting and discusses their statistical properties through our analysis. Such a work allows us to present another fit that outperforms all existing numerical schemes for ellipse fitting together with the geometric fit. Lastly, Chapter 7 is devoted to validation our general theory practically through some numerical experiments. The work presented in Chapters 3, and parts of Chapters 4 and 5 that are related to circle fitting problem were published in Electronic Journal of Statistics in The contents of chapter 7 is accepted for publication in Journal Theory of Probability and Mathematical Statistics. Chapter 2 represents a major portion of a third manuscript. Last but not the least, the new results for ellipse fitting problem presented in chapters 5 and 6 will be submitted soon for publication. 21

32 CHAPTER 2 CIRCLE ALGEBRAIC FITS AND THE PROBLEM OF THE MOMENTS We discussed briefly in Chapter 1 the geometric fit for circles (or MLE under our assumption). We discussed the difficulties and the disadvantages to implement it practically together with the infinite moment problem that MLE has. We also mentioned the most popular non-iterative algebraic circle fits such as Kåsa, Pratt, Taubin, which are also considered as approximations to MLE. In this chapter we will discuss these fits and their practical implementations in details. Also we will discuss briefly our fit which we proposed in [4] according to our general analysis (see Chapters 3 and 5). We called this fit Hyperaccurate fit (or Hyper fit for shortcut). Accordingly, we will investigate one of spectacular features of these fits. We prove that each one of the fits, Pratt s fit and Taubin s as well as Hyperaccurate fit, returns center (a, b) and radius R whose first moment is infinite. This chapter is organized as follows. Section 2.1 reviews the most popular algebraic fits. Section 2.2 states our main theorem regarding the nonexistence of the moments of the center and the radius for the Pratt fit, the Taubin fit, and Hyper fit. Section 2.3 provides a proof for our theorem. Finally, Section 2.4 shows how these estimates pose this property but the Kåsa fit does not Algebraic Circle Fits We describe the most popular algebraic circle fits here Kåsa Fit. The simplest and fastest method was introduced in the 1970s by Delogne [27] and Kåsa [47], and then rediscovered and published independently 22

Fitting circles to scattered data: parameter estimates have no moments

arxiv:0907.0429v [math.st] 2 Jul 2009 Fitting circles to scattered data: parameter estimates have no moments N. Chernov Department of Mathematics University of Alabama at Birmingham Birmingham, AL 35294