ROBUST ESTIMATOR FOR MULTIPLE INLIER STRUCTURES

ROBUST ESTIMATOR FOR MULTIPLE INLIER STRUCTURES Xiang Yang (1) and Peter Meer (2) (1) Dept. of Mechanical and Aerospace Engineering (2) Dept. of Electrical and Computer Engineering Rutgers University, NJ 08854, USA

Several inlier structures and outliers... each structure with a different scale... the scales also have to be found. How do you solve it? S. Mittal, S. Anand, P. Meer: Generalized Projection Based M-Estimator. IEEE Trans. Pattern Anal. Machine Intell., 34, 2351-2364, 2012. 2

Three distinct steps in Mittal et al. 1. Scale Estimation A scale estimate was obtained with all undetected structures contributing as well. 2. Model Estimation Using Mean Shift The mean shift was more complicated than it should be. 3. Inlier/Outlier Dichotomy The stopping criterion was just a heuristic relation. A different solution: simpler, avoids the above problems. Will examine also the criterions for robustness. 3

Nonlinear objective functions The n 1 < n inlier measurements y i plus outliers, in R l. In general a nonlinear objective function Ψ (y i, β) 0 k i = 1,..., n 1 Ψ( ) R k Ψ (y i, β) outliers i = (n 1 + 1),..., n. e.g. 3 3 fundamental matrix F in y i Fy i = 0. k = 1. In a higher dimensional linear space the vector Ψ can be separated into a matrix of measurements Φ and a corresponding new parameter vector θ Ψ (y i, β) = Φ(y i )θ(β) Φ(y) R k m θ(β) R m k = ζ relations between rows called carriers x [c] i, and θ x [c] i θ 0 c = 1,..., ζ i = 1,..., n 1. 4

There is ambiguity is these equations x [c] i θ 0 which can be eliminated by taking θ θ = 1. The estimator found by the generalized projection based M-estimator (gpbm) was refined by a few percents by the above relation. S. Mittal, P. Meer: Conjugate gradient on Grassmann manifolds for robust subspace estimation. Image and Vision Computing, 30, 417-427, 2012. This procedure can also be used for the new algorithm too, but will not described it in this talk. 5

Ellipse estimation. ζ = 1. Input: [x y] R 2 Nonlin. obj. funct.: (y i y c ) Q(y i y c ) 1 0 where matrix Q is 2 2 symmetric, positive definite and y c is the ellipse center. y i are measurements! Carrier: x = [x y x 2 xy y 2 ] R 5 gives the linear relation x i θ α 0 i = 1,..., n 1 with the scalar α (intercept) pulled out. The relation between the input parameters and θ θ = [ 2y c Q Q 11 2Q 12 Q 22 ] α = y c Qy c 1 6

Beside θ θ = 1 the ellipses also have to satisfy the positive symmetry condition 4θ 3 θ 5 θ 2 4 > 0 (= 1). The nonlinearity of the ellipses is a difficult problem. A carrier x perturbed with noise having s.d. σ from the true value x o, does not have the expectation zero mean E(x x o ) = [0 0 σ 2 since the carrier has x 2, y 2 terms. 0 σ 2 ] 7

Fundamental matrix. ζ = 1. Input: [x y x y ] R 4 Nonlin. obj. funct.: y i Fy i = [x i y i 1] F [x i y i 1] 0 Carrier: x = [x y x y xx xy x y yy ] R 8 gives the linear relation x i θ α 0 i = 1,..., n 1. 8

Homography. ζ = 2. Input: [x y x y ] R 4 Nonlinear obj. funct.: y i Hy i or [x hi y hi w hi] H[x i y i 1] Direct linear transformation (DLT): with θ = h = vec[h ] and A i a 2 9 matrix [ ] y A i h = i 0 3 x 1 i y i h 0 3 yi y i h 2 0 2. y i h 3 Carriers: x [1] = [ x y 1 0 0 0 x x x y x ] x [2] = [0 0 0 x y 1 y x y y y ] gives two linear relations x [c] i h 0 c = 1, 2 i = 1,..., n 1. 9

Covariance of the carriers The inliers at input have the same l l covariance σ 2 C y. σ is unknown and have to be estimated; det[c y ] = 1. If no additional information, C y = I y. The m m covariances of the carriers are σ 2 C [c] i = σ 2 J x [c] i y i C y J x [c] i y i with the Jacobian matrix J x y = x(y) y = x 1 x 1 y 1 y l.............. x m x m y 1 y l c = 1,..., ζ A carrier covariance depends on the input point.. 10

Ellipse estimation. ζ = 1. The 5 2 Jacobian gives the 5 5 covariance [ ] 1 0 J 2xi y x i y i = i 0. 0 1 0 x i 2y i Fundamental matrix. ζ = 1. The 8 4 Jacobian matrix gives the 8 8 covariance 1 0 0 0 x i y i 0 0 J x i y i = 0 1 0 0 0 0 x i y i 0 0 1 0 x i 0 y i 0 0 0 0 1 0 x i 0 y i 11

Homography. ζ = 2. The two 9 4 Jacobian matrices give the 9 9 covariances I 2 2 x J i I 2 2 0 2 = 0 0 x [1] 4 4 i y 2 yi i 0 2 0 2 0 J x [2] i y i = I 2 2 y i I 2 2 0 2 0 4 3 0 2 0 4 0 2 0 0 2 yi 12

1. Scale estimation for an inlier structure M elemental subsets, each containing the minimal number of points defining θ. The α is computed from θ. If ζ > 1, we want to have only a one-dimensional null space, R. Should take into account only one x [c]. For each elemental subset θ, α, and each point i, consider only the largest Mahalanobis distance from α x [ c] d [ c] i θ α i = θ C [ c] i θ = d i c i = max c=1,...,ζ d[c] i. The carrier vector is x i, covariance matrix is C i, variance in the null space is H i = θ Ci θ = θ J xi y i J x i y i θ. 13

For each elemental subset order the Mahalanobis distances in ascending order, d [i]. Take n r1 n points from min M n r1 i=1 d [i]. If M larger, this minimun is almost sure from an inlier structure. This is the initial set, having the intercept α m. Taking n r1 = 0.05n, or ɛ = 5%, is enough, as will see. All structures have n r1 = 0.05n, where n is the total number of input points. 14

Initial set has n r1 points with the largest distance from α m being d [r1 ]. Increase the distance 2 d [r1] = d [r2]. Find n r2. Continue... Assume that b d [r1 ] = d [rb] satisfies 2[n rb n r(b 1) ] n r(b 1) ˆσ = b 1 d [rb] because the border of an inlier structure is not precise. y = x R 2 x R 5 search is in R, the null space 15

2. Inlier estimation with mean shift N = M/10 new elemental subsets from the inlier points found in the previous step. For each elemental subset, do mean shift in one dimension with the profile κ(u) = K(u 2 ), u 0, [ θ, α ] = arg max θ,α = 1 nˆσ 1 nˆσ arg max θ ( n κ i=1 arg max z ( ) (z z i ) B 1 i (z z i ) n i=1 κ(z) ) having the variance B i = ˆσ 2 Hi = ˆσ 2 θ Ci θ and with all the points z i = x i θ still existing. For each θ, move from z = α to the closest mode ẑ. 16

The profile of Epanechnikov kernel { 1 u (z z κ(u) = i ) B 1 i (z z i ) 1 0 (z z i ) B 1 i (z z i ) > 1 and g(u) = κ (u) = 1 for u 1 and zero for u > 1. An iteration around z = z old gives the new value z new [ n ] 1 [ n ] z new = g (u) g (u) z i i=1 where only the points inside the Epanechnikov kernel around z old are taken into account. The largest mode is ˆα, and the corresponding elemental subset gives ˆθ. Number of inliers is n in, TLS estimates are ˆθ tls and ˆα tls. This is the recovered inlier structure. i=1 17

3. Strength based merge process The strength of a structure s is defined as nin i=1 d = d i s = n in n in d = n2 in nin d. i=1 i The jth structure has: n j inliers and strength s j. Before l = 1,..., (j 1) structures were detected: n l inliers and strength s l. TLS estimate for j and l together gives strength s j,l and will be fused if s j,l > n js j + n l s l n j + n l for an l. The merge is done in the linear space. 18

300 inliers. 300 outliers. Gaussian inlier noise σ g = 20. M = 500. Three similar inlier structures. Final structure: ˆσ = 46.1 and 329 inliers. To merge two linear objective functions, a small angle and a small average distance between the two is enough. The input vector and the carrier vector are the same. 19

The merges of two nonlinear objective functions should be done in the input space and not in the linear space. It was not done here! For two ellipses the overlap area can be computed. They merge if the shared area is large and the major axes are close one to the other. For fundamental matrices (or homographies) first recover the Euclidean geometry, which also means processing in 3D. At least 3 images are required, if there is no additional knowledge. Depth is a strong separator, as will see it in a homography example too. 20

Final classification Continue until you don t have enough points to start another initial set. The structures are sorted by their strengths in descending order. The inlier structures will be always at the beginning of the list. The user is able to specify where to stop and retain only the inliers, which have denser structures. 21

θ 1 x i + θ 2 y i α 0 i = 1,..., n in M = 300 for all multiple line estimations. 100 points, σ g = 5 200 points, σ g = 10 400 unstructured outliers 300 points, σ g = 20 scale : 11.5 16.3 40.1 322.2 structure : 136 202 306 318 strength : 11660 9996 7202 832. The correct inliers are recovered 97 times from 100. Three times the 100 points line didn t recover. scale : 12.06 ± 2.94 18.90 ± 4.32 35.56 ± 10.35 inliers : 121.4 ± 17.8 225.7 ± 32.1 288.3 ± 44.9 22

θ 1 x i + θ 2 y i α 0 i = 1,..., n in 100 points, σ g = 20 200 points, σ g = 10 400 unstructured outliers 300 points, σ g = 5 scale : 57.7 17.4 8.9 428.4 structure : 164 199 293 344 strength : 28971 10134 2627 719. The correct inliers are recovered 96 times from 100. Four times the top structure didn t recover. scale : 8.63 ± 3.02 17.77 ± 3.73 49.72 ± 15.20 inliers : 286.2 ± 24.8 202.7 ± 20.5 151.2 ± 25.2 23

(y i y c ) Q(y i y c ) 1 0 i = 1,..., n in M = 2000 for all multiple ellipse estimations. 200 points, σ g = 3 300 points, σ g = 6 500 unstructured outliers 400 points, σ g = 9 red green blue cyan magenta scale : 9.16 12.77 13.42 146.17 152.70 structure : 181 339 353 396 117 strength : 25486 24785 21188 2349 944. The correct inliers are recovered 98 times from 100. Two times the 200 points ellipse didn t recover. scale : 7.83 ± 2.15 13.99 ± 3.81 17.86 ± 4.37 inliers : 194.2 ± 27.2 319.0 ± 47.3 430.5 ± 59.5 24

(y i y c ) Q(y i y c ) 1 0 i = 1,..., n in 200 points, σ g = 9 300 points, σ g = 6 500 unstructured outliers 400 points, σ g = 3 red green blue cyan magenta scale : 7.75 17.39 17.68 188.67 100.13 structure : 428 333 179 365 79 strength : 62867 21567 9974 2152 1327. The correct inliers are recovered 95 times from 100. Five times the 200 points ellipse didn t recover. scale : 6.34 ± 1.42 13.31 ± 3.45 19.42 ± 5.72 inliers : 404.4 ± 24.9 315.7 ± 29.8 197.1 ± 17.0 25

Ellipse estimation in a real image Image size 200 150. EDISON segmentation system based on the mean shift is applied (top/right) with the default spatial σ s = 7 and range σ r = 6.5 bandwidth. Canny edge detection (bottom/left) with the thresholds of 100 and 200. The strongest three ellipses are drawn superimposed over the edges. The ellipses drawn superimposed over the original image (bottom/right) are correct. 26

Fundamental matrix estimation. M = 1000 for all fundamental matrix estimations. The 546 points pairs were extracted with SIFT with the default distance ratio of 0.8. The first three structures are 160 pairs with ˆσ 1 = 0.42 (red) 147 pairs with ˆσ 2 = 0.59 (green) 60 pairs with ˆσ 3 = 0.34 (blue). 27

Fundamental matrix estimation. The SIFT finds 173 point correspondences. The first structure is 70 pairs with ˆσ = 0.38. 28

Homography estimation M = 1000 for all homography estimations. The SIFT finds 168 point correspondences. The first two structures are 66 pairs with ˆσ 1 = 1.37 (red) 34 pairs with ˆσ 2 = 4.3 (green). 29

Homography estimation The SIFT finds 495 correspondences. The first three structures are 160 pairs with ˆσ 1 = 1.25 (red) 98 pairs with ˆσ 2 = 1.59 (green) 121 pairs with ˆσ 3 = 1.77 (blue). 30

Conditions for robustness For the algorithm to be robust the four parameters M, the number of trial; n out, amount of outliers; ˆσ, the noise of the inlier structure; ɛ, the initial set interact in a complex manner. 31

200 inliers, σ g = 9. 400 outliers. Change the parameters, one at a time n out = 100, 400, 800 σ g = 3, 9, 15 M = 50 4000 ɛ = 1 40%. In each condition do 100 trials and return the average of the counted true inliers over the total number of points classified as inliers. 32

How important is the number of trials M. n out σ g Not much, when M exceeds some value depending on the input data. Increasing to outliers to n out = 900 (σ g = 9), has stronger effect than increasing the noise (n out = 400) σ g = 15. 33

The initial set should be quasi-correct. M = 2000. ɛ = 5% is equivalent to n r1 = 30 points. From the initial set 24 points are between ±2σ g = ±18. From the initial set 29 points are between ±18. 34

Changing the starting ratio. M = 2000. Average real initial set with automatic increase. σ g = 9. σ g changing: initial sets final results Taking ɛ = 5% is enough most of the time. 35

Robustness of the algorithm If the input data is preprocessed and part of the outliers are eliminated, stronger inlier noise will be tolerated. The number of trials M does not matter beyond a value depending on the input data. The ɛ = 5% generally is enough as the starting point. If the number of outliers are three or four times more than the number of inliers, usually the algorithm still will deliver. If not all the inlier structures came out, since the data was too challenging, the recovered inlier structures are correct. 36

Thank You!

Open problems... Estimating the m k matrix Θ and a k-dimensional vector α. Reducing to k independently runs of the algorithm is it enough? An image contains both lines and conics together with outliers. How do you approach it, if all inlier structures should be recovered? You have to process robustly a very large image. Will hierarchical processing from many small images to the large image find all the relevant inlier structures? The covariances of the input points are not equal. Estimate first all the inlier structures with the same ˆσ. After that, for each structure separately, do scale estimation. Will this procedure work all the time? 38