ROBUST ESTIMATOR FOR MULTIPLE INLIER STRUCTURES

Similar documents
Nonrobust and Robust Objective Functions

Robust Estimation for Computer Vision using Grassmann Manifolds

Linear Dimensionality Reduction

Minimum Hellinger Distance Estimation in a. Semiparametric Mixture Model

nonrobust estimation The n measurement vectors taken together give the vector X R N. The unknown parameter vector is P R M.

Data Mining. Dimensionality reduction. Hamid Beigy. Sharif University of Technology. Fall 1395

Deep Learning: Approximation of Functions by Composition

Lecture 8: Interest Point Detection. Saad J Bedros

Algorithms for Computing a Planar Homography from Conics in Correspondence

MULTICHANNEL SIGNAL PROCESSING USING SPATIAL RANK COVARIANCE MATRICES

Kernel Methods. Charles Elkan October 17, 2007

Bayesian Optimization

Conditions for Segmentation of Motion with Affine Fundamental Matrix

Machine Learning (CS 567) Lecture 5

PCA and LDA. Man-Wai MAK

Lecture 8: Interest Point Detection. Saad J Bedros

1. Projective geometry

Motion Estimation (I) Ce Liu Microsoft Research New England

Data Preprocessing. Cluster Similarity

Lecture Notes 1: Vector spaces

Lecture 22: Section 4.7

Machine Learning, Fall 2009: Midterm

Method 1: Geometric Error Optimization

A Statistical Analysis of Fukunaga Koontz Transform

ORIE 6326: Convex Optimization. Quasi-Newton Methods

Lecture 3: Pattern Classification. Pattern classification

Clustering by Mixture Models. General background on clustering Example method: k-means Mixture model based clustering Model estimation

The Expectation-Maximization Algorithm

Lecture 4: Probabilistic Learning

PCA and LDA. Man-Wai MAK

MFM Practitioner Module: Risk & Asset Allocation. John Dodson. February 18, 2015

Kalman Filter. Predict: Update: x k k 1 = F k x k 1 k 1 + B k u k P k k 1 = F k P k 1 k 1 F T k + Q

Introduction to Machine Learning

Lecture 4.3 Estimating homographies from feature correspondences. Thomas Opsahl

Solving Corrupted Quadratic Equations, Provably

Vision par ordinateur

Augmented Reality VU numerical optimization algorithms. Prof. Vincent Lepetit

Sparsity Models. Tong Zhang. Rutgers University. T. Zhang (Rutgers) Sparsity Models 1 / 28

APPROXIMATE SOLUTION OF A SYSTEM OF LINEAR EQUATIONS WITH RANDOM PERTURBATIONS

Pose estimation from point and line correspondences

Dominant Feature Vectors Based Audio Similarity Measure

Linear Regression and Its Applications

Detection of signal transitions by order statistics filtering

Pattern Recognition and Machine Learning

Lecture 3: Pattern Classification

IV. Matrix Approximation using Least-Squares

Motion Estimation (I)

One Picture and a Thousand Words Using Matrix Approximtions October 2017 Oak Ridge National Lab Dianne P. O Leary c 2017

CS 195-5: Machine Learning Problem Set 1

1 Probability Model. 1.1 Types of models to be discussed in the course

SLAM Techniques and Algorithms. Jack Collier. Canada. Recherche et développement pour la défense Canada. Defence Research and Development Canada

Grassmann Averages for Scalable Robust PCA Supplementary Material

Linear Algebra, Summer 2011, pt. 3

Machine Learning. Dimensionality reduction. Hamid Beigy. Sharif University of Technology. Fall 1395

EXISTENCE VERIFICATION FOR SINGULAR ZEROS OF REAL NONLINEAR SYSTEMS

Linear Regression. In this problem sheet, we consider the problem of linear regression with p predictors and one intercept,

Data Exploration and Unsupervised Learning with Clustering

A spectral clustering algorithm based on Gram operators

BSCB 6520: Statistical Computing Homework 4 Due: Friday, December 5

SYMBOL RECOGNITION IN HANDWRITTEN MATHEMATI- CAL FORMULAS

Sparse Levenberg-Marquardt algorithm.

Principal Component Analysis

Notes on Discriminant Functions and Optimal Classification

Statistical Machine Learning

Assignment 1 Math 5341 Linear Algebra Review. Give complete answers to each of the following questions. Show all of your work.

Presentation in Convex Optimization

Two-View Segmentation of Dynamic Scenes from the Multibody Fundamental Matrix

UNIVERSITY OF MASSACHUSETTS. Department of Mathematics and Statistics. Basic Exam - Applied Statistics. Tuesday, January 17, 2017

4 Bias-Variance for Ridge Regression (24 points)

8 Numerical methods for unconstrained problems

Mark your answers ON THE EXAM ITSELF. If you are not sure of your answer you may wish to provide a brief explanation.

ROBERTO BATTITI, MAURO BRUNATO. The LION Way: Machine Learning plus Intelligent Optimization. LIONlab, University of Trento, Italy, Apr 2015

Affine Structure From Motion

PCA & ICA. CE-717: Machine Learning Sharif University of Technology Spring Soleymani

High Accuracy Fundamental Matrix Computation and Its Performance Evaluation

Vector spaces. DS-GA 1013 / MATH-GA 2824 Optimization-based Data Analysis.

Performance Comparison of K-Means and Expectation Maximization with Gaussian Mixture Models for Clustering EE6540 Final Project

10. Unconstrained minimization

Maximum Likelihood, Logistic Regression, and Stochastic Gradient Training

Machine Learning. Lecture 4: Regularization and Bayesian Statistics. Feng Li.

CSE 546 Final Exam, Autumn 2013

Convex Optimization in Classification Problems

Robust Scale Estimation for the Generalized Gaussian Probability Density Function

CS168: The Modern Algorithmic Toolbox Lecture #8: PCA and the Power Iteration Method

Edges and Scale. Image Features. Detecting edges. Origin of Edges. Solution: smooth first. Effects of noise

FAST ALGORITHM FOR ESTIMATING MUTUAL INFORMATION, ENTROPIES AND SCORE FUNCTIONS. Dinh Tuan Pham

Metric-based classifiers. Nuno Vasconcelos UCSD

Learning Multiple Tasks with a Sparse Matrix-Normal Penalty

ADAPTIVE ANTENNAS. SPATIAL BF

Least Squares Regression

Robust scale estimate for the generalized Gaussian Probability Density Function

More on Unsupervised Learning

When Dictionary Learning Meets Classification

Linear Algebra Massoud Malek

Hierarchical Boosting and Filter Generation

Gaussian processes. Chuong B. Do (updated by Honglak Lee) November 22, 2008

Dimension Reduction Techniques. Presented by Jie (Jerry) Yu

High-Rank Matrix Completion and Subspace Clustering with Missing Data

Chapter 5 Joint Probability Distributions

Foundations of Computer Vision

Transcription:

ROBUST ESTIMATOR FOR MULTIPLE INLIER STRUCTURES Xiang Yang (1) and Peter Meer (2) (1) Dept. of Mechanical and Aerospace Engineering (2) Dept. of Electrical and Computer Engineering Rutgers University, NJ 08854, USA

Several inlier structures and outliers... each structure with a different scale... the scales also have to be found. How do you solve it? S. Mittal, S. Anand, P. Meer: Generalized Projection Based M-Estimator. IEEE Trans. Pattern Anal. Machine Intell., 34, 2351-2364, 2012. 2

Three distinct steps in Mittal et al. 1. Scale Estimation A scale estimate was obtained with all undetected structures contributing as well. 2. Model Estimation Using Mean Shift The mean shift was more complicated than it should be. 3. Inlier/Outlier Dichotomy The stopping criterion was just a heuristic relation. A different solution: simpler, avoids the above problems. Will examine also the criterions for robustness. 3

Nonlinear objective functions The n 1 < n inlier measurements y i plus outliers, in R l. In general a nonlinear objective function Ψ (y i, β) 0 k i = 1,..., n 1 Ψ( ) R k Ψ (y i, β) outliers i = (n 1 + 1),..., n. e.g. 3 3 fundamental matrix F in y i Fy i = 0. k = 1. In a higher dimensional linear space the vector Ψ can be separated into a matrix of measurements Φ and a corresponding new parameter vector θ Ψ (y i, β) = Φ(y i )θ(β) Φ(y) R k m θ(β) R m k = ζ relations between rows called carriers x [c] i, and θ x [c] i θ 0 c = 1,..., ζ i = 1,..., n 1. 4

There is ambiguity is these equations x [c] i θ 0 which can be eliminated by taking θ θ = 1. The estimator found by the generalized projection based M-estimator (gpbm) was refined by a few percents by the above relation. S. Mittal, P. Meer: Conjugate gradient on Grassmann manifolds for robust subspace estimation. Image and Vision Computing, 30, 417-427, 2012. This procedure can also be used for the new algorithm too, but will not described it in this talk. 5

Ellipse estimation. ζ = 1. Input: [x y] R 2 Nonlin. obj. funct.: (y i y c ) Q(y i y c ) 1 0 where matrix Q is 2 2 symmetric, positive definite and y c is the ellipse center. y i are measurements! Carrier: x = [x y x 2 xy y 2 ] R 5 gives the linear relation x i θ α 0 i = 1,..., n 1 with the scalar α (intercept) pulled out. The relation between the input parameters and θ θ = [ 2y c Q Q 11 2Q 12 Q 22 ] α = y c Qy c 1 6

Beside θ θ = 1 the ellipses also have to satisfy the positive symmetry condition 4θ 3 θ 5 θ 2 4 > 0 (= 1). The nonlinearity of the ellipses is a difficult problem. A carrier x perturbed with noise having s.d. σ from the true value x o, does not have the expectation zero mean E(x x o ) = [0 0 σ 2 since the carrier has x 2, y 2 terms. 0 σ 2 ] 7

Fundamental matrix. ζ = 1. Input: [x y x y ] R 4 Nonlin. obj. funct.: y i Fy i = [x i y i 1] F [x i y i 1] 0 Carrier: x = [x y x y xx xy x y yy ] R 8 gives the linear relation x i θ α 0 i = 1,..., n 1. 8

Homography. ζ = 2. Input: [x y x y ] R 4 Nonlinear obj. funct.: y i Hy i or [x hi y hi w hi] H[x i y i 1] Direct linear transformation (DLT): with θ = h = vec[h ] and A i a 2 9 matrix [ ] y A i h = i 0 3 x 1 i y i h 0 3 yi y i h 2 0 2. y i h 3 Carriers: x [1] = [ x y 1 0 0 0 x x x y x ] x [2] = [0 0 0 x y 1 y x y y y ] gives two linear relations x [c] i h 0 c = 1, 2 i = 1,..., n 1. 9

Covariance of the carriers The inliers at input have the same l l covariance σ 2 C y. σ is unknown and have to be estimated; det[c y ] = 1. If no additional information, C y = I y. The m m covariances of the carriers are σ 2 C [c] i = σ 2 J x [c] i y i C y J x [c] i y i with the Jacobian matrix J x y = x(y) y = x 1 x 1 y 1 y l.............. x m x m y 1 y l c = 1,..., ζ A carrier covariance depends on the input point.. 10

Ellipse estimation. ζ = 1. The 5 2 Jacobian gives the 5 5 covariance [ ] 1 0 J 2xi y x i y i = i 0. 0 1 0 x i 2y i Fundamental matrix. ζ = 1. The 8 4 Jacobian matrix gives the 8 8 covariance 1 0 0 0 x i y i 0 0 J x i y i = 0 1 0 0 0 0 x i y i 0 0 1 0 x i 0 y i 0 0 0 0 1 0 x i 0 y i 11

Homography. ζ = 2. The two 9 4 Jacobian matrices give the 9 9 covariances I 2 2 x J i I 2 2 0 2 = 0 0 x [1] 4 4 i y 2 yi i 0 2 0 2 0 J x [2] i y i = I 2 2 y i I 2 2 0 2 0 4 3 0 2 0 4 0 2 0 0 2 yi 12

1. Scale estimation for an inlier structure M elemental subsets, each containing the minimal number of points defining θ. The α is computed from θ. If ζ > 1, we want to have only a one-dimensional null space, R. Should take into account only one x [c]. For each elemental subset θ, α, and each point i, consider only the largest Mahalanobis distance from α x [ c] d [ c] i θ α i = θ C [ c] i θ = d i c i = max c=1,...,ζ d[c] i. The carrier vector is x i, covariance matrix is C i, variance in the null space is H i = θ Ci θ = θ J xi y i J x i y i θ. 13

For each elemental subset order the Mahalanobis distances in ascending order, d [i]. Take n r1 n points from min M n r1 i=1 d [i]. If M larger, this minimun is almost sure from an inlier structure. This is the initial set, having the intercept α m. Taking n r1 = 0.05n, or ɛ = 5%, is enough, as will see. All structures have n r1 = 0.05n, where n is the total number of input points. 14

Initial set has n r1 points with the largest distance from α m being d [r1 ]. Increase the distance 2 d [r1] = d [r2]. Find n r2. Continue... Assume that b d [r1 ] = d [rb] satisfies 2[n rb n r(b 1) ] n r(b 1) ˆσ = b 1 d [rb] because the border of an inlier structure is not precise. y = x R 2 x R 5 search is in R, the null space 15

2. Inlier estimation with mean shift N = M/10 new elemental subsets from the inlier points found in the previous step. For each elemental subset, do mean shift in one dimension with the profile κ(u) = K(u 2 ), u 0, [ θ, α ] = arg max θ,α = 1 nˆσ 1 nˆσ arg max θ ( n κ i=1 arg max z ( ) (z z i ) B 1 i (z z i ) n i=1 κ(z) ) having the variance B i = ˆσ 2 Hi = ˆσ 2 θ Ci θ and with all the points z i = x i θ still existing. For each θ, move from z = α to the closest mode ẑ. 16

The profile of Epanechnikov kernel { 1 u (z z κ(u) = i ) B 1 i (z z i ) 1 0 (z z i ) B 1 i (z z i ) > 1 and g(u) = κ (u) = 1 for u 1 and zero for u > 1. An iteration around z = z old gives the new value z new [ n ] 1 [ n ] z new = g (u) g (u) z i i=1 where only the points inside the Epanechnikov kernel around z old are taken into account. The largest mode is ˆα, and the corresponding elemental subset gives ˆθ. Number of inliers is n in, TLS estimates are ˆθ tls and ˆα tls. This is the recovered inlier structure. i=1 17

3. Strength based merge process The strength of a structure s is defined as nin i=1 d = d i s = n in n in d = n2 in nin d. i=1 i The jth structure has: n j inliers and strength s j. Before l = 1,..., (j 1) structures were detected: n l inliers and strength s l. TLS estimate for j and l together gives strength s j,l and will be fused if s j,l > n js j + n l s l n j + n l for an l. The merge is done in the linear space. 18

300 inliers. 300 outliers. Gaussian inlier noise σ g = 20. M = 500. Three similar inlier structures. Final structure: ˆσ = 46.1 and 329 inliers. To merge two linear objective functions, a small angle and a small average distance between the two is enough. The input vector and the carrier vector are the same. 19

The merges of two nonlinear objective functions should be done in the input space and not in the linear space. It was not done here! For two ellipses the overlap area can be computed. They merge if the shared area is large and the major axes are close one to the other. For fundamental matrices (or homographies) first recover the Euclidean geometry, which also means processing in 3D. At least 3 images are required, if there is no additional knowledge. Depth is a strong separator, as will see it in a homography example too. 20

Final classification Continue until you don t have enough points to start another initial set. The structures are sorted by their strengths in descending order. The inlier structures will be always at the beginning of the list. The user is able to specify where to stop and retain only the inliers, which have denser structures. 21

θ 1 x i + θ 2 y i α 0 i = 1,..., n in M = 300 for all multiple line estimations. 100 points, σ g = 5 200 points, σ g = 10 400 unstructured outliers 300 points, σ g = 20 scale : 11.5 16.3 40.1 322.2 structure : 136 202 306 318 strength : 11660 9996 7202 832. The correct inliers are recovered 97 times from 100. Three times the 100 points line didn t recover. scale : 12.06 ± 2.94 18.90 ± 4.32 35.56 ± 10.35 inliers : 121.4 ± 17.8 225.7 ± 32.1 288.3 ± 44.9 22

θ 1 x i + θ 2 y i α 0 i = 1,..., n in 100 points, σ g = 20 200 points, σ g = 10 400 unstructured outliers 300 points, σ g = 5 scale : 57.7 17.4 8.9 428.4 structure : 164 199 293 344 strength : 28971 10134 2627 719. The correct inliers are recovered 96 times from 100. Four times the top structure didn t recover. scale : 8.63 ± 3.02 17.77 ± 3.73 49.72 ± 15.20 inliers : 286.2 ± 24.8 202.7 ± 20.5 151.2 ± 25.2 23

(y i y c ) Q(y i y c ) 1 0 i = 1,..., n in M = 2000 for all multiple ellipse estimations. 200 points, σ g = 3 300 points, σ g = 6 500 unstructured outliers 400 points, σ g = 9 red green blue cyan magenta scale : 9.16 12.77 13.42 146.17 152.70 structure : 181 339 353 396 117 strength : 25486 24785 21188 2349 944. The correct inliers are recovered 98 times from 100. Two times the 200 points ellipse didn t recover. scale : 7.83 ± 2.15 13.99 ± 3.81 17.86 ± 4.37 inliers : 194.2 ± 27.2 319.0 ± 47.3 430.5 ± 59.5 24

(y i y c ) Q(y i y c ) 1 0 i = 1,..., n in 200 points, σ g = 9 300 points, σ g = 6 500 unstructured outliers 400 points, σ g = 3 red green blue cyan magenta scale : 7.75 17.39 17.68 188.67 100.13 structure : 428 333 179 365 79 strength : 62867 21567 9974 2152 1327. The correct inliers are recovered 95 times from 100. Five times the 200 points ellipse didn t recover. scale : 6.34 ± 1.42 13.31 ± 3.45 19.42 ± 5.72 inliers : 404.4 ± 24.9 315.7 ± 29.8 197.1 ± 17.0 25

Ellipse estimation in a real image Image size 200 150. EDISON segmentation system based on the mean shift is applied (top/right) with the default spatial σ s = 7 and range σ r = 6.5 bandwidth. Canny edge detection (bottom/left) with the thresholds of 100 and 200. The strongest three ellipses are drawn superimposed over the edges. The ellipses drawn superimposed over the original image (bottom/right) are correct. 26

Fundamental matrix estimation. M = 1000 for all fundamental matrix estimations. The 546 points pairs were extracted with SIFT with the default distance ratio of 0.8. The first three structures are 160 pairs with ˆσ 1 = 0.42 (red) 147 pairs with ˆσ 2 = 0.59 (green) 60 pairs with ˆσ 3 = 0.34 (blue). 27

Fundamental matrix estimation. The SIFT finds 173 point correspondences. The first structure is 70 pairs with ˆσ = 0.38. 28

Homography estimation M = 1000 for all homography estimations. The SIFT finds 168 point correspondences. The first two structures are 66 pairs with ˆσ 1 = 1.37 (red) 34 pairs with ˆσ 2 = 4.3 (green). 29

Homography estimation The SIFT finds 495 correspondences. The first three structures are 160 pairs with ˆσ 1 = 1.25 (red) 98 pairs with ˆσ 2 = 1.59 (green) 121 pairs with ˆσ 3 = 1.77 (blue). 30

Conditions for robustness For the algorithm to be robust the four parameters M, the number of trial; n out, amount of outliers; ˆσ, the noise of the inlier structure; ɛ, the initial set interact in a complex manner. 31

200 inliers, σ g = 9. 400 outliers. Change the parameters, one at a time n out = 100, 400, 800 σ g = 3, 9, 15 M = 50 4000 ɛ = 1 40%. In each condition do 100 trials and return the average of the counted true inliers over the total number of points classified as inliers. 32

How important is the number of trials M. n out σ g Not much, when M exceeds some value depending on the input data. Increasing to outliers to n out = 900 (σ g = 9), has stronger effect than increasing the noise (n out = 400) σ g = 15. 33

The initial set should be quasi-correct. M = 2000. ɛ = 5% is equivalent to n r1 = 30 points. From the initial set 24 points are between ±2σ g = ±18. From the initial set 29 points are between ±18. 34

Changing the starting ratio. M = 2000. Average real initial set with automatic increase. σ g = 9. σ g changing: initial sets final results Taking ɛ = 5% is enough most of the time. 35

Robustness of the algorithm If the input data is preprocessed and part of the outliers are eliminated, stronger inlier noise will be tolerated. The number of trials M does not matter beyond a value depending on the input data. The ɛ = 5% generally is enough as the starting point. If the number of outliers are three or four times more than the number of inliers, usually the algorithm still will deliver. If not all the inlier structures came out, since the data was too challenging, the recovered inlier structures are correct. 36

Thank You!

Open problems... Estimating the m k matrix Θ and a k-dimensional vector α. Reducing to k independently runs of the algorithm is it enough? An image contains both lines and conics together with outliers. How do you approach it, if all inlier structures should be recovered? You have to process robustly a very large image. Will hierarchical processing from many small images to the large image find all the relevant inlier structures? The covariances of the input points are not equal. Estimate first all the inlier structures with the same ˆσ. After that, for each structure separately, do scale estimation. Will this procedure work all the time? 38