Branch-and-Bound Algorithm. Pattern Recognition XI. Michal Haindl. Outline

Size: px

Start display at page:

Download "Branch-and-Bound Algorithm. Pattern Recognition XI. Michal Haindl. Outline"

Daisy Shepherd
5 years ago
Views:

1 Branch-and-Bound Algorithm assumption - can be used if a feature selection criterion satisfies the monotonicity property monotonicity property - for nested feature sets X j related X 1 X 2... X l the criterion function J(X j ) satisfies J(X 1 ) J(X 2 )... J(X l ) top down search with backtracking 1 All possible discarded features are represented in a tree the tree is scanned in top-down and right to left manner. The tree has l l levels. 2 If J node < J discard the whole branch, if J node > J continue along the branch. 3 If J terminal node > J update J. c M. Haindl MI-ROZ /13 Branch-and-Bound Algorithm 2 Outline Outline Pattern Recognition XI Michal Haindl Faculty of Information Technology, KTI Czech Technical University in Prague Institute of Information Theory and Automation Academy of Sciences of the Czech Republic Prague, Czech Republic Evropský sociální fond. MI-ROZ /Z Praha & EU: Investujeme do vaší budoucnosti c M. Haindl MI-ROZ /13 January 16, 2012 Outline finds an optimal feature subset if monotonicity condition holds effective search organization prohibitive for large l 1 Feature Selection Branch-and-Bound Algorithm 2 Feature Extraction Karhunen-Loeve Expansion c M. Haindl MI-ROZ /13 c M. Haindl MI-ROZ /13

2 Sequential Forward / Backward Selection Branch-and-Bound Algorithm 2 suboptimal if branch-and-bound computationally unfeaseble Sequential Forward Selection (SFS) - bottom-up process suppose k features were selected from X to form X k, the k +1 feature selected from remaining X so that finds an optimal feature subset if monotonicity condition holds effective search organization prohibitive for large l Example 3 from 5 J(X k+1 ) = maxj(x k y j ) y j X X k X 0 not possible to remove a superfluous feature (as a result of including other measurement) c M. Haindl MI-ROZ /13 c M. Haindl MI-ROZ /13 Sequential Forward / Backward Selection 2 Branch-and-Bound Algorithm 2 Sequential Backward Selection (SBS) - top-down process suppose k features were removed from the set of measurements X to form X l k, the k +1 feature to be eliminated is chosen so that J(X l k 1 ) = maxj(x l k y j ) y j X l k finds an optimal feature subset if monotonicity condition holds effective search organization prohibitive for large l X l X once a measurement is discarded no revision possible computatinally more complex than SFS l,l 1,..., l continues monitoring of the amount of information loss c M. Haindl MI-ROZ /13 c M. Haindl MI-ROZ /13

3 Feature Extraction Monte Carlo Methods information compression by mapping nonlinear e.g. X generated by AR parameter space computationally unfeasible, analytically untractable linear Ẍ = A T X Simulated annealing min J(X) (stochastic hill-climbing) 1 select an annealing schedule (T i ), select initial solution X i 2 X i X i+1, J = J(X i+1 ) J(X i ) 3 accept change with the probability P = exp{ J T i } c M. Haindl MI-ROZ /13 c M. Haindl MI-ROZ /13 Feature Extraction Genetic Algorithms information compression by mapping nonlinear e.g. X generated by AR parameter space computationally unfeasible, analytically untractable linear Ẍ = A T X parallel test-and-go - predefined number of solutions (binary strings) is modified and tested simultaneously 1 select initial population { k X i,i = 1,...,n} 2 apply reproduction, crossover and mutation, evaluate the criterion J( k X i ) 3 only the best ones survive to the next generation k +1 c M. Haindl MI-ROZ /13 c M. Haindl MI-ROZ /13

4 Karhunen-Loeve Expansion 2 Parametric Measures Ẍ = l j=1 x j u j = A T X u j,λ j are eigenvectors, eigenvalues of Φ λ 1 λ 2..., λ l... λ l ǫ = u T j λ j u j = l ǫ min λ l+1,...,λ l minimal Mahalanobis distance (Ẍ, µ) λ j J M = l (ẍ j µ i,j ) 2 j=1 λ j c M. Haindl MI-ROZ /13 PDM & linear transformation Chernoff s < 0,1 > µ i = A T µ i Σ i = A T Σ i A J C (A) = 1 2 s(1 s)(µ 2 µ 1 ) T A[(1 s)a T Σ 1 A+sA T Σ 2 A] 1 A T (µ 2 µ 1 ) ln (1 s)at Σ 1 A+sA T Σ 2 A A T Σ 1 A 1 s A T Σ 2 A s optimal solution - numerical search in the gradient direction J C(A) if Σ 1 = Σ 2 = Σ A = Σ 1 (µ 2 µ 1 ) if µ 2 = µ 1 A matrix of ranked eigenvectors of Σ 1 2 Σ 1 c M. Haindl MI-ROZ /13 Karhunen-Loeve Expansion Σ = diag[λ 1,...,λ l] a projection of X onto the K-L coordinate system & subsequent approximation of X with Ẍ, l < l u j j = 1,..., - the complete set orthonormal basis vectors, i.e. X = u T j u i = δ ij x j u j Ẍ = j=1 l j=1 x j u j {u 1,...,u l } : ǫ min = E{(X Ẍ)T (X Ẍ)} ǫ = E{ Φ symmetric, PD x 2 j } = u T j E{XX T }u j = u T j Φu j l l only l independent solutions, i.e. c M. Haindl MI-ROZ /13

Feature Selection. Pattern Recognition X. Michal Haindl. Feature Selection. Outline

Feature Selection. Pattern Recognition X. Michal Haindl. Feature Selection. Outline Feature election Outline Pattern Recognition X motivation technical recognition problem dimensionality reduction ց class separability increase ր data compression (e.g. required communication channel capacity)