Feature Selection. Pattern Recognition X. Michal Haindl. Feature Selection. Outline

Feature election Outline Pattern Recognition X motivation technical recognition problem dimensionality reduction ց class separability increase ր data compression (e.g. required communication channel capacity) or a given amount o data, number o eatures ց a perormance estimate accuracy ր physical physical measurement (e.g. R soil moisture, vegetation cover) data enhancement Michal Haindl Faculty o Inormation Technology, KTI Czech Technical University in Prague Institute o Inormation Theory and Automation Academy o ciences o the Czech Republic Prague, Czech Republic Evropský sociální ond. Praha & EU: Investujeme do vaší budoucnosti MI-ROZ 2011-2012/Z Feature election c M. Haindl MI-ROZ - 10 3/12 Outline c M. Haindl MI-ROZ - 10 1/12 January 16, 2012 Outline motivation technical recognition problem dimensionality reduction ց class separability increase ր data compression (e.g. required communication channel capacity) or a given amount o data, number o eatures ց a perormance estimate accuracy ր physical physical measurement (e.g. R soil moisture, vegetation cover) data enhancement 1 Feature election Probabilistic Dependence Measures c M. Haindl MI-ROZ - 10 3/12 c M. Haindl MI-ROZ - 10 2/12

Feature election / Extraction Feature election some inormation discarded J(Ẍ) = max J() eature extraction Ẍ = (X), l l, l < l all inormation used, compression by mapping motivation technical recognition problem dimensionality reduction ց class separability increase ր data compression (e.g. required communication channel capacity) or a given amount o data, number o eatures ց a perormance estimate accuracy ր physical physical measurement (e.g. R soil moisture, vegetation cover) data enhancement eective mathematical theory only or linear transormation Gaussian data c M. Haindl MI-ROZ - 10 3/12 Feature election / Extraction Feature election / Extraction sensor eature selector/extractor classiier some inormation discarded J(Ẍ) = max J() eature extraction Ẍ = (X), l l, l < l all inormation used, compression by mapping some inormation discarded J(Ẍ) = maxj() eature extraction Ẍ = (X), l l, l < l all inormation used, compression by mapping

Feature election Feature election / Extraction speciication the eature evaluation criterion J(X) the dimensionality o the eature space l the optimization procedure the FE orm o mapping (X) (extractor) J(X) deined in terms o unknown model characteristics P(ω i ),p(x ω i ) estimates error sources suboptimal criterion unctions suboptimal search strategies pd estimation errors (small sample size) numerical errors itting errors c M. Haindl MI-ROZ - 10 5/12 some inormation discarded J(Ẍ) = max J() eature extraction Ẍ = (X), l l, l < l all inormation used, compression by mapping E - perormance optimization - measurement cost reduction E - no direct relation with classiication error FE - no physical eature interpretation Feature election Approaches Feature election Entropies Feature-et earch Algorithms Monte Carlo Techniques (simulated annealing, genetic algorithms) c M. Haindl MI-ROZ - 10 6/12 speciication the eature evaluation criterion J(X) the dimensionality o the eature space l the optimization procedure the FE orm o mapping (X) (extractor) J(X) deined in terms o unknown model characteristics P(ω i ),p(x ω i ) estimates error sources suboptimal criterion unctions suboptimal search strategies pd estimation errors (small sample size) numerical errors itting errors c M. Haindl MI-ROZ - 10 5/12

2 i J can be expressed in the orm o averaged divergence i.e. J F = ( P(ω 1 X) P(ω 2 X) )P(ω 2 X)p(X)dX (s) convex unction, = lim s (s) s then P(error) < (0)P(ω 2)+ P(ω 1 ) J F (0)+ (1) e.g. the averaged divergence the averaged Matusita distance J T J T (s) = ( s 1) 2, J F = J 2 T, (0) = 1, (1) = 0, = 1 P(error) 1 2 (1 J 2 T ) c M. Haindl MI-ROZ - 10 8/12 K = 2 P(error) = 1 2 [1 P(ω 1 )p(x ω 1 ) P(ω 2 )p(x ω 2 ) dx] maxp(error) i p(x ω i ) completely overlap similarly any measure between two pd s J(Ẍ) = (P(ω i ),p(x ω i ),i = 1,2)dX satisying J 0 J = 0 or overlapping p(x ω i ) J = max or nonoverlapping p(x ω i ) can be used or eature selection c M. Haindl MI-ROZ - 10 7/12 2 i J can be expressed in the orm o averaged divergence i.e. J F = ( P(ω 1 X) P(ω 2 X) )P(ω 2 X)p(X)dX (s) convex unction, = lim s (s) s then P(error) < (0)P(ω 2)+ P(ω 1 ) J F (0)+ (1) e.g. the averaged divergence the averaged Matusita distance J T J T (s) = ( s 1) 2, J F = J 2 T, (0) = 1, (1) = 0, = 1 P(error) 1 2 (1 J 2 T ) c M. Haindl MI-ROZ - 10 8/12 K = 2 P(error) = 1 2 [1 P(ω 1 )p(x ω 1 ) P(ω 2 )p(x ω 2 ) dx] maxp(error) i p(x ω i ) completely overlap similarly any measure between two pd s J(Ẍ) = (P(ω i ),p(x ω i ),i = 1,2)dX satisying J 0 J = 0 or overlapping p(x ω i ) J = max or nonoverlapping p(x ω i ) can be used or eature selection c M. Haindl MI-ROZ - 10 7/12

Entropy Measures Gaussian Density observe X and compute P(ω i X) to determine an inormation gain i P(ω i X) = P(ω j X) j i then minimal inormation gain and max. entropy (uncertainty) average generalized entropy o degree α [ K ] JE α = (2 1 α 1) 1 P α (ω i X) 1 p(x)dx i=1 hannon α = 1 J (X ) = min X J (X) K J = P(ω i X) log 2 [P(ω i X)]p(X)dX i=1 Cherno s < 0,1 > J C = 1 2 s(1 s)(µ 2 µ 1 ) T [(1 s)σ 1 +sσ 2 ] 1 (µ 2 µ 1 ) Bhattacharyya + 1 2 ln (1 s)σ 1 +sσ 2 Σ 1 1 s Σ 2 s J B = 1 4 (µ 2 µ 1 ) T [Σ 1 +Σ 2 ] 1 (µ 2 µ 1 )+ 1 2 ln 1 2 (Σ 1 +Σ 2 ) Σ1 Σ 2 c M. Haindl MI-ROZ - 10 11/12 c M. Haindl MI-ROZ - 10 9/12 Feature-et earch Algorithms Probabilistic Dependence Measures or a given l < l direct search evaluation o eectiveness o ( ) l l = l! (l l)! l! i p(x ω i ) = p(x) X,ω i independent, no learning about ω i rom X maxj dependence between r. variable X and a realization o ω i is measured by the distance between p(x),p(x ω i ) i one o p(x ω i ) p(x) all prob. distance measures suit overall dependence e.g. Patrick-Fisher J R = K i=1 { P(ω i ) (p(x ω i ) p(x)) 2 dx }1 2 c M. Haindl MI-ROZ - 10 12/12 c M. Haindl MI-ROZ - 10 10/12

Feature-et earch Algorithms or a given l < l direct search evaluation o eectiveness o ( ) l l = l! (l l)! l! NAA ) Earth Observer 1 Hyperion - 242 spectral channels 1,5 10 17 ( 242 10 combinatorial problem excessive even or moderate l, l c M. Haindl MI-ROZ - 10 12/12