Feature Selection. Pattern Recognition X. Michal Haindl. Feature Selection. Outline

Similar documents
Branch-and-Bound Algorithm. Pattern Recognition XI. Michal Haindl. Outline

Neural Nets in PR. Pattern Recognition XII. Michal Haindl. Outline. Neural Nets in PR 2

Set Theory. Pattern Recognition III. Michal Haindl. Set Operations. Outline

Notation. Pattern Recognition II. Michal Haindl. Outline - PR Basic Concepts. Pattern Recognition Notions

MI-RUB Testing II Lecture 11

MI-RUB Testing Lecture 10

Bootstrap metody II Kernelové Odhady Hustot

Feature selection and extraction Spectral domain quality estimation Alternatives

10. Joint Moments and Joint Characteristic Functions

Statistika pro informatiku

Engineering Decisions

MI-RUB Exceptions Lecture 7

Statistika pro informatiku

Markovské řetězce se spojitým parametrem

Principles of Pattern Recognition. C. A. Murthy Machine Intelligence Unit Indian Statistical Institute Kolkata

Binary Decision Diagrams

Feature selection. c Victor Kitov August Summer school on Machine Learning in High Energy Physics in partnership with

NonlinearOptimization

INPUT GROUND MOTION SELECTION FOR XIAOWAN HIGH ARCH DAM

Linear Quadratic Regulator (LQR) Design I

The integral test and estimates of sums

Linear Quadratic Regulator (LQR) I

Universidad Carlos III de Madrid

Quantum computing. Jan Černý, FIT, Czech Technical University in Prague. České vysoké učení technické v Praze. Fakulta informačních technologií

ENSC327 Communications Systems 2: Fourier Representations. School of Engineering Science Simon Fraser University

Probabilistic Model of Error in Fixed-Point Arithmetic Gaussian Pyramid

A Brief Survey on Semi-supervised Learning with Graph Regularization

Math 754 Chapter III: Fiber bundles. Classifying spaces. Applications

3. Several Random Variables

Maximization of Multi - Information

Multilevel Logic Synthesis Algebraic Methods

ROBUST STABILITY AND PERFORMANCE ANALYSIS OF UNSTABLE PROCESS WITH DEAD TIME USING Mu SYNTHESIS

Information Theory Primer:

PATTERN RECOGNITION AND MACHINE LEARNING

Reliability Assessment with Correlated Variables using Support Vector Machines

Answer Key-Math 11- Optional Review Homework For Exam 2

5. Density evolution. Density evolution 5-1

Winter 2019 Math 106 Topics in Applied Mathematics. Lecture 8: Importance Sampling

Chapter 8: Converter Transfer Functions

COMP 408/508. Computer Vision Fall 2017 PCA for Recognition

DATA ASSIMILATION IN A COMBINED 1D-2D FLOOD MODEL

EEL 851: Biometrics. An Overview of Statistical Pattern Recognition EEL 851 1

CLASSIFICATION OF MULTIPLE ANNOTATOR DATA USING VARIATIONAL GAUSSIAN PROCESS INFERENCE

Minimum Error-Rate Discriminant

Probability, Statistics, and Reliability for Engineers and Scientists MULTIPLE RANDOM VARIABLES

Combining Classifiers and Learning Mixture-of-Experts

Lecture : Feedback Linearization

Fisher Consistency of Multicategory Support Vector Machines

Equidistant Polarizing Transforms

Probability Theory for Machine Learning. Chris Cremer September 2015

In many diverse fields physical data is collected or analysed as Fourier components.

Informatics 2B: Learning and Data Lecture 10 Discriminant functions 2. Minimal misclassifications. Decision Boundaries

No. of dimensions 1. No. of centers

LEARNING & LINEAR CLASSIFIERS

Decision Level Fusion: An Event Driven Approach

Estimation of Sample Reactivity Worth with Differential Operator Sampling Method

Logistic Regression. William Cohen

Introduction to Machine Learning

Some New Information Inequalities Involving f-divergences

Analog Communication (10EC53)

Spatial Transformer. Ref: Max Jaderberg, Karen Simonyan, Andrew Zisserman, Koray Kavukcuoglu, Spatial Transformer Networks, NIPS, 2015

Robust Fault Detection for Uncertain System with Communication Delay and Measurement Missing

Binary Pressure-Sensitive Paint

Expectation Propagation for Approximate Bayesian Inference

Computational Intelligence Methods

Kyle Reing University of Southern California April 18, 2018

arxiv: v1 [gr-qc] 18 Feb 2009 Detecting the Cosmological Stochastic Background of Gravitational Waves with FastICA

The achievable limits of operational modal analysis. * Siu-Kui Au 1)

Bayesian Inference and MCMC

Winter 2019 Math 106 Topics in Applied Mathematics. Lecture 1: Introduction

Základy teorie front II

СИБИРСКИЕ ЭЛЕКТРОННЫЕ МАТЕМАТИЧЕСКИЕ ИЗВЕСТИЯ

Computing and Communications 2. Information Theory -Entropy

A COLLABORATIVE 20 QUESTIONS MODEL FOR TARGET SEARCH WITH HUMAN-MACHINE INTERACTION

Alpha-Divergence for Classification, Indexing and Retrieval 0 (Revised 2)

Information geometry for bivariate distribution control

Introduction to Systems Analysis and Decision Making Prepared by: Jakub Tomczak

Discrete Mathematics. On the number of graphs with a given endomorphism monoid

Application of Automated Geoimage analysis methods

Statistical Machine Learning Lectures 4: Variational Bayes

18-660: Numerical Methods for Engineering Design and Optimization

SAMSI Astrostatistics Tutorial. More Markov chain Monte Carlo & Demo of Mathematica software

Traffic models on a network of roads

Scalable robust hypothesis tests using graphical models

Evaluating Probabilistic Queries over Imprecise Data

Particle lter for mobile robot tracking and localisation

Bayesian Decision Theory Lecture 2

SGN (4 cr) Chapter 5

Lecture 2: Convergence of Random Variables

Stochastic Behavioral Modeling of Analog/Mixed-Signal Circuits by Maximizing Entropy

Section 3.4: Concavity and the second Derivative Test. Find any points of inflection of the graph of a function.

Ex x xf xdx. Ex+ a = x+ a f x dx= xf x dx+ a f xdx= xˆ. E H x H x H x f x dx ˆ ( ) ( ) ( ) μ is actually the first moment of the random ( )

Machine Learning Lecture Notes

Today. Introduction to optimization Definition and motivation 1-dimensional methods. Multi-dimensional methods. General strategies, value-only methods

Probability and statistics; Rehearsal for pattern recognition

Reliability-Based Load and Resistance Factor Design (LRFD) Guidelines for Stiffened Panels and Grillages of Ship Structures

16.4. Power Series. Introduction. Prerequisites. Learning Outcomes

Bayesian Decision and Bayesian Learning

Disparity as a Separate Measurement in Monocular SLAM

4 Expectation & the Lebesgue Theorems

Transcription:

Feature election Outline Pattern Recognition X motivation technical recognition problem dimensionality reduction ց class separability increase ր data compression (e.g. required communication channel capacity) or a given amount o data, number o eatures ց a perormance estimate accuracy ր physical physical measurement (e.g. R soil moisture, vegetation cover) data enhancement Michal Haindl Faculty o Inormation Technology, KTI Czech Technical University in Prague Institute o Inormation Theory and Automation Academy o ciences o the Czech Republic Prague, Czech Republic Evropský sociální ond. Praha & EU: Investujeme do vaší budoucnosti MI-ROZ 2011-2012/Z Feature election c M. Haindl MI-ROZ - 10 3/12 Outline c M. Haindl MI-ROZ - 10 1/12 January 16, 2012 Outline motivation technical recognition problem dimensionality reduction ց class separability increase ր data compression (e.g. required communication channel capacity) or a given amount o data, number o eatures ց a perormance estimate accuracy ր physical physical measurement (e.g. R soil moisture, vegetation cover) data enhancement 1 Feature election Probabilistic Dependence Measures c M. Haindl MI-ROZ - 10 3/12 c M. Haindl MI-ROZ - 10 2/12

Feature election / Extraction Feature election some inormation discarded J(Ẍ) = max J() eature extraction Ẍ = (X), l l, l < l all inormation used, compression by mapping motivation technical recognition problem dimensionality reduction ց class separability increase ր data compression (e.g. required communication channel capacity) or a given amount o data, number o eatures ց a perormance estimate accuracy ր physical physical measurement (e.g. R soil moisture, vegetation cover) data enhancement eective mathematical theory only or linear transormation Gaussian data c M. Haindl MI-ROZ - 10 3/12 Feature election / Extraction Feature election / Extraction sensor eature selector/extractor classiier some inormation discarded J(Ẍ) = max J() eature extraction Ẍ = (X), l l, l < l all inormation used, compression by mapping some inormation discarded J(Ẍ) = maxj() eature extraction Ẍ = (X), l l, l < l all inormation used, compression by mapping

Feature election Feature election / Extraction speciication the eature evaluation criterion J(X) the dimensionality o the eature space l the optimization procedure the FE orm o mapping (X) (extractor) J(X) deined in terms o unknown model characteristics P(ω i ),p(x ω i ) estimates error sources suboptimal criterion unctions suboptimal search strategies pd estimation errors (small sample size) numerical errors itting errors c M. Haindl MI-ROZ - 10 5/12 some inormation discarded J(Ẍ) = max J() eature extraction Ẍ = (X), l l, l < l all inormation used, compression by mapping E - perormance optimization - measurement cost reduction E - no direct relation with classiication error FE - no physical eature interpretation Feature election Approaches Feature election Entropies Feature-et earch Algorithms Monte Carlo Techniques (simulated annealing, genetic algorithms) c M. Haindl MI-ROZ - 10 6/12 speciication the eature evaluation criterion J(X) the dimensionality o the eature space l the optimization procedure the FE orm o mapping (X) (extractor) J(X) deined in terms o unknown model characteristics P(ω i ),p(x ω i ) estimates error sources suboptimal criterion unctions suboptimal search strategies pd estimation errors (small sample size) numerical errors itting errors c M. Haindl MI-ROZ - 10 5/12

2 i J can be expressed in the orm o averaged divergence i.e. J F = ( P(ω 1 X) P(ω 2 X) )P(ω 2 X)p(X)dX (s) convex unction, = lim s (s) s then P(error) < (0)P(ω 2)+ P(ω 1 ) J F (0)+ (1) e.g. the averaged divergence the averaged Matusita distance J T J T (s) = ( s 1) 2, J F = J 2 T, (0) = 1, (1) = 0, = 1 P(error) 1 2 (1 J 2 T ) c M. Haindl MI-ROZ - 10 8/12 K = 2 P(error) = 1 2 [1 P(ω 1 )p(x ω 1 ) P(ω 2 )p(x ω 2 ) dx] maxp(error) i p(x ω i ) completely overlap similarly any measure between two pd s J(Ẍ) = (P(ω i ),p(x ω i ),i = 1,2)dX satisying J 0 J = 0 or overlapping p(x ω i ) J = max or nonoverlapping p(x ω i ) can be used or eature selection c M. Haindl MI-ROZ - 10 7/12 2 i J can be expressed in the orm o averaged divergence i.e. J F = ( P(ω 1 X) P(ω 2 X) )P(ω 2 X)p(X)dX (s) convex unction, = lim s (s) s then P(error) < (0)P(ω 2)+ P(ω 1 ) J F (0)+ (1) e.g. the averaged divergence the averaged Matusita distance J T J T (s) = ( s 1) 2, J F = J 2 T, (0) = 1, (1) = 0, = 1 P(error) 1 2 (1 J 2 T ) c M. Haindl MI-ROZ - 10 8/12 K = 2 P(error) = 1 2 [1 P(ω 1 )p(x ω 1 ) P(ω 2 )p(x ω 2 ) dx] maxp(error) i p(x ω i ) completely overlap similarly any measure between two pd s J(Ẍ) = (P(ω i ),p(x ω i ),i = 1,2)dX satisying J 0 J = 0 or overlapping p(x ω i ) J = max or nonoverlapping p(x ω i ) can be used or eature selection c M. Haindl MI-ROZ - 10 7/12

Entropy Measures Gaussian Density observe X and compute P(ω i X) to determine an inormation gain i P(ω i X) = P(ω j X) j i then minimal inormation gain and max. entropy (uncertainty) average generalized entropy o degree α [ K ] JE α = (2 1 α 1) 1 P α (ω i X) 1 p(x)dx i=1 hannon α = 1 J (X ) = min X J (X) K J = P(ω i X) log 2 [P(ω i X)]p(X)dX i=1 Cherno s < 0,1 > J C = 1 2 s(1 s)(µ 2 µ 1 ) T [(1 s)σ 1 +sσ 2 ] 1 (µ 2 µ 1 ) Bhattacharyya + 1 2 ln (1 s)σ 1 +sσ 2 Σ 1 1 s Σ 2 s J B = 1 4 (µ 2 µ 1 ) T [Σ 1 +Σ 2 ] 1 (µ 2 µ 1 )+ 1 2 ln 1 2 (Σ 1 +Σ 2 ) Σ1 Σ 2 c M. Haindl MI-ROZ - 10 11/12 c M. Haindl MI-ROZ - 10 9/12 Feature-et earch Algorithms Probabilistic Dependence Measures or a given l < l direct search evaluation o eectiveness o ( ) l l = l! (l l)! l! i p(x ω i ) = p(x) X,ω i independent, no learning about ω i rom X maxj dependence between r. variable X and a realization o ω i is measured by the distance between p(x),p(x ω i ) i one o p(x ω i ) p(x) all prob. distance measures suit overall dependence e.g. Patrick-Fisher J R = K i=1 { P(ω i ) (p(x ω i ) p(x)) 2 dx }1 2 c M. Haindl MI-ROZ - 10 12/12 c M. Haindl MI-ROZ - 10 10/12

Feature-et earch Algorithms or a given l < l direct search evaluation o eectiveness o ( ) l l = l! (l l)! l! NAA ) Earth Observer 1 Hyperion - 242 spectral channels 1,5 10 17 ( 242 10 combinatorial problem excessive even or moderate l, l c M. Haindl MI-ROZ - 10 12/12