Confusion matrices. True / False positives / negatives. INF 4300 Classification III Anne Solberg The agenda today: E.g., testing for cancer

Similar documents
Clustering Methods without Given Number of Clusters

IN Pratical guidelines for classification Evaluation Feature selection Principal component transform Anne Solberg

Suggested Answers To Exercises. estimates variability in a sampling distribution of random means. About 68% of means fall

Source slideplayer.com/fundamentals of Analytical Chemistry, F.J. Holler, S.R.Crouch. Chapter 6: Random Errors in Chemical Analysis

The Dynamics of Learning Vector Quantization

Comparing Means: t-tests for Two Independent Samples

NCAAPMT Calculus Challenge Challenge #3 Due: October 26, 2011

Suggestions - Problem Set (a) Show the discriminant condition (1) takes the form. ln ln, # # R R

SMALL-SIGNAL STABILITY ASSESSMENT OF THE EUROPEAN POWER SYSTEM BASED ON ADVANCED NEURAL NETWORK METHOD

Problem Set 8 Solutions

Optimal Coordination of Samples in Business Surveys

Alternate Dispersion Measures in Replicated Factorial Experiments

Lecture 4 Topic 3: General linear models (GLMs), the fundamentals of the analysis of variance (ANOVA), and completely randomized designs (CRDs)

Preemptive scheduling on a small number of hierarchical machines

Social Studies 201 Notes for November 14, 2003

Lecture 21. The Lovasz splitting-off lemma Topics in Combinatorial Optimization April 29th, 2004

Z a>2 s 1n = X L - m. X L = m + Z a>2 s 1n X L = The decision rule for this one-tail test is

MINITAB Stat Lab 3

Department of Mechanical Engineering Massachusetts Institute of Technology Modeling, Dynamics and Control III Spring 2002

A FUNCTIONAL BAYESIAN METHOD FOR THE SOLUTION OF INVERSE PROBLEMS WITH SPATIO-TEMPORAL PARAMETERS AUTHORS: CORRESPONDENCE: ABSTRACT

Social Studies 201 Notes for March 18, 2005

ON THE APPROXIMATION ERROR IN HIGH DIMENSIONAL MODEL REPRESENTATION. Xiaoqun Wang

CHAPTER 6. Estimation

The Use of MDL to Select among Computational Models of Cognition

By Xiaoquan Wen and Matthew Stephens University of Michigan and University of Chicago

Chapter 2 Sampling and Quantization. In order to investigate sampling and quantization, the difference between analog

Advanced Digital Signal Processing. Stationary/nonstationary signals. Time-Frequency Analysis... Some nonstationary signals. Time-Frequency Analysis

1. The F-test for Equality of Two Variances

Finite Element Analysis of a Fiber Bragg Grating Accelerometer for Performance Optimization

A tutorial on conformal prediction

μ + = σ = D 4 σ = D 3 σ = σ = All units in parts (a) and (b) are in V. (1) x chart: Center = μ = 0.75 UCL =

Lecture 7: Testing Distributions

Factor Analysis with Poisson Output

Multipurpose Small Area Estimation

A Simplified Methodology for the Synthesis of Adaptive Flight Control Systems

Lecture 8: Period Finding: Simon s Problem over Z N

RELIABILITY OF REPAIRABLE k out of n: F SYSTEM HAVING DISCRETE REPAIR AND FAILURE TIMES DISTRIBUTIONS

Learning Multiplicative Interactions

A Study on Simulating Convolutional Codes and Turbo Codes

The continuous time random walk (CTRW) was introduced by Montroll and Weiss 1.

Bogoliubov Transformation in Classical Mechanics

Approximating discrete probability distributions with Bayesian networks

ANALYSIS OF DECISION BOUNDARIES IN LINEARLY COMBINED NEURAL CLASSIFIERS

AN ADAPTIVE SIGNAL SEARCH ALGORITHM IN GPS RECEIVER

Reinforcement Learning

Physics 741 Graduate Quantum Mechanics 1 Solutions to Final Exam, Fall 2014

Statistics and Data Analysis

Lecture 9: Shor s Algorithm

Gain and Phase Margins Based Delay Dependent Stability Analysis of Two- Area LFC System with Communication Delays

Acceptance sampling uses sampling procedure to determine whether to

time? How will changes in vertical drop of the course affect race time? How will changes in the distance between turns affect race time?

March 18, 2014 Academic Year 2013/14

Evolutionary Algorithms Based Fixed Order Robust Controller Design and Robustness Performance Analysis

White Rose Research Online URL for this paper: Version: Accepted Version

c n b n 0. c k 0 x b n < 1 b k b n = 0. } of integers between 0 and b 1 such that x = b k. b k c k c k

Chapter Landscape of an Optimization Problem. Local Search. Coping With NP-Hardness. Gradient Descent: Vertex Cover

Pairwise Markov Random Fields and its Application in Textured Images Segmentation

REPRESENTATION OF ALGEBRAIC STRUCTURES BY BOOLEAN FUNCTIONS. Logic and Applications 2015 (LAP 2015) September 21-25, 2015, Dubrovnik, Croatia

Predicting the Performance of Teams of Bounded Rational Decision-makers Using a Markov Chain Model

Given the following circuit with unknown initial capacitor voltage v(0): X(s) Immediately, we know that the transfer function H(s) is

If Y is normally Distributed, then and 2 Y Y 10. σ σ

Recent progress in fire-structure analysis

Asymptotics of ABC. Paul Fearnhead 1, Correspondence: Abstract

ARTICLE Overcoming the Winner s Curse: Estimating Penetrance Parameters from Case-Control Data

Notes on Phase Space Fall 2007, Physics 233B, Hitoshi Murayama

Sampling and the Discrete Fourier Transform

Departure Time and Route Choices with Bottleneck Congestion: User Equilibrium under Risk and Ambiguity

Exercise set 1 Solution

Supplementary information

(b) Is the game below solvable by iterated strict dominance? Does it have a unique Nash equilibrium?

Annex-A: RTTOV9 Cloud validation

The use of connected masks for reconstructing the single particle image from X-ray diffraction data

Lecture 17: Analytic Functions and Integrals (See Chapter 14 in Boas)

Design spacecraft external surfaces to ensure 95 percent probability of no mission-critical failures from particle impact.

Nonlinear Single-Particle Dynamics in High Energy Accelerators

Fair Game Review. Chapter 7 A B C D E Name Date. Complete the number sentence with <, >, or =

Root Locus Contents. Root locus, sketching algorithm. Root locus, examples. Root locus, proofs. Root locus, control examples

Stratified Analysis of Probabilities of Causation

III.9. THE HYSTERESIS CYCLE OF FERROELECTRIC SUBSTANCES

Moment of Inertia of an Equilateral Triangle with Pivot at one Vertex

3.1 The Revised Simplex Algorithm. 3 Computational considerations. Thus, we work with the following tableau. Basic observations = CARRY. ... m.

Tuning bandit algorithms in stochastic environments

CS 170: Midterm Exam II University of California at Berkeley Department of Electrical Engineering and Computer Sciences Computer Science Division

Standard Guide for Conducting Ruggedness Tests 1

Assessing the Discriminatory Power of Credit Scores under Censoring

Microblog Hot Spot Mining Based on PAM Probabilistic Topic Model

Avoiding Forbidden Submatrices by Row Deletions

Advanced Method for Small-Signal Stability Assessment based on Neuronal Networks

11.5 MAP Estimator MAP avoids this Computational Problem!

0 of the same magnitude. If we don t use an OA and ignore any damping, the CTF is

S_LOOP: SINGLE-LOOP FEEDBACK CONTROL SYSTEM ANALYSIS

LINEAR ALGEBRA METHOD IN COMBINATORICS. Theorem 1.1 (Oddtown theorem). In a town of n citizens, no more than n clubs can be formed under the rules

Digital Control System

Some Sets of GCF ϵ Expansions Whose Parameter ϵ Fetch the Marginal Value

A Bluffer s Guide to... Sphericity

New bounds for Morse clusters

Control Systems Analysis and Design by the Root-Locus Method

Assignment for Mathematics for Economists Fall 2016

arxiv: v1 [math.mg] 25 Aug 2011

Lecture 10 Filtering: Applied Concepts

Transcription:

INF 4300 Claification III Anne Solberg 29.10.14 The agenda today: More on etimating claifier accuracy Cure of dimenionality knn-claification K-mean clutering x i feature vector for pixel i i- The cla label for pixel i K the number of clae given in the training data Mak with training pixel p ( x ) 2 1 1 exp n / 2 2 Σ 1 / 2 Multiband image with n pectral channel or feature t 1 x μ Σ x μ 29.10.14 INF 4300 1 INF 4300 2 Confuion matrice A matrix with the true cla label veru the etimated cla label for each cla Etimated cla label True cla label Cla 1 Cla 2 Cla 3 Total # of ample Cla 1 80 15 5 100 Cla 2 5 140 5 150 Cla 3 25 50 125 200 Total 110 205 135 450 INF 4300 3 True / Fale poitive / negative True poitive (TP): Patient ha cancer and tet reult i poitive. True negative (TN): A healthy patient and a negative tet reult. Fale poitive (FP): Healthy patient that get a poitive tet reult. Fale negative (FN): Cancer patient that get a negative tet reult. Good to have: TP & TN Bad to have: FP (but thi will probably be detected) Wort to have: FN (may go un-detected) E.g., teting for cancer No cancer TN TP Cancer F N FP INF 4300 4

Senitivity and pecificity Baye claification with lo function Senitivity: the portion of the data et that teted poitive out of all the poitive patient teted: Senitivity = TP/(TP+FN) The probability that the tet i poitive given that the patient i ick. Higher enitivity mean that fewer deceae cae go undetected. Specificity: the portion of the data et that teted negative out of all the negative patient teted: Specificity = TN/(TN+FP) The probability that a tet i negative given that the patient i not ick. TN F N FP TP In cae where different clae have different importance (e.g. ick/healthy), we can incorporate thi into a Bayeian claifier if we conider the lo. Let ( i ) be the lo if we decide cla i if the true cla i. c The rik of deciding cla i i then: R( i x ) i P x 1 To minimize the overall rik, compute R( i x) for i=1 c and chooe the cla for which R( i x) i minimum. Higher pecificity mean that fewer healthy patient are labeled a ick. INF 4300 5 INF 4300 6 Outlier and doubt In a claification problem, we might want to identify outlier and doubt ample We might want an ideal claifier to report thi ample i from cla l (uual cae) thi ample i not from any of the clae (outlier) thi ample i too hard for me (doubt/reect) The two lat cae hould lead to a reection of the ample! Outlier Heuritically defined a ample which did not come from the aumed population of ample The outlier can reult from ome breakdown in preproceing. Outlier can alo come from pixel from other clae than the clae in the training data et. Example: K tree pecie clae, but a few road pixel divide the foret region. One way to deal with outlier i to model them a a eparate cla, e.g., a gauian with very large variance, and etimate prior probability from the training data Another approach i to decide on ome threhold on the apoteriori and if a ample fall below thi threhold for all clae, then declare it an outlier. INF 4300 7 INF 4300 8

Doubt ample Doubt ample are ample for which the cla with the highet probability i not ignificantly more probable than ome of the other clae (e.g. two clae have eentially equal probability). Doubt pixel typically occurr on the border between two clae ( mixel ) Cloe to the deciion boundary the probabilitie will be almot equal. Claification oftware can allow the uer to pecify threhold for doubt. INF 4300 9 The training / tet et dilemma Ideally we want to maximize the ize of both the training and tet dataet Obviouly there i a fixed amount of available data with known label A very imple approach i to eparate the dataet in two random ubet For mall ample ize we may have to ue another trategy: Cro-validation Thi i a good trategy when we have very few ground truth ample. Common in medicine where we might have a mall number of patient with a certain type of cancer. The cot of obtaining more ground truth data might be o high that we have to do with a mall number of ground truth ample. INF 4300 10 Crovalidation / Leave n - Out A very imple (but computationally complex) idea allow u u to fake a large tet et Train the claifier on a et of N-n ample Tet the claifier on the n remaining ample Repeat n/n time (dependent on ubampling) Report average performance on the repeated experiment a tet et error An example with leave-1-out and 30 ample: Select one ample to leave out Train on the remaining 29 ample Claify the one ample and tore it cla label Repeat thi 30 time Count the number of miclaification among the 30 experiment. Leave-n-Out etimation generally overetimate the claification accuracy. Feature election hould be performed within the loop, not in advance!!! Uing a training et and a tet et of approximately the ame ize i better. INF 4300 11 The covariance matrix and dimenionality Aume we have S clae and a n-dimenional feature vector. With a fully multivariate Gauian model, we mut etimate S different mean vector and S different covariance matrice from training ample. ˆ ha n element ˆ ha n(n-1)/2 element Aume that we have M training ample from each cla Given M, there i a maximum of the achieved claification performance for a certain value of n increaing n beyond thi limit will lead to wore performance. Adding more feature i not alway a good idea! Total number of ample given by a rule of thumb: M>10 n S If we have limited training data, we can ue diagonal covariance matrice or regularization INF 4300 12

The cure of dimenionality In practice, the cure mean that, for a given ample ize, there i a maximum number of feature one can add before the claifier tart to degrade. For a finite training ample ize, the correct claification rate initially increae when adding new feature, attain a maximum and then begin to decreae. For a high dimenionality, we will need lot of training data to get the bet performance. => 10 ample / feature / cla. Correct claification rate a function of feature dimenionality, for different amount of training data. Equal prior probabilitie of the two clae i aumed. INF 4300 13 Ue few, but good feature To avoid the cure of dimenionality we mut take care in finding a et of relatively few feature. A good feature ha high within-cla homogeneity, and hould ideally have large between-cla eparation. In practie, one feature i not enough to eparate all clae, but a good feature hould: eparate ome of the clae well Iolate one cla from the other. If two feature look very imilar (or have high correlation), they are often redundant and we hould ue only one of them. Cla eparation can be tudied by: Viual inpection of the feature image overlaid the training mak Scatter plot Evaluating feature a done by training can be difficult to do automatically, o manual interaction i normally required. INF 4300 14 How do we beat the cure of dimenionality? Ue regularized etimate for the Gauian cae Ue diagonal covariance matrice Apply regularized covariance etimation (INF 5300) Generate few, but informative feature Careful feature deign given the application Reducing the dimenionality Feature election (more in INF5300) Feature tranform (INF 5300) Exhautive feature election If for ome reaon you know that you will ue d out of D available feature, an exhautive earch will involve a number of combination to tet: D! n D d! d! If we want to perform an exhautive earch through D feature for the optimal ubet of the d m bet feature, the number of combination to tet i m D! n D d! d! d1 Impractical even for a moderate number of feature! d 5, D = 100 => n = 79.374.995 31.10.12 INF 4300 15 INF 4300 16

Suboptimal feature election Select the bet ingle feature baed on ome quality criteria, e.g., etimated correct claification rate. A combination of the bet ingle feature will often imply correlated feature and will therefore be uboptimal. More in INF 5300 Sequential forward election implie that when a feature i elected or removed, thi deciion i final. Stepwie forward-backward election overcome thi. A pecial cae of the add - a, remove - r algorithm. Improved into floating earch by making the number of forward and backward earch tep data dependent. Adaptive floating earch Ocillating earch. Example of feature election from INF 5300 - Method 1 - Individual feature election Each feature i treated individually (no correlation/covariance between feature i conideren) Select a criteria, e.g. a ditance meaure Rank the feature according to the value of the criteria C(k) Select the et of feature with the bet individual criteria value Multicla ituation: Average cla eparability or C(k) = min ditance(i,) - wort cae Often ued Advantage with individual election: computation time Diadvantage: no correlation i utilized. INF 4300 17 INF 4300 18 Method 2 - Sequential backward election Select l feature out of m Example: 4 feature x 1,x 2,x 3,x 4 Chooe a criterion C and compute it for the vector [x 1,x 2,x 3,x 4 ] T Eliminate one feature at a time by computing [x 1,x 2,x 3 ] T, [x 1,x 2,x 4 ] T, [x 1,x 3,x 4 ] T and [x 2,x 3,x 4 ] T Select the bet combination, ay [x 1,x 2,x 3 ] T. From the elected 3-dimenional feature vector eliminate one more feature, and evaluate the criterion for [x 1,x 2 ] T, [x 1,x 3 ] T, [x 2,x 3 ] T and elect the one with the bet value. Number of combination earched: 1+1/2((m+1)m-l(l+1)) Method 3: Sequential forward election Compute the criterion value for each feature. Select the feature with the bet value, ay x 1. Form all poible combination of feature x1 (the winner at the previou tep) and a new feature, e.g. [x 1,x 2 ] T, [x 1,x 3 ] T, [x 1,x 4 ] T, etc. Compute the criterion and elect the bet one, ay [x 1,x 3 ] T. Continue with adding a new feature. Number of combination earched: lm-l(l-1)/2. Backward election i fater if l i cloer to m than to 1. INF 4300 19 INF 4300 20

Hyperpectral image example Hyperpectral example claification accuracy v. nof. feature on tet et A hyperpectral image from France 81 feature/pectral band 6 clae (tree pecie) ha 81 parameter to compute for each cla ha 81*80/2=3240 parameter for each cla. 1000 training ample for each cla. Tet et: 1000-2000 ample for each cla. 3 of the 81 band hown a RGB image claification accuracy Each curve how a different dimenionality reduction method. Note that a we include more feature, the claification accuracy firt increae, then it tart to decreae. Cure of dimenionality! Plot of the 81 mean value for each cla number of feature ued in a Gauian claifier INF 4300 21 INF 4300 22 Exploratory data analyi k-nearet-neighbor claification For a mall number of feature, manual data analyi to tudy the feature i recommended. Chooe intelligent feature. Evaluate e.g. Error rate for ingle-feature claification Scatter plot Scatter plot of feature combination A very imple claifier. Claification of a new ample x i i done a follow: Out of N training vector, identify the k nearet neighbor (meaure by Euclidean ditance) in the training et, irrepectively of the cla label. Out of thee k ample, identify the number of vector k i that belong to cla i, i:1,2,...m (if we have M clae) Aign x i to the cla i with the maximum number of k i ample. k hould be odd, and mut be elected a priori. INF 4300 23 INF 4300 24

knn-example About knn-claification If k=1, will be claified a If k=5, will be claified a If k=1 (1NN-claification), each ample i aigned to the ame cla a the cloet ample in the training data et. If the number of training ample i very high, thi can be a good rule. If k->, thi i theoretically a very good claifier. Thi claifier involve no training time, but the time needed to claify one pattern x i will depend on the number of training ample, a the ditance to all point in the training et mut be computed. Practical value for k: 3<=k<=9 INF 4300 25 INF 4300 26 Supervied or unupervied claification Supervied claification Claify each obect or pixel into a et of k known clae Cla parameter are etimated uing a et of training ample from each cla. Unupervied claification Partition the feature pace into a et of k cluter k i not known and mut be etimated (difficult) In both cae, claification i baed on the value of the et of n feature x 1,...x n. The obect i claified to the cla which ha the highet poterior probability. The cluter we get are not the clae we want. Unupervied claification/clutering Divide the data into cluter baed on imilarity (or diimilarity) Similarity or diimilarity i baed on ditance meaure (ometime called proximity meaure) Euclidean ditance, Mahalanobi ditance etc. Two main approache to clutering hierarchical - non-hierarchical (equential) diviive agglomerative Non-hierarachical method are often ued in image analyi INF 4300 27 INF 4300 28

K-mean clutering k-mean example Note: K-mean algorithm normally mean ISODATA, but different definition are found in different book K i aumed to be known 1. Start with aigning K cluter center k random data point, or the firt K point, or K equally pace point For k=1:k, Set k equal to the feature vector x k for thee point. 2. Aign each obect/pixel x i in the image to the cloet cluter center uing Euclidean ditance. Compute for each ample the ditance r2 to each cluter center: T 2 r xi k xi k xi k Aign x i to the cloet cluter (with minimum r value) 2 X6 3. Recompute the cluter center baed on the new label. 4. Repeat from 2 until #change<limit. ISODATA K-mean: plitting and merging of cluter are included in the algorithm INF 4300 29 INF 4300 30 k-mean example μ 3 k-mean example μ 3 μ 2 μ 2 X6 X6 Step 1: Step 2: μ 1 Chooe k cluter centre, μ (0) k, randomly from the available datapoint. Here: k = 3 μ 1 Aign each of the obect in x to the nearet cluter center μ (i) k ( i) ( i x in, where, arg min, ) n c xn xn 1.. k INF 4300 31 INF 4300 32

k-mean example k-mean example μ 2 μ 3 μ 2 μ 3 μ 1 X6 Step 3: Recalculate cluter centre μ (i+1) k baed on the clutering in iteration i ( i 1 ) 1 ( x i ) n N x n c ( i ) μ 1 X6 Step 4: If the cluter don t change; μ (i+1) k μ (i) k (or prepecified number of iteration i reached), terminate, ele reaign - increae iteration i and goto tep 2. INF 4300 33 INF 4300 34 k-mean example k-mean variation μ 2 X(6) X6 μ 3 Step 3 in next iteration: Recalculate cluter centre. The generic algorithm ha many improvement ISODATA allow for merging and plitting of cluter Among other thing, thi eek to improve an initial bad choice of k k-median i another variation k-mean optimize a probabilitic model μ 1 INF 4300 35 INF 4300 36

How do we determine k? Example: K-mean clutering The number of natural cluter in the data rarely correpond to the number of information clae of interet. Cluter validity indice can give indication of how many cluter there are. Ue cluter merging or plitting tailored to the application. Rule of thumb for practical image clutering: tart with approximately twice a many cluter a expected information clae determine which cluter correpond to the information clae plit and merge cluter to improve. Original Kmean K=5 Supervied 4 clae Kmean K=10 INF 4300 37 INF 4300 38 Learning goal for thi lecture Undertand how different meaure of claification accuracy work: Confuion matrix Senitivity/pecifity Average claification accuracy Be familiar with the cure of dimenionality and the importance of electing few, but good feature Undertand knn-claification Undertand the difference between upervied and unupervied claification Undertand the Kmean-algorithm. INF 4300 39