Fuzzy Set Theory in Computer Vision: Example 6

Size: px

Start display at page:

Download "Fuzzy Set Theory in Computer Vision: Example 6"

Clemence Barnett
5 years ago
Views:

1 Fuzzy Set Theory in Computer Vision: Example 6 Derek T. Anderson and James M. Keller FUZZ-IEEE, July 2017

2 Background

3 Background

4 Background

5 Background

6 Background

7 Background

8 Background

9 Background

10 Background

11 Background

12 Background

13 Kernels Kernel Crash Course... Supervised pattern recognition or machine learning

14 Kernels Kernel Crash Course... Supervised pattern recognition or machine learning MKL for both classification and clustering

15 Kernels Kernel Crash Course... Supervised pattern recognition or machine learning MKL for both classification and clustering These tools enable computer vision

16 Kernels Kernel Crash Course... Supervised pattern recognition or machine learning MKL for both classification and clustering These tools enable computer vision Most well-known w.r.t. support vector machines (SVM)

17 Kernels Kernel Crash Course... Supervised pattern recognition or machine learning MKL for both classification and clustering These tools enable computer vision Most well-known w.r.t. support vector machines (SVM) Observation i (e.g., image ROI), have features (x i,k R d k )

18 Kernels Kernel Crash Course... Supervised pattern recognition or machine learning MKL for both classification and clustering These tools enable computer vision Most well-known w.r.t. support vector machines (SVM) Observation i (e.g., image ROI), have features (x i,k R d k ) e.g., xi,1 is HOG, x i,2 is LBP, etc.

19 Kernels Kernel Crash Course... Supervised pattern recognition or machine learning MKL for both classification and clustering These tools enable computer vision Most well-known w.r.t. support vector machines (SVM) Observation i (e.g., image ROI), have features (x i,k R d k ) e.g., xi,1 is HOG, x i,2 is LBP, etc. Kernel; φ : x φ(x) R D

20 Kernels Kernel Crash Course... Supervised pattern recognition or machine learning MKL for both classification and clustering These tools enable computer vision Most well-known w.r.t. support vector machines (SVM) Observation i (e.g., image ROI), have features (x i,k R d k ) e.g., xi,1 is HOG, x i,2 is LBP, etc. Kernel; φ : x φ(x) R D, κ(x i,k, x j,k ) = φ(x i,k ) φ(x j,k )

21 Kernels Kernel Crash Course... Supervised pattern recognition or machine learning MKL for both classification and clustering These tools enable computer vision Most well-known w.r.t. support vector machines (SVM) Observation i (e.g., image ROI), have features (x i,k R d k ) e.g., xi,1 is HOG, x i,2 is LBP, etc. Kernel; φ : x φ(x) R D, κ(x i,k, x j,k ) = φ(x i,k ) φ(x j,k ) The kernel function κ can take many forms, with polynomial κ(x i,k, x j,k ) = (x T i,k x j,k + 1) p and radial-basis-function (RBF) κ(x i,k, x j,k ) = exp(σ x i,k x j,k 2 ) being well known

22 Kernels Kernel Crash Course... Supervised pattern recognition or machine learning MKL for both classification and clustering These tools enable computer vision Most well-known w.r.t. support vector machines (SVM) Observation i (e.g., image ROI), have features (x i,k R d k ) e.g., xi,1 is HOG, x i,2 is LBP, etc. Kernel; φ : x φ(x) R D, κ(x i,k, x j,k ) = φ(x i,k ) φ(x j,k ) The kernel function κ can take many forms, with polynomial κ(x i,k, x j,k ) = (x T i,k x j,k + 1) p and radial-basis-function (RBF) κ(x i,k, x j,k ) = exp(σ x i,k x j,k 2 ) being well known Kernel matrix (n objects); [K ijk = κ(x i,k, x j,k )] n n

23 Kernels Multiple Kernel Learning (MKL) Mercer kept all the good secrets for himself...

24 Kernels Multiple Kernel Learning (MKL) Mercer kept all the good secrets for himself... What is the correct kernel?

25 Kernels Multiple Kernel Learning (MKL) Mercer kept all the good secrets for himself... What is the correct kernel? MK can be applied in different ways Low/mid CV (SISO/FIFO) and mid/high CV (FIFO/DIDO)

26 Kernels Multiple Kernel Learning (MKL) Mercer kept all the good secrets for himself... What is the correct kernel? MK can be applied in different ways Low/mid CV (SISO/FIFO) and mid/high CV (FIFO/DIDO) Low = exploit data correlations

27 Kernels Multiple Kernel Learning (MKL) Mercer kept all the good secrets for himself... What is the correct kernel? MK can be applied in different ways Low/mid CV (SISO/FIFO) and mid/high CV (FIFO/DIDO) Low = exploit data correlations High = ensemble like

28 Kernels Multiple Kernel Learning (MKL) Mercer kept all the good secrets for himself... What is the correct kernel? MK can be applied in different ways Low/mid CV (SISO/FIFO) and mid/high CV (FIFO/DIDO) Low = exploit data correlations High = ensemble like Configuration: search for f (K 1,..., K M ) (building blocks)

29 Kernels Multiple Kernel Learning (MKL) Mercer kept all the good secrets for himself... What is the correct kernel? MK can be applied in different ways Low/mid CV (SISO/FIFO) and mid/high CV (FIFO/DIDO) Low = exploit data correlations High = ensemble like Configuration: search for f (K 1,..., K M ) (building blocks) Global problem: search the configuration space...

30 Kernels MKL flavors Fixed rule

31 Kernels MKL flavors Fixed rule e.g., uniform weights

32 Kernels MKL flavors Fixed rule e.g., uniform weights Heuristic

33 Kernels MKL flavors Fixed rule e.g., uniform weights Heuristic e.g., derive from kernel matrices S. R. Price, B. Murray, L. Hu, D. T. Anderson, T. Havens, R. Luke, J. M. Keller, Multiple kernel based feature and decision level fusion of ieco individuals for explosive hazard detection in FLIR imagery, SPIE Defense, Security, and Sensing, 2016

34 Kernels MKL flavors Fixed rule e.g., uniform weights Heuristic e.g., derive from kernel matrices S. R. Price, B. Murray, L. Hu, D. T. Anderson, T. Havens, R. Luke, J. M. Keller, Multiple kernel based feature and decision level fusion of ieco individuals for explosive hazard detection in FLIR imagery, SPIE Defense, Security, and Sensing, 2016 Optimization (more on next slide)

35 Kernels MKL flavors Fixed rule e.g., uniform weights Heuristic e.g., derive from kernel matrices S. R. Price, B. Murray, L. Hu, D. T. Anderson, T. Havens, R. Luke, J. M. Keller, Multiple kernel based feature and decision level fusion of ieco individuals for explosive hazard detection in FLIR imagery, SPIE Defense, Security, and Sensing, 2016 Optimization (more on next slide) e.g., solve relative to SVM

36 Kernels Some noteworthy approaches Linear convex sum (LCS) based SISO/FIFO

37 Kernels Some noteworthy approaches Linear convex sum (LCS) based SISO/FIFO Xu et al.: MKL by group lasso (MKLGL) Varma and Babu: generalized MKL (Gaussians) Cortes et al.: polynomial kernels Us: FI and genetic algorithm (FIGA) Us: GA MKL p-norm (GAMKLp)

38 Kernels Some noteworthy approaches Linear convex sum (LCS) based SISO/FIFO Xu et al.: MKL by group lasso (MKLGL) Varma and Babu: generalized MKL (Gaussians) Cortes et al.: polynomial kernels Us: FI and genetic algorithm (FIGA) Us: GA MKL p-norm (GAMKLp) DIDO based on the FI

39 Kernels Some noteworthy approaches Linear convex sum (LCS) based SISO/FIFO Xu et al.: MKL by group lasso (MKLGL) Varma and Babu: generalized MKL (Gaussians) Cortes et al.: polynomial kernels Us: FI and genetic algorithm (FIGA) Us: GA MKL p-norm (GAMKLp) DIDO based on the FI Us: Decision level FI MKL p-norm (DeFIMKLp) Us: Decision level least squares MKL (DeLSMKL)

40 DeFIMKL DeFIMKL algorithm f k (x i ) is decision on x i by kth classifier

41 DeFIMKL DeFIMKL algorithm f k (x i ) is decision on x i by kth classifier ηk (x) = n i=1 α iky i κ k (x i, x) b k

42 DeFIMKL DeFIMKL algorithm f k (x i ) is decision on x i by kth classifier ηk (x) = n i=1 α iky i κ k (x i, x) b k fk (x) = η k (x) 1+η 2 k (x)

43 DeFIMKL DeFIMKL algorithm f k (x i ) is decision on x i by kth classifier ηk (x) = n i=1 α iky i κ k (x i, x) b k fk (x) = η k (x) 1+η 2 k (x) Fuzzy integral is fµ (x i ) = m k=1 f π(k)(x i ) [µ(a k ) µ(a k 1 )]

44 DeFIMKL DeFIMKL algorithm f k (x i ) is decision on x i by kth classifier ηk (x) = n i=1 α iky i κ k (x i, x) b k fk (x) = η k (x) 1+η 2 k (x) Fuzzy integral is fµ (x i ) = m k=1 f π(k)(x i ) [µ(a k ) µ(a k 1 )] Sum of squared error (SSE)

45 DeFIMKL DeFIMKL algorithm f k (x i ) is decision on x i by kth classifier ηk (x) = n i=1 α iky i κ k (x i, x) b k fk (x) = η k (x) 1+η 2 k (x) Fuzzy integral is fµ (x i ) = m k=1 f π(k)(x i ) [µ(a k ) µ(a k 1 )] Sum of squared error (SSE) E 2 = n i=1 (f µ(x i ) y i ) 2

46 DeFIMKL DeFIMKL algorithm f k (x i ) is decision on x i by kth classifier ηk (x) = n i=1 α iky i κ k (x i, x) b k fk (x) = η k (x) 1+η 2 k (x) Fuzzy integral is fµ (x i ) = m k=1 f π(k)(x i ) [µ(a k ) µ(a k 1 )] Sum of squared error (SSE) E 2 = n i=1 (f µ(x i ) y i ) 2 E 2 = n ( ) i=1 H T 2 xi u y i

47 DeFIMKL DeFIMKL algorithm f k (x i ) is decision on x i by kth classifier ηk (x) = n i=1 α iky i κ k (x i, x) b k fk (x) = η k (x) 1+η 2 k (x) Fuzzy integral is fµ (x i ) = m k=1 f π(k)(x i ) [µ(a k ) µ(a k 1 )] Sum of squared error (SSE) E 2 = n i=1 (f µ(x i ) y i ) 2 E 2 = n i=1 E 2 = n i=1 ( ) H T 2 ( xi u y i u T H xi Hx T i u 2y i Hx T i u + yi 2 )

48 DeFIMKL DeFIMKL algorithm f k (x i ) is decision on x i by kth classifier ηk (x) = n i=1 α iky i κ k (x i, x) b k fk (x) = η k (x) 1+η 2 k (x) Fuzzy integral is fµ (x i ) = m k=1 f π(k)(x i ) [µ(a k ) µ(a k 1 )] Sum of squared error (SSE) E 2 = n i=1 (f µ(x i ) y i ) 2 E 2 = n ( ) i=1 H T 2 xi u y i E 2 = n ( ) i=1 u T H xi Hx T i u 2y i Hx T i u + yi 2 E 2 = u T Du + f T u + n i=1 y i 2 D = n i=1 Hx i HT x i and f = n i=1 2y ih xi

49 DeFIMKL DeFIMKL algorithm f k (x i ) is decision on x i by kth classifier ηk (x) = n i=1 α iky i κ k (x i, x) b k fk (x) = η k (x) 1+η 2 k (x) Fuzzy integral is fµ (x i ) = m k=1 f π(k)(x i ) [µ(a k ) µ(a k 1 )] Sum of squared error (SSE) E 2 = n i=1 (f µ(x i ) y i ) 2 E 2 = n ( ) i=1 H T 2 xi u y i E 2 = n ( ) i=1 u T H xi Hx T i u 2y i Hx T i u + yi 2 E 2 = u T Du + f T u + n i=1 y i 2 D = n i=1 Hx i HT x i and f = n i=1 2y ih xi QP subject to monotonicity constraints

50 DeFIMKL DeFIMKL algorithm f k (x i ) is decision on x i by kth classifier ηk (x) = n i=1 α iky i κ k (x i, x) b k fk (x) = η k (x) 1+η 2 k (x) Fuzzy integral is fµ (x i ) = m k=1 f π(k)(x i ) [µ(a k ) µ(a k 1 )] Sum of squared error (SSE) E 2 = n i=1 (f µ(x i ) y i ) 2 E 2 = n ( ) i=1 H T 2 xi u y i E 2 = n ( ) i=1 u T H xi Hx T i u 2y i Hx T i u + yi 2 E 2 = u T Du + f T u + n i=1 y i 2 D = n i=1 Hx i HT x i and f = n i=1 2y ih xi QP subject to monotonicity constraints min u 0.5u T ˆDu + f T u + λ u p, Cu 0, (0, 1) T u 1

51 Big Data Big Data: Nystrom approximation and linearization MKL can be difficult-to-impossible to apply to large data Full MKL for m matrices is mn 2

52 Big Data Big Data: Nystrom approximation and linearization MKL can be difficult-to-impossible to apply to large data Full MKL for m matrices is mn 2 Gram matrix, K R n n, approximated by

53 Big Data Big Data: Nystrom approximation and linearization MKL can be difficult-to-impossible to apply to large data Full MKL for m matrices is mn 2 Gram matrix, K R n n, approximated by K = Kz K zzk z T z are indices of z sampled columns of K K zz is Moore-Penrose pseudoinverse of K zz

54 Big Data Big Data: Nystrom approximation and linearization MKL can be difficult-to-impossible to apply to large data Full MKL for m matrices is mn 2 Gram matrix, K R n n, approximated by K = Kz K zzk z T z are indices of z sampled columns of K K zz is Moore-Penrose pseudoinverse of K zz Now, aggregate m size nz matrices, so mnz K z = m k=1 (w kk k ) z is positive semi-definite (PSD)

55 Big Data Big Data: Nystrom approximation and linearization MKL can be difficult-to-impossible to apply to large data Full MKL for m matrices is mn 2 Gram matrix, K R n n, approximated by K = Kz K zzk z T z are indices of z sampled columns of K K zz is Moore-Penrose pseudoinverse of K zz Now, aggregate m size nz matrices, so mnz K z = m k=1 (w kk k ) z is positive semi-definite (PSD) Can linearize by eigendecomposition of fused K zz K zz = U z Λ 1 z Uz T Linearized model ( X ) becomes X = K z U z Λ 1 2 z

56 Big Data Big Data: Nystrom approximation and linearization MKL can be difficult-to-impossible to apply to large data Full MKL for m matrices is mn 2 Gram matrix, K R n n, approximated by K = Kz K zzk z T z are indices of z sampled columns of K K zz is Moore-Penrose pseudoinverse of K zz Now, aggregate m size nz matrices, so mnz K z = m k=1 (w kk k ) z is positive semi-definite (PSD) Can linearize by eigendecomposition of fused K zz K zz = U z Λ 1 z Uz T Linearized model ( X ) becomes X = K z U z Λ 1 2 z Put into a linear SVM vs. kernel SVM (faster!)

57 Examples Example: fusion of learned ieco features on IR C1 C5 Population 1 (HOG) Candidate Chip ieco

58 Examples Example: fusion of learned ieco features on IR C1 C5 Population 1 (HOG) Candidate Chip ieco

59 Examples Example: fusion of learned ieco features on IR C1 C5 Population 1 (HOG) C6 C10 Candidate Chip Population 2 (EHD) ieco

60 Examples Example: fusion of learned ieco features on IR C1 C5 Population 1 (HOG) C6 C10 Candidate Chip Population 2 (EHD) C11 C15 Population 3 (SD) ieco

61 Examples Results on learned ieco IR features

62 Examples Results on learned ieco IR features Translation: did fixed, heuristics and optimization

63 Examples Results on learned ieco IR features Translation: did fixed, heuristics and optimization Translation: DeFIMKLp was best optimization approach

64 Examples Results on learned ieco IR features

65 Examples Results on learned ieco IR features Translation: overfitting, picking one feature group

66 Examples Results on learned ieco IR features Translation: spreads the wealth, more generalizable

67 Examples Results on ground penetrating radar and kernel compression Translation: LCS (GAMKLp) beat DeFIMKLp

68 Examples Results on ground penetrating radar and kernel compression Translation: SMALL data size and fast!

69 Unsolved challenges Computational and storage efficiency Millions of training samples and many base kernels

70 Unsolved challenges Computational and storage efficiency Millions of training samples and many base kernels Non-linear SISO/FIFO MKL n! possibilities, each a feature space K ij = φ σ(x i ), φ σ(x j ) = m k=1 σ k (K k ) ij = σ1φ 1 t i σ1φ 1 j σmφ m i σmφ m j

71 Unsolved challenges Computational and storage efficiency Millions of training samples and many base kernels Non-linear SISO/FIFO MKL n! possibilities, each a feature space K ij = φ σ(x i ), φ σ(x j ) = m k=1 σ k (K k ) ij = σ1φ 1 t i σ1φ 1 j σmφ m i σmφ m j Heterogeneous kernels and normalization

72 Unsolved challenges Computational and storage efficiency Millions of training samples and many base kernels Non-linear SISO/FIFO MKL n! possibilities, each a feature space K ij = φ σ(x i ), φ σ(x j ) = m k=1 σ k (K k ) ij = σ1φ 1 t i σ1φ 1 j σmφ m i σmφ m j Heterogeneous kernels and normalization What E(D, Θ)...

Feature and Decision Level Fusion Using Multiple Kernel Learning and Fuzzy Integrals

Michigan Technological University Digital Commons @ Michigan Tech Dissertations, Master's Theses and Master's Reports 2017 Feature and Decision Level Fusion Using Multiple Kernel Learning and Fuzzy Integrals