Biomedical signal processing application of optimization methods for machine learning problems
|
|
- Percival Steven Walton
- 6 years ago
- Views:
Transcription
1 Biomedical signal processing application of optimization methods for machine learning problems Fabian J. Theis Computational Modeling in Biology Institute of Bioinformatics and Systems Biology Helmholtz Zentrum München Grenoble, 16-Sep-28
2 Data mining cocktail-party problem
3 Data mining cocktail-party problem
4 Data mining cocktail-party problem
5 Neural Network Data mining cocktail-party problem W
6 Data mining mixture model x(t) = f(s(t)) estimate mixing process f and sources s(t) often linear f = A s(t) x(t) ŝ(t) W Neural Network
7 Outline 1 Supervised methods Motivation 1: classification Motivation 2: image segmentation Statistical decision theory 2 Clustering k-means Partitional clustering 3 Independent component analysis Sparse component analysis Nonlinear sparse component analysis 4
8 Motivation 1: classification Motivation 2: image segmentation Statistical decision theory Outline 1 Supervised methods Motivation 1: classification Motivation 2: image segmentation Statistical decision theory 2 Clustering k-means Partitional clustering 3 Independent component analysis Sparse component analysis Nonlinear sparse component analysis 4
9 Motivation 1: classification Motivation 2: image segmentation Statistical decision theory Motivation 1: classification data analysis: classification decide between (two or multiple) classes s(t) {, 1} learn by example? f g
10 Motivation 1: classification Motivation 2: image segmentation Statistical decision theory Neural networks
11 Motivation 1: classification Motivation 2: image segmentation Statistical decision theory Classification: example observations: immunological data set 3 cell parameters of 37 children with pulmonary diseases goal interpretation using supervised and unsupervised analysis disease classification into chronic bronchitis or interstitial lung disease CB ILD? cooperation with D. Hartl, Pediatric Immunology, Munich
12 Motivation 1: classification Motivation 2: image segmentation Statistical decision theory Classification: example observations: immunological data set 3 cell parameters of 37 children with pulmonary diseases goal interpretation using supervised and unsupervised analysis disease classification into chronic bronchitis or interstitial lung disease CB ILD? cooperation with D. Hartl, Pediatric Immunology, Munich
13 Motivation 1: classification Motivation 2: image segmentation Statistical decision theory Data visualization & dimension reduction parameter interpretation?
14 CB(3) ILD(1) ILD(2) CB(1) ILD(3) CB(2) CB(1) CB(1) ILD(2) ILD(1) CB(1) ILD(1) CB(1) CB(1) ILD(1) ILD(2) ILD(1) ILD(1) ILD(2) CB(2) ILD(1) ILD(2) CB(1) no(2) O(1) no(3) x(1) x(2) O(1) x(3) O(2) no(1) O(1) x(2) x(1) O(1) x(1) O(1) O(1) x(1) x(2) x(1) x(1) x(2) O(2) x(1) x(2) O(1) Supervised methods Motivation 1: classification Motivation 2: image segmentation Statistical decision theory Data visualization & dimension reduction d d d d d d d d d d d d 4 d 1.35 d 196 d 1 K means Clusters 9.29 CB(3) 4.44 d.623 visualization by self-organizing map network topology-preserving nonlinear dimension reduction/scaling detect new parameter dependencies
15 Motivation 1: classification Motivation 2: image segmentation Statistical decision theory Disease classification dimensionreducing network z(i) = B supervised Aunsup.x(i) results: down-scaling to 5 hidden neurons suffices classification rate of > 9% [Theis, Hartl, Krauss-Etschmann, Lang. Neural network signal analysis in immunology. Proc. ISSPA 23.]
16 Motivation 1: classification Motivation 2: image segmentation Statistical decision theory Disease classification dimensionreducing network z(i) = B supervised Aunsup.x(i) results: down-scaling to 5 hidden neurons suffices classification rate of > 9% [Theis, Hartl, Krauss-Etschmann, Lang. Neural network signal analysis in immunology. Proc. ISSPA 23.]
17 Motivation 1: classification Motivation 2: image segmentation Statistical decision theory Motivation 2: image segmentation classification application in image processing object classification? f g
18 Motivation 1: classification Motivation 2: image segmentation Statistical decision theory Motivation 2: image segmentation Problem: How many labelled cells lie in this section image?
19 Motivation 1: classification Motivation 2: image segmentation Statistical decision theory Biological background: neurogenesis adult neurogenesis new neurons emerge even in the adult human brain level depends on external stimuli Are there neural ancestral cells? goal automated quantification of neurogenesis in adult mice cooperation with Z. Kohl, Department of Neurology, University of Regensburg
20 Motivation 1: classification Motivation 2: image segmentation Statistical decision theory Automated cell counting
21 Motivation 1: classification Motivation 2: image segmentation Statistical decision theory Automated cell counting directional neural network train cell patch classifier ζ using directional neural network scan image using ζ to get cell positions speed-up via hierarchical and multiscale methods
22 Motivation 1: classification Motivation 2: image segmentation Statistical decision theory Automated cell counting directional neural network train cell patch classifier ζ using directional neural network scan image using ζ to get cell positions speed-up via hierarchical and multiscale methods
23 Motivation 1: classification Motivation 2: image segmentation Statistical decision theory Results counting comparison with 2 experts (variability ±5%) yields 9% ± 4% accuracy application: considerable cell proliferation in hippocampus of epileptic mice [Theis, Kohl, Guggenberger, Kuhn, Lang. ZANE - an algorithm for counting labelled cells in section images. Proc. MEDSIP 24]
24 Motivation 1: classification Motivation 2: image segmentation Statistical decision theory Statistical decision theory setup input: random vector X : Ω R p output: random vector Y : Ω R or categorical output, possibly Y {, 1} input-output relation measured by joint density P(X, Y ) realization by samples (training data) (x i, y i ) for i = 1,..., N often collected in (N p)-matrix X and vector y R N
25 Motivation 1: classification Motivation 2: image segmentation Statistical decision theory Goal: prediction 1.5 goal: learn classificator from training data predict y for new sample x
26 Motivation 1: classification Motivation 2: image segmentation Statistical decision theory Linear model p ŷ = ˆβ + x j ˆβj j=1 set x := 1, then ŷ = x β least squares: minimize RSS(β) = X (y Xβ) = so N i=1 (y i x i β) 2 = (y Xβ) (y Xβ) ˆβ = (X X) 1 X y
27 Motivation 1: classification Motivation 2: image segmentation Statistical decision theory Linear model p ŷ = ˆβ + x j ˆβj j=1 set x := 1, then ŷ = x β least squares: minimize RSS(β) = X (y Xβ) = so N i=1 (y i x i β) 2 = (y Xβ) (y Xβ) ˆβ = (X X) 1 X y
28 Motivation 1: classification Motivation 2: image segmentation Statistical decision theory Linear model p ŷ = ˆβ + x j ˆβj j=1 set x := 1, then ŷ = x β least squares: minimize RSS(β) = X (y Xβ) = so N i=1 (y i x i β) 2 = (y Xβ) (y Xβ) ˆβ = (X X) 1 X y
29 Motivation 1: classification Motivation 2: image segmentation Statistical decision theory Linear model p ŷ = ˆβ + x j ˆβj j=1 set x := 1, then ŷ = x β least squares: minimize RSS(β) = X (y Xβ) = so N i=1 (y i x i β) 2 = (y Xβ) (y Xβ) ˆβ = (X X) 1 X y
30 Motivation 1: classification Motivation 2: image segmentation Statistical decision theory Linear model decision boundary {x x β = 1/2}
31 Motivation 1: classification Motivation 2: image segmentation Statistical decision theory Linear model nice, but what about more complex data? (r = 2 and r = 1 Gaussians per class, σ =.2, with r means sampled from N((1, ), I and N((, 1), I), respectively)
32 Motivation 1: classification Motivation 2: image segmentation Statistical decision theory Linear model hm? global, linear model is too rigid
33 Motivation 1: classification Motivation 2: image segmentation Statistical decision theory Nearest-neighbor method ŷ = 1 k x i N k (x) if N k (x) equal the k closest points x i to x local model needs metric (here Euclidean) how to determine k? smaller k higher learning accuracy larger k smoother, higher generalizability least-square learning would yield k = 1 y i
34 Motivation 1: classification Motivation 2: image segmentation Statistical decision theory Nearest-neighbor method ŷ = 1 k x i N k (x) if N k (x) equal the k closest points x i to x local model needs metric (here Euclidean) how to determine k? smaller k higher learning accuracy larger k smoother, higher generalizability least-square learning would yield k = 1 y i
35 Motivation 1: classification Motivation 2: image segmentation Statistical decision theory Nearest-neighbor method, k = decision boundary {x ŷ(x) = 1/2}
36 Motivation 1: classification Motivation 2: image segmentation Statistical decision theory Nearest-neighbor method, k = 1, 2,
37 Motivation 1: classification Motivation 2: image segmentation Statistical decision theory Statistical decisions probabilistic view: P(X, Y ) = P(Y X )P(X ) find function f (X ) predicting Y as well as possible w.r.t. squared error loss L(Y, f (X )) = (Y f (X )) 2 expected prediction error EPE(f ) = E(Y f (X )) 2 = (y f (x)) 2 P(dx, dy) = E X E Y X ((Y f (X )) 2 X ) pointwise minimization suffices f (x) = argmin c E Y X ((Y c) 2 X = x) solved at conditional expectation (regression function) f (x) = E(Y X = x)
38 Motivation 1: classification Motivation 2: image segmentation Statistical decision theory Statistical decisions probabilistic view: P(X, Y ) = P(Y X )P(X ) find function f (X ) predicting Y as well as possible w.r.t. squared error loss L(Y, f (X )) = (Y f (X )) 2 expected prediction error EPE(f ) = E(Y f (X )) 2 = (y f (x)) 2 P(dx, dy) = E X E Y X ((Y f (X )) 2 X ) pointwise minimization suffices f (x) = argmin c E Y X ((Y c) 2 X = x) solved at conditional expectation (regression function) f (x) = E(Y X = x)
39 Motivation 1: classification Motivation 2: image segmentation Statistical decision theory Statistical decisions can be estimated by f (x) = E(Y X = x) ˆf (x) = 1 k x i N k (x) y i approximate expectation via sample averages approximate point conditioning to local conditioning note: ˆf (x) E(Y X = x) for N, K, k/n but: (very) finite samples curse of dimensionality fraction r of unit cube in p dimensions is covered by cube of edge length e p(r) = r 1/p e 2 (.1) =.1, e 2 (.1) =.32 e 1 (.1) =.63, e 1 (.1) =.8
40 Motivation 1: classification Motivation 2: image segmentation Statistical decision theory Statistical decisions can be estimated by f (x) = E(Y X = x) ˆf (x) = 1 k x i N k (x) y i approximate expectation via sample averages approximate point conditioning to local conditioning note: ˆf (x) E(Y X = x) for N, K, k/n but: (very) finite samples curse of dimensionality fraction r of unit cube in p dimensions is covered by cube of edge length e p(r) = r 1/p e 2 (.1) =.1, e 2 (.1) =.32 e 1 (.1) =.63, e 1 (.1) =.8
41 Motivation 1: classification Motivation 2: image segmentation Statistical decision theory Statistical decisions can be estimated by f (x) = E(Y X = x) ˆf (x) = 1 k x i N k (x) y i approximate expectation via sample averages approximate point conditioning to local conditioning note: ˆf (x) E(Y X = x) for N, K, k/n but: (very) finite samples curse of dimensionality fraction r of unit cube in p dimensions is covered by cube of edge length e p(r) = r 1/p e 2 (.1) =.1, e 2 (.1) =.32 e 1 (.1) =.63, e 1 (.1) =.8
42 Motivation 1: classification Motivation 2: image segmentation Statistical decision theory Statistical decisions if instead for approximating f (x) = E(Y X = x), we assume linear model f (x) = x β, we get β = E(XX ) 1 E(XY ) no conditioning, global approximation
43 Motivation 1: classification Motivation 2: image segmentation Statistical decision theory Statistical decisions if instead for approximating f (x) = E(Y X = x), we assume linear model f (x) = x β, we get β = E(XX ) 1 E(XY ) no conditioning, global approximation
44 Motivation 1: classification Motivation 2: image segmentation Statistical decision theory Statistical decisions for discrete Y if Y {, 1}, consider loss function { if f(x)=y L(Y, f (X )) = 1 otherwise then EPE = E X y {,1} L(y, f (X ))P(y X ) and hence Ŷ (x) = argmin y {,1} which yields the Bayes classifier question: how to model P(Y X )? y {,1} L(y, y )P(y X = x) = argmin y {,1} 1 P(y X = x) Ŷ (x) = argmax y P(y X = x)
45 Motivation 1: classification Motivation 2: image segmentation Statistical decision theory Statistical decisions for discrete Y if Y {, 1}, consider loss function { if f(x)=y L(Y, f (X )) = 1 otherwise then EPE = E X y {,1} L(y, f (X ))P(y X ) and hence Ŷ (x) = argmin y {,1} which yields the Bayes classifier question: how to model P(Y X )? y {,1} L(y, y )P(y X = x) = argmin y {,1} 1 P(y X = x) Ŷ (x) = argmax y P(y X = x)
46 Motivation 1: classification Motivation 2: image segmentation Statistical decision theory Statistical decisions for discrete Y if Y {, 1}, consider loss function { if f(x)=y L(Y, f (X )) = 1 otherwise then EPE = E X y {,1} L(y, f (X ))P(y X ) and hence Ŷ (x) = argmin y {,1} which yields the Bayes classifier question: how to model P(Y X )? y {,1} L(y, y )P(y X = x) = argmin y {,1} 1 P(y X = x) Ŷ (x) = argmax y P(y X = x)
47 Bayes classifier results Supervised methods Motivation 1: classification Motivation 2: image segmentation Statistical decision theory
48 Motivation 1: classification Motivation 2: image segmentation Statistical decision theory Method combinations nonlinear models e.g. f (x) = p j=1 f j(x j ) or basis expansion f (x) = j h j(x)β j with polynomial, Fourier or sigmoidal bases ( neural networks) prediction/function approximation by maximum-likelihood estimation of parameters enhance generalizability by adding regularization term +λj(f ) to RSS(f ) for f from some function class generalize inner-product methods to nonlinear situations by high-dimensional embedding x Φ(x) and kernels k(x, x ) = Φ(x) Φ(x)
49 Clustering k-means Partitional clustering Outline 1 Supervised methods Motivation 1: classification Motivation 2: image segmentation Statistical decision theory 2 Clustering k-means Partitional clustering 3 Independent component analysis Sparse component analysis Nonlinear sparse component analysis 4
50 Clustering k-means Partitional clustering Clustering explanation by example goal: differentiate hand-written digits 2 and 4 given a set of unknown gray-scale images of 2s and 4s, find the subset of 2s and the subset of 4s unsupervised learning by example
51 Clustering k-means Partitional clustering Clustering explanation by example goal: differentiate hand-written digits 2 and 4 given a set of unknown gray-scale images of 2s and 4s, find the subset of 2s and the subset of 4s versus unsupervised learning by example
52 Clustering k-means Partitional clustering Clustering explanation by example goal: differentiate hand-written digits 2 and 4 given a set of unknown gray-scale images of 2s and 4s, find the subset of 2s and the subset of 4s like a baby: versus unsupervised learning by example
53 Clustering k-means Partitional clustering Example data set here: machine learning i.e. statistical approach needs many test cases: dimension reduction via PCA to only 2 dimensions here 1 28x28 images each interpret each 28x28-image as element of R 784 :
54 Clustering k-means Partitional clustering Example data set here: machine learning i.e. statistical approach needs many test cases: dimension reduction via PCA to only 2 dimensions here 1 28x28 images each interpret each 28x28-image as element of R 784 :
55 Clustering k-means Partitional clustering Example data set here: machine learning i.e. statistical approach needs many test cases: dimension reduction via PCA to only 2 dimensions here 1 28x28 images each interpret each 28x28-image as element of R 784 :......
56 Clustering k-means Partitional clustering Example data set here: machine learning i.e. statistical approach needs many test cases: dimension reduction via PCA to only 2 dimensions here 1 28x28 images each interpret each 28x28-image as element of R 784 :......
57 Clustering k-means Partitional clustering Example data set here: machine learning i.e. statistical approach needs many test cases: dimension reduction via PCA to only 2 dimensions here 1 28x28 images each interpret each 28x28-image as element of R 784 :
58 Clustering k-means Partitional clustering k-means clustering: data vectors (samples) x(1), x(2),..., x(t ) R n distance measure d(x, y) between samples algorithm: k-means given number k of clusters initialize centroids randomly update rules: batch or sequential (online) cost function minimize E(c i, C i ) := P k i=1 1 C i Px C i d(x i, c i ) 2 [Theis, Gruber. Grassmann clustering. Proc. EUSIPCO 26]
59 Clustering k-means Partitional clustering k-means clustering: data vectors (samples) x(1), x(2),..., x(t ) R n distance measure d(x, y) between samples algorithm: k-means given number k of clusters initialize centroids randomly update rules: batch or sequential (online) cost function minimize E(c i, C i ) := P k i=1 1 C i Px C i d(x i, c i ) 2 Samples Centroids [Theis, Gruber. Grassmann clustering. Proc. EUSIPCO 26]
60 Clustering k-means Partitional clustering k-means clustering: data vectors (samples) x(1), x(2),..., x(t ) R n distance measure d(x, y) between samples algorithm: k-means given number k of clusters initialize centroids randomly update rules: batch or sequential (online) cost function minimize E(c i, C i ) := P k i=1 1 C i P x C i d(x i, c i ) 2 batch k-means Aufteilung [Theis, Gruber. Grassmann clustering. Proc. EUSIPCO 26]
61 Clustering k-means Partitional clustering k-means clustering: data vectors (samples) x(1), x(2),..., x(t ) R n distance measure d(x, y) between samples algorithm: k-means given number k of clusters initialize centroids randomly update rules: batch or sequential (online) cost function minimize E(c i, C i ) := P k i=1 1 C i P x C i d(x i, c i ) 2 batch k-means Zuweisung [Theis, Gruber. Grassmann clustering. Proc. EUSIPCO 26]
62 Clustering k-means Partitional clustering k-means clustering: data vectors (samples) x(1), x(2),..., x(t ) R n distance measure d(x, y) between samples algorithm: k-means given number k of clusters initialize centroids randomly update rules: batch or sequential (online) cost function minimize E(c i, C i ) := P k i=1 1 C i P x C i d(x i, c i ) 2 batch k-means [Theis, Gruber. Grassmann clustering. Proc. EUSIPCO 26]
63 Clustering k-means Partitional clustering k-means clustering: data vectors (samples) x(1), x(2),..., x(t ) R n distance measure d(x, y) between samples algorithm: k-means given number k of clusters initialize centroids randomly update rules: batch or sequential (online) cost function minimize E(c i, C i ) := P k i=1 1 C i P x C i d(x i, c i ) 2 sequentieller k-means beliebiges Sample [Theis, Gruber. Grassmann clustering. Proc. EUSIPCO 26]
64 Clustering k-means Partitional clustering k-means clustering: data vectors (samples) x(1), x(2),..., x(t ) R n distance measure d(x, y) between samples algorithm: k-means given number k of clusters initialize centroids randomly update rules: batch or sequential (online) cost function minimize E(c i, C i ) := P k i=1 1 C i P x C i d(x i, c i ) 2 sequentieller k-means nächster Centroid [Theis, Gruber. Grassmann clustering. Proc. EUSIPCO 26]
65 Clustering k-means Partitional clustering k-means clustering: data vectors (samples) x(1), x(2),..., x(t ) R n distance measure d(x, y) between samples algorithm: k-means given number k of clusters initialize centroids randomly update rules: batch or sequential (online) cost function minimize E(c i, C i ) := P k i=1 1 C i P x C i d(x i, c i ) 2 sequentieller k-means Update [Theis, Gruber. Grassmann clustering. Proc. EUSIPCO 26]
66 Clustering k-means Partitional clustering k-means clustering: data vectors (samples) x(1), x(2),..., x(t ) R n distance measure d(x, y) between samples algorithm: k-means given number k of clusters initialize centroids randomly update rules: batch or sequential (online) cost function minimize E(c i, C i ) := P k i=1 1 C i P x C i d(x i, c i ) 2 sequentieller k-means [Theis, Gruber. Grassmann clustering. Proc. EUSIPCO 26]
67 Clustering k-means Partitional clustering k-means clustering: data vectors (samples) x(1), x(2),..., x(t ) R n distance measure d(x, y) between samples algorithm: k-means given number k of clusters initialize centroids randomly update rules: batch or sequential (online) cost function minimize E(c i, C i ) := P k i=1 1 C i P x C i d(x i, c i ) 2 sequentieller k-means [Theis, Gruber. Grassmann clustering. Proc. EUSIPCO 26]
68 Clustering k-means Partitional clustering Batch k-means 3 k means after 1 iteration done: error 4.5%
69 Clustering k-means Partitional clustering Batch k-means 3 k means after 2 iterations done: error 4.5%
70 Clustering k-means Partitional clustering Batch k-means 3 k means after 3 iterations done: error 4.5%
71 Clustering k-means Partitional clustering Batch k-means 3 k means after 4 iterations done: error 4.5%
72 Clustering k-means Partitional clustering Batch k-means 3 k means after 5 iterations done: error 4.5%
73 Clustering k-means Partitional clustering Batch k-means 3 k means after 6 iterations done: error 4.5%
74 Clustering k-means Partitional clustering Batch k-means 3 k means after 7 iterations done: error 4.5%
75 Clustering k-means Partitional clustering Partitional clustering goal: given a set A of points in metric space (M, d) find partition of A into B i, S i B i = A, and centroids c i M minimizing E(B 1, c 1,..., B k, c k ) := kx X i=1 a B i d(a, c i ) 2. (1) A = {a 1,..., a T } constrained non-linear opt. problem minimize kx TX E(W, C) := w it d(a i, c i ) 2. (2) subject to w it {, 1}, i=1 t=1 kx w it = 1 for 1 i k, 1 t T. (3) i=1 C := {c 1,..., c k } centroid locations, W := (w it ) partition matrix
76 Clustering k-means Partitional clustering Partitional clustering goal: given a set A of points in metric space (M, d) find partition of A into B i, S i B i = A, and centroids c i M minimizing E(B 1, c 1,..., B k, c k ) := kx X i=1 a B i d(a, c i ) 2. (1) A = {a 1,..., a T } constrained non-linear opt. problem minimize kx TX E(W, C) := w it d(a i, c i ) 2. (2) subject to w it {, 1}, i=1 t=1 kx w it = 1 for 1 i k, 1 t T. (3) i=1 C := {c 1,..., c k } centroid locations, W := (w it ) partition matrix
77 Clustering k-means Partitional clustering Minimize this! common approach: partial optimization for W and C alternate minimization of either W and C while keeping the other one fixed batch k-means algorithm initial random choice of centroids c 1,..., c k iterate until convergence: cluster assignment: for each a t determine an index i(t) such that i(t) = argmin i d(a t, c i ) cluster update: within each cluster B i := {a t i(t) = i} determine the centroid c i by minimizing c i := argmin c X a B i d(a, c) 2 convergence to local minimum (??)
78 Clustering k-means Partitional clustering Minimize this! common approach: partial optimization for W and C alternate minimization of either W and C while keeping the other one fixed batch k-means algorithm initial random choice of centroids c 1,..., c k iterate until convergence: cluster assignment: for each a t determine an index i(t) such that i(t) = argmin i d(a t, c i ) cluster update: within each cluster B i := {a t i(t) = i} determine the centroid c i by minimizing c i := argmin c X a B i d(a, c) 2 convergence to local minimum (??)
79 Clustering k-means Partitional clustering Minimize this! common approach: partial optimization for W and C alternate minimization of either W and C while keeping the other one fixed batch k-means algorithm initial random choice of centroids c 1,..., c k iterate until convergence: cluster assignment: for each a t determine an index i(t) such that i(t) = argmin i d(a t, c i ) cluster update: within each cluster B i := {a t i(t) = i} determine the centroid c i by minimizing c i := argmin c X a B i d(a, c) 2 convergence to local minimum (??)
80 Clustering k-means Partitional clustering Euclidean case special case: M := R n and the Euclidean distance d(x, y) := x y centroids can be calculated in closed form: centroid is given by the cluster mean c i := (1/ B i ) X a B i a this follows directly from X X a c i 2 = a B i a B i j=1 nx (a j c ij ) 2 = nx X j=1 a B i (a 2 j 2a j c ij + c 2 ij )
81 Clustering k-means Partitional clustering Euclidean case special case: M := R n and the Euclidean distance d(x, y) := x y centroids can be calculated in closed form: centroid is given by the cluster mean c i := (1/ B i ) X a B i a this follows directly from X X a c i 2 = a B i a B i j=1 nx (a j c ij ) 2 = nx X j=1 a B i (a 2 j 2a j c ij + c 2 ij )
82 Clustering k-means Partitional clustering Euclidean case special case: M := R n and the Euclidean distance d(x, y) := x y centroids can be calculated in closed form: centroid is given by the cluster mean c i := (1/ B i ) X a B i a this follows directly from X X a c i 2 = a B i a B i j=1 nx (a j c ij ) 2 = nx X j=1 a B i (a 2 j 2a j c ij + c 2 ij )
83 Clustering k-means Partitional clustering Extensions c i := argmin c a B i d(a, c) p more difficult optimization problems: non-euclidean spaces e.g. RP n or Grassmann manifolds extensions from p = 2 to e.g. p = 1 or p < p = 1 corresponds to finding the spatial median of B i
84 Clustering k-means Partitional clustering Extensions c i := argmin c a B i d(a, c) p more difficult optimization problems: non-euclidean spaces e.g. RP n or Grassmann manifolds extensions from p = 2 to e.g. p = 1 or p < p = 1 corresponds to finding the spatial median of B i
85 Independent component analysis Sparse component analysis Nonlinear sparse component analysis Outline 1 Supervised methods Motivation 1: classification Motivation 2: image segmentation Statistical decision theory 2 Clustering k-means Partitional clustering 3 Independent component analysis Sparse component analysis Nonlinear sparse component analysis 4
86 Independent component analysis Sparse component analysis Nonlinear sparse component analysis Independent component analysis example: Cocktail party problem of the brain auditory cortex auditory cortex 2 word detection decision [Keck, Theis, Gruber, Lang, Specht, Puntonet. 3D spatial analysis of fmri data on a word perception task. LNCS, 3195: ]
87 Independent component analysis Sparse component analysis Nonlinear sparse component analysis BSS model Blind source separation (BSS) problem x(t) = As(t) + ɛ(t) x(t) observed m-dimensional random vector A (unknown) full-rank m n matrix s(t) (unknown) n-dimensional source signals (here: n m) ɛ(t) (unknown) white noise goal: given x, recover A and s! additional assumptions necessary stochastically independent s(t): p s(s 1,..., s n) = p s1 (s 1)... p sn (s n) independent component analysis (ICA) sparse source signals s i (t) sparse component analysis (SCA) nonnegative s and A nonnegative matrix factorization (NMF)
88 Independent component analysis Sparse component analysis Nonlinear sparse component analysis BSS model Blind source separation (BSS) problem x(t) = As(t) + ɛ(t) x(t) observed m-dimensional random vector A (unknown) full-rank m n matrix s(t) (unknown) n-dimensional source signals (here: n m) ɛ(t) (unknown) white noise goal: given x, recover A and s! additional assumptions necessary stochastically independent s(t): p s(s 1,..., s n) = p s1 (s 1)... p sn (s n) independent component analysis (ICA) sparse source signals s i (t) sparse component analysis (SCA) nonnegative s and A nonnegative matrix factorization (NMF)
89 Independent component analysis Sparse component analysis Nonlinear sparse component analysis important questions in data analysis model? (restrictions to A and s) indeterminacies of the model? algorithmic identification given x? identifiability obvious indeterminacies: scaling L and permutation P Theorem Let the independent random vector s L 2 contain at most one gaussian component. Given two ICA solutions As = A s, then A = A LP. Note: theorem does not hold for gaussian sources s. [Theis. A new concept for separability problems in blind source separation. Neural Computation, 24]
90 Independent component analysis Sparse component analysis Nonlinear sparse component analysis important questions in data analysis model? (restrictions to A and s) indeterminacies of the model? algorithmic identification given x? identifiability obvious indeterminacies: scaling L and permutation P Theorem Let the independent random vector s L 2 contain at most one gaussian component. Given two ICA solutions As = A s, then A = A LP. Note: theorem does not hold for gaussian sources s. [Theis. A new concept for separability problems in blind source separation. Neural Computation, 24]
91 Independent component analysis Sparse component analysis Nonlinear sparse component analysis important questions in data analysis model? (restrictions to A and s) indeterminacies of the model? algorithmic identification given x? identifiability obvious indeterminacies: scaling L and permutation P Theorem Let the independent random vector s L 2 contain at most one gaussian component. Given two ICA solutions As = A s, then A = A LP. Note: theorem does not hold for gaussian sources s. [Theis. A new concept for separability problems in blind source separation. Neural Computation, 24]
92 Independent component analysis Sparse component analysis Nonlinear sparse component analysis important questions in data analysis model? (restrictions to A and s) indeterminacies of the model? algorithmic identification given x? identifiability obvious indeterminacies: scaling L and permutation P Theorem Let the independent random vector s L 2 contain at most one gaussian component. Given two ICA solutions As = A s, then A = A LP Note: theorem does not hold for gaussian sources s [Theis. A new concept for separability problems in blind source separation. Neural Computation, 24]
93 Independent component analysis Sparse component analysis Nonlinear sparse component analysis ICA algorithms basic scheme of ICA algorithms (case m = n) search for invertible demixing matrix W that minimizes some dependence measure of Wx some contrasts minimize mutual information I (Wx) (?) maximize neural network output entropy H(f (Wx)) (?) extend PCA by performing nonlinear decorrelation (?) maximize non-gaussianity of output components (Wx) i (?) minimize off-diagonal error of H ln pwx minimize median deviation of Wx [Theis et al. Linear geometric ICA: Fundamentals and algorithms. Neural Computation, 23] [Theis, Lang, Puntonet. A geometric algorithm for overcomplete linear ICA. Neurocomputing, 24]
94 Independent component analysis Sparse component analysis Nonlinear sparse component analysis ICA algorithms basic scheme of ICA algorithms (case m = n) search for invertible demixing matrix W that minimizes some dependence measure of Wx some contrasts minimize mutual information I (Wx) (?) maximize neural network output entropy H(f (Wx)) (?) extend PCA by performing nonlinear decorrelation (?) maximize non-gaussianity of output components (Wx) i (?) minimize off-diagonal error of H ln pwx minimize median deviation of Wx [Theis et al. Linear geometric ICA: Fundamentals and algorithms. Neural Computation, 23] [Theis, Lang, Puntonet. A geometric algorithm for overcomplete linear ICA. Neurocomputing, 24]
95 Independent component analysis Sparse component analysis Nonlinear sparse component analysis Optimization problem: minimize cost function f (W) on Gl(n) or O(n) often: gradient descent: W f (W) in high dimensions: simulated annealing or genetic algorithms use non-euclidean structure of Gl(n) Euclidean gradient not compatible with group Gl(n) define natural gradient nat f (W) = euc f (W)W W considerable performance increase [Stadlthanner, Theis, Puntonet, Lang. Extended sparse nonnegative matrix factorization. LNCS, 3512: ] [Squartini, Theis. New Riemannian metrics for speeding-up the convergence of over- and underdetermined ICA. In preparation] [Theis. Gradients on matrix manifolds and their chain rule. Submitted to NIPS LR, 25]
96 Independent component analysis Sparse component analysis Nonlinear sparse component analysis Optimization problem: minimize cost function f (W) on Gl(n) or O(n) often: gradient descent: W f (W) in high dimensions: simulated annealing or genetic algorithms use non-euclidean structure of Gl(n) Euclidean gradient not compatible with group Gl(n) define natural gradient nat f (W) = euc f (W)W W considerable performance increase [Stadlthanner, Theis, Puntonet, Lang. Extended sparse nonnegative matrix factorization. LNCS, 3512: ] [Squartini, Theis. New Riemannian metrics for speeding-up the convergence of over- and underdetermined ICA. In preparation] [Theis. Gradients on matrix manifolds and their chain rule. Submitted to NIPS LR, 25]
97 Independent component analysis Sparse component analysis Nonlinear sparse component analysis fmri analysis function magnetic resonance imaging noninvasive brain imaging technique information on brain activation patterns activation maps help identifying task-related brain regions BSS techniques for fmri possible, see (?).
98 Independent component analysis Sparse component analysis Nonlinear sparse component analysis fmri analysis spatial-only BSS function magnetic resonance imaging noninvasive brain imaging technique information on brain activation patterns activation maps help identifying task-related brain regions BSS techniques for fmri possible, see (?).
99 Independent component analysis Sparse component analysis Nonlinear sparse component analysis Experimental setup experiment block design protocol: 5 time instants of visual stimulation 5 instants of rest 1 scans taking 3s each data set well known design expected activity in visual cortex here: use only a single horizontal slice preprocessing motion correction smoothing data acquired by D. Auer, MPI of Psychiatry, Munich
100 Independent component analysis Sparse component analysis Nonlinear sparse component analysis Results cc:.18 2 cc: cc:.5 4 cc:.9 (a) spatial sources s S (b) temporal sources t S component 2 partially represents the frontal eye fields component 4: stimulus component, cc =.9 with stimulus [Theis, Gruber, Keck, Lang. Functional MRI analysis by a novel spatiotemporal ICA algorithm. LNCS 3696: ]
101 Independent component analysis Sparse component analysis Nonlinear sparse component analysis
102 Independent component analysis Sparse component analysis Nonlinear sparse component analysis Why extend ICA? identifiability of ICA only holds if data follows generative model with independent sources simulation apply ICA to data not fulfilling the ICA model here sources consist of a 2d- and a 1-d irreducible component plot Amari-error over 1 runs.
103 Independent component analysis Sparse component analysis Nonlinear sparse component analysis Why extend ICA? identifiability of ICA only holds if data follows generative model with independent sources simulation apply ICA to data not fulfilling the ICA model here sources consist of a 2d- and a 1-d irreducible component plot Amari-error over 1 runs crosstalking error FastICA JADE Extended Infomax result: no recovery of mixing matrix
104 Independent component analysis Sparse component analysis Nonlinear sparse component analysis require stochastic independence only between groups of source components nk-dimensional S is to be k-independent i.e. 1 1 S 1. S k C B S nk k+1. S nk mutually independent independent subspace analysis (ISA) recent result: extension to arbitrary group-size major advantage: general independent subspace analysis (ISA) always exists C A [Theis. Uniqueness of complex and multidimensional independent component analysis. Signal Processing, 24]
105 Independent component analysis Sparse component analysis Nonlinear sparse component analysis require stochastic independence only between groups of source components nk-dimensional S is to be k-independent i.e. 1 1 S 1. S k C B S nk k+1. S nk mutually independent independent subspace analysis (ISA) recent result: extension to arbitrary group-size major advantage: general independent subspace analysis (ISA) always exists C A [Theis. Uniqueness of complex and multidimensional independent component analysis. Signal Processing, 24]
106 Independent component analysis Sparse component analysis Nonlinear sparse component analysis PCA X A S
107 Independent component analysis Sparse component analysis Nonlinear sparse component analysis ICA X A S L P
108 Independent component analysis Sparse component analysis Nonlinear sparse component analysis ISA with fixed groupsize X A S L P
109 Independent component analysis Sparse component analysis Nonlinear sparse component analysis General ISA X A S L P
110 Independent component analysis Sparse component analysis Nonlinear sparse component analysis ISA framework Definition Y independent component of X : X = A(Y, Z) such that Y and Z are stochastically independent. Definition (general ISA) S is irreducible if it contains no lower-dim. independent cpt. W Gl(n) independent subspace analysis of X : WX = (S 1,..., S k ) with pairwise independent, irreducible S i Theorem Given a random vector X with existing covariance, then an ISA of X exists and is unique except for scaling and permutation.
111 Independent component analysis Sparse component analysis Nonlinear sparse component analysis ISA framework Definition Y independent component of X : X = A(Y, Z) such that Y and Z are stochastically independent. Definition (general ISA) S is irreducible if it contains no lower-dim. independent cpt. W Gl(n) independent subspace analysis of X : WX = (S 1,..., S k ) with pairwise independent, irreducible S i Theorem Given a random vector X with existing covariance, then an ISA of X exists and is unique except for scaling and permutation.
112 Independent component analysis Sparse component analysis Nonlinear sparse component analysis ISA framework Definition Y independent component of X : X = A(Y, Z) such that Y and Z are stochastically independent. Definition (general ISA) S is irreducible if it contains no lower-dim. independent cpt. W Gl(n) independent subspace analysis of X : WX = (S 1,..., S k ) with pairwise independent, irreducible S i Theorem Given a random vector X with existing covariance, then an ISA of X exists and is unique except for scaling and permutation.
113 Independent component analysis Sparse component analysis Nonlinear sparse component analysis Algebraic ISA algorithms main idea: source condition matrices C i (S) are block-diagonal subspace JADE after whitening assume orthogonal A group-independence of S: contracted quadricovariance matrices C ij (S) are block-diagonal perform joint block diagonalization of {C ij (X)} to get A for general ISA, estimate block-structure after diagonalization C ij (S) = A C ij (X) A [Theis. Towards a general independent subspace analysis. NIPS 26 accepted]
114 Independent component analysis Sparse component analysis Nonlinear sparse component analysis Algebraic ISA algorithms main idea: source condition matrices C i (S) are block-diagonal subspace JADE after whitening assume orthogonal A group-independence of S: contracted quadricovariance matrices C ij (S) are block-diagonal perform joint block diagonalization of {C ij (X)} to get A for general ISA, estimate block-structure after diagonalization C ij (S) = A C ij (X) A [Theis. Towards a general independent subspace analysis. NIPS 26 accepted]
115 Independent component analysis Sparse component analysis Nonlinear sparse component analysis Joint Block Diagonalization with unknown block-sizes Joint Block Diagonalization (JBD) given n n-matrices C 1,..., C K and a partition m, m m r = n of n goal: find orthogonal A such that k: A C k A is m-block-diagonal minimize (e.g. by applying iterative Givens-rotations) K f m (Â) :=  C k  diagm m ( C k Â) 2 F k=1 unknown blocksize m general JBD then searches for maximal-length block structure i.e. (A, m) = argmax m A:f m (A)= m result: JBD by JD: any block-optimal JBD i.e. zero of f m is a local minimum of ordinary joint diagonalization.
116 Independent component analysis Sparse component analysis Nonlinear sparse component analysis Joint Block Diagonalization with unknown block-sizes Joint Block Diagonalization (JBD) given n n-matrices C 1,..., C K and a partition m, m m r = n of n goal: find orthogonal A such that k: A C k A is m-block-diagonal minimize (e.g. by applying iterative Givens-rotations) K f m (Â) :=  C k  diagm m ( C k Â) 2 F k=1 unknown blocksize m general JBD then searches for maximal-length block structure i.e. (A, m) = argmax m A:f m (A)= m result: JBD by JD: any block-optimal JBD i.e. zero of f m is a local minimum of ordinary joint diagonalization.
117 Independent component analysis Sparse component analysis Nonlinear sparse component analysis Joint Block Diagonalization with unknown block-sizes Joint Block Diagonalization (JBD) given n n-matrices C 1,..., C K and a partition m, m m r = n of n goal: find orthogonal A such that k: A C k A is m-block-diagonal minimize (e.g. by applying iterative Givens-rotations) K f m (Â) :=  C k  diagm m ( C k Â) 2 F k=1 unknown blocksize m general JBD then searches for maximal-length block structure i.e. (A, m) = argmax m A:f m (A)= m result: JBD by JD: any block-optimal JBD i.e. zero of f m is a local minimum of ordinary joint diagonalization.
118 Independent component analysis Sparse component analysis Nonlinear sparse component analysis Joint Block Diagonalization with unknown block-sizes Joint Block Diagonalization (JBD) given n n-matrices C 1,..., C K and a partition m, m m r = n of n goal: find orthogonal A such that k: A C k A is m-block-diagonal minimize (e.g. by applying iterative Givens-rotations) K f m (Â) :=  C k  diagm m ( C k Â) 2 F k=1 unknown blocksize m general JBD then searches for maximal-length block structure i.e. (A, m) = argmax m A:f m (A)= m result: JBD by JD: any block-optimal JBD i.e. zero of f m is a local minimum of ordinary joint diagonalization.
119 Independent component analysis Sparse component analysis Nonlinear sparse component analysis Example (unknown) C 1  A w/o rec. P  A. performance of the proposed general JBD (unknown) block-partition 4 = additive noise with SNR of 5dB, K = 1 matrices result: estimate  equals A after permutation recovery
120 Independent component analysis Sparse component analysis Nonlinear sparse component analysis Extraction of fetal electrocardiograms separate fetal ECG (FECG) recordings from the mother s ECG (MECG) apply Hessian-based MICA with k = 2 and 5 Hessians
121 Independent component analysis Sparse component analysis Nonlinear sparse component analysis (a) ECG recordings (b) extracted sources (c) MECG part (d) FECG part
122 Independent component analysis Sparse component analysis Nonlinear sparse component analysis Sparse component analysis sparse [Theis, Puntonet, Lang. Median-based clustering for underdetermined blind signal processing. IEEE SPL, 25]
123 Independent component analysis Sparse component analysis Nonlinear sparse component analysis Model Sparse Component Analysis (SCA) problem x(t) = As(t) observed mixtures x(t) R m A (unknown) real matrix with linearly independent columns s(t) (unknown) (m 1)-sparse sources s(t) R n i.e. s(t) has at most (m 1) non-zeros goal: recover unknown A and s(t) given only x(t) Theorem If s(t) is (m 1)-sparse and A and s(t) in general position, both A and s(t) are identifiable (except for scaling and permutation). [Georgiev, Theis, Cichocki. Sparse component analysis and blind source separation of underdetermined mixtures. IEEE TNN, 25]
124 Independent component analysis Sparse component analysis Nonlinear sparse component analysis Model Sparse Component Analysis (SCA) problem x(t) = As(t) observed mixtures x(t) R m A (unknown) real matrix with linearly independent columns s(t) (unknown) (m 1)-sparse sources s(t) R n i.e. s(t) has at most (m 1) non-zeros goal: recover unknown A and s(t) given only x(t) Theorem If s(t) is (m 1)-sparse and A and s(t) in general position, both A and s(t) are identifiable (except for scaling and permutation). [Georgiev, Theis, Cichocki. Sparse component analysis and blind source separation of underdetermined mixtures. IEEE TNN, 25]
125 Independent component analysis Sparse component analysis Nonlinear sparse component analysis SCA algorithm matrix identification by multiple hyperplane detection e.g. using Hough transform robust against outliers and noise source recovery using sparsity and known matrix [Theis, Georgiev, Cichocki. Robust sparse component analysis based on a generalized Hough transform. Signal Processing 26]
126 Independent component analysis Sparse component analysis Nonlinear sparse component analysis SCA of surface electromyograms electromyogram (EMG): electric signal generated by a contracting muscle surface EMG: non-invasive, however source overlaps cooperation with G. García, Bioinformatic Engineering, Osaka
127 Independent component analysis Sparse component analysis Nonlinear sparse component analysis Results source and SCA recovery within 8 artificial, dependent mixtures results on toy data: sparseness works as separation criterion real data relative semg enhancement 24.6 ± 21.4% (mean over 9 subjects) beats standard signal processing and ICA [Theis, García. On the use of sparse signal decomposition in the analysis of multi-channel surface EMGs. Signal Processing, 26]
128 Independent component analysis Sparse component analysis Nonlinear sparse component analysis Results source and SCA recovery within 8 artificial, dependent mixtures results on toy data: sparseness works as separation criterion real data relative semg enhancement 24.6 ± 21.4% (mean over 9 subjects) beats standard signal processing and ICA [Theis, García. On the use of sparse signal decomposition in the analysis of multi-channel surface EMGs. Signal Processing, 26]
129 Independent component analysis Sparse component analysis Nonlinear sparse component analysis SCA of functional MRI data cc:.16 2 cc:.28 3 cc: cc:.4 5 cc:.88 component maps (S) time courses (A) complete SCA was performed using k-means hyperplane clustering components 2 and 3 represents inner ventricles, component 1 contains the frontal eye fields component 5 is desired visual stimulus component active in the visual cortex (crosscorrelation with stimulus cc =.88 fastica yields similar cc =.9)
130 Independent component analysis Sparse component analysis Nonlinear sparse component analysis SCA of functional MRI data cc:.16 2 cc:.28 3 cc: cc:.4 5 cc:.88 component maps (S) time courses (A) complete SCA was performed using k-means hyperplane clustering components 2 and 3 represents inner ventricles, component 1 contains the frontal eye fields component 5 is desired visual stimulus component active in the visual cortex (crosscorrelation with stimulus cc =.88 fastica yields similar cc =.9)
131 Independent component analysis Sparse component analysis Nonlinear sparse component analysis Postnonlinear SCA Given m-dimensional random vector x, find representation x = f(as) with unknown n-dim. random vector s (sources) m n-matrix A (mixing matrix) diagonal invertible f = f 1... f m (postnonlinearities) postnonlinear ICA s independent (see (?)) here: SCA model s is (m 1)-sparse
132 Independent component analysis Sparse component analysis Nonlinear sparse component analysis Overcomplete postnonlinear cocktail-party problem
133 Independent component analysis Sparse component analysis Nonlinear sparse component analysis Overcomplete postnonlinear cocktail-party problem
134 Independent component analysis Sparse component analysis Nonlinear sparse component analysis Overcomplete postnonlinear cocktail-party problem
135 Independent component analysis Sparse component analysis Nonlinear sparse component analysis Postnonlinearity identification lemma Given an invertible 2 2-matrix A, define L at as L := A([, ɛ) {} {} [, ɛ)). Lemma If a diagonal analytic diffeomorphism h := h 1 h 2 maps an L (in general position ) at again on an L at, then it is a linear scaling. h
136 Independent component analysis Sparse component analysis Nonlinear sparse component analysis Identifiability due to linear identifiability it is enough show that if f(as) = ˆf(Âŝ) then h = ˆf 1 f is linear scaling case m = 2: image of As and Âŝ are finite unions of L s, so this follows from previous lemma h h
137 Independent component analysis Sparse component analysis Nonlinear sparse component analysis Identifiability due to linear identifiability it is enough show that if f(as) = ˆf(Âŝ) then h = ˆf 1 f is linear scaling case m = 2: image of As and Âŝ are finite unions of L s, so this follows from previous lemma h h
138 Independent component analysis Sparse component analysis Nonlinear sparse component analysis Identifiability due to linear identifiability it is enough show that if f(as) = ˆf(Âŝ) then h = ˆf 1 f is linear scaling case m = 2: image of As and Âŝ are finite unions of L s, so this follows from previous lemma h h
139 Independent component analysis Sparse component analysis Nonlinear sparse component analysis Identifiability: proof A f R 3 R 2 A f R 2 R 3 R 2
140 Independent component analysis Sparse component analysis Nonlinear sparse component analysis Algorithm multistage separation algorithm: find separating nonlinearities g estimate mixing matrix  of linearized model g(x) estimate sources given  and g(x) how can g be found algorithmically?
141 Independent component analysis Sparse component analysis Nonlinear sparse component analysis Algorithm multistage separation algorithm: find separating nonlinearities g estimate mixing matrix  of linearized model g(x) estimate sources given  and g(x) how can g be found algorithmically?
142 Independent component analysis Sparse component analysis Nonlinear sparse component analysis Postnonlinearity detection for simplicity assume m = 2. geometrical preprocessing: determine two 1-dimensional submanifolds in the image of x find curves y(t) and z(t) in R 2 which are mapped onto an L by g. simple method: choose arbitrary starting points y(t 1) and z(t 1) among samples of x iteratively pick closest sample to previous y(t i 1 ) resp. z(t i 1 ) with smaller modulus
143 Independent component analysis Sparse component analysis Nonlinear sparse component analysis Postnonlinearity detection for simplicity assume m = 2. geometrical preprocessing: determine two 1-dimensional submanifolds in the image of x find curves y(t) and z(t) in R 2 which are mapped onto an L by g. simple method: choose arbitrary starting points y(t 1) and z(t 1) among samples of x iteratively pick closest sample to previous y(t i 1 ) resp. z(t i 1 ) with smaller modulus
144 Independent component analysis Sparse component analysis Nonlinear sparse component analysis Postnonlinearity detection for simplicity assume m = 2. geometrical preprocessing: determine two 1-dimensional submanifolds in the image of x find curves y(t) and z(t) in R 2 which are mapped onto an L by g. simple method: choose arbitrary starting points y(t 1) and z(t 1) among samples of x iteratively pick closest sample to previous y(t i 1 ) resp. z(t i 1 ) with smaller modulus
145 Supervised methods Independent component analysis Sparse component analysis Nonlinear sparse component analysis Postnonlinearity detection F. Theis Biomedical signal processing application of optimization methods for machi
146 Independent component analysis Sparse component analysis Nonlinear sparse component analysis Supervised methods Postnonlinearity detection f A F. Theis Biomedical signal processing application of optimization methods for machi
147 Independent component analysis Sparse component analysis Nonlinear sparse component analysis Supervised methods Postnonlinearity detection f A mixture density F. Theis Biomedical signal processing application of optimization methods for machi
148 Independent component analysis Sparse component analysis Nonlinear sparse component analysis Supervised methods Postnonlinearity detection f A mixture density geometrical preprocessing F. Theis Biomedical signal processing application of optimization methods for machi
149 Independent component analysis Sparse component analysis Nonlinear sparse component analysis Postnonlinearity detection reparametrization (ȳ := y y 1 1 ) of the curves gives y 1 = z 1 = id. hence g y = (g 1, ag 1 ) and g z = (g 1, bg 1 ) g 2 y 2 = ag 1 = a b g 2 z 2 analytical geometrical postnonlinearity detection: find analytical 1d diffeomorphism g with g y = cg z for c, ±1 and given curves y, z : ( 1, 1) R with y() = z() =. note c = y ()/z ()
150 Independent component analysis Sparse component analysis Nonlinear sparse component analysis Postnonlinearity detection reparametrization (ȳ := y y 1 1 ) of the curves gives y 1 = z 1 = id. hence g y = (g 1, ag 1 ) and g z = (g 1, bg 1 ) g 2 y 2 = ag 1 = a b g 2 z 2 analytical geometrical postnonlinearity detection: find analytical 1d diffeomorphism g with g y = cg z for c, ±1 and given curves y, z : ( 1, 1) R with y() = z() =. note c = y ()/z ()
151 Independent component analysis Sparse component analysis Nonlinear sparse component analysis Postnonlinearity detection reparametrization (ȳ := y y 1 1 ) of the curves gives y 1 = z 1 = id. hence g y = (g 1, ag 1 ) and g z = (g 1, bg 1 ) g 2 y 2 = ag 1 = a b g 2 z 2 analytical geometrical postnonlinearity detection: find analytical 1d diffeomorphism g with g y = cg z for c, ±1 and given curves y, z : ( 1, 1) R with y() = z() =. note c = y ()/z ()
152 Independent component analysis Sparse component analysis Nonlinear sparse component analysis Postnonlinearity detection equation g y = cg z can be solved in different ways: calculate composite derivatives using Faá di Bruno s formula derivatives of y and z lead to estimation of derivatives of g least-squares P polynomial fit of g using energy function E = 1 T 2T i=1 (g(y(t i)) cg(z(t i ))) 2 MLP approximation of g using E from above fix g() = and g () = 1.
153 Independent component analysis Sparse component analysis Nonlinear sparse component analysis Postnonlinearity detection equation g y = cg z can be solved in different ways: calculate composite derivatives using Faá di Bruno s formula derivatives of y and z lead to estimation of derivatives of g least-squares P polynomial fit of g using energy function E = 1 T 2T i=1 (g(y(t i)) cg(z(t i ))) 2 MLP approximation of g using E from above fix g() = and g () = 1.
154 Independent component analysis Sparse component analysis Nonlinear sparse component analysis Artificial mixtures artificial example postnonlinear mixture of n = 3 uniform sources (1 5 samples) to m = 2 observations postnonlinear mixing model x = f 1 f «2(As) mixing matrix A = postnonlinearities f 1(x) = tanh(x) +.1x and f 2(x) = x algorithm MLP based postnonlinearity detection algorithm natural gradient-descent learning parameters: 9 hidden neurons, learning rate of η =.1 and 1 5 iterations
155 Independent component analysis Sparse component analysis Nonlinear sparse component analysis Artificial mixtures artificial example postnonlinear mixture of n = 3 uniform sources (1 5 samples) to m = 2 observations postnonlinear mixing model x = f 1 f «2(As) mixing matrix A = postnonlinearities f 1(x) = tanh(x) +.1x and f 2(x) = x algorithm MLP based postnonlinearity detection algorithm natural gradient-descent learning parameters: 9 hidden neurons, learning rate of η =.1 and 1 5 iterations
156 Independent component analysis Sparse component analysis Nonlinear sparse component analysis PNL detection f 1 f mixing pnls f
157 Independent component analysis Sparse component analysis Nonlinear sparse component analysis PNL detection f 1 f 2 g 1 g mixing pnls f separating pnls g 5 5 5
158 Independent component analysis Sparse component analysis Nonlinear sparse component analysis PNL detection f 1 f 2 g 1 g mixing pnls f separating pnls g SIRs: 26, 71 and 46 db density of recovered sources
159 analyze statistical patterns in data sets x(t) method: factorization model x(t) = f (s(t)) supervised training of f nearest neighbor (local), regression (global) unsupervised identification (often linear) clustering (local model), blind source separation (linear model) applications: biomedical data analysis, signal processing, financial markets etc.
160 Current application with T. Schröder, HMGU unsupervised clustering of subtrees supervised learning of cell shapes parameter estimation of dynamical system for cell fate decision
Natural Gradient Learning for Over- and Under-Complete Bases in ICA
NOTE Communicated by Jean-François Cardoso Natural Gradient Learning for Over- and Under-Complete Bases in ICA Shun-ichi Amari RIKEN Brain Science Institute, Wako-shi, Hirosawa, Saitama 351-01, Japan Independent
More informationIndependent Component Analysis (ICA)
Independent Component Analysis (ICA) Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr
More informationStatistical Machine Learning
Statistical Machine Learning Christoph Lampert Spring Semester 2015/2016 // Lecture 12 1 / 36 Unsupervised Learning Dimensionality Reduction 2 / 36 Dimensionality Reduction Given: data X = {x 1,..., x
More informationMachine Learning Lecture 5
Machine Learning Lecture 5 Linear Discriminant Functions 26.10.2017 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de leibe@vision.rwth-aachen.de Course Outline Fundamentals Bayes Decision Theory
More informationHST.582J/6.555J/16.456J
Blind Source Separation: PCA & ICA HST.582J/6.555J/16.456J Gari D. Clifford gari [at] mit. edu http://www.mit.edu/~gari G. D. Clifford 2005-2009 What is BSS? Assume an observation (signal) is a linear
More informationIndependent Component Analysis. Contents
Contents Preface xvii 1 Introduction 1 1.1 Linear representation of multivariate data 1 1.1.1 The general statistical setting 1 1.1.2 Dimension reduction methods 2 1.1.3 Independence as a guiding principle
More informationAdvanced Introduction to Machine Learning CMU-10715
Advanced Introduction to Machine Learning CMU-10715 Independent Component Analysis Barnabás Póczos Independent Component Analysis 2 Independent Component Analysis Model original signals Observations (Mixtures)
More informationECE521 week 3: 23/26 January 2017
ECE521 week 3: 23/26 January 2017 Outline Probabilistic interpretation of linear regression - Maximum likelihood estimation (MLE) - Maximum a posteriori (MAP) estimation Bias-variance trade-off Linear
More informationIndependent Subspace Analysis
Independent Subspace Analysis Barnabás Póczos Supervisor: Dr. András Lőrincz Eötvös Loránd University Neural Information Processing Group Budapest, Hungary MPI, Tübingen, 24 July 2007. Independent Component
More informationGatsby Theoretical Neuroscience Lectures: Non-Gaussian statistics and natural images Parts I-II
Gatsby Theoretical Neuroscience Lectures: Non-Gaussian statistics and natural images Parts I-II Gatsby Unit University College London 27 Feb 2017 Outline Part I: Theory of ICA Definition and difference
More informationRecap from previous lecture
Recap from previous lecture Learning is using past experience to improve future performance. Different types of learning: supervised unsupervised reinforcement active online... For a machine, experience
More informationNatural Image Statistics
Natural Image Statistics A probabilistic approach to modelling early visual processing in the cortex Dept of Computer Science Early visual processing LGN V1 retina From the eye to the primary visual cortex
More informationHST.582J / 6.555J / J Biomedical Signal and Image Processing Spring 2007
MIT OpenCourseWare http://ocw.mit.edu HST.582J / 6.555J / 16.456J Biomedical Signal and Image Processing Spring 2007 For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.
More informationIndependent Component Analysis
Independent Component Analysis Seungjin Choi Department of Computer Science Pohang University of Science and Technology, Korea seungjin@postech.ac.kr March 4, 2009 1 / 78 Outline Theory and Preliminaries
More informationDiscriminative Direction for Kernel Classifiers
Discriminative Direction for Kernel Classifiers Polina Golland Artificial Intelligence Lab Massachusetts Institute of Technology Cambridge, MA 02139 polina@ai.mit.edu Abstract In many scientific and engineering
More informationNONLINEAR CLASSIFICATION AND REGRESSION. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition
NONLINEAR CLASSIFICATION AND REGRESSION Nonlinear Classification and Regression: Outline 2 Multi-Layer Perceptrons The Back-Propagation Learning Algorithm Generalized Linear Models Radial Basis Function
More informationLinear Models for Regression. Sargur Srihari
Linear Models for Regression Sargur srihari@cedar.buffalo.edu 1 Topics in Linear Regression What is regression? Polynomial Curve Fitting with Scalar input Linear Basis Function Models Maximum Likelihood
More informationUnsupervised learning: beyond simple clustering and PCA
Unsupervised learning: beyond simple clustering and PCA Liza Rebrova Self organizing maps (SOM) Goal: approximate data points in R p by a low-dimensional manifold Unlike PCA, the manifold does not have
More informationChap 1. Overview of Statistical Learning (HTF, , 2.9) Yongdai Kim Seoul National University
Chap 1. Overview of Statistical Learning (HTF, 2.1-2.6, 2.9) Yongdai Kim Seoul National University 0. Learning vs Statistical learning Learning procedure Construct a claim by observing data or using logics
More informationCIFAR Lectures: Non-Gaussian statistics and natural images
CIFAR Lectures: Non-Gaussian statistics and natural images Dept of Computer Science University of Helsinki, Finland Outline Part I: Theory of ICA Definition and difference to PCA Importance of non-gaussianity
More informationIndependent Component Analysis on the Basis of Helmholtz Machine
Independent Component Analysis on the Basis of Helmholtz Machine Masashi OHATA *1 ohatama@bmc.riken.go.jp Toshiharu MUKAI *1 tosh@bmc.riken.go.jp Kiyotoshi MATSUOKA *2 matsuoka@brain.kyutech.ac.jp *1 Biologically
More informationMassoud BABAIE-ZADEH. Blind Source Separation (BSS) and Independent Componen Analysis (ICA) p.1/39
Blind Source Separation (BSS) and Independent Componen Analysis (ICA) Massoud BABAIE-ZADEH Blind Source Separation (BSS) and Independent Componen Analysis (ICA) p.1/39 Outline Part I Part II Introduction
More informationRecursive Generalized Eigendecomposition for Independent Component Analysis
Recursive Generalized Eigendecomposition for Independent Component Analysis Umut Ozertem 1, Deniz Erdogmus 1,, ian Lan 1 CSEE Department, OGI, Oregon Health & Science University, Portland, OR, USA. {ozertemu,deniz}@csee.ogi.edu
More informationIndependent Component Analysis and Its Applications. By Qing Xue, 10/15/2004
Independent Component Analysis and Its Applications By Qing Xue, 10/15/2004 Outline Motivation of ICA Applications of ICA Principles of ICA estimation Algorithms for ICA Extensions of basic ICA framework
More informationMachine Learning Lecture 7
Course Outline Machine Learning Lecture 7 Fundamentals (2 weeks) Bayes Decision Theory Probability Density Estimation Statistical Learning Theory 23.05.2016 Discriminative Approaches (5 weeks) Linear Discriminant
More informationPattern Recognition and Machine Learning
Christopher M. Bishop Pattern Recognition and Machine Learning ÖSpri inger Contents Preface Mathematical notation Contents vii xi xiii 1 Introduction 1 1.1 Example: Polynomial Curve Fitting 4 1.2 Probability
More informationCross-Entropy Optimization for Independent Process Analysis
Cross-Entropy Optimization for Independent Process Analysis Zoltán Szabó, Barnabás Póczos, and András Lőrincz Department of Information Systems Eötvös Loránd University, Budapest, Hungary Research Group
More informationLinear & nonlinear classifiers
Linear & nonlinear classifiers Machine Learning Hamid Beigy Sharif University of Technology Fall 1394 Hamid Beigy (Sharif University of Technology) Linear & nonlinear classifiers Fall 1394 1 / 34 Table
More informationEEL 851: Biometrics. An Overview of Statistical Pattern Recognition EEL 851 1
EEL 851: Biometrics An Overview of Statistical Pattern Recognition EEL 851 1 Outline Introduction Pattern Feature Noise Example Problem Analysis Segmentation Feature Extraction Classification Design Cycle
More informationComputer Vision Group Prof. Daniel Cremers. 2. Regression (cont.)
Prof. Daniel Cremers 2. Regression (cont.) Regression with MLE (Rep.) Assume that y is affected by Gaussian noise : t = f(x, w)+ where Thus, we have p(t x, w, )=N (t; f(x, w), 2 ) 2 Maximum A-Posteriori
More information1 Introduction Independent component analysis (ICA) [10] is a statistical technique whose main applications are blind source separation, blind deconvo
The Fixed-Point Algorithm and Maximum Likelihood Estimation for Independent Component Analysis Aapo Hyvarinen Helsinki University of Technology Laboratory of Computer and Information Science P.O.Box 5400,
More informationArtificial Neural Networks. MGS Lecture 2
Artificial Neural Networks MGS 2018 - Lecture 2 OVERVIEW Biological Neural Networks Cell Topology: Input, Output, and Hidden Layers Functional description Cost functions Training ANNs Back-Propagation
More informationPCA & ICA. CE-717: Machine Learning Sharif University of Technology Spring Soleymani
PCA & ICA CE-717: Machine Learning Sharif University of Technology Spring 2015 Soleymani Dimensionality Reduction: Feature Selection vs. Feature Extraction Feature selection Select a subset of a given
More informationLecture 3: Pattern Classification
EE E6820: Speech & Audio Processing & Recognition Lecture 3: Pattern Classification 1 2 3 4 5 The problem of classification Linear and nonlinear classifiers Probabilistic classification Gaussians, mixtures
More informationSupport Vector Machines for Classification: A Statistical Portrait
Support Vector Machines for Classification: A Statistical Portrait Yoonkyung Lee Department of Statistics The Ohio State University May 27, 2011 The Spring Conference of Korean Statistical Society KAIST,
More informationKernel Methods. Machine Learning A W VO
Kernel Methods Machine Learning A 708.063 07W VO Outline 1. Dual representation 2. The kernel concept 3. Properties of kernels 4. Examples of kernel machines Kernel PCA Support vector regression (Relevance
More informationCPSC 340: Machine Learning and Data Mining. Sparse Matrix Factorization Fall 2018
CPSC 340: Machine Learning and Data Mining Sparse Matrix Factorization Fall 2018 Last Time: PCA with Orthogonal/Sequential Basis When k = 1, PCA has a scaling problem. When k > 1, have scaling, rotation,
More informationLinear & nonlinear classifiers
Linear & nonlinear classifiers Machine Learning Hamid Beigy Sharif University of Technology Fall 1396 Hamid Beigy (Sharif University of Technology) Linear & nonlinear classifiers Fall 1396 1 / 44 Table
More informationMachine Learning 2017
Machine Learning 2017 Volker Roth Department of Mathematics & Computer Science University of Basel 21st March 2017 Volker Roth (University of Basel) Machine Learning 2017 21st March 2017 1 / 41 Section
More informationCSCI-567: Machine Learning (Spring 2019)
CSCI-567: Machine Learning (Spring 2019) Prof. Victor Adamchik U of Southern California Mar. 19, 2019 March 19, 2019 1 / 43 Administration March 19, 2019 2 / 43 Administration TA3 is due this week March
More informationIndependent Component Analysis and Unsupervised Learning
Independent Component Analysis and Unsupervised Learning Jen-Tzung Chien National Cheng Kung University TABLE OF CONTENTS 1. Independent Component Analysis 2. Case Study I: Speech Recognition Independent
More informationNeural Network Training
Neural Network Training Sargur Srihari Topics in Network Training 0. Neural network parameters Probabilistic problem formulation Specifying the activation and error functions for Regression Binary classification
More informationIndependent Component Analysis and Unsupervised Learning. Jen-Tzung Chien
Independent Component Analysis and Unsupervised Learning Jen-Tzung Chien TABLE OF CONTENTS 1. Independent Component Analysis 2. Case Study I: Speech Recognition Independent voices Nonparametric likelihood
More informationReal and Complex Independent Subspace Analysis by Generalized Variance
Real and Complex Independent Subspace Analysis by Generalized Variance Neural Information Processing Group, Department of Information Systems, Eötvös Loránd University, Budapest, Hungary ICA Research Network
More informationData Mining. Dimensionality reduction. Hamid Beigy. Sharif University of Technology. Fall 1395
Data Mining Dimensionality reduction Hamid Beigy Sharif University of Technology Fall 1395 Hamid Beigy (Sharif University of Technology) Data Mining Fall 1395 1 / 42 Outline 1 Introduction 2 Feature selection
More informationECE 521. Lecture 11 (not on midterm material) 13 February K-means clustering, Dimensionality reduction
ECE 521 Lecture 11 (not on midterm material) 13 February 2017 K-means clustering, Dimensionality reduction With thanks to Ruslan Salakhutdinov for an earlier version of the slides Overview K-means clustering
More informationCPSC 340: Machine Learning and Data Mining. More PCA Fall 2017
CPSC 340: Machine Learning and Data Mining More PCA Fall 2017 Admin Assignment 4: Due Friday of next week. No class Monday due to holiday. There will be tutorials next week on MAP/PCA (except Monday).
More informationCh 4. Linear Models for Classification
Ch 4. Linear Models for Classification Pattern Recognition and Machine Learning, C. M. Bishop, 2006. Department of Computer Science and Engineering Pohang University of Science and echnology 77 Cheongam-ro,
More informationLinear Regression and Its Applications
Linear Regression and Its Applications Predrag Radivojac October 13, 2014 Given a data set D = {(x i, y i )} n the objective is to learn the relationship between features and the target. We usually start
More informationCS534 Machine Learning - Spring Final Exam
CS534 Machine Learning - Spring 2013 Final Exam Name: You have 110 minutes. There are 6 questions (8 pages including cover page). If you get stuck on one question, move on to others and come back to the
More informationLinear Models for Classification
Linear Models for Classification Oliver Schulte - CMPT 726 Bishop PRML Ch. 4 Classification: Hand-written Digit Recognition CHINE INTELLIGENCE, VOL. 24, NO. 24, APRIL 2002 x i = t i = (0, 0, 0, 1, 0, 0,
More informationBANA 7046 Data Mining I Lecture 6. Other Data Mining Algorithms 1
BANA 7046 Data Mining I Lecture 6. Other Data Mining Algorithms 1 Shaobo Li University of Cincinnati 1 Partially based on Hastie, et al. (2009) ESL, and James, et al. (2013) ISLR Data Mining I Lecture
More informationPATTERN CLASSIFICATION
PATTERN CLASSIFICATION Second Edition Richard O. Duda Peter E. Hart David G. Stork A Wiley-lnterscience Publication JOHN WILEY & SONS, INC. New York Chichester Weinheim Brisbane Singapore Toronto CONTENTS
More informationMachine learning for pervasive systems Classification in high-dimensional spaces
Machine learning for pervasive systems Classification in high-dimensional spaces Department of Communications and Networking Aalto University, School of Electrical Engineering stephan.sigg@aalto.fi Version
More informationRobustness of Principal Components
PCA for Clustering An objective of principal components analysis is to identify linear combinations of the original variables that are useful in accounting for the variation in those original variables.
More informationRobust extraction of specific signals with temporal structure
Robust extraction of specific signals with temporal structure Zhi-Lin Zhang, Zhang Yi Computational Intelligence Laboratory, School of Computer Science and Engineering, University of Electronic Science
More informationLinear Regression (9/11/13)
STA561: Probabilistic machine learning Linear Regression (9/11/13) Lecturer: Barbara Engelhardt Scribes: Zachary Abzug, Mike Gloudemans, Zhuosheng Gu, Zhao Song 1 Why use linear regression? Figure 1: Scatter
More informationCOS513: FOUNDATIONS OF PROBABILISTIC MODELS LECTURE 9: LINEAR REGRESSION
COS513: FOUNDATIONS OF PROBABILISTIC MODELS LECTURE 9: LINEAR REGRESSION SEAN GERRISH AND CHONG WANG 1. WAYS OF ORGANIZING MODELS In probabilistic modeling, there are several ways of organizing models:
More informationLinear and Non-Linear Dimensionality Reduction
Linear and Non-Linear Dimensionality Reduction Alexander Schulz aschulz(at)techfak.uni-bielefeld.de University of Pisa, Pisa 4.5.215 and 7.5.215 Overview Dimensionality Reduction Motivation Linear Projections
More informationReading Group on Deep Learning Session 1
Reading Group on Deep Learning Session 1 Stephane Lathuiliere & Pablo Mesejo 2 June 2016 1/31 Contents Introduction to Artificial Neural Networks to understand, and to be able to efficiently use, the popular
More informationSTK Statistical Learning: Advanced Regression and Classification
STK4030 - Statistical Learning: Advanced Regression and Classification Riccardo De Bin debin@math.uio.no STK4030: lecture 1 1/ 42 Outline of the lecture Introduction Overview of supervised learning Variable
More informationLecture 5: Linear models for classification. Logistic regression. Gradient Descent. Second-order methods.
Lecture 5: Linear models for classification. Logistic regression. Gradient Descent. Second-order methods. Linear models for classification Logistic regression Gradient descent and second-order methods
More informationCheng Soon Ong & Christian Walder. Canberra February June 2018
Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 2018 Outlines Overview Introduction Linear Algebra Probability Linear Regression
More informationUnsupervised Feature Extraction by Time-Contrastive Learning and Nonlinear ICA
Unsupervised Feature Extraction by Time-Contrastive Learning and Nonlinear ICA with Hiroshi Morioka Dept of Computer Science University of Helsinki, Finland Facebook AI Summit, 13th June 2016 Abstract
More informationLecture 10: Dimension Reduction Techniques
Lecture 10: Dimension Reduction Techniques Radu Balan Department of Mathematics, AMSC, CSCAMM and NWC University of Maryland, College Park, MD April 17, 2018 Input Data It is assumed that there is a set
More informationMachine Learning. CUNY Graduate Center, Spring Lectures 11-12: Unsupervised Learning 1. Professor Liang Huang.
Machine Learning CUNY Graduate Center, Spring 2013 Lectures 11-12: Unsupervised Learning 1 (Clustering: k-means, EM, mixture models) Professor Liang Huang huang@cs.qc.cuny.edu http://acl.cs.qc.edu/~lhuang/teaching/machine-learning
More informationUnsupervised Learning with Permuted Data
Unsupervised Learning with Permuted Data Sergey Kirshner skirshne@ics.uci.edu Sridevi Parise sparise@ics.uci.edu Padhraic Smyth smyth@ics.uci.edu School of Information and Computer Science, University
More informationMachine Learning for Signal Processing Bayes Classification and Regression
Machine Learning for Signal Processing Bayes Classification and Regression Instructor: Bhiksha Raj 11755/18797 1 Recap: KNN A very effective and simple way of performing classification Simple model: For
More informationDIMENSIONALITY REDUCTION METHODS IN INDEPENDENT SUBSPACE ANALYSIS FOR SIGNAL DETECTION. Mijail Guillemard, Armin Iske, Sara Krause-Solberg
DIMENSIONALIY EDUCION MEHODS IN INDEPENDEN SUBSPACE ANALYSIS FO SIGNAL DEECION Mijail Guillemard, Armin Iske, Sara Krause-Solberg Department of Mathematics, University of Hamburg, {guillemard, iske, krause-solberg}@math.uni-hamburg.de
More informationCSC2515 Winter 2015 Introduction to Machine Learning. Lecture 2: Linear regression
CSC2515 Winter 2015 Introduction to Machine Learning Lecture 2: Linear regression All lecture slides will be available as.pdf on the course website: http://www.cs.toronto.edu/~urtasun/courses/csc2515/csc2515_winter15.html
More informationMachine Learning (BSMC-GA 4439) Wenke Liu
Machine Learning (BSMC-GA 4439) Wenke Liu 02-01-2018 Biomedical data are usually high-dimensional Number of samples (n) is relatively small whereas number of features (p) can be large Sometimes p>>n Problems
More informationLinear discriminant functions
Andrea Passerini passerini@disi.unitn.it Machine Learning Discriminative learning Discriminative vs generative Generative learning assumes knowledge of the distribution governing the data Discriminative
More informationAn Introduction to Independent Components Analysis (ICA)
An Introduction to Independent Components Analysis (ICA) Anish R. Shah, CFA Northfield Information Services Anish@northinfo.com Newport Jun 6, 2008 1 Overview of Talk Review principal components Introduce
More informationLecture 3: Pattern Classification. Pattern classification
EE E68: Speech & Audio Processing & Recognition Lecture 3: Pattern Classification 3 4 5 The problem of classification Linear and nonlinear classifiers Probabilistic classification Gaussians, mitures and
More informationFinal Overview. Introduction to ML. Marek Petrik 4/25/2017
Final Overview Introduction to ML Marek Petrik 4/25/2017 This Course: Introduction to Machine Learning Build a foundation for practice and research in ML Basic machine learning concepts: max likelihood,
More informationMachine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function.
Bayesian learning: Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function. Let y be the true label and y be the predicted
More informationLinear Models for Regression
Linear Models for Regression Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr
More informationIntroduction to Logistic Regression and Support Vector Machine
Introduction to Logistic Regression and Support Vector Machine guest lecturer: Ming-Wei Chang CS 446 Fall, 2009 () / 25 Fall, 2009 / 25 Before we start () 2 / 25 Fall, 2009 2 / 25 Before we start Feel
More informationThese slides follow closely the (English) course textbook Pattern Recognition and Machine Learning by Christopher Bishop
Music and Machine Learning (IFT68 Winter 8) Prof. Douglas Eck, Université de Montréal These slides follow closely the (English) course textbook Pattern Recognition and Machine Learning by Christopher Bishop
More informationCheng Soon Ong & Christian Walder. Canberra February June 2018
Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 218 Outlines Overview Introduction Linear Algebra Probability Linear Regression 1
More informationMachine Learning. Dimensionality reduction. Hamid Beigy. Sharif University of Technology. Fall 1395
Machine Learning Dimensionality reduction Hamid Beigy Sharif University of Technology Fall 1395 Hamid Beigy (Sharif University of Technology) Machine Learning Fall 1395 1 / 47 Table of contents 1 Introduction
More informationClassification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2012
Classification CE-717: Machine Learning Sharif University of Technology M. Soleymani Fall 2012 Topics Discriminant functions Logistic regression Perceptron Generative models Generative vs. discriminative
More informationMACHINE LEARNING. Methods for feature extraction and reduction of dimensionality: Probabilistic PCA and kernel PCA
1 MACHINE LEARNING Methods for feature extraction and reduction of dimensionality: Probabilistic PCA and kernel PCA 2 Practicals Next Week Next Week, Practical Session on Computer Takes Place in Room GR
More informationBrief Introduction of Machine Learning Techniques for Content Analysis
1 Brief Introduction of Machine Learning Techniques for Content Analysis Wei-Ta Chu 2008/11/20 Outline 2 Overview Gaussian Mixture Model (GMM) Hidden Markov Model (HMM) Support Vector Machine (SVM) Overview
More informationLecture 3: Statistical Decision Theory (Part II)
Lecture 3: Statistical Decision Theory (Part II) Hao Helen Zhang Hao Helen Zhang Lecture 3: Statistical Decision Theory (Part II) 1 / 27 Outline of This Note Part I: Statistics Decision Theory (Classical
More information1 Machine Learning Concepts (16 points)
CSCI 567 Fall 2018 Midterm Exam DO NOT OPEN EXAM UNTIL INSTRUCTED TO DO SO PLEASE TURN OFF ALL CELL PHONES Problem 1 2 3 4 5 6 Total Max 16 10 16 42 24 12 120 Points Please read the following instructions
More informationECE662: Pattern Recognition and Decision Making Processes: HW TWO
ECE662: Pattern Recognition and Decision Making Processes: HW TWO Purdue University Department of Electrical and Computer Engineering West Lafayette, INDIANA, USA Abstract. In this report experiments are
More informationWhy is Deep Learning so effective?
Ma191b Winter 2017 Geometry of Neuroscience The unreasonable effectiveness of deep learning This lecture is based entirely on the paper: Reference: Henry W. Lin and Max Tegmark, Why does deep and cheap
More informationLINEAR CLASSIFIERS. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition
LINEAR CLASSIFIERS Classification: Problem Statement 2 In regression, we are modeling the relationship between a continuous input variable x and a continuous target variable t. In classification, the input
More informationArtificial Intelligence Module 2. Feature Selection. Andrea Torsello
Artificial Intelligence Module 2 Feature Selection Andrea Torsello We have seen that high dimensional data is hard to classify (curse of dimensionality) Often however, the data does not fill all the space
More informationMultilayer Perceptron
Outline Hong Chang Institute of Computing Technology, Chinese Academy of Sciences Machine Learning Methods (Fall 2012) Outline Outline I 1 Introduction 2 Single Perceptron 3 Boolean Function Learning 4
More informationSemi-Blind approaches to source separation: introduction to the special session
Semi-Blind approaches to source separation: introduction to the special session Massoud BABAIE-ZADEH 1 Christian JUTTEN 2 1- Sharif University of Technology, Tehran, IRAN 2- Laboratory of Images and Signals
More informationCPSC 340: Machine Learning and Data Mining. More PCA Fall 2016
CPSC 340: Machine Learning and Data Mining More PCA Fall 2016 A2/Midterm: Admin Grades/solutions posted. Midterms can be viewed during office hours. Assignment 4: Due Monday. Extra office hours: Thursdays
More informationMidterm Review CS 6375: Machine Learning. Vibhav Gogate The University of Texas at Dallas
Midterm Review CS 6375: Machine Learning Vibhav Gogate The University of Texas at Dallas Machine Learning Supervised Learning Unsupervised Learning Reinforcement Learning Parametric Y Continuous Non-parametric
More informationLinear Classification. CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington
Linear Classification CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington 1 Example of Linear Classification Red points: patterns belonging
More informationClustering VS Classification
MCQ Clustering VS Classification 1. What is the relation between the distance between clusters and the corresponding class discriminability? a. proportional b. inversely-proportional c. no-relation Ans:
More informationwhere A 2 IR m n is the mixing matrix, s(t) is the n-dimensional source vector (n» m), and v(t) is additive white noise that is statistically independ
BLIND SEPARATION OF NONSTATIONARY AND TEMPORALLY CORRELATED SOURCES FROM NOISY MIXTURES Seungjin CHOI x and Andrzej CICHOCKI y x Department of Electrical Engineering Chungbuk National University, KOREA
More informationA two-layer ICA-like model estimated by Score Matching
A two-layer ICA-like model estimated by Score Matching Urs Köster and Aapo Hyvärinen University of Helsinki and Helsinki Institute for Information Technology Abstract. Capturing regularities in high-dimensional
More informationTo appear in Proceedings of the ICA'99, Aussois, France, A 2 R mn is an unknown mixture matrix of full rank, v(t) is the vector of noises. The
To appear in Proceedings of the ICA'99, Aussois, France, 1999 1 NATURAL GRADIENT APPROACH TO BLIND SEPARATION OF OVER- AND UNDER-COMPLETE MIXTURES L.-Q. Zhang, S. Amari and A. Cichocki Brain-style Information
More information