Machine Learning for Signal Processing. Analysis. Class Nov Instructor: Bhiksha Raj. 8 Nov /18797
|
|
- Nigel Craig
- 5 years ago
- Views:
Transcription
1 Machine Learning for Signal Processing Independent Component Analysis Class Nov 2012 Instructor: Bhiksha Raj 8 Nov /
2 A brief review of basic probability Uncorrelated: Two random variables X and Y are uncorrelated iff: The average value of the product of the variables equals the product of their individual averages Setup: Each draw produces one instance of X and one instance of Y I.e one instance of (X,Y) E[XY] = E[X]E[Y] The average value of X is the same regardless of the value of Y 8 Nov /
3 Uncorrelatedness Which of the above represent uncorrelated RVs? 8 Nov /
4 A brief review of basic probability Independence: Two random variables X and Y are independent iff: Their joint probability equals the product of their individual probabilities P(X,Y) = P(X)P(Y) The average value of X is the same regardless of the value of Y E[X Y] = E[X] 8 Nov /
5 A brief review of basic probability Independence: Two random variables X and Y are independent iff: The average value of any function X is the same regardless of the value of Y E[f(X)g(Y)] (Y)] = E[f(X)] E[g(Y)] for all f(), g() 8 Nov /
6 Independence Which of the above represent independent RVs? Which h represent uncorrelated RVs? 8 Nov /
7 A brief review of basic probability f(x) p(x) The expected value of an odd function of an RV is 0 if The RV is 0 mean The PDF is of the RV is symmetric around 0 E[f(X)] = 0 if f(x) is odd symmetric 8 Nov /
8 A brief review of basic info. theory T(all), M(ed), S(hort) ( X ) P( X )[ log P( X )] Entropy: Theminimum average number of bits to transmit to convey a symbol X X T, M, S M F F M.. Y ( X, Y ) P( X, Y )[ log P( X, Y )] Joint entropy: The minimum average number of bits to convey sets (pairs here) of symbols 8 Nov / X, Y
9 A brief review of basic info. theory X T, M, S ( X Y ) Y M F F M.. P( Y ) P( X Y )[ log P( X Y )] P( X, Y )[ Y X X, Y log P( X Y )] Conditional Entropy: The minimum average number of bits to transmit to convey a symbol X, after symbol Y has already been conveyed Averaged over all values of X and Y 8 Nov /
10 A brief review of basic info. theory ) ( )] ( log )[ ( ) ( )] ( log )[ ( ) ( ) ( X X P X P Y P Y X P Y X P Y P Y X Conditional entropy of X = (X) if X is Y X Y X Conditional entropy of X = (X) if X is independent of Y Y P X P Y X P Y X P Y X P Y X )] ( ) ( l )[ ( )] ( l )[ ( ) ( Y X Y X Y P X P Y X P Y X P Y X P Y X,, )] ( ) ( log )[, ( )], ( log )[, ( ), ( ) ( ) ( ) ( )log, ( ) ( )log, (,, Y X Y P Y X P X P Y X P Y X Y X Joint entropy of X and Y is the sum of the entropies of X and Y if they are independent p y p 8 Nov /
11 Onward.. 8 Nov /
12 Projection: multiple notes M = W = P = W (W T W) 1 W T Projected Spectrogram = P * M 8 Nov /
13 We re actually computing a score M = =? W = M ~ W = pinv(w)m 8 Nov /
14 ow about the other way? M = =?? W= U= M ~ W W = Mpinv(V) U = W 8 Nov /
15 So what are we doing here? =? W =? M ~ W is an approximation Given W, estimate to minimize error arg min M W 2 F arg min Must ideally find transcription of given notes i j M ij ( W) ij 2 8 Nov /
16 Going the other way.. W =?? W M ~ W is an approximation Given, estimate W to minimize error arg min W M W 2 F arg min ( W) Must ideally find the notes corresponding to the transcription 8 Nov /18797 i j M ij ij 2
17 When both parameters are unknown =? W =? approx(m) =? Must estimate both and W to best approximate M Ideally, must learn both the notes and their transcription! 8 Nov /
18 A least squares solution W Unconstrained 2, arg min M W W F, For any W, that minimizes the error, W =WA, =A -1 also minimizes the error for any invertible A Forour our problem, lets consider the truth.. When one note occurs, the other does not h it h j = 0 for all i!= j The rows of are uncorrelated 8 Nov /
19 A least squares solution Assume: T = I Normalizing all rows of to length 1 pinv() = T Projecting M onto W = M pinv() = M T W = M T W 2, arg min M W W, F 2 arg min M M T F Constraint: Rank() = 4 8 Nov /
20 Finding the notes arg min M M T 2 F Note T!= I Only T = I Could also be rewritten as T arg min trace M( I ) M T arg min trace M T M( I T ) arg min trace Correlation( M T )( I T ) arg max T trace Correlation( M ) T 8 Nov /
21 Finding the notes Constraint: every row of has length 1 T T T Correlation( M trace arg max trace ) Differentiating and equating to 0 T Correlation(M ) Simply pyrequiring the rows of to be orthonormal gives us that is the set of Eigenvectors of the data in M T 8 Nov /
22 Equivalences T T T Correlation( M trace arg max trace ) is identical to W, arg min 2 2 M W F i h W, i i i j h ij T i h j Minimize least squares error with the constraint that the rows of are length 1 and orthogonal to one another 8 Nov /
23 So how does that work? There are 12 notes in the segment, hence we try to estimate 12 notes.. 8 Nov /
24 So how does that work? The first three notes and their contributions The spectrograms of the notes are statistically uncorrelated 8 Nov /
25 Finding the notes Can find W instead of W arg min M W W T WM 2 F Solving the above, with the constraints that the columns of W are orthonormal gives you the eigen vectors of the data in M W arg max W trace T T W WCorrelation( M) trace W W Correlation( M) W W 8 Nov /
26 So how does that work? There are 12 notes in the segment, hence we try to estimate 12 notes.. 8 Nov /
27 Our notes are not orthogonal Overlapping frequencies Note occur concurrently armonica continues to resonate to previous note More generally, simple orthogonality will not give us the desired solution 8 Nov /
28 What else can we look for? Assume: The transcription of one note does not depend on what else is playing Or, in a multi instrument piece, instruments are playing independently of one another Not strictly true, but still.. 8 Nov /
29 Formulating it with Independence W, arg min W 2 M W ( rows. of.. are. independent), F Impose statistical independence constraints on decomposition 8 Nov /
30 Changing problems for a bit h 1 ( t) m1 ( t) w11h1 ( t) w12h2 ( t) m ( t ) w h ( t ) w h ( t ) 2( t h 2 ( t) Two people speak simultaneously Recorded by two microphones Each recorded ddsignal lis a mixture it of both signals 8 Nov /
31 Imposing Statistical Constraints M W = w21 w 11 w 12 w 22 M = W M = mixed signal W = notes = transcription Signal at mic 1 Signal at mic 2 Signal from speaker 1 Signal from speaker 2 Given only M estimate Ensure that the components of the vectors in the estimated are statistically independent Multiple approaches.. 8 Nov /
32 Imposing Statistical Constraints M W = w21 w 11 w 12 w 22 M = W Given only M estimate = W -1 M = AM Estimate A such that the components of AM are statistically independent A is the unmixing matrix Multiple approaches.. 8 Nov /
33 Statistical Independence M = W = AM Emulating independence Compute W (or A) and such that t has statistical ttiti characteristics that are observed in statistically independent variables Enforcing independence Compute W and such that the components of M are independent 8 Nov /
34 Emulating Independence Therows of areuncorrelated E[h i h j ] = E[h i ]E[h j ] h i and h j are the i th and j th components of any vector in The fourth order moments are independent E[h i h j h k h l ] = E[h i ]E[h j ]E[h k ]E[h l ] E[h i2 h j h k ] = E[h i2 ]E[h j ]E[h k ] E[h i2 h j2 ] = E[h i2 ]E[h j2 ] Etc. 8 Nov /
35 Zero Mean Usual to assume zero mean processes Otherwise, some of the math doesn t work well M = W = AM If mean(m) = 0 => mean() = 0 E[] [ ] = AE[M] = A0 = 0 First step of ICA: Set the mean of M to 0 1 m m i cols ( M ) i m i m m i are the columns of M i m i 8 Nov /
36 Emulating Independence.. = Diagonal + rank1 matrix Independence Uncorrelatedness Estimate a C such that CM is uncorrelated X = CM =AM A=BC =BCM E[x i x j ] = E[x i ]E[x j ] = ij [since M is now centered ] XX T = I In reality, we only want this to be a diagonal matrix, but we ll make it identity 8 Nov /
37 Decorrelating = Diagonal + rank1 matrix =AM A=BC =BCM X = CM XX T = I Eigen decomposition MM T = USU T Let C = S -1/2 U T WMM T W T = S -1/2 U T USU T US -1/2 = I 8 Nov /
38 Decorrelating = Diagonal + rank1 matrix =AM A=BC =BCM X = CM XX T = I Eigen decomposition MM T = ESE T Let C = S -1/2 E T WMM T W T = S -1/2 E T ESE T ES -1/2 = I X is called the whitened version of M The process of decorrelating M is called whitening C is the whitening i matrix ti 8 Nov /
39 Uncorrelated!= Independent Whitening merely ensures that the resulting shat signals are uncorrelated, i.e. E[x i x j ] = 0 if i!= j j This does not ensure higher order moments are also decoupled, e.g. it does not ensurethat E[x i2 x j2 ] = E[x i2 ]E [x j2 ] This is one of the signatures of independent RVs Lets explicitly itl decouple the fourth order moments 8 Nov /
40 Decorrelating = Diagonal + rank1 matrix =AM A=BC =BX X = CM XX T = I Will multiplying li li X by B re correlate the components? Not if B is unitary BB T = B T B = I So we want to find a unitary matrix Since the rows of are uncorrelated Because they are independent d 8 Nov /
41 ICA: Freeing Fourth Moments We have E[x i x j ] = 0 if i!= j Already been decorrelated A=BC, = BCM, X = CM, = BX The fourth moments of have the form: E[h i h j h k h l ] If the rows of were independent E[h i h j h k h l ] = E[h i ] E[h j ] E[h k ] E[h l ] Solution: Compute B such that the fourth moments of = BX are decoupled While ensuring that B is Unitary 8 Nov /
42 ICA: Freeing Fourth Moments Create a matrix of fourth moment terms that would be diagonal were the rows of independent and diagonalize it A good candidate Good because it incorporates the energy in all rows of Where d ij = E[ k h k2 h i h j j] i.e. D = E[h T hhh T ] h are the columns of d11 d12 d13.. D d.. 21 d22 d Assuming h is real, else replace transposition with ermition 8 Nov /
43 ICA: The D matrix D d11 d12 d13.. d.. 21 d22 d d ij = E[ k h k2 h i h j ] = 1 h 2 cols( ) m k mk h mi h mj Sum of squares of all components j th component h k 2 j th component h i h j h k2 h i h j Average above term across all columns of 8 Nov /
44 ICA: The D matrix D d11 d12 d13.. d.. 21 d22 d d ij = E[ k h k2 h i h j ] = 1 h 2 cols( ) m k mk h mi h mj If the h i terms were independent For i!= j E k h 2 k h h i j E h E h Eh Eh Eh Eh Eh i j j i k i, k j Centered: E[h j ] = 0 E[ k h k2 h i h j ]=0 for i!= j For i= j k i j E k 2 k hih j E h Eh Eh 0 Thus, if the h i terms were independent, d ij = 0 if i!= j i.e., if h i were independent, D would be a diagonal matrix Let us diagonalize D 8 Nov /18797 h i i k i k 45
45 Diagonalizing D Compose a fourth order matrix from X Recall: X = CM, = BX = BCM B is what we re trying to learn to make independent Compose D = E[x T xxx T ] Diagonalize D via Eigen decmpositin D = UU T B = U T That s it!!!! 8 Nov /
46 B frees the fourth moment D = UU T ; B = U T U is a unitary matrix, i.e. U T U = UU T = I (identity) = BX = U T X h = U T x The fourth moment matrix of is E[h T h h h T ] = E[x T UU T xu T x x T U] = E[x T xu T x x T U] = U T E[x T x xx T ]U = U T D U = U T U U T U = The fourth moment matrix of = U T X is Diagonal!! 8 Nov /
47 Overall Solution = AM = BCM = BX A = BC = U T C 8 Nov /
48 Independent Component Analysis Goal: to derive a matrix A such that the rows of AM are independent Procedure: re 1. Center M 2. Compute the autocorrelation matrix R MM of M 3. Compute whitening matrix C via Eigen decomposition R XX = ESE T, C = S -1/2 E T 4. Compute X= CM 5. Compute the fourth moment matrix D = E[x T xxx T ] 6. Diagonalize D via Eigen decomposition 7. D = UU T 8. Compute A = U T C The fourth moment matrix of =AM is diagonal Note that the autocorrelation matrix of will also be diagonal 8 Nov /
49 ICA by diagonalizing moment matrices The procedure just outlined, while fully functional, has shortcomings Only a subset of fourth order moments are considered There eeae are many yother ways of constructing fourth order ode moment matrices that would ideally be diagonal Diagonalizing the particular fourth order moment matrix we have chosen is not guaranteed to diagonalize every other fourth order moment matrix JADE: (Joint Approximate Diagonalization of Eigenmatrices), J.F. Cardoso Jointly diagonalizes several fourth order moment matrices More effective than the procedure shown, but more computationally expensive 8 Nov /
50 Enforcing Independence Specifically ensure that the components of are independent = AM Contrast function: A non linear function that has a minimum value when the output components are independent Dfi Define and minimize i i a contrast function F(AM) Contrast tfunctions are often only approximations too.. 8 Nov /
51 A note on pre-whitening The mixed signal is usually prewhitened Normalize variance along all directions Eliminate second order dependence X = CM E[x i x j ] = E[x i ]E[x j ] = ij for centered signals Eigen decomposition MM T = ESE T C = S -1/2 E T Can use first K columns of E only if only K independent sources are expected In microphone array setup only K < M sources 8 Nov /
52 The contrast function Contrast function: A non linear function that has a minimum i value when the output tcomponents are independent An explicit contrast function I ( ) ( h ) ( ) With constraint : = BX X is whitened M i i 8 Nov /
53 Linear Functions h = Bx Individual columns of the and X matrices x is mixed signal, B is the unmixing matrix P h ( h) P x ( B 1 h) B 1 ( x) P( x)log P( x) dx ( h) ( x) log B 8 Nov /
54 The contrast function i I( ) ( h ) ( ) i I( ) ( hi ) ( x) log B i Ignoring g (x) ( ) (Const) J ( ) ( h ) log W Minimize the above to obtain B i i 8 Nov /
55 An alternate approach Recall PCA M = W, the columns of W must be statistically independent Leads to: min W M W T WM 2 Error minimization framework to estimate W Can we arrive at an error minimization framework for ICA Define an Error objective that represents independence 8 Nov /
56 An alternate approach Definition of Independence if x and y are independent: d E[f(x)g(y)] = E[f(x)]E[g(y)] Must thold ldfor every f() and g()!! 8 Nov /
57 An alternate approach Define g() = g(bx) (component wise function) g(h 11 ) g(h 12 )... g(h 21 ) g(h 22 ) Define f() ( ) = f(bx) f(h 11 ) f(h 12 )... f(h 21 ) f(h 22 ) Nov /
58 An alternate approach P = g() f() T = g(bx) f(bx) T P = P 11 P P 21 P Must ideally be Error = P-Q 2 F.. Q P ij g( hik ) f ( h jk ) k This is a square matrix g( h ) Q 11 Q Q 12 Q ij ik Q = 22.. k l.... f ( h Q ii g( hik ) f ( hil ) k jl ) i j 8 Nov /
59 An alternate approach Ideal value for Q Q = Q 11 Q Q 21 Q Q ij g( h ik ) k l f ( h Q ii g( hik ) f ( hil ) k jl ) i j If g() and h() are odd symmetric functions j g(h ij ) = 0 for all i Since = j h ij = 0 ( is centered) Q is a Diagonal Matrix!!! 8 Nov /
60 An alternate approach Minimize Error P g(bx)f(bx) Q Diagonal error T 2 P Q F Leads to trivial Widrow opf type iterative rule: T E Diag g(bx)f(bx) B B EB T 8 Nov /
61 Update Rules Multiple solutions under different assumptions for g() and f() = BX B = B + B Jutten erraut : Online update B ij = f(h i )g(h j ); actually assumed a recursive neural network Bell Sejnowski B = ([B T ] -1 g()x T ) 8 Nov /
62 Update Rules Multiple solutions under different assumptions for g() and f() = BX B = B + B Natural gradient f() = identity function B = (I g() T )W Cichoki Unbehaeven B = (I g()f() T )W 8 Nov /
63 What are G() and () Must be odd symmetric functions Multiple functions proposed g( x) x x tanh( x ) tanh( x) x is super Gaussian x is sub Gaussian Audio signals in general B = (I T -Ktanh() T )W Or simply B=(I Ktanh() = T )W 8 Nov /
64 So how does it work? Example with instantaneous mixture of two speakers Natural gradient update Works very well! 8 Nov /
65 Another example! Input Mix Output 8 Nov /
66 Another Example Three instruments.. t 8 Nov /
67 The Notes Three instruments.. 8 Nov /
68 ICA for data exploration The bases in PCA represent the building blocks Ideally notes Very successfully used So can ICA be used to do the same? 8 Nov /
69 ICA vs PCA bases Motivation for using ICA vs PCA PCA will indicate orthogonal directions of maximal variance May not align with the data! Non-Gaussian data PCA ICA ICA finds directions that are independent More likely to align with the data 8 Nov /
70 Finding useful transforms with ICA Audio preprocessing example Take a lot of audio snippets and concatenate t them in a big matrix, do component analysis PCA results in the DCT bases ICA returns time/freq localized sinusoids which is a better way to analyze sounds Ditto for images ICA returns localizes edge filters 8 Nov /
71 Example case: ICA-faces vs. Eigenfaces ICA-faces Eigenfaces 8 Nov /
72 ICA for Signal Enhncement Very commonly used to enhance EEG signals EEG signals are frequently corrupted by heartbeats and biorhythm signals ICA can be used to separate them out 8 Nov /
73 So how does that work? There are 12 notes in the segment, hence we try to estimate 12 notes.. 8 Nov /
74 PCA solution There are 12 notes in the segment, hence we try to estimate 12 notes.. 8 Nov /
75 So how does this work: ICA solution Better.. But not much But the issues here? 8 Nov /
76 ICA Issues No sense of order Unlike PCA Get K independent directions, but does not have a notion of the best direction So the sources can come in any order Permutation invariance Does not have sense of scaling Scaling the signal does not affect independence Outputsare scaled versions of desired signals in permuted order In the best case In worse case, output are not desired signals at all.. 8 Nov /
77 What else went wrong? Assume distribution of signals is symmetric around mean Note energy here Not symmetric negative values never happen Still this didn t affect the three instruments case.. Notes are not independent Only one note plays at a time If one note plays, other notes are not playing 8 Nov /
78 Continue in next class.. NMF Factor analysis.. 8 Nov /
Independent Component Analysis. Slides from Paris Smaragdis UIUC
Independent Component Analysis Slides from Paris Smaragdis UIUC 1 A simple audio problem 2 Formalizing the problem Each mic will receive a mi of both sounds Sound waves superimpose linearly We ll ignore
More informationIndependent Component Analysis
000 001 002 003 004 005 006 007 008 009 010 011 012 013 014 015 016 017 018 019 020 021 022 023 024 025 026 027 028 029 030 031 032 033 034 035 036 037 038 039 040 041 042 043 044 1 Introduction Indepent
More informationLecture 10: Dimension Reduction Techniques
Lecture 10: Dimension Reduction Techniques Radu Balan Department of Mathematics, AMSC, CSCAMM and NWC University of Maryland, College Park, MD April 17, 2018 Input Data It is assumed that there is a set
More informationDimensionality Reduction. CS57300 Data Mining Fall Instructor: Bruno Ribeiro
Dimensionality Reduction CS57300 Data Mining Fall 2016 Instructor: Bruno Ribeiro Goal } Visualize high dimensional data (and understand its Geometry) } Project the data into lower dimensional spaces }
More informationCS281 Section 4: Factor Analysis and PCA
CS81 Section 4: Factor Analysis and PCA Scott Linderman At this point we have seen a variety of machine learning models, with a particular emphasis on models for supervised learning. In particular, we
More informationPCA & ICA. CE-717: Machine Learning Sharif University of Technology Spring Soleymani
PCA & ICA CE-717: Machine Learning Sharif University of Technology Spring 2015 Soleymani Dimensionality Reduction: Feature Selection vs. Feature Extraction Feature selection Select a subset of a given
More informationUnsupervised learning: beyond simple clustering and PCA
Unsupervised learning: beyond simple clustering and PCA Liza Rebrova Self organizing maps (SOM) Goal: approximate data points in R p by a low-dimensional manifold Unlike PCA, the manifold does not have
More information16.584: Random Vectors
1 16.584: Random Vectors Define X : (X 1, X 2,..X n ) T : n-dimensional Random Vector X 1 : X(t 1 ): May correspond to samples/measurements Generalize definition of PDF: F X (x) = P[X 1 x 1, X 2 x 2,...X
More informationAPPENDIX A. Background Mathematics. A.1 Linear Algebra. Vector algebra. Let x denote the n-dimensional column vector with components x 1 x 2.
APPENDIX A Background Mathematics A. Linear Algebra A.. Vector algebra Let x denote the n-dimensional column vector with components 0 x x 2 B C @. A x n Definition 6 (scalar product). The scalar product
More informationMatrix Factorization and Analysis
Chapter 7 Matrix Factorization and Analysis Matrix factorizations are an important part of the practice and analysis of signal processing. They are at the heart of many signal-processing algorithms. Their
More informationRecursive Generalized Eigendecomposition for Independent Component Analysis
Recursive Generalized Eigendecomposition for Independent Component Analysis Umut Ozertem 1, Deniz Erdogmus 1,, ian Lan 1 CSEE Department, OGI, Oregon Health & Science University, Portland, OR, USA. {ozertemu,deniz}@csee.ogi.edu
More informationMassoud BABAIE-ZADEH. Blind Source Separation (BSS) and Independent Componen Analysis (ICA) p.1/39
Blind Source Separation (BSS) and Independent Componen Analysis (ICA) Massoud BABAIE-ZADEH Blind Source Separation (BSS) and Independent Componen Analysis (ICA) p.1/39 Outline Part I Part II Introduction
More informationStatistics for scientists and engineers
Statistics for scientists and engineers February 0, 006 Contents Introduction. Motivation - why study statistics?................................... Examples..................................................3
More informationHST.582J/6.555J/16.456J
Blind Source Separation: PCA & ICA HST.582J/6.555J/16.456J Gari D. Clifford gari [at] mit. edu http://www.mit.edu/~gari G. D. Clifford 2005-2009 What is BSS? Assume an observation (signal) is a linear
More informationGatsby Theoretical Neuroscience Lectures: Non-Gaussian statistics and natural images Parts I-II
Gatsby Theoretical Neuroscience Lectures: Non-Gaussian statistics and natural images Parts I-II Gatsby Unit University College London 27 Feb 2017 Outline Part I: Theory of ICA Definition and difference
More informationThe Hilbert Space of Random Variables
The Hilbert Space of Random Variables Electrical Engineering 126 (UC Berkeley) Spring 2018 1 Outline Fix a probability space and consider the set H := {X : X is a real-valued random variable with E[X 2
More information5.6. PSEUDOINVERSES 101. A H w.
5.6. PSEUDOINVERSES 0 Corollary 5.6.4. If A is a matrix such that A H A is invertible, then the least-squares solution to Av = w is v = A H A ) A H w. The matrix A H A ) A H is the left inverse of A and
More informationFundamentals of Principal Component Analysis (PCA), Independent Component Analysis (ICA), and Independent Vector Analysis (IVA)
Fundamentals of Principal Component Analysis (PCA),, and Independent Vector Analysis (IVA) Dr Mohsen Naqvi Lecturer in Signal and Information Processing, School of Electrical and Electronic Engineering,
More informationTensor Decompositions for Machine Learning. G. Roeder 1. UBC Machine Learning Reading Group, June University of British Columbia
Network Feature s Decompositions for Machine Learning 1 1 Department of Computer Science University of British Columbia UBC Machine Learning Group, June 15 2016 1/30 Contact information Network Feature
More informationCPSC 340: Machine Learning and Data Mining. Sparse Matrix Factorization Fall 2018
CPSC 340: Machine Learning and Data Mining Sparse Matrix Factorization Fall 2018 Last Time: PCA with Orthogonal/Sequential Basis When k = 1, PCA has a scaling problem. When k > 1, have scaling, rotation,
More informationMathematical foundations - linear algebra
Mathematical foundations - linear algebra Andrea Passerini passerini@disi.unitn.it Machine Learning Vector space Definition (over reals) A set X is called a vector space over IR if addition and scalar
More information11-755/ Machine Learning for Signal Processing. Algebra. Class Sep Instructor: Bhiksha Raj
11-755/18-797 Machine Learning for Signal Processing Fundamentals of Linear Algebra Class 2-3. 6 Sep 2011 Instructor: Bhiksha Raj 57 Administrivia TA Times: Anoop Ramakrishna: Thursday 12.30-1.30pm Manuel
More informationNatural Gradient Learning for Over- and Under-Complete Bases in ICA
NOTE Communicated by Jean-François Cardoso Natural Gradient Learning for Over- and Under-Complete Bases in ICA Shun-ichi Amari RIKEN Brain Science Institute, Wako-shi, Hirosawa, Saitama 351-01, Japan Independent
More information14 Singular Value Decomposition
14 Singular Value Decomposition For any high-dimensional data analysis, one s first thought should often be: can I use an SVD? The singular value decomposition is an invaluable analysis tool for dealing
More informationMachine Learning (BSMC-GA 4439) Wenke Liu
Machine Learning (BSMC-GA 4439) Wenke Liu 02-01-2018 Biomedical data are usually high-dimensional Number of samples (n) is relatively small whereas number of features (p) can be large Sometimes p>>n Problems
More informationUnsupervised Machine Learning and Data Mining. DS 5230 / DS Fall Lecture 7. Jan-Willem van de Meent
Unsupervised Machine Learning and Data Mining DS 5230 / DS 4420 - Fall 2018 Lecture 7 Jan-Willem van de Meent DIMENSIONALITY REDUCTION Borrowing from: Percy Liang (Stanford) Dimensionality Reduction Goal:
More informationIndependent Components Analysis
CS229 Lecture notes Andrew Ng Part XII Independent Components Analysis Our next topic is Independent Components Analysis (ICA). Similar to PCA, this will find a new basis in which to represent our data.
More informationArtificial Intelligence Module 2. Feature Selection. Andrea Torsello
Artificial Intelligence Module 2 Feature Selection Andrea Torsello We have seen that high dimensional data is hard to classify (curse of dimensionality) Often however, the data does not fill all the space
More informationReview (probability, linear algebra) CE-717 : Machine Learning Sharif University of Technology
Review (probability, linear algebra) CE-717 : Machine Learning Sharif University of Technology M. Soleymani Fall 2012 Some slides have been adopted from Prof. H.R. Rabiee s and also Prof. R. Gutierrez-Osuna
More informationIndependent Component Analysis
Chapter 5 Independent Component Analysis Part I: Introduction and applications Motivation Skillikorn chapter 7 2 Cocktail party problem Did you see that Have you heard So, yesterday this guy I said, darling
More informationReview (Probability & Linear Algebra)
Review (Probability & Linear Algebra) CE-725 : Statistical Pattern Recognition Sharif University of Technology Spring 2013 M. Soleymani Outline Axioms of probability theory Conditional probability, Joint
More informationAn Introduction to Independent Components Analysis (ICA)
An Introduction to Independent Components Analysis (ICA) Anish R. Shah, CFA Northfield Information Services Anish@northinfo.com Newport Jun 6, 2008 1 Overview of Talk Review principal components Introduce
More informationSingular Value Decomposition
Chapter 6 Singular Value Decomposition In Chapter 5, we derived a number of algorithms for computing the eigenvalues and eigenvectors of matrices A R n n. Having developed this machinery, we complete our
More informationCIFAR Lectures: Non-Gaussian statistics and natural images
CIFAR Lectures: Non-Gaussian statistics and natural images Dept of Computer Science University of Helsinki, Finland Outline Part I: Theory of ICA Definition and difference to PCA Importance of non-gaussianity
More informationIndependent Component Analysis
1 Independent Component Analysis Background paper: http://www-stat.stanford.edu/ hastie/papers/ica.pdf 2 ICA Problem X = AS where X is a random p-vector representing multivariate input measurements. S
More information1. Background: The SVD and the best basis (questions selected from Ch. 6- Can you fill in the exercises?)
Math 35 Exam Review SOLUTIONS Overview In this third of the course we focused on linear learning algorithms to model data. summarize: To. Background: The SVD and the best basis (questions selected from
More informationVectors To begin, let us describe an element of the state space as a point with numerical coordinates, that is x 1. x 2. x =
Linear Algebra Review Vectors To begin, let us describe an element of the state space as a point with numerical coordinates, that is x 1 x x = 2. x n Vectors of up to three dimensions are easy to diagram.
More informationIndependent Component Analysis
Independent Component Analysis James V. Stone November 4, 24 Sheffield University, Sheffield, UK Keywords: independent component analysis, independence, blind source separation, projection pursuit, complexity
More informationDot Products, Transposes, and Orthogonal Projections
Dot Products, Transposes, and Orthogonal Projections David Jekel November 13, 2015 Properties of Dot Products Recall that the dot product or standard inner product on R n is given by x y = x 1 y 1 + +
More information7 Principal Component Analysis
7 Principal Component Analysis This topic will build a series of techniques to deal with high-dimensional data. Unlike regression problems, our goal is not to predict a value (the y-coordinate), it is
More informationHST.582J / 6.555J / J Biomedical Signal and Image Processing Spring 2007
MIT OpenCourseWare http://ocw.mit.edu HST.582J / 6.555J / 16.456J Biomedical Signal and Image Processing Spring 2007 For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.
More informationIndependent Component Analysis. Contents
Contents Preface xvii 1 Introduction 1 1.1 Linear representation of multivariate data 1 1.1.1 The general statistical setting 1 1.1.2 Dimension reduction methods 2 1.1.3 Independence as a guiding principle
More informationToday. Probability and Statistics. Linear Algebra. Calculus. Naïve Bayes Classification. Matrix Multiplication Matrix Inversion
Today Probability and Statistics Naïve Bayes Classification Linear Algebra Matrix Multiplication Matrix Inversion Calculus Vector Calculus Optimization Lagrange Multipliers 1 Classical Artificial Intelligence
More informationIntroduction to Independent Component Analysis. Jingmei Lu and Xixi Lu. Abstract
Final Project 2//25 Introduction to Independent Component Analysis Abstract Independent Component Analysis (ICA) can be used to solve blind signal separation problem. In this article, we introduce definition
More informationIndependent Component Analysis (ICA) Bhaskar D Rao University of California, San Diego
Independent Component Analysis (ICA) Bhaskar D Rao University of California, San Diego Email: brao@ucsdedu References 1 Hyvarinen, A, Karhunen, J, & Oja, E (2004) Independent component analysis (Vol 46)
More informationLatent Variable Models and EM algorithm
Latent Variable Models and EM algorithm SC4/SM4 Data Mining and Machine Learning, Hilary Term 2017 Dino Sejdinovic 3.1 Clustering and Mixture Modelling K-means and hierarchical clustering are non-probabilistic
More informationLinear Algebra, Summer 2011, pt. 3
Linear Algebra, Summer 011, pt. 3 September 0, 011 Contents 1 Orthogonality. 1 1.1 The length of a vector....................... 1. Orthogonal vectors......................... 3 1.3 Orthogonal Subspaces.......................
More informationCPSC 340: Machine Learning and Data Mining. More PCA Fall 2017
CPSC 340: Machine Learning and Data Mining More PCA Fall 2017 Admin Assignment 4: Due Friday of next week. No class Monday due to holiday. There will be tutorials next week on MAP/PCA (except Monday).
More informationVector, Matrix, and Tensor Derivatives
Vector, Matrix, and Tensor Derivatives Erik Learned-Miller The purpose of this document is to help you learn to take derivatives of vectors, matrices, and higher order tensors (arrays with three dimensions
More informationLecture'12:' SSMs;'Independent'Component'Analysis;' Canonical'Correla;on'Analysis'
Lecture'12:' SSMs;'Independent'Component'Analysis;' Canonical'Correla;on'Analysis' Lester'Mackey' May'7,'2014' ' Stats'306B:'Unsupervised'Learning' Beyond'linearity'in'state'space'modeling' Credit:'Alex'Simma'
More informationMatrices and systems of linear equations
Matrices and systems of linear equations Samy Tindel Purdue University Differential equations and linear algebra - MA 262 Taken from Differential equations and linear algebra by Goode and Annin Samy T.
More informationDifferent Estimation Methods for the Basic Independent Component Analysis Model
Washington University in St. Louis Washington University Open Scholarship Arts & Sciences Electronic Theses and Dissertations Arts & Sciences Winter 12-2018 Different Estimation Methods for the Basic Independent
More informationPerhaps the simplest way of modeling two (discrete) random variables is by means of a joint PMF, defined as follows.
Chapter 5 Two Random Variables In a practical engineering problem, there is almost always causal relationship between different events. Some relationships are determined by physical laws, e.g., voltage
More informationPrincipal Component Analysis
Principal Component Analysis Laurenz Wiskott Institute for Theoretical Biology Humboldt-University Berlin Invalidenstraße 43 D-10115 Berlin, Germany 11 March 2004 1 Intuition Problem Statement Experimental
More informationMachine Learning for Large-Scale Data Analysis and Decision Making A. Week #1
Machine Learning for Large-Scale Data Analysis and Decision Making 80-629-17A Week #1 Today Introduction to machine learning The course (syllabus) Math review (probability + linear algebra) The future
More information5. Random Vectors. probabilities. characteristic function. cross correlation, cross covariance. Gaussian random vectors. functions of random vectors
EE401 (Semester 1) 5. Random Vectors Jitkomut Songsiri probabilities characteristic function cross correlation, cross covariance Gaussian random vectors functions of random vectors 5-1 Random vectors we
More information1 Introduction Independent component analysis (ICA) [10] is a statistical technique whose main applications are blind source separation, blind deconvo
The Fixed-Point Algorithm and Maximum Likelihood Estimation for Independent Component Analysis Aapo Hyvarinen Helsinki University of Technology Laboratory of Computer and Information Science P.O.Box 5400,
More informationMA 575 Linear Models: Cedric E. Ginestet, Boston University Revision: Probability and Linear Algebra Week 1, Lecture 2
MA 575 Linear Models: Cedric E Ginestet, Boston University Revision: Probability and Linear Algebra Week 1, Lecture 2 1 Revision: Probability Theory 11 Random Variables A real-valued random variable is
More informationBasic Principles of Video Coding
Basic Principles of Video Coding Introduction Categories of Video Coding Schemes Information Theory Overview of Video Coding Techniques Predictive coding Transform coding Quantization Entropy coding Motion
More information4 Derivations of the Discrete-Time Kalman Filter
Technion Israel Institute of Technology, Department of Electrical Engineering Estimation and Identification in Dynamical Systems (048825) Lecture Notes, Fall 2009, Prof N Shimkin 4 Derivations of the Discrete-Time
More informationLinear Algebra Section 2.6 : LU Decomposition Section 2.7 : Permutations and transposes Wednesday, February 13th Math 301 Week #4
Linear Algebra Section. : LU Decomposition Section. : Permutations and transposes Wednesday, February 1th Math 01 Week # 1 The LU Decomposition We learned last time that we can factor a invertible matrix
More informationIndependent Component Analysis of Incomplete Data
Independent Component Analysis of Incomplete Data Max Welling Markus Weber California Institute of Technology 136-93 Pasadena, CA 91125 fwelling,rmwg@vision.caltech.edu Keywords: EM, Missing Data, ICA
More informationOn Information Maximization and Blind Signal Deconvolution
On Information Maximization and Blind Signal Deconvolution A Röbel Technical University of Berlin, Institute of Communication Sciences email: roebel@kgwtu-berlinde Abstract: In the following paper we investigate
More information[Disclaimer: This is not a complete list of everything you need to know, just some of the topics that gave people difficulty.]
Math 43 Review Notes [Disclaimer: This is not a complete list of everything you need to know, just some of the topics that gave people difficulty Dot Product If v (v, v, v 3 and w (w, w, w 3, then the
More informationGetting Started with Communications Engineering. Rows first, columns second. Remember that. R then C. 1
1 Rows first, columns second. Remember that. R then C. 1 A matrix is a set of real or complex numbers arranged in a rectangular array. They can be any size and shape (provided they are rectangular). A
More informationPrincipal Component Analysis CS498
Principal Component Analysis CS498 Today s lecture Adaptive Feature Extraction Principal Component Analysis How, why, when, which A dual goal Find a good representation The features part Reduce redundancy
More information1 Singular Value Decomposition and Principal Component
Singular Value Decomposition and Principal Component Analysis In these lectures we discuss the SVD and the PCA, two of the most widely used tools in machine learning. Principal Component Analysis (PCA)
More informationPCA, Kernel PCA, ICA
PCA, Kernel PCA, ICA Learning Representations. Dimensionality Reduction. Maria-Florina Balcan 04/08/2015 Big & High-Dimensional Data High-Dimensions = Lot of Features Document classification Features per
More informationLeast square joint diagonalization of matrices under an intrinsic scale constraint
Least square joint diagonalization of matrices under an intrinsic scale constraint Dinh-Tuan Pham 1 and Marco Congedo 2 1 Laboratory Jean Kuntzmann, cnrs - Grenoble INP - UJF (Grenoble, France) Dinh-Tuan.Pham@imag.fr
More informationCS 246 Review of Linear Algebra 01/17/19
1 Linear algebra In this section we will discuss vectors and matrices. We denote the (i, j)th entry of a matrix A as A ij, and the ith entry of a vector as v i. 1.1 Vectors and vector operations A vector
More informationSemi-Blind approaches to source separation: introduction to the special session
Semi-Blind approaches to source separation: introduction to the special session Massoud BABAIE-ZADEH 1 Christian JUTTEN 2 1- Sharif University of Technology, Tehran, IRAN 2- Laboratory of Images and Signals
More informationSingle Channel Signal Separation Using MAP-based Subspace Decomposition
Single Channel Signal Separation Using MAP-based Subspace Decomposition Gil-Jin Jang, Te-Won Lee, and Yung-Hwan Oh 1 Spoken Language Laboratory, Department of Computer Science, KAIST 373-1 Gusong-dong,
More informationLINEAR SYSTEMS (11) Intensive Computation
LINEAR SYSTEMS () Intensive Computation 27-8 prof. Annalisa Massini Viviana Arrigoni EXACT METHODS:. GAUSSIAN ELIMINATION. 2. CHOLESKY DECOMPOSITION. ITERATIVE METHODS:. JACOBI. 2. GAUSS-SEIDEL 2 CHOLESKY
More informationBlind Signal Separation: Statistical Principles
Blind Signal Separation: Statistical Principles JEAN-FRANÇOIS CARDOSO, MEMBER, IEEE Invited Paper Blind signal separation (BSS) and independent component analysis (ICA) are emerging techniques of array
More informationFile: ica tutorial2.tex. James V Stone and John Porrill, Psychology Department, Sheeld University, Tel: Fax:
File: ica tutorial2.tex Independent Component Analysis and Projection Pursuit: A Tutorial Introduction James V Stone and John Porrill, Psychology Department, Sheeld University, Sheeld, S 2UR, England.
More informationFinal Review Sheet. B = (1, 1 + 3x, 1 + x 2 ) then 2 + 3x + 6x 2
Final Review Sheet The final will cover Sections Chapters 1,2,3 and 4, as well as sections 5.1-5.4, 6.1-6.2 and 7.1-7.3 from chapters 5,6 and 7. This is essentially all material covered this term. Watch
More informationData Mining Techniques
Data Mining Techniques CS 6220 - Section 3 - Fall 2016 Lecture 12 Jan-Willem van de Meent (credit: Yijun Zhao, Percy Liang) DIMENSIONALITY REDUCTION Borrowing from: Percy Liang (Stanford) Linear Dimensionality
More informationCS168: The Modern Algorithmic Toolbox Lecture #8: PCA and the Power Iteration Method
CS168: The Modern Algorithmic Toolbox Lecture #8: PCA and the Power Iteration Method Tim Roughgarden & Gregory Valiant April 15, 015 This lecture began with an extended recap of Lecture 7. Recall that
More informationFundamentals of Linear Algebra
11-755/18-797 achine Learning for Signal Processing Fundamentals of Linear Algebra Class 2-3. 6 Sep 211 Administrivia A imes: Anoop Ramakrishna: hursday 12.3-1.3pm anuel ragut: Friday 11am 12pm. HW1: On
More informationFoundations of Matrix Analysis
1 Foundations of Matrix Analysis In this chapter we recall the basic elements of linear algebra which will be employed in the remainder of the text For most of the proofs as well as for the details, the
More informationHOMEWORK PROBLEMS FROM STRANG S LINEAR ALGEBRA AND ITS APPLICATIONS (4TH EDITION)
HOMEWORK PROBLEMS FROM STRANG S LINEAR ALGEBRA AND ITS APPLICATIONS (4TH EDITION) PROFESSOR STEVEN MILLER: BROWN UNIVERSITY: SPRING 2007 1. CHAPTER 1: MATRICES AND GAUSSIAN ELIMINATION Page 9, # 3: Describe
More informationStatistical Pattern Recognition
Statistical Pattern Recognition Feature Extraction Hamid R. Rabiee Jafar Muhammadi, Alireza Ghasemi, Payam Siyari Spring 2014 http://ce.sharif.edu/courses/92-93/2/ce725-2/ Agenda Dimensionality Reduction
More informationReview of Probability Theory
Review of Probability Theory Arian Maleki and Tom Do Stanford University Probability theory is the study of uncertainty Through this class, we will be relying on concepts from probability theory for deriving
More informationLecture Notes 1: Vector spaces
Optimization-based data analysis Fall 2017 Lecture Notes 1: Vector spaces In this chapter we review certain basic concepts of linear algebra, highlighting their application to signal processing. 1 Vector
More informationFantope Regularization in Metric Learning
Fantope Regularization in Metric Learning CVPR 2014 Marc T. Law (LIP6, UPMC), Nicolas Thome (LIP6 - UPMC Sorbonne Universités), Matthieu Cord (LIP6 - UPMC Sorbonne Universités), Paris, France Introduction
More informationMathematical Optimisation, Chpt 2: Linear Equations and inequalities
Mathematical Optimisation, Chpt 2: Linear Equations and inequalities Peter J.C. Dickinson p.j.c.dickinson@utwente.nl http://dickinson.website version: 12/02/18 Monday 5th February 2018 Peter J.C. Dickinson
More informationSTA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 3 Linear
More informationStatistical Pattern Recognition
Statistical Pattern Recognition A Brief Mathematical Review Hamid R. Rabiee Jafar Muhammadi, Ali Jalali, Alireza Ghasemi Spring 2012 http://ce.sharif.edu/courses/90-91/2/ce725-1/ Agenda Probability theory
More informationA Tutorial on Data Reduction. Principal Component Analysis Theoretical Discussion. By Shireen Elhabian and Aly Farag
A Tutorial on Data Reduction Principal Component Analysis Theoretical Discussion By Shireen Elhabian and Aly Farag University of Louisville, CVIP Lab November 2008 PCA PCA is A backbone of modern data
More informationB553 Lecture 5: Matrix Algebra Review
B553 Lecture 5: Matrix Algebra Review Kris Hauser January 19, 2012 We have seen in prior lectures how vectors represent points in R n and gradients of functions. Matrices represent linear transformations
More informationQueens College, CUNY, Department of Computer Science Numerical Methods CSCI 361 / 761 Spring 2018 Instructor: Dr. Sateesh Mane.
Queens College, CUNY, Department of Computer Science Numerical Methods CSCI 361 / 761 Spring 2018 Instructor: Dr. Sateesh Mane c Sateesh R. Mane 2018 8 Lecture 8 8.1 Matrices July 22, 2018 We shall study
More informationOR MSc Maths Revision Course
OR MSc Maths Revision Course Tom Byrne School of Mathematics University of Edinburgh t.m.byrne@sms.ed.ac.uk 15 September 2017 General Information Today JCMB Lecture Theatre A, 09:30-12:30 Mathematics revision
More informationA matrix over a field F is a rectangular array of elements from F. The symbol
Chapter MATRICES Matrix arithmetic A matrix over a field F is a rectangular array of elements from F The symbol M m n (F ) denotes the collection of all m n matrices over F Matrices will usually be denoted
More informationLecture 5 : Projections
Lecture 5 : Projections EE227C. Lecturer: Professor Martin Wainwright. Scribe: Alvin Wan Up until now, we have seen convergence rates of unconstrained gradient descent. Now, we consider a constrained minimization
More informationGaussian and Linear Discriminant Analysis; Multiclass Classification
Gaussian and Linear Discriminant Analysis; Multiclass Classification Professor Ameet Talwalkar Slide Credit: Professor Fei Sha Professor Ameet Talwalkar CS260 Machine Learning Algorithms October 13, 2015
More informationIntroduction to Machine Learning
10-701 Introduction to Machine Learning PCA Slides based on 18-661 Fall 2018 PCA Raw data can be Complex, High-dimensional To understand a phenomenon we measure various related quantities If we knew what
More informationa b = a T b = a i b i (1) i=1 (Geometric definition) The dot product of two Euclidean vectors a and b is defined by a b = a b cos(θ a,b ) (2)
This is my preperation notes for teaching in sections during the winter 2018 quarter for course CSE 446. Useful for myself to review the concepts as well. More Linear Algebra Definition 1.1 (Dot Product).
More informationLecture: Adaptive Filtering
ECE 830 Spring 2013 Statistical Signal Processing instructors: K. Jamieson and R. Nowak Lecture: Adaptive Filtering Adaptive filters are commonly used for online filtering of signals. The goal is to estimate
More informationSingular Value Decomposition
Singular Value Decomposition (Com S 477/577 Notes Yan-Bin Jia Sep, 7 Introduction Now comes a highlight of linear algebra. Any real m n matrix can be factored as A = UΣV T where U is an m m orthogonal
More informationcomponent risk analysis
273: Urban Systems Modeling Lec. 3 component risk analysis instructor: Matteo Pozzi 273: Urban Systems Modeling Lec. 3 component reliability outline risk analysis for components uncertain demand and uncertain
More information