AG DANK/BCS Meeting 2013 in London University College London, 8/9 November 2013

Size: px

Start display at page:

Download "AG DANK/BCS Meeting 2013 in London University College London, 8/9 November 2013"

Arline Hodges
5 years ago
Views:

1 AG DANK/S Meetin 3 in London University ollee London 8/9 November 3 MDELS FR SIMULTANEUS LASSIFIATIN AND REDUTIN F THREE-WAY DATA Roberto Rocci University Tor erata Rome

2 A eneral classification model: Gaussian Mixtures Let x=[x x x J ] be a random vector of J variables. We assume G p f x x G q p p mixture model where each component represents an underlyin roup in our case exp J μ x μ x x Gaussian and each observation is assined to a roup by computin h h h p p p x x x posterior probabilities Given a sample of N i.i.d. observations the parameters are estimated by maximizin n n p L lo x lo-likelihood

3 Problems - a very lare number of parameters; - difficult to understand which are the discriminant variables i.e. the variables that describe the clusterin structure. Idea The mixture model induces the followin covariance structure G W G ar x p μ μ μ μ p variance decomposition etween Within Model the etween covariance matrix to: - reduce the number of parameters; - find the components linear combinations of the variables explainin the larest information about the classification. 3

4 Reduction Model The model is a component analysis of the centroid matrix. Scalar j j Q q b jq q G p q where: j is the mean of variable j in component ; q is the mean of prototype variable q in component ; b is the loadin of variable j on prototype variable q. jq ector μ μ η η G p 4

5 Matrix M N N where: - M [ μ μ... μ ] μ G centred centroid matrix; - N η η... η ] centroid matrix on the reduced space. [ G The component model is not identified. In fact ~ η F Fη ~ η. We exploit such rotational freedom by requirin that I Q. 5

6 ML Estimation homoscedastic case: EM alorithm Maximization of the lolikelihood N G L lo p x n objective n is equivalent to the maximization of the fuzzy function Hathaway 986 p u lou un lo n l x fuzzy objective n n n n where u n and u n =. This is so because l reaches a maximum respect to U=[u n ] when u n p xn phh xn h posterior probabilities Substitutin the previous in l we obtain L. 6

7 The alorithm is based on the conditional maximization of l with respect to a subset of parameters iven the others. The fundamental steps are the followin. a Update U=[u n ]: b Update p=[p ]: c Update : u n p xn n= N; = G phh xn h p u n N n = G. u n x n μ xn μ. N n They are simply the steps of a ordinary EM alorithm. 7

8 d Update : We consider centered data μ. e Update N and : It can be shown that the objective function can be written as l tr D X N X N c where c is a constant term independent of N and D = diau + u +... u +G and X [ x x... x ] G is the matrix of centroids x u n nxn computed on the u centred variables. n n This alorithm can be also seen as an EM Men & Rubin

9 Use and interpretation of components Step M of the EM alorithm shows that: the within-standardized component loadins matrix of the matrix of within-standardized centroids derives from a PA tr D X N X N D X D N D Z D N min N the component scores Y Z X X X maximize the between variance subject to the constraint of unit within variance i.e. max tr XDX subject to. I Q Fisher linear discriminant analysis LDA 9

10 Three-way Extension Two-way sample Three-way sample J variables J variables K conditions N observations x ij N observations x ijk

11 Let x = [x x x J x K x K x JK ] be a random vector of J variables observed under K different conditions. General classification model f G x x mixture model p where JK x exp x μ x μ Gaussian components Problems - a very lare number of parameters; - difficult to understand which are the discriminant variables and/or occasions; - difficult to distinuish the role of variables from that of occasions.

12 Within ovariance Structure Model Direct Product rowne 984 = K K KK in scalar notation jklm jl km asford & MacLachlan 985 proposed I K ;

13 Reduction Model The model is a Tucker component analysis of the centroid matrix. Scalar jk jk Q R q r b jq c kr qr G p qr where: - jk is the mean of variable j under condition k in component ; - qr is the mean of prototype variable q under prototype condition r in component ; - b is the mean of variable j under prototype condition r in component ; q jq qr - b jq is the loadin of variable j on prototype variable q; - c r kr qr is the mean of prototype variable q under condition k in component ; c is the loadin of occasion k on prototype occasion r. - kr ften used in hemistry and Psycholoy see 3

14 ector μ μ η η G Matrix M N N p where: - M [ μ μ... μ ] μ G centred centroid matrix; - N η η... η ] centroid matrix on the reduced space. [ G The component model is not identified. In fact ~ ~ η D F D F η D F ~ η ~ η. We exploit such rotational freedom by requirin that I Q I R. 4

15 ML Estimation homoscedastic case: EM alorithm An EM alorithm can be prorammed followin the analoous alorithm already seen for the two-way case. About the update of N and it is interestin to note that the complete lolikelihood can be written as l tr D [ X N ] [ X N ] c where c is a constant term and X is the matrix of centroids computed on the centred variables. It follows that the parameters can be updated by computin a weihted least squares approximation of the centroid matrix. 5

16 6 Use and interpretation of components the within-standardized component loadins matrices and derive from a Tucker analysis of the matrix of within-standardized centroids N D Z D D N X D N min the component scores X X Z Y maximize the between variance subject to the constraint of unit within variance i.e. max tr X DX subject to Q R Q R I I I I ilinear discriminant analysis LDA

17 LDA: interpretation onstrained LDA y qr where w jkqr J K j k b jq c kr x jk w jkqr Hierarchical LDA y c b x Dimensionality reduction of the variables Dimensionality reduction of the occasions y y qr qr J j K k b c jq kr f h jr qk f h jr qk K k J j c b kr jq x x jk jk b c jp kq variable component weihts occasion component weihts 7

18 Application Data 58 units: soybeans; 8 conditions: 4 environments Lawes rookstead Nambour Redland ay years 97 97; variables: yield K/Ha protein. Model selection Model considered: G = :7 Q = : R = :8 same locations. diaonal or with non null covariances only between the est model selected by I: G = 7 Q = R = and diaonal. 8

19 Percentae of variation accounted for by the components on the within-standardized data ccasions ariables Tot Tot

20 asford & McLachlan &M and our R classification R &M

21 iplot on the first latent variable at the two latent occasions 8 N 6 N 4 L L 3-7 R R

22 Heteroscedastic case Reduction model Scalar jk ector jk J K q r b η jq c kr qr G p qr if q > Q and/or r > R μ μ η G Matrix M N N where - [ R K R ] square - ] square. [ Q J Q p qr

23 3 Within-covariance model Ω where - Ψ Ω Ω Ω - Ψ diaonal.

24 4 If K =3 and R = we have Ψ Ω Ω Ω Ω Ω

Introduction to Signal Detection and Classification. Phani Chavali

Introduction to Signal Detection and Classification Phani Chavali Outline Detection Problem Performance Measures Receiver Operating Characteristics (ROC) F-Test - Test Linear Discriminant Analysis (LDA)