Statistical Analysis of. Compositional Data

Size: px
Start display at page:

Download "Statistical Analysis of. Compositional Data"

Transcription

1 Statistical Analysis of Compositional Data Statistical Analysis of Compositional Data Carles Barceló Vidal J Antoni Martín Fernández Santiago Thió Fdez-Henestrosa Dept d Informàtica i Matemàtica Aplicada Universitat de Girona Campus de Montilivi E Girona Catalunya Spain

2 Statistical Analysis of Compositional Data 2 What is compositional data? Traditionally, Composition positive vector x = (x 1,, x D ) whose components are subject to a constant sum restriction: x x D = constant Compositional data Closed data

3 Statistical Analysis of Compositional Data 3 What is compositional data? A positive vector w = (w 1,, w D ) is compositional when our interest lies on the relative magnitudes w j /w k of its parts and not on the absolute values Scale-invariance property If a positive vector w = (w 1,, w D ) is compositional, the vectors w and kw, with k > 0, give us the same information

4 Statistical Analysis of Compositional Data 4 USA - President election States Bush Gore Others Total Alabama Alaska Wisconsin Wyoming Alabama 56,6% 41,8% 1,6% 100% Alaska 59,0% 27,9% 13,1% 100% 100% Wisconsin 47,7% 47,9% 4,4% 100% Wyoming 69,2% 28,3% 2,5% 100%

5 Statistical Analysis of Compositional Data 5 Activity patterns of a statistician Daily time (hours) devoted by an academic statistician to different activities: te = teaching; co = consultation; ad = administration; re = research; ot = other wakeful activities; sl = sleep D te co ad re ot sl Total 1 3,5 2,0 4,5 2,5 6,5 5, ,0 2,0 2,5 3,0 6,5 6, ,5 2,5 3,0 2,0 5,0 8, ,5 2,0 3,0 3,0 4,0 9, ,4% 8,3% 18,8% 10,4% 27,1% 20,8% 100% 2 16,7% 8,3% 10,4% 2,5% 27,1% 25,0% 100% 100% 19 10,4% 10,4% 12,5% 8,3% 20,8% 35,4% 100% 20 10,5% 8,3% 12,5% 12,5% 16,7% 37,5% 100%

6 Statistical Analysis of Compositional Data 6 Artic lake Sand, silt, clay composition (% by weight) of 39 sediment samples from an Artic lake Sample Sand Silt Clay Total S % S % 100% S %

7 Statistical Analysis of Compositional Data 7 Volcano H Percentage of Cl, K 2 O, P 2 O 5, TiO 2 and SiO 2 in 46 samples of volcanic rocks from a volcano H Sample Cl K 2 O P 2 O 5 TiO 2 SiO 2 Total % % % % % %

8 Statistical Analysis of Compositional Data 8 Halimba boreholes Percentages of Al 2 O 3, SiO 2, Fe 2 O 3, TiO 2, H 2 O CaO and MgO in some samples from different drills in Halimba region (Hungary) Al 2 O 3 SiO 2 Fe 2 O 3 TiO 2 H 2 O CaO MgO Total 52,5 6,7 23,6 2,6 12,0 0,2 0,1 97,7% 47,7 4,6 32,1 2,3 12,0 2,0 0,0 100,7% 50,6 8,9 25,4 2,5 11,9 1,1 0,0 100,4%

9 Statistical Analysis of Compositional Data 9 The space of compositions Any D 1 real vector w = (w 1,, w D ) with positive components w 1,, w D will be called a D-observational vector Therefore, the set of these vectors will be IR D +, the positive orthant of IR D Definition Two D-observational vectors w and w are compositionally equivalent, w w, when there exists a positive proportionality constant k such that w = kw This relation classifies the vectors of IR D + in classes of equivalence, called D-compositions The composition generated by an observational vector w will be symbolized by w, ie, w = {kw : k IR + }

10 Statistical Analysis of Compositional Data 10 Scale invariance Definition A function f defined on IR D + is said to be scale invariant if f(kw) = f(w) for every w IR D + and k IR + or equivalently f(w) = f(w ) when w w Property Any scale invariant function f(w) defined on IR D + can be expressed in terms of ratios of the components w 1,, w D of w, such as w 1 /w D, w D 1 /w D or w 1 /g(w),, w D /g(w), where g(w) = (w 1 w 2 w D ) 1/D is the geometric mean of the components of w Property Any function defined on the compositional space C D 1 arises from a scale invariant function defined on the positive real space IR D +

11 Statistical Analysis of Compositional Data 11 The space of compositions A D-part composition can be geometrically interpreted as a ray from the origin in the positive orthant of IR D : W3 1 W ccl LW W 1 W2 W1 1 The set C D 1 of all D-compositions will be called the (D 1)-dimensional compositional space The compositional closure mapping from IR D + to C D 1 denoted by ccl is defined by ccl w = w (w IR D +)

12 Statistical Analysis of Compositional Data 12 Representation of a composition Linear criterion Definition The linear criterion selects from each D-composition w the D-observational vector w with components w1,, wd whose sum is equal to 1 If this vector is symbolized by ccl L w or by C w then ccl L w = C w = w/ D w j (w IR D +) j=1 The set of all the vectors x = ccl L w (w IR D +) is the well-known (D 1)-dimensional simplex S D

13 Statistical Analysis of Compositional Data 13 Linear criterion W3 1 W ccl LW W 1 W2 W1 1 1 x 2 x 3 x 1 2 3

14 Statistical Analysis of Compositional Data 14 Representation of a composition Other criteria Spherical criterion W3 1 W ccl W E 1 W2 W 1 W1 Hyperbolic criterion W2 W 1 W ccl W H 1 W1

15 Statistical Analysis of Compositional Data 15 Subcompositions Sometimes, given a composition w in C D 1, we may wish to focus attention on the relative magnitudes of a subset of components Definition If S is any subset of the indices 1,, D of a given a D-composition w C D 1, and w S is the subvector formed from the corresponding components of w, then w S = ccl w S is termed a subcomposition If the subset S is formed by C indices, with 2 C < D, the subcomposition w S belongs to the compositional space C C 1 Definition The formation of a C-subcomposition w S from a D-composition w may be considered as the mapping sub S from C D 1 to C C 1 : sub S w = w S (w C D 1 )

16 Statistical Analysis of Compositional Data 16 Subcompositions W3 1 W W2 W ccl LW 1 W 12 W 12 W1 1 1 ccl W L 12 ccl LW 2 3

17 Statistical Analysis of Compositional Data 17 Compositional Problems 1 Percentage of Cl, K 2 O, P 2 O 5, TiO 2 and SiO 2 in 46 samples of volcanic rocks from a volcano H: Num Cl K 2 O P 2 O 5 TiO 2 SiO

18 Statistical Analysis of Compositional Data 18 Compositional Problems 1a It is possible to describe the pattern of variability of these volcanic rocks and to define a covariance or correlation structure? 1b Is it possible to define a measure of total variability of this set of volcanic rocks? 1c For a new volcanic rock specimen with known composition (Cl,K 2 O,P 2 O 5,TiO 2,SiO 2 ) and claimed to be from the same volcano, can we say whether it is fairly typical from this volcano? If not, can we place some measure on its atypicality? 1d To what extent, if any, do the subcomposition (Cl,K 2 O,P 2 O 5 ) explain the pattern of variability of the full composition?

19 Statistical Analysis of Compositional Data 19 Compositional Problems 1e From this ternary diagram it seems that the pattern of (K 2 O,P 2 O 5,TiO 2 ) can be well adjusted by a curve How can we confirm this?

20 Statistical Analysis of Compositional Data 20 Compositional Problems 2 Percentage of Cl, K 2 O, P 2 O 5, TiO 2 and SiO 2 : 65 samples of volcanic rocks from a volcano A, and 19 samples from another volcano D Num Cl K 2 O P 2 O 5 TiO 2 SiO 2 1A A A D D D

21 Statistical Analysis of Compositional Data 21 Compositional Problems 2a Can we detect any differences between the compositional pattern of volcano A and volcano D? If so, how can we choose a 3-part subcomposition which somehow captures the essence of the two patterns individually and yet emphasizes the differences between the patterns? 2b Is it possible to establish a classification rule for discriminating between volcanos A and D?

22 Statistical Analysis of Compositional Data 22 Compositional Problems 3 Sand, silt,clay composition (% by weight) of 39 sediment samples at different water depths in an Artic lake: Num Sand Silt Clay Depth (m) S S S a Is sediment composition dependent on water depth? 3b If so, how can we quantify the extent of the dependence?

23 Statistical Analysis of Compositional Data 23 How to analyze closed raw data? Spurious correlations Pearson (1897) If u = f(x, y) and v = g(z, y) be two functions of three variables x, y, z, and these variables be selected at random so that there exists no correlation between x and y, y and z, or z and x, there will still be found to exist correlation between u and v That is likely to occur when u and v are indices with the same denominator Consequence The standard covariance matrix [s ij ] of a closed data set from S D is always singular because D s ij = 0, for i = 1,, D j=1

24 Statistical Analysis of Compositional Data 24 How to analyze closed raw data? Subcompositional incoherence Example Scientist A Scientist B Full compositions from S 4 Subcompositions from S 3 (x 1, x 2, x 3, x 4 ) (s 1, s 2, s 3 ) (01, 02, 01, 06) (0250, 0500, 0250) (02, 01, 01, 06) (0500, 0250, 0250) (03, 03, 02, 02) (0375, 0375, 0250) corr{x (1), x (2) } = 05 corr{s (1), s (2) } = 1 Any statement that scientists A and B make about the common parts 1,2 and 3 must agree

25 Statistical Analysis of Compositional Data 25 Statistics in IR D Translation In IR D the inner operation is translation If t IR D, the translation t moves the random vector X in IR D to a random vector X + t in such a way that E{X + t} = E{X} + t and Σ{X + t} = Σ{X} Scalar product For any random vector X on IR D and for any λ R, E{λX} = λe{x} and Σ{λX} = λ 2 Σ{X}

26 Statistical Analysis of Compositional Data 26 Perturbations on C D 1 Scale invariance is the property which characterizes compositional data Therefore, any operation involving compositions must be compatible with this property Definition C D 1 as We define an inner operation in w w = ccl (w 1 w 1,, w D w D) (C D 1, ) is a commutative group: Composition 1 D = ccl (1,, 1) is the neutral element The inverse composition w 1 of w = ccl (w 1,, w D ) is the composition w 1 = ccl (1/w 1,, 1/w D )

27 Statistical Analysis of Compositional Data 27 The group of perturbations in C D 1 Definition Given a composition p C D 1, the perturbation associated to p is the transformation from C D 1 to C D 1 defined by c p c (c C D 1 ) Then, we say that p c is the composition which results when the perturbation p is applied to the composition c Moreover, given two compositions of C D 1 w = ccl (w 1,, w D ), w = ccl (w1,, wd), there exists a unique perturbation p which transforms w on w : p = w w 1 = ccl ( ) w 1,, w D w 1 w D

28 Statistical Analysis of Compositional Data 28 The group of perturbations in C D 1 X1 pox p x e X2 X3 X1 x xox* 1 x* X2 X3

29 Statistical Analysis of Compositional Data 29 The group of perturbations in C D 1 Perturbation in compositional space plays the same role as translation plays in real space The set of all perturbations in C D 1 is a commutative group isomorphic to (C D 1, ) For this reason, we will also call perturbation the inner operation defined on C D 1 The assumption that the group of perturbations is the operating group on the compositional space is the keystone of the methodology introduced by Aitchison (1986) In fact, it implies to accept that the difference between two compositions w = ccl (w 1,, w D ) and w = ccl (w 1,, w D ) will be based on the ratios w j /w j between parts instead of on the arithmetic differences w j w j

30 Statistical Analysis of Compositional Data 30 Perturbations on C D 1 Interpretation Some natural processes in nature can be interpreted as a succession of changes from one initial composition w 0 to a final composition w n through the application of successive perturbations: w 0 p 1 w 0 = w 1 p 2 w 1 = w 2 p n w n 1 = w n In this manner, w n = (p n p n 1 p 1 ) w 0

31 Statistical Analysis of Compositional Data 31 Genesis of normal distribution Particles fall from a funnel onto tips of triangles, where they are deviated to the left or to the right with equal probability (05) and finally fall into receptacles If the tip of a triangle is at distance x from the left edge of the board, triangle tips to the right and to the left below it are placed at x + k and x k (k constant)

32 Statistical Analysis of Compositional Data 32 Genesis of lognormal distribution Particles fall from a funnel onto tips of triangles, where they are deviated to the left or to the right with equal probability (05) and finally fall into receptacles If the tip of a triangle is at distance x from the left edge of the board, triangle tips to the right and to the left below it are placed at x/k and xk (k constant)

33 Statistical Analysis of Compositional Data 33 Perturbations on C D 1 Interpretation If w = ccl (w SiO2,, w P2 O 5 ) expresses the percentage composition of major oxides of a rock, its molecular composition will be w = ccl (w SiO2 /m SiO2,, w P2 O 5 /m P2 O 5 ), where m j symbolizes the molecular weight of oxide j Therefore, composition w can be obtained applying the perturbation m 1 = (ccl (m SiO2,, m P2 O 5 ) ) 1 to composition w: w = m 1 w

34 Statistical Analysis of Compositional Data 34 The vector space (C D 1,, ) Definition defined as The external operation in C D 1 is λ w = ccl (w λ 1,, w λ D), for each λ IR and each w C D 1 (C D 1,, ) is a vector space of dimension D 1 X1 2x x e 2x X2 X3

35 Statistical Analysis of Compositional Data 35 The log and the exp transformations between IR D + and IR D The logarithmic transformation on IR D + transforms the rays from the origin which represent the compositions of the space C D 1, to straight lines of IR D parallel to vector 1 D = (1, D, 1) Inversely, the exponential transformation on IR D transforms these straight lines of IR D parallel to vector 1 D, to rays from the origin of IR D +

36 Statistical Analysis of Compositional Data 36 W2 1 W ccl W W 1 W1 Z2 z+u U z 1 ucl z V 1 Z1

37 Statistical Analysis of Compositional Data 37 Centered logratio transformation Definition The centered logratio transformation denoted by clr is the one-to-one function from the compositional space C D 1 to the subspace V = {z = (z 1,, z D ) IR D : z z D = 0} of IR D, defined by clr w = log w g(w) (w C D 1 ) The inverse transformation, from V to C D 1, is given by clr 1 z = ccl (exp z) (z V ) The logarithmic and the exponential transformations establish a one-to-one correspondence between the simplex S D and the hyperplan V in IR D

38 Statistical Analysis of Compositional Data 38 Centered logratio transformation W2 1 W ccl W W 1 W1 Z2 z+u U z 1 ucl z V 1 Z1

39 Statistical Analysis of Compositional Data 39 Centered logratio transformation Property The centered logratio transformation is an isomorphism between the vector space (C D 1,, ) and the vector subspace V = {z = (z 1,, z D ) IR D : z z D = 0} of (IR D, +, ) Therefore, clr (w w ) = clr w + clr w ; clr (λ w) = λ clr w, where w, w C D 1, and λ R Equally, clr 1 (z + z ) = clr 1 z clr 1 z ; clr 1 (λ z) = λ clr 1 z, where z, z V, and λ IR

40 Statistical Analysis of Compositional Data 40 Isometric logratio transformation Let V = {v 1,, v D 1 } an orthonormal basis of the subspace V = {z = (z 1,, z D ) IR D : z z D = 0} Then, since clr w V, it will be always possible to write clr w = u 1 v u D 1 v D 1, for any w C D 1 Definition The isometric logratio transformation denoted by ilr V is the one-to-one function from the compositional space C D 1 to IR D 1 defined by ilr V w = (u 1,, u D 1 ) (w C D 1 ) Like clr, the transformation ilr V is an isomorphism between the vector spaces (C D 1,, ) and (IR D 1, +, )

41 Statistical Analysis of Compositional Data 41 Skye Lavas Sample Na 2 O + K 2 O Fe 2 O 3 MgO S S S clr (Na 2 O + K 2 O) clr (Fe 2 O 3 ) clr (MgO) S1 0,7910 0,5775-1,3685 S2 0,9107 0,7436-1,6543 S3 0,7399 0,7609-1,5008 ilr 1 [u 1 ] ilr 2 [u 2 ] S1 0,1510 1,6760 S2 0,1181 2,0261 S3-0,0149 1,8381 Hint The orthonormal basis V of the subspace V IR 3 linked to the ilr coordinates is v 1 = ( 1 2, 1 2, 0), v 2 = ( 1 6, 1 6, 2 6 )

42 Statistical Analysis of Compositional Data 42 Skye Lavas Ternary diagram clr-coordinates ilr-coordinates

43 Statistical Analysis of Compositional Data 43 C D 1 as an Euclidean space The clr transformation between C D 1 and the subspace V of IR D allows to translate to C D 1 the real Euclidean structure defined on V : <w, w > C = (clr w) clr w = (log w) H D log w, w C = clr w = [(log w) H D log w] 1/2, d C (w, w ) = d Euc (clr w, clr w ) = [(log w log w) H D (log w log w)] 1/2, where H D is the (D D)-centering matrix This matrix is equal I D D 1 J D, where I D is the identity matrix and J D = 1 D 1 D Therefore, by construction, transformations clr and clr 1 and also ilr and ilr 1 preserve the distances defined in C D 1 and IR D 1

44 Statistical Analysis of Compositional Data 44 Compositional geometry in C D 1 We can not analyze the simplex S D as we analyze the Euclidean real space Let w 1 = ccl (1000, 49500, 39500) w 2 = ccl (0010, 49995, 39995) w 3 = ccl (250, 500, 250) w 4 = ccl (350, 300, 350) be four compositions from S 3 Then, d Euc (w 1, w 2 ) 121 < 2449 d Euc (w 3, w 4 ), whereas d C (w 1, w 2 ) 377 > 069 d C (w 3, w 4 )

45 Statistical Analysis of Compositional Data 45 Compositional geometry in C D 1 Any linear variety on C D 1 straight lines, planes, etc can always be implicitly expressed by a system of linear equations in log w 1,, log w D in the form a 11 log w 1 + +a 1D log w D = b 1 a m1 log w 1 + +a md log w D = b m, with a i1 + + a id = 0, for each i = 1,, m In particular, the parametric equation varying t IR of a straight line on C D 1 is given by w(t) = ccl (exp(α 1 +λ 1 t),, exp(α D +λ D t)), where D j=1 alpha j = 0 and D j=1 λ j = 0 Similarly to real space, the concepts of parallelism and orthogonality can be introduced in C D 1

46 Statistical Analysis of Compositional Data 46 Parallelism in C 2 1 k>0 k=0 k<0 2 3 log w 2 log w 3 = k 1 k=4 k=2 k=0 k= k= 4 log w 1 2 log w 2 + log w 3 = k

47 Statistical Analysis of Compositional Data 47 Orthogonality in C w 2 w 3 = 0 2 log w 1 + log w 2 + log w 3 = log w 1 3 log w log w 3 = 0 5 log w 1 log w 2 4 log w 3 = 0

48 Statistical Analysis of Compositional Data 48 Circles in C 2 Simplex S clr -space

49 Statistical Analysis of Compositional Data 49 The alr transformation Definition The additive logratio transformation of index j (j = 1,, D) denoted by alr j is the one-to-one transformation from C D 1 to IR D 1 defined by where w y = alr j w = log w j w j w j = (w 1, w 2,, w j 1, w j+1,, w D ) The inverse transformation of alr D, from IR D 1 to C D 1, is given by alr 1 D y = ccl (exp y 1,, exp y D 1, 1) (y IR D 1 )

50 Statistical Analysis of Compositional Data 50 The alr transformation Property The alr j transformations (j = 1,, D) are isomorphisms between the vector spaces (C D 1,, ) and (IR D 1, +, ), ie, alr j (w w ) = alr j w + alr j w ; alr j (λ w) = λ alr j w, alr 1 j (y + y ) = alr 1 j y alr 1 j y ; alr 1 j (λ y) = λ alr 1 j y, where w, w C D 1, y, y IR D 1 and λ IR Property The alr j transformations (j = 1,, D) do not preserve the distances defined in the metric spaces C D 1 and IR D 1, ie, d C (w, w ) d Euc (alr j w, alr j w ); d Euc (y, y ) d C (alr 1 j y, alr 1 j y )

51 Statistical Analysis of Compositional Data 51 Determination of a composition A composition w C D 1 can be determined in several forms: (i) Giving any D-observational vector belonging to w Usually, we will choose the vector x = Cw = ccl L w belonging to S D (ii) Giving the components (z 1,, z D ) = z of the centered logratio transformed vector clr w Since z belongs to the subspace V of IR D, its components are related by the equality z z D = 0 (iii) Giving the components (y 1,, y D 1 ) = y of the additive logratio transformed vector alr D w If it is needed, we can choose the components of any other logratio alr j w (j D) (iv) Giving the components (u 1,, u D 1 ) = u of the isometric logratio transformed vector ilr V w, where V is a known orthonormal basis of the subspace V of IR D

52 Statistical Analysis of Compositional Data 52 Determination of a composition Skye lavas: A = Na 2 O + K 2 O, F = Fe 2 O 3, M = MgO Sample A F M S S clr (A) clr (F) clr (M) S S ilr 1 [u 1 ] ilr 2 [u 2 ] S1 0,1510 1,6760 S2 0,1181 2,0261 alr M A alr M F S S alr A F alr A M S S Hint The orthonormal basis V of the subspace V IR 3 linked to the ilr coordinates is v 1 = ( 1, 1, 0), v 2 = ( 1 1,, 2 )

53 Statistical Analysis of Compositional Data 53

54 Statistical Analysis of Compositional Data 54 Compositional data set Raw data matrix W = [w ij : i = 1,, n; j = 1,, D], or X = [x ij : i = 1,, n; j = 1,, D], where x i = (x i1,, x id ) S D Example AFM composition of 23 aphyric Skye lavas [A = Na 2 O + K 2 O, F = Fe 2 O 3, M = MgO] Obs A% F% M% S S X =

55 Statistical Analysis of Compositional Data 55 Compositional data set Centred logratio (clr ) data matrix Z = [z ij : i = 1,, n; j = 1,, D], where z ij = log (w ij /g(w i )), with g(w i ) = ( D k=1 w ik) 1/D Example AFM composition of 23 aphyric Skye lavas [A = Na 2 O + K 2 O, F = Fe 2 O 3, M = MgO] Obs A% F% M% S S clr A clr F clr M Z =

56 Statistical Analysis of Compositional Data 56 Compositional data set Additive logratio (alr) data matrix Y = [y ij : i = 1,, n; j = 1,, d], where y ij = log(w ij /w id ) Example AFM composition of 23 aphyric Skye lavas [A = Na 2 O + K 2 O, F = Fe 2 O 3, M = MgO] Obs A% F% M% S S log A M log F M Y =

57 Statistical Analysis of Compositional Data 57 Center of a compositional data set The center of a set W of n compositions w 1,, w n of C D 1, is the composition g defined by cen W = g = ( 1 n w 1) ( 1 n w n) This center is equal to ( n ) 1/n g = ccl w i1 i=1 ( n ) 1/n,, w id i=1 It verifies that clr g = z = and alr D g = y = n i=1 n i=1 1 n z i = 1 n y i = n i=1 n i=1 1 n clr w i 1 n alr D w i

58 Statistical Analysis of Compositional Data 58 Center of a compositional data set Properties cen W = ccl ( n ) 1/n w i1 ( n ) 1/n,, w id i=1 i=1 cen W = argmin }{{} ξ C D 1 { dc (w 1,ξ)++d C (w n,ξ) n } cen {p W} = p cen W, where p C D 1 cen {t W} = t cen W, where t IR cen {W W } = cen W cen W

59 Statistical Analysis of Compositional Data 59 Center of a compositional data set Example AFM composition of 23 aphyric Skye lavas Compositional ( geometric ) center: g = ccl (2585, 5665, 1750) Arithmetic center: a = ccl (2683, 5374, 1943)

60 Statistical Analysis of Compositional Data 60 Centering To centre a compositional data set w 1, w n with centre g, it suffices to consider the new data set w 1 = g 1 w 1,, w n = g 1 w n Obviously, the centre of the new centered data set w 1,, w n is ccl (1/D,, 1/D)

61 Statistical Analysis of Compositional Data 61 Compositional covariance structure Variation matrix T = [τ jk ] = [ { var log w }] (j) w (k) τ jk = 0 means a perfect relationship between w (j) and w (k) in the sense that the ratio w (j) /w (k) is constant The larger the value of τ jk the more departure from proportionality between w (j) and w (k) A measure of degree of proportionality between two parts j and k is given by exp( τ jk ) In this way, exp( τ jk ) = 0 means zero propotionality, and exp( τ jk ) = 1 means perfect propotionality The variation matrix of any subcomposition is simply obtained by picking out on T all the logratio variances τ jk associated with the parts j and k of the subcomposition

62 Statistical Analysis of Compositional Data 62 Compositional covariance structure Logratio covariance matrix Σ = [σ jk ] = [ cov { y (j), y (k) }] = [ { }] cov log w (j) w (D), log w (k) w (D), where y j = ( log w 1j,, log w ) nj, w 1D w nd for j = 1,, D 1

63 Statistical Analysis of Compositional Data 63 Compositional covariance structure Centered covariance matrix where Γ = [γ jk ] = [ cov { z (j), z (k) }] z (j) = (log (w 1j /g(w 1 )),, log (w nj /g(w n ))), for j = 1,, D Hint Correlation corr { } z (j), z (k) is not a measure of a relationship between parts j and k because is subcompositionally incoherent Total (relative) variability totvar C {W} = n i=1 1 n d2 C (w i, g) = trace {Γ} = 1 2D 1 D T1 D = 1 D i<j τ ij

64 Statistical Analysis of Compositional Data 64 Compositional covariance structure [ ] The centered covariance matrix Γ = γ jk is singular because D k=1 γ jk = 0 (j = 1,, D) The relationships between the three covariance matrices T, Σ and Γ are linear The dimensionality of the covariance structure of a compositional raw data matrix from C D 1 is equal to 1 D(D 1) 2 The covariance matrix T and also Σ and Γ is coherent with the algebraic structure of (C D 1,, ), ie, T{p W} = T{W} and T{λ W} = λ 2 T{W}, where W is a compositional raw data matrix from C D 1, p C D 1 and λ IR Therefore, totvar C {p W} = totvar C {W}; totvar C {λ W} = λ 2 totvar C {W}

65 Statistical Analysis of Compositional Data 65 Compositional covariance structure Example AFM composition of 23 Skye lavas Variation matrix T = Logratio covariance matrix Σ A = [ Σ M = [ ] Σ F = [ ] ] Centered covariance matrix Γ = Total variability: totvar C = 0582

66 Statistical Analysis of Compositional Data 66 Biplots In general, the biplot is a simultaneous representation of the rows (observations) and columns (variables) of a n p matrix X by means of a rank-2 approximation Usually, biplot analysis starts with performing some transformations on X, depending on the nature of the data, to obtain a transformed matrix Z which is the one that is actually displayed

67 Statistical Analysis of Compositional Data 67 Biplots The singular value decomposition (SVD) of Z provides a decomposition of this matrix: Z = [u 1 : : u r ] diag{λ 1,, λ r } [v 1 : : v r ], where r is the rank of Z; u 1,, u r are the standardized eigenvectors of Z ; v 1,, v r are the standardized eigenvectors of Z; and λ 1,, λ r the corresponding positive eigenvalues in decreasing order From this SVD of Z, and using only the two first eigenvectors, a rank-2 approximation Ẑ is obtained: Ẑ = [u 1 : u 2 ] diag{λ 1, λ 2 } [v 1 : v 2 ]

68 Statistical Analysis of Compositional Data 68 Biplots Then Ẑ decomposes in Ẑ = [λ α 1 u 1 : λ α 2 u 2 ] }{{} F [λ 1 α 1 v 1 : λ 1 α 2 v 2 ] }{{} where α is an arbitrary constant G, The biplot represents simultaneously in IR 2 the rows of F, which provides the coordinates of n points (in correspondence with the n rows/observations of Z), and the rows of G, which provides the coordinates of p points (in correspondence with the columns/variables of Z) Conventionally, the biplot depicts the variables by rays and the observations by points Depending on the constant α, the biplot favours the display of rows (observations) or columns (variables) For α = 0, the biplot is called covariance biplot In this case, the display of variables is favoured

69 Statistical Analysis of Compositional Data 69 Biplots Singular value decomposition (SVD) of Z: Z = [u 1 : : u r ] diag{λ 1,, λ r } [v 1 : : v r ], Rank-2 approximation Ẑ: Ẑ = [λ α 1 u 1 : λ α 2 u 2 ] }{{} F [λ 1 α 1 v 1 : λ 1 α 2 v 2 ] }{{} G, The ratio λ 1 + λ 2 λ λ r is a measure of the proportion of the variability of Z captured by the biplot

70 Statistical Analysis of Compositional Data 70 Relative variation diagrams Definition The relative variation diagram of a compositional data set w 1,, w n of C D 1 is the covariance biplot of the matrix Z c which we obtain after centering the D columns of the centered logratio matrix Z Elements Origin, labeled O Vertices, for each of the D parts (variables/columns) of compositions, labeled 1,, j, D Case marker, for each of the n observations (rows), labeled 1,, i, n Ray Is the join Oj of origin O to a vertex j Link Is the join jk of two vertices j and k

71 Statistical Analysis of Compositional Data 71 Relative variation diagrams

72 Statistical Analysis of Compositional Data 72 Relative variation diagrams The vertices and case markers are both centered at the origin O Rays and inter-ray angles represent the centered logratio matrix Γ: Oj 2 = ˆγ jj = estimate of var { z (j) }, Oj Ok = ˆγ jk = estimate of cov { z (j), z (k) }, so that cos ĵok = estimate of corr { z (j), z (k) } Hint Remember that correlation corr { } z (j), z (k) is not a measure of a relationship between parts j and k because is subcompositionally incoherent

73 Statistical Analysis of Compositional Data 73 Relative variation diagrams The squared lengths of the links represent the set of estimated relative variances: { jk 2 = ˆτ jk = estimate of var log w } (j) w (k) Therefore, if two vertices j and k coincide or are close together then components w (j) and w (k) are in constant proportion or nearly so j k 0

74 Statistical Analysis of Compositional Data 74 Relative variation diagrams Links jl and kl, with a common vertex l, represent the estimated logratio covariance matrix Σ l : { jl kl cov log w (j), log w } (k), w (l) so that w (l) cos ĵlk corr { log w (j) w (l), log w (k) w (l) } j l k 0

75 Statistical Analysis of Compositional Data 75 Relative variation diagrams If the links jk and lm intersect at R then cos jrm { corr log w (j), log w } (l) w (k) w (m) Therefore, if two links jk and lm intersect at right angles then the logratios log(w (j) /w (k) ) and log(w (l) /w (m) ) will be uncorrelated and, within the context of logistic normality, independent, ie, subcompositions (j, k) and (l, m) are independent l j R 0 m k

76 Statistical Analysis of Compositional Data 76 Relative variation diagrams The relative variation diagram for any subcomposition S is simply the subdiagram formed by selecting the vertices corresponding to the parts of the subcomposition and taking the centroide O S of these vertices as the center of the subcompositional biplot Therefore, if a subset say 1,, C of vertices is approximately collinear then the associated subcomposition has a compositional one-dimensional structure 0 j k l j 0 S 0 k l

77 Statistical Analysis of Compositional Data 77 Volcano H Parts: 1=Cl; 2=K 2 O; 3=P 2 O 5 ; 4=TiO 2 ; 5=SiO 2 Variation matrix T 0 2, 784 4, 134 3, 970 2, 966 2, , 647 0, 645 0, 146 4, 134 0, , 071 0, 304 3, 970 0, 645 0, , 249 2, 966 0, 146 0, 304 0, Centered covariance matrix Γ 2, 134 0, 221 0, 803 0, 743 0, 368 0, 221 0, 208 0, 022 0, 043 0, 079 0, 803 0, 022 0, 394 0, 337 0, 094 0, 743 0, 043 0, 337 0, 350 0, 099 0, 368 0, 079 0, 094 0, 099 0, 096

78 Statistical Analysis of Compositional Data 78 Volcano H

79 Statistical Analysis of Compositional Data 79 Volcano H

80 Statistical Analysis of Compositional Data 80 Dimension-reducing techniques Compositional PCA Given a set of compositions w 1,, w n of C D 1 with center g, the PCA will start looking for a direction determined by a C-unitary composition c 1 such that the total variability of the C-orthogonal projections of w 1,, w n on the compositional straight line through g with direction c 1 will be maximum And so on Property The compositional principal components of a set of compositions w 1,, w n of C D 1 can be determined from the standard principal components of the clr -transformed observations clr w 1,, clr w n

81 Statistical Analysis of Compositional Data 81 Compositional PCA In this manner, the positive eigenvalues λ 1 λ D 1 of the centered logratio covariance matrix Γ give the decomposition of totvar C, and the corresponding unitary eigenvectors z 1,, z D 1 determine the corresponding directions clr 1 z 1,, clr 1 z D 1 of the principal axes

82 Statistical Analysis of Compositional Data 82 Skye Lavas PC1: log A log F log M 07849

83 Statistical Analysis of Compositional Data 83 Volcano H (Cl, K 2 O, P 2 O 5, TiO 2, SiO 2 ) PC 1 : (04246,01656,01264,01296,01538) (878%) PC 2 : (01469,03836,01230,01182,02293) (acum 981%)

84 Statistical Analysis of Compositional Data 84 Dimension-reducing techniques Subcomposition analysis Let w 1,, w n be a compositional data set of C D 1, and let sub S w 1,, sub S w n be the set of the corresponding subcompositions of C C 1 associated to a subset S of parts 1,, D Then, the ratio totvar C {sub S w 1,, sub S w n } totvar C {w 1,, w n } gives the proportion of total variabilitiy retained by the subcompositions If the purpose of subcompositional analysis is to retain as much variability as possible for a given number C of parts, then we have to search for subcompositions of this size which maximize this ratio

85 Statistical Analysis of Compositional Data 85 Volcano H

86 Statistical Analysis of Compositional Data 86 Subcomposition analysis Example Percentage of Cl, K 2 O, P 2 O 5, TiO 2 and SiO 2 in 46 samples of volcanic rocks from an anonymous volcano H Total variability: totvar C = Total variability of 3-parts subcompositions: Subcomposition Percentage P 2 O 5, TiO 2, SiO 2 653% K 2 O, TiO 2, SiO % K 2 O, P 2 O 5, SiO % K 2 O, P 2 O 5, TiO % Cl, K 2 O, SiO % Cl, TiO 2, SiO % Cl, K 2 O, TiO % Cl, P 2 O 5, SiO % Cl, K 2 O, P 2 O % Cl, P 2 O 5, TiO %

87 Statistical Analysis of Compositional Data 87 Zeros in compositional data Logratio methodology is incompatible with composition with zeros in one or more parts Two kinds of zeros: Essential zeros: part completely absent Rounded zeros: no quantifiable proportion has been recorded Treatment of essential zeros: Is it suitable to amalgamate some parts? Pre-classification: create initial groups according to the number and location of zeros, and analyze each group separately Treatment of rounded zeros: Consider the zero values as missing values Imputation: replace zero values by a small amount using non-parametric or parametric techniques Apply log-ratio methodology to replaced observations of resulting data set

88 Statistical Analysis of Compositional Data 88 Rounded zeros Multiplicative replacement Let be w = ccl (w 1,, w D ) C D 1 any composition with some w j = 0 (rounded zero) The multiplicative replacement replaces w by the composition w (r) = ccl (w (r) 1,, w(r) D ) defined by if w j = 0 w (r) j = δ j ; ( if w j 0 w (r) j = w j 1 ) δ l w l =0 where δ j are the small values replacing zeros parts

89 Statistical Analysis of Compositional Data 89 Rounded zeros Multiplicative replacement It is a natural replacement Ratio between two non-zero parts is preserved It is compatible with subcompositions, perturbation and power transformation Covariance structure of subcompositions with no zeros is preserved

90 Statistical Analysis of Compositional Data 90 Modeling compositional data In practice, many of the probability density functions (pdf) on the compositional space C D 1 will be defined from a pdf on the real space IR D 1 Then the alr j 1 transformations will allow to induce on the simplex S D the corresponding pdf The most important pdf on C D 1 are: The Dirichlet class The (additive) logistic normal class The (additive) logistic skewnormal class Definition A random composition w on C D 1 is said to have an additive logistic normal distribution (aln) of parameters µ and Σ written w L D 1 (µ, Σ) if the random vector y = alr D w = log(w D /w D ) has a N D 1 (µ, Σ) on IR D 1

91 Statistical Analysis of Compositional Data 91 Logistic normal distributions on C D 1 Property Let w be a random vector on C D 1 If alr D w N D 1 (µ, Σ), then all the other logratio random vectors alr j w (j = 1,, d) are normally distributed Property Let w be a random composition on C D 1 Let w S be the random subcomposition on C C 1 corresponding to a subset S of C parts of w If w L D 1 (µ, Σ), then w S L C 1 (µ S, Σ S ), where µ S and Σ S can be easily calculated from µ and Σ Property Let w be a random vector on C D 1, which w L D 1 (µ, Σ) If we perturb w by a constant composition p C D 1, then the perturbed random vector p w L D 1 (µ + alr D p, Σ)

92 Statistical Analysis of Compositional Data 92 Logistic normal distributions on C D 1 Estimation of parameters To estimate the parameters µ and Σ of a random composition w L D 1 (µ, Σ) from a random sample w 1,, w n of w, we estimate by standard procedures the vector mean and the covariance matrix of a multivariate normal distribution from the alr D -transformed random sample y 1 = alr D w 1,, y n = alr D w n The maximum likelihood estimations of µ and Σ are given by σ jk = 1 n µ j = 1 n y ij, n i=1 n (y ij µ j )(y ik µ k ), i=1 for j, k = 1,, D 1

93 Statistical Analysis of Compositional Data 93 Predictive regions Definition Let w be a random composition aln distributed on C D 1 If µ and Σ are the estimates of the unknown parameters of w from a random sample of size n, the 1 α predictive region is defined as } {w C D 1 : (alr D w µ) Σ 1 (alrd w µ) r 2, where r 2 is a real number such that Prob [ F D 1,n (D 1) ] n(n (D 1)) (n 2 1)(D 1) r2 = 1 α 99% Predictive region - Skye lavas

94 Statistical Analysis of Compositional Data 94 Atypicality index Definition If a random composition w on C D 1 is L D 1 (µ, Σ) distributed, the atypicality index of a composition w C D 1 in relation to the random composition w is defined as Prob [ χ 2 D 1 (alr D w µ) Σ 1 (alr D w µ) ] Definition Let w be a random composition aln distributed on C D 1 If µ and Σ are the estimates of the unknown parameters of w from a random sample w 1,, w n of size n, the atypicality index of a composition w C D 1 in relation to the compositional data set w 1,, w n is defined as ] Prob [ F D 1,n (D 1) k(alr D w µ) Σ 1 (alrd w µ) where k = n(n (D 1)) (n 2 1)(D 1),

95 Statistical Analysis of Compositional Data 95 Compositional Regression Artic lake Sand, silt,clay composition of 39 sediment samples at different water depths in an Artic lake: Num Sand Silt Clay Depth (m) S S S a Is sediment composition dependent on water depth? b If so, how can we quantify the extent of the dependence?

96 Statistical Analysis of Compositional Data 96 Compositional Regression Compositions w i C D 1 regressing on a real concomitant t i (i = 1,, n): w i = β 0 (t i β 1 ) ε i (i = 1,, n), where β 0 : constant; β 1 : regression coefficient; ε i (i = 1,, n): errors alr version of the regression model alr w i = alr β 0 +t i alr β 1 +alr ε i (i = 1,, n) Can be reparametrized as alr w i = α 0 + t i α 1 + ɛ i (i = 1,, n)

97 Statistical Analysis of Compositional Data 97 Compositional Regression alr w i = α 0 + t i α 1 + ɛ i (i = 1,, n) Estimations α 0 and α 1 are obtained by application of the least squares method Then β 0 = alr 1 α 0, β1 = alr 1 α 1 The error (residual) of w i (i = 1,, n) will be e i = w i ŵ i, where ŵ i = β 0 (t i β 1 ) Sum of squares of errors: n SSError = e i 2 C = i=1 n ( dc (w i, ŵ i ) ) 2 i=1 Proportion of variability explained by the fitted linear regression model: 1 SSError totvar C {w 1,, w n }

98 Statistical Analysis of Compositional Data 98 Artic lake Num Sand Silt Clay Depth (m) S S alr fitted simple linear regression model: log(sand/clay) = log(depth) + ɛ 1 ; log(silt/clay) = log(depth) + ɛ 2 Fitted regression model in S 3 : ccl L (sand, silt, clay) = ( , , ) log(depth) (004604, , ) Proportion of variability explained by the fitted simple linear regression model: = %

99 Statistical Analysis of Compositional Data 99 Artic lake Num Sand Silt Clay Depth (m) S S

The Mathematics of Compositional Analysis

The Mathematics of Compositional Analysis Austrian Journal of Statistics September 2016, Volume 45, 57 71. AJS http://www.ajs.or.at/ doi:10.17713/ajs.v45i4.142 The Mathematics of Compositional Analysis Carles Barceló-Vidal University of Girona,

More information

Updating on the Kernel Density Estimation for Compositional Data

Updating on the Kernel Density Estimation for Compositional Data Updating on the Kernel Density Estimation for Compositional Data Martín-Fernández, J. A., Chacón-Durán, J. E., and Mateu-Figueras, G. Dpt. Informàtica i Matemàtica Aplicada, Universitat de Girona, Campus

More information

CoDa-dendrogram: A new exploratory tool. 2 Dept. Informàtica i Matemàtica Aplicada, Universitat de Girona, Spain;

CoDa-dendrogram: A new exploratory tool. 2 Dept. Informàtica i Matemàtica Aplicada, Universitat de Girona, Spain; CoDa-dendrogram: A new exploratory tool J.J. Egozcue 1, and V. Pawlowsky-Glahn 2 1 Dept. Matemàtica Aplicada III, Universitat Politècnica de Catalunya, Barcelona, Spain; juan.jose.egozcue@upc.edu 2 Dept.

More information

A Critical Approach to Non-Parametric Classification of Compositional Data

A Critical Approach to Non-Parametric Classification of Compositional Data A Critical Approach to Non-Parametric Classification of Compositional Data J. A. Martín-Fernández, C. Barceló-Vidal, V. Pawlowsky-Glahn Dept. d'informàtica i Matemàtica Aplicada, Escola Politècnica Superior,

More information

arxiv: v2 [stat.me] 16 Jun 2011

arxiv: v2 [stat.me] 16 Jun 2011 A data-based power transformation for compositional data Michail T. Tsagris, Simon Preston and Andrew T.A. Wood Division of Statistics, School of Mathematical Sciences, University of Nottingham, UK; pmxmt1@nottingham.ac.uk

More information

Time Series of Proportions: A Compositional Approach

Time Series of Proportions: A Compositional Approach Time Series of Proportions: A Compositional Approach C. Barceló-Vidal 1 and L. Aguilar 2 1 Dept. Informàtica i Matemàtica Aplicada, Campus de Montilivi, Univ. de Girona, E-17071 Girona, Spain carles.barcelo@udg.edu

More information

Dealing With Zeros and Missing Values in Compositional Data Sets Using Nonparametric Imputation 1

Dealing With Zeros and Missing Values in Compositional Data Sets Using Nonparametric Imputation 1 Mathematical Geology, Vol. 35, No. 3, April 2003 ( C 2003) Dealing With Zeros and Missing Values in Compositional Data Sets Using Nonparametric Imputation 1 J. A. Martín-Fernández, 2 C. Barceló-Vidal,

More information

An EM-Algorithm Based Method to Deal with Rounded Zeros in Compositional Data under Dirichlet Models. Rafiq Hijazi

An EM-Algorithm Based Method to Deal with Rounded Zeros in Compositional Data under Dirichlet Models. Rafiq Hijazi An EM-Algorithm Based Method to Deal with Rounded Zeros in Compositional Data under Dirichlet Models Rafiq Hijazi Department of Statistics United Arab Emirates University P.O. Box 17555, Al-Ain United

More information

Principal balances.

Principal balances. Principal balances V. PAWLOWSKY-GLAHN 1, J. J. EGOZCUE 2 and R. TOLOSANA-DELGADO 3 1 Dept. Informàtica i Matemàtica Aplicada, U. de Girona, Spain (vera.pawlowsky@udg.edu) 2 Dept. Matemàtica Aplicada III,

More information

The Dirichlet distribution with respect to the Aitchison measure on the simplex - a first approach

The Dirichlet distribution with respect to the Aitchison measure on the simplex - a first approach The irichlet distribution with respect to the Aitchison measure on the simplex - a first approach G. Mateu-Figueras and V. Pawlowsky-Glahn epartament d Informàtica i Matemàtica Aplicada, Universitat de

More information

Linear Algebra Review. Vectors

Linear Algebra Review. Vectors Linear Algebra Review 9/4/7 Linear Algebra Review By Tim K. Marks UCSD Borrows heavily from: Jana Kosecka http://cs.gmu.edu/~kosecka/cs682.html Virginia de Sa (UCSD) Cogsci 8F Linear Algebra review Vectors

More information

Groups of Parts and Their Balances in Compositional Data Analysis 1

Groups of Parts and Their Balances in Compositional Data Analysis 1 Mathematical Geology, Vol. 37, No. 7, October 2005 ( C 2005) DOI: 10.1007/s11004-005-7381-9 Groups of Parts and Their Balances in Compositional Data Analysis 1 J. J. Egozcue 2 and V. Pawlowsky-Glahn 3

More information

Computation. For QDA we need to calculate: Lets first consider the case that

Computation. For QDA we need to calculate: Lets first consider the case that Computation For QDA we need to calculate: δ (x) = 1 2 log( Σ ) 1 2 (x µ ) Σ 1 (x µ ) + log(π ) Lets first consider the case that Σ = I,. This is the case where each distribution is spherical, around the

More information

Some Practical Aspects on Multidimensional Scaling of Compositional Data 2 1 INTRODUCTION 1.1 The sample space for compositional data An observation x

Some Practical Aspects on Multidimensional Scaling of Compositional Data 2 1 INTRODUCTION 1.1 The sample space for compositional data An observation x Some Practical Aspects on Multidimensional Scaling of Compositional Data 1 Some Practical Aspects on Multidimensional Scaling of Compositional Data J. A. Mart n-fernández 1 and M. Bren 2 To visualize the

More information

MACHINE LEARNING. Methods for feature extraction and reduction of dimensionality: Probabilistic PCA and kernel PCA

MACHINE LEARNING. Methods for feature extraction and reduction of dimensionality: Probabilistic PCA and kernel PCA 1 MACHINE LEARNING Methods for feature extraction and reduction of dimensionality: Probabilistic PCA and kernel PCA 2 Practicals Next Week Next Week, Practical Session on Computer Takes Place in Room GR

More information

Lecture Notes 1: Vector spaces

Lecture Notes 1: Vector spaces Optimization-based data analysis Fall 2017 Lecture Notes 1: Vector spaces In this chapter we review certain basic concepts of linear algebra, highlighting their application to signal processing. 1 Vector

More information

Motivating the Covariance Matrix

Motivating the Covariance Matrix Motivating the Covariance Matrix Raúl Rojas Computer Science Department Freie Universität Berlin January 2009 Abstract This note reviews some interesting properties of the covariance matrix and its role

More information

THE CLOSURE PROBLEM: ONE HUNDRED YEARS OF DEBATE

THE CLOSURE PROBLEM: ONE HUNDRED YEARS OF DEBATE Vera Pawlowsky-Glahn 1 and Juan José Egozcue 2 M 2 1 Dept. of Computer Science and Applied Mathematics; University of Girona; Girona, SPAIN; vera.pawlowsky@udg.edu; 2 Dept. of Applied Mathematics; Technical

More information

Focus was on solving matrix inversion problems Now we look at other properties of matrices Useful when A represents a transformations.

Focus was on solving matrix inversion problems Now we look at other properties of matrices Useful when A represents a transformations. Previously Focus was on solving matrix inversion problems Now we look at other properties of matrices Useful when A represents a transformations y = Ax Or A simply represents data Notion of eigenvectors,

More information

Machine Learning (Spring 2012) Principal Component Analysis

Machine Learning (Spring 2012) Principal Component Analysis 1-71 Machine Learning (Spring 1) Principal Component Analysis Yang Xu This note is partly based on Chapter 1.1 in Chris Bishop s book on PRML and the lecture slides on PCA written by Carlos Guestrin in

More information

PCA & ICA. CE-717: Machine Learning Sharif University of Technology Spring Soleymani

PCA & ICA. CE-717: Machine Learning Sharif University of Technology Spring Soleymani PCA & ICA CE-717: Machine Learning Sharif University of Technology Spring 2015 Soleymani Dimensionality Reduction: Feature Selection vs. Feature Extraction Feature selection Select a subset of a given

More information

Principle Components Analysis (PCA) Relationship Between a Linear Combination of Variables and Axes Rotation for PCA

Principle Components Analysis (PCA) Relationship Between a Linear Combination of Variables and Axes Rotation for PCA Principle Components Analysis (PCA) Relationship Between a Linear Combination of Variables and Axes Rotation for PCA Principle Components Analysis: Uses one group of variables (we will call this X) In

More information

Vector spaces. DS-GA 1013 / MATH-GA 2824 Optimization-based Data Analysis.

Vector spaces. DS-GA 1013 / MATH-GA 2824 Optimization-based Data Analysis. Vector spaces DS-GA 1013 / MATH-GA 2824 Optimization-based Data Analysis http://www.cims.nyu.edu/~cfgranda/pages/obda_fall17/index.html Carlos Fernandez-Granda Vector space Consists of: A set V A scalar

More information

Statistical Pattern Recognition

Statistical Pattern Recognition Statistical Pattern Recognition Feature Extraction Hamid R. Rabiee Jafar Muhammadi, Alireza Ghasemi, Payam Siyari Spring 2014 http://ce.sharif.edu/courses/92-93/2/ce725-2/ Agenda Dimensionality Reduction

More information

Math 350 Fall 2011 Notes about inner product spaces. In this notes we state and prove some important properties of inner product spaces.

Math 350 Fall 2011 Notes about inner product spaces. In this notes we state and prove some important properties of inner product spaces. Math 350 Fall 2011 Notes about inner product spaces In this notes we state and prove some important properties of inner product spaces. First, recall the dot product on R n : if x, y R n, say x = (x 1,...,

More information

CSE 554 Lecture 7: Alignment

CSE 554 Lecture 7: Alignment CSE 554 Lecture 7: Alignment Fall 2012 CSE554 Alignment Slide 1 Review Fairing (smoothing) Relocating vertices to achieve a smoother appearance Method: centroid averaging Simplification Reducing vertex

More information

Regression with Compositional Response. Eva Fišerová

Regression with Compositional Response. Eva Fišerová Regression with Compositional Response Eva Fišerová Palacký University Olomouc Czech Republic LinStat2014, August 24-28, 2014, Linköping joint work with Karel Hron and Sandra Donevska Objectives of the

More information

Regularized Discriminant Analysis and Reduced-Rank LDA

Regularized Discriminant Analysis and Reduced-Rank LDA Regularized Discriminant Analysis and Reduced-Rank LDA Department of Statistics The Pennsylvania State University Email: jiali@stat.psu.edu Regularized Discriminant Analysis A compromise between LDA and

More information

Mathematical foundations - linear algebra

Mathematical foundations - linear algebra Mathematical foundations - linear algebra Andrea Passerini passerini@disi.unitn.it Machine Learning Vector space Definition (over reals) A set X is called a vector space over IR if addition and scalar

More information

14 Singular Value Decomposition

14 Singular Value Decomposition 14 Singular Value Decomposition For any high-dimensional data analysis, one s first thought should often be: can I use an SVD? The singular value decomposition is an invaluable analysis tool for dealing

More information

Principal component analysis for compositional data with outliers

Principal component analysis for compositional data with outliers ENVIRONMETRICS Environmetrics 2009; 20: 621 632 Published online 11 February 2009 in Wiley InterScience (www.interscience.wiley.com).966 Principal component analysis for compositional data with outliers

More information

LINEAR ALGEBRA KNOWLEDGE SURVEY

LINEAR ALGEBRA KNOWLEDGE SURVEY LINEAR ALGEBRA KNOWLEDGE SURVEY Instructions: This is a Knowledge Survey. For this assignment, I am only interested in your level of confidence about your ability to do the tasks on the following pages.

More information

Class 11 Maths Chapter 15. Statistics

Class 11 Maths Chapter 15. Statistics 1 P a g e Class 11 Maths Chapter 15. Statistics Statistics is the Science of collection, organization, presentation, analysis and interpretation of the numerical data. Useful Terms 1. Limit of the Class

More information

Linear Algebra Methods for Data Mining

Linear Algebra Methods for Data Mining Linear Algebra Methods for Data Mining Saara Hyvönen, Saara.Hyvonen@cs.helsinki.fi Spring 2007 Linear Discriminant Analysis Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki Principal

More information

Linear Algebra & Geometry why is linear algebra useful in computer vision?

Linear Algebra & Geometry why is linear algebra useful in computer vision? Linear Algebra & Geometry why is linear algebra useful in computer vision? References: -Any book on linear algebra! -[HZ] chapters 2, 4 Some of the slides in this lecture are courtesy to Prof. Octavia

More information

1 Singular Value Decomposition and Principal Component

1 Singular Value Decomposition and Principal Component Singular Value Decomposition and Principal Component Analysis In these lectures we discuss the SVD and the PCA, two of the most widely used tools in machine learning. Principal Component Analysis (PCA)

More information

Camera Models and Affine Multiple Views Geometry

Camera Models and Affine Multiple Views Geometry Camera Models and Affine Multiple Views Geometry Subhashis Banerjee Dept. Computer Science and Engineering IIT Delhi email: suban@cse.iitd.ac.in May 29, 2001 1 1 Camera Models A Camera transforms a 3D

More information

Linear Algebra Massoud Malek

Linear Algebra Massoud Malek CSUEB Linear Algebra Massoud Malek Inner Product and Normed Space In all that follows, the n n identity matrix is denoted by I n, the n n zero matrix by Z n, and the zero vector by θ n An inner product

More information

1. Addition: To every pair of vectors x, y X corresponds an element x + y X such that the commutative and associative properties hold

1. Addition: To every pair of vectors x, y X corresponds an element x + y X such that the commutative and associative properties hold Appendix B Y Mathematical Refresher This appendix presents mathematical concepts we use in developing our main arguments in the text of this book. This appendix can be read in the order in which it appears,

More information

Introductory compositional data (CoDa)analysis for soil

Introductory compositional data (CoDa)analysis for soil Introductory compositional data (CoDa)analysis for soil 1 scientists Léon E. Parent, department of Soils and Agrifood Engineering Université Laval, Québec 2 Definition (Aitchison, 1986) Compositional data

More information

Review problems for MA 54, Fall 2004.

Review problems for MA 54, Fall 2004. Review problems for MA 54, Fall 2004. Below are the review problems for the final. They are mostly homework problems, or very similar. If you are comfortable doing these problems, you should be fine on

More information

Matrices and Vectors. Definition of Matrix. An MxN matrix A is a two-dimensional array of numbers A =

Matrices and Vectors. Definition of Matrix. An MxN matrix A is a two-dimensional array of numbers A = 30 MATHEMATICS REVIEW G A.1.1 Matrices and Vectors Definition of Matrix. An MxN matrix A is a two-dimensional array of numbers A = a 11 a 12... a 1N a 21 a 22... a 2N...... a M1 a M2... a MN A matrix can

More information

Dimensionality Reduction: PCA. Nicholas Ruozzi University of Texas at Dallas

Dimensionality Reduction: PCA. Nicholas Ruozzi University of Texas at Dallas Dimensionality Reduction: PCA Nicholas Ruozzi University of Texas at Dallas Eigenvalues λ is an eigenvalue of a matrix A R n n if the linear system Ax = λx has at least one non-zero solution If Ax = λx

More information

Linear Algebra (Review) Volker Tresp 2018

Linear Algebra (Review) Volker Tresp 2018 Linear Algebra (Review) Volker Tresp 2018 1 Vectors k, M, N are scalars A one-dimensional array c is a column vector. Thus in two dimensions, ( ) c1 c = c 2 c i is the i-th component of c c T = (c 1, c

More information

Principal Components Analysis. Sargur Srihari University at Buffalo

Principal Components Analysis. Sargur Srihari University at Buffalo Principal Components Analysis Sargur Srihari University at Buffalo 1 Topics Projection Pursuit Methods Principal Components Examples of using PCA Graphical use of PCA Multidimensional Scaling Srihari 2

More information

What is Principal Component Analysis?

What is Principal Component Analysis? What is Principal Component Analysis? Principal component analysis (PCA) Reduce the dimensionality of a data set by finding a new set of variables, smaller than the original set of variables Retains most

More information

Vectors To begin, let us describe an element of the state space as a point with numerical coordinates, that is x 1. x 2. x =

Vectors To begin, let us describe an element of the state space as a point with numerical coordinates, that is x 1. x 2. x = Linear Algebra Review Vectors To begin, let us describe an element of the state space as a point with numerical coordinates, that is x 1 x x = 2. x n Vectors of up to three dimensions are easy to diagram.

More information

Linear Algebra & Geometry why is linear algebra useful in computer vision?

Linear Algebra & Geometry why is linear algebra useful in computer vision? Linear Algebra & Geometry why is linear algebra useful in computer vision? References: -Any book on linear algebra! -[HZ] chapters 2, 4 Some of the slides in this lecture are courtesy to Prof. Octavia

More information

On the interpretation of differences between groups for compositional data

On the interpretation of differences between groups for compositional data Statistics & Operations Research Transactions SORT 39 (2) July-December 2015, 1-22 ISSN: 1696-2281 eissn: 2013-8830 www.idescat.cat/sort/ Statistics & Operations Research Institut d Estadística de Catalunya

More information

15 Singular Value Decomposition

15 Singular Value Decomposition 15 Singular Value Decomposition For any high-dimensional data analysis, one s first thought should often be: can I use an SVD? The singular value decomposition is an invaluable analysis tool for dealing

More information

Linear Models Review

Linear Models Review Linear Models Review Vectors in IR n will be written as ordered n-tuples which are understood to be column vectors, or n 1 matrices. A vector variable will be indicted with bold face, and the prime sign

More information

Lecture 8. Principal Component Analysis. Luigi Freda. ALCOR Lab DIAG University of Rome La Sapienza. December 13, 2016

Lecture 8. Principal Component Analysis. Luigi Freda. ALCOR Lab DIAG University of Rome La Sapienza. December 13, 2016 Lecture 8 Principal Component Analysis Luigi Freda ALCOR Lab DIAG University of Rome La Sapienza December 13, 2016 Luigi Freda ( La Sapienza University) Lecture 8 December 13, 2016 1 / 31 Outline 1 Eigen

More information

CS 143 Linear Algebra Review

CS 143 Linear Algebra Review CS 143 Linear Algebra Review Stefan Roth September 29, 2003 Introductory Remarks This review does not aim at mathematical rigor very much, but instead at ease of understanding and conciseness. Please see

More information

Discriminant analysis and supervised classification

Discriminant analysis and supervised classification Discriminant analysis and supervised classification Angela Montanari 1 Linear discriminant analysis Linear discriminant analysis (LDA) also known as Fisher s linear discriminant analysis or as Canonical

More information

Lecture 3: Review of Linear Algebra

Lecture 3: Review of Linear Algebra ECE 83 Fall 2 Statistical Signal Processing instructor: R Nowak Lecture 3: Review of Linear Algebra Very often in this course we will represent signals as vectors and operators (eg, filters, transforms,

More information

Linear Algebra I. Ronald van Luijk, 2015

Linear Algebra I. Ronald van Luijk, 2015 Linear Algebra I Ronald van Luijk, 2015 With many parts from Linear Algebra I by Michael Stoll, 2007 Contents Dependencies among sections 3 Chapter 1. Euclidean space: lines and hyperplanes 5 1.1. Definition

More information

Lecture 3: Review of Linear Algebra

Lecture 3: Review of Linear Algebra ECE 83 Fall 2 Statistical Signal Processing instructor: R Nowak, scribe: R Nowak Lecture 3: Review of Linear Algebra Very often in this course we will represent signals as vectors and operators (eg, filters,

More information

Lecture 10: Dimension Reduction Techniques

Lecture 10: Dimension Reduction Techniques Lecture 10: Dimension Reduction Techniques Radu Balan Department of Mathematics, AMSC, CSCAMM and NWC University of Maryland, College Park, MD April 17, 2018 Input Data It is assumed that there is a set

More information

Lecture 13. Principal Component Analysis. Brett Bernstein. April 25, CDS at NYU. Brett Bernstein (CDS at NYU) Lecture 13 April 25, / 26

Lecture 13. Principal Component Analysis. Brett Bernstein. April 25, CDS at NYU. Brett Bernstein (CDS at NYU) Lecture 13 April 25, / 26 Principal Component Analysis Brett Bernstein CDS at NYU April 25, 2017 Brett Bernstein (CDS at NYU) Lecture 13 April 25, 2017 1 / 26 Initial Question Intro Question Question Let S R n n be symmetric. 1

More information

3.1. The probabilistic view of the principal component analysis.

3.1. The probabilistic view of the principal component analysis. 301 Chapter 3 Principal Components and Statistical Factor Models This chapter of introduces the principal component analysis (PCA), briefly reviews statistical factor models PCA is among the most popular

More information

MATH 583A REVIEW SESSION #1

MATH 583A REVIEW SESSION #1 MATH 583A REVIEW SESSION #1 BOJAN DURICKOVIC 1. Vector Spaces Very quick review of the basic linear algebra concepts (see any linear algebra textbook): (finite dimensional) vector space (or linear space),

More information

Unsupervised Machine Learning and Data Mining. DS 5230 / DS Fall Lecture 7. Jan-Willem van de Meent

Unsupervised Machine Learning and Data Mining. DS 5230 / DS Fall Lecture 7. Jan-Willem van de Meent Unsupervised Machine Learning and Data Mining DS 5230 / DS 4420 - Fall 2018 Lecture 7 Jan-Willem van de Meent DIMENSIONALITY REDUCTION Borrowing from: Percy Liang (Stanford) Dimensionality Reduction Goal:

More information

Lecture 2: Linear Algebra Review

Lecture 2: Linear Algebra Review EE 227A: Convex Optimization and Applications January 19 Lecture 2: Linear Algebra Review Lecturer: Mert Pilanci Reading assignment: Appendix C of BV. Sections 2-6 of the web textbook 1 2.1 Vectors 2.1.1

More information

Singular Value Decomposition. 1 Singular Value Decomposition and the Four Fundamental Subspaces

Singular Value Decomposition. 1 Singular Value Decomposition and the Four Fundamental Subspaces Singular Value Decomposition This handout is a review of some basic concepts in linear algebra For a detailed introduction, consult a linear algebra text Linear lgebra and its pplications by Gilbert Strang

More information

PRINCIPAL COMPONENTS ANALYSIS

PRINCIPAL COMPONENTS ANALYSIS 121 CHAPTER 11 PRINCIPAL COMPONENTS ANALYSIS We now have the tools necessary to discuss one of the most important concepts in mathematical statistics: Principal Components Analysis (PCA). PCA involves

More information

Biplots in Practice MICHAEL GREENACRE. Professor of Statistics at the Pompeu Fabra University. Chapter 6 Offprint

Biplots in Practice MICHAEL GREENACRE. Professor of Statistics at the Pompeu Fabra University. Chapter 6 Offprint Biplots in Practice MICHAEL GREENACRE Proessor o Statistics at the Pompeu Fabra University Chapter 6 Oprint Principal Component Analysis Biplots First published: September 010 ISBN: 978-84-93846-8-6 Supporting

More information

Vectors and Matrices Statistics with Vectors and Matrices

Vectors and Matrices Statistics with Vectors and Matrices Vectors and Matrices Statistics with Vectors and Matrices Lecture 3 September 7, 005 Analysis Lecture #3-9/7/005 Slide 1 of 55 Today s Lecture Vectors and Matrices (Supplement A - augmented with SAS proc

More information

Dimensionality Reduction and Principal Components

Dimensionality Reduction and Principal Components Dimensionality Reduction and Principal Components Nuno Vasconcelos (Ken Kreutz-Delgado) UCSD Motivation Recall, in Bayesian decision theory we have: World: States Y in {1,..., M} and observations of X

More information

Preprocessing & dimensionality reduction

Preprocessing & dimensionality reduction Introduction to Data Mining Preprocessing & dimensionality reduction CPSC/AMTH 445a/545a Guy Wolf guy.wolf@yale.edu Yale University Fall 2016 CPSC 445 (Guy Wolf) Dimensionality reduction Yale - Fall 2016

More information

A Tutorial on Data Reduction. Principal Component Analysis Theoretical Discussion. By Shireen Elhabian and Aly Farag

A Tutorial on Data Reduction. Principal Component Analysis Theoretical Discussion. By Shireen Elhabian and Aly Farag A Tutorial on Data Reduction Principal Component Analysis Theoretical Discussion By Shireen Elhabian and Aly Farag University of Louisville, CVIP Lab November 2008 PCA PCA is A backbone of modern data

More information

Multivariate Distributions

Multivariate Distributions IEOR E4602: Quantitative Risk Management Spring 2016 c 2016 by Martin Haugh Multivariate Distributions We will study multivariate distributions in these notes, focusing 1 in particular on multivariate

More information

Unconstrained Ordination

Unconstrained Ordination Unconstrained Ordination Sites Species A Species B Species C Species D Species E 1 0 (1) 5 (1) 1 (1) 10 (4) 10 (4) 2 2 (3) 8 (3) 4 (3) 12 (6) 20 (6) 3 8 (6) 20 (6) 10 (6) 1 (2) 3 (2) 4 4 (5) 11 (5) 8 (5)

More information

Lecture 5: Ecological distance metrics; Principal Coordinates Analysis. Univariate testing vs. community analysis

Lecture 5: Ecological distance metrics; Principal Coordinates Analysis. Univariate testing vs. community analysis Lecture 5: Ecological distance metrics; Principal Coordinates Analysis Univariate testing vs. community analysis Univariate testing deals with hypotheses concerning individual taxa Is this taxon differentially

More information

Lecture: Face Recognition and Feature Reduction

Lecture: Face Recognition and Feature Reduction Lecture: Face Recognition and Feature Reduction Juan Carlos Niebles and Ranjay Krishna Stanford Vision and Learning Lab Lecture 11-1 Recap - Curse of dimensionality Assume 5000 points uniformly distributed

More information

Methodological Concepts for Source Apportionment

Methodological Concepts for Source Apportionment Methodological Concepts for Source Apportionment Peter Filzmoser Institute of Statistics and Mathematical Methods in Economics Vienna University of Technology UBA Berlin, Germany November 18, 2016 in collaboration

More information

Principal component analysis (PCA) for clustering gene expression data

Principal component analysis (PCA) for clustering gene expression data Principal component analysis (PCA) for clustering gene expression data Ka Yee Yeung Walter L. Ruzzo Bioinformatics, v17 #9 (2001) pp 763-774 1 Outline of talk Background and motivation Design of our empirical

More information

NATIONAL BOARD FOR HIGHER MATHEMATICS. M. A. and M.Sc. Scholarship Test. September 24, Time Allowed: 150 Minutes Maximum Marks: 30

NATIONAL BOARD FOR HIGHER MATHEMATICS. M. A. and M.Sc. Scholarship Test. September 24, Time Allowed: 150 Minutes Maximum Marks: 30 NATIONAL BOARD FOR HIGHER MATHEMATICS M. A. and M.Sc. Scholarship Test September 24, 2011 Time Allowed: 150 Minutes Maximum Marks: 30 Please read, carefully, the instructions on the following page 1 INSTRUCTIONS

More information

PCA and admixture models

PCA and admixture models PCA and admixture models CM226: Machine Learning for Bioinformatics. Fall 2016 Sriram Sankararaman Acknowledgments: Fei Sha, Ameet Talwalkar, Alkes Price PCA and admixture models 1 / 57 Announcements HW1

More information

Structure in Data. A major objective in data analysis is to identify interesting features or structure in the data.

Structure in Data. A major objective in data analysis is to identify interesting features or structure in the data. Structure in Data A major objective in data analysis is to identify interesting features or structure in the data. The graphical methods are very useful in discovering structure. There are basically two

More information

DS-GA 1002 Lecture notes 0 Fall Linear Algebra. These notes provide a review of basic concepts in linear algebra.

DS-GA 1002 Lecture notes 0 Fall Linear Algebra. These notes provide a review of basic concepts in linear algebra. DS-GA 1002 Lecture notes 0 Fall 2016 Linear Algebra These notes provide a review of basic concepts in linear algebra. 1 Vector spaces You are no doubt familiar with vectors in R 2 or R 3, i.e. [ ] 1.1

More information

Linear Classification. CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington

Linear Classification. CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington Linear Classification CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington 1 Example of Linear Classification Red points: patterns belonging

More information

Discriminant analysis for compositional data and robust parameter estimation

Discriminant analysis for compositional data and robust parameter estimation Noname manuscript No. (will be inserted by the editor) Discriminant analysis for compositional data and robust parameter estimation Peter Filzmoser Karel Hron Matthias Templ Received: date / Accepted:

More information

Principal Components Analysis (PCA)

Principal Components Analysis (PCA) Principal Components Analysis (PCA) Principal Components Analysis (PCA) a technique for finding patterns in data of high dimension Outline:. Eigenvectors and eigenvalues. PCA: a) Getting the data b) Centering

More information

The following definition is fundamental.

The following definition is fundamental. 1. Some Basics from Linear Algebra With these notes, I will try and clarify certain topics that I only quickly mention in class. First and foremost, I will assume that you are familiar with many basic

More information

Statistical methods for the analysis of microbiome compositional data in HIV studies

Statistical methods for the analysis of microbiome compositional data in HIV studies 1/ 56 Statistical methods for the analysis of microbiome compositional data in HIV studies Javier Rivera Pinto November 30, 2018 Outline 1 Introduction 2 Compositional data and microbiome analysis 3 Kernel

More information

Principal component analysis

Principal component analysis Principal component analysis Angela Montanari 1 Introduction Principal component analysis (PCA) is one of the most popular multivariate statistical methods. It was first introduced by Pearson (1901) and

More information

Multivariate Gaussian Analysis

Multivariate Gaussian Analysis BS2 Statistical Inference, Lecture 7, Hilary Term 2009 February 13, 2009 Marginal and conditional distributions For a positive definite covariance matrix Σ, the multivariate Gaussian distribution has density

More information

Basic Concepts in Matrix Algebra

Basic Concepts in Matrix Algebra Basic Concepts in Matrix Algebra An column array of p elements is called a vector of dimension p and is written as x p 1 = x 1 x 2. x p. The transpose of the column vector x p 1 is row vector x = [x 1

More information

Applied Linear Algebra in Geoscience Using MATLAB

Applied Linear Algebra in Geoscience Using MATLAB Applied Linear Algebra in Geoscience Using MATLAB Contents Getting Started Creating Arrays Mathematical Operations with Arrays Using Script Files and Managing Data Two-Dimensional Plots Programming in

More information

GI07/COMPM012: Mathematical Programming and Research Methods (Part 2) 2. Least Squares and Principal Components Analysis. Massimiliano Pontil

GI07/COMPM012: Mathematical Programming and Research Methods (Part 2) 2. Least Squares and Principal Components Analysis. Massimiliano Pontil GI07/COMPM012: Mathematical Programming and Research Methods (Part 2) 2. Least Squares and Principal Components Analysis Massimiliano Pontil 1 Today s plan SVD and principal component analysis (PCA) Connection

More information

Linear Algebra. Preliminary Lecture Notes

Linear Algebra. Preliminary Lecture Notes Linear Algebra Preliminary Lecture Notes Adolfo J. Rumbos c Draft date May 9, 29 2 Contents 1 Motivation for the course 5 2 Euclidean n dimensional Space 7 2.1 Definition of n Dimensional Euclidean Space...........

More information

L3: Review of linear algebra and MATLAB

L3: Review of linear algebra and MATLAB L3: Review of linear algebra and MATLAB Vector and matrix notation Vectors Matrices Vector spaces Linear transformations Eigenvalues and eigenvectors MATLAB primer CSCE 666 Pattern Analysis Ricardo Gutierrez-Osuna

More information

UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013

UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013 UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013 Exam policy: This exam allows two one-page, two-sided cheat sheets; No other materials. Time: 2 hours. Be sure to write your name and

More information

Lecture: Face Recognition and Feature Reduction

Lecture: Face Recognition and Feature Reduction Lecture: Face Recognition and Feature Reduction Juan Carlos Niebles and Ranjay Krishna Stanford Vision and Learning Lab 1 Recap - Curse of dimensionality Assume 5000 points uniformly distributed in the

More information

PCA, Kernel PCA, ICA

PCA, Kernel PCA, ICA PCA, Kernel PCA, ICA Learning Representations. Dimensionality Reduction. Maria-Florina Balcan 04/08/2015 Big & High-Dimensional Data High-Dimensions = Lot of Features Document classification Features per

More information

Introduction to Machine Learning

Introduction to Machine Learning 10-701 Introduction to Machine Learning PCA Slides based on 18-661 Fall 2018 PCA Raw data can be Complex, High-dimensional To understand a phenomenon we measure various related quantities If we knew what

More information

LECTURE NOTE #11 PROF. ALAN YUILLE

LECTURE NOTE #11 PROF. ALAN YUILLE LECTURE NOTE #11 PROF. ALAN YUILLE 1. NonLinear Dimension Reduction Spectral Methods. The basic idea is to assume that the data lies on a manifold/surface in D-dimensional space, see figure (1) Perform

More information

4.2. ORTHOGONALITY 161

4.2. ORTHOGONALITY 161 4.2. ORTHOGONALITY 161 Definition 4.2.9 An affine space (E, E ) is a Euclidean affine space iff its underlying vector space E is a Euclidean vector space. Given any two points a, b E, we define the distance

More information

Principal Component Analysis

Principal Component Analysis Principal Component Analysis CS5240 Theoretical Foundations in Multimedia Leow Wee Kheng Department of Computer Science School of Computing National University of Singapore Leow Wee Kheng (NUS) Principal

More information

Principal Component Analysis

Principal Component Analysis Principal Component Analysis Yingyu Liang yliang@cs.wisc.edu Computer Sciences Department University of Wisconsin, Madison [based on slides from Nina Balcan] slide 1 Goals for the lecture you should understand

More information