Pollution Sources Detection via Principal Component Analysis and Rotation

Size: px

Start display at page:

Download "Pollution Sources Detection via Principal Component Analysis and Rotation"

Ophelia Thomas
6 years ago
Views:

1 Pollution Sources Detection via Principal Component Analysis and Rotation Vanessa Kuentz 1 in collaboration with : Marie Chavent 1 Hervé Guégan 2 Brigitte Patouille 1 Jérôme Saracco 1,3 1 IMB, Université Bordeaux 1, Talence, France ( vanessa.kuentz@math.u-bordeaux1.fr) 2 ARCANE-CENBG, Gradignan, France 3 GREThA, Université Bordeaux 4, Pessac, France 1/22 Applied Stochastic Models and Data Analysis, Crete, 2007

2 Outline Factor Analysis 1 Factor Analysis Notations Recall about PCA Factor Analysis model Link between FA and PCA 2 Motivation Criterion Remark on rotation in PCA 3 A three steps process Statistical treatment Some confirmation of the results 2/22 Applied Stochastic Models and Data Analysis, Crete, 2007

3 Issue Factor Analysis Air pollution = small particles + liquid droplets High concentrations of particles = danger to human health, especially fine particles Development of air quality control strategies identification of pollution sources Receptor modeling, commonly used = Principal Component Analysis Part of scientific project initiated by French Ministry of Ecology 3/22 Applied Stochastic Models and Data Analysis, Crete, 2007

4 Notations Recall about PCA Factor Analysis model Link between FA and PCA X = (x j i ) n,p numerical data matrix where n objects are described on p < n variables x 1,..., x p X = ( x j i ) n,p standardized data matrix: x j s j ( x j,s j sample mean and standard deviation of x j ) R sample correlation matrix of x 1,..., x p : i = x j i x j R = X M X where M = 1 m I n (with m = n or n 1) We have : R = Z Z with Z = M 1/2 X 4/22 Applied Stochastic Models and Data Analysis, Crete, 2007

5 Notations Recall about PCA Factor Analysis model Link between FA and PCA Replace p variables by q p uncorrelated components The SVD of Z (of rank r p) is Z = UΛ 1/2 V with : - Λ : diagonal matrix (r, r) of nonnull eigenvalues λ 1,..., λ r of Z Z (in decreasing order) - U : orthonormal matrix (n, r) of eigenvectors of ZZ associated with first r eigenvalues - V : orthonormal matrix (p, r) of eigenvectors of Z Z associated with first r eigenvalues Decomposition of X: X = M 1/2 UΛ 1/2 V 5/22 Applied Stochastic Models and Data Analysis, Crete, 2007

6 Notations Recall about PCA Factor Analysis model Link between FA and PCA q = r : In PCA, equation X = M 1/2 UΛ 1/2 V is written : X = ΨV with Ψ = M 1/2 UΛ 1/2 We have Ψ = XV (VV = I p ) Columns of Ψ : principal components ψ k, k = 1,..., q q < r : In practice, user retains only first q < r eigenvalues of Λ Approximation of X : ˆ Xq = Ψ q V q (Ψ q, V q : matrices Ψ and V reduced to first q columns) 6/22 Applied Stochastic Models and Data Analysis, Crete, 2007

7 Notations Recall about PCA Factor Analysis model Link between FA and PCA FA : underlying common factors + dimension reduction Model is written : x = Af + e (p 1) (p q)(q 1) (p 1) with : - x = ( x 1,..., x p ) random vector of R p - A : loading (or pattern) matrix - f = (f 1,..., f q ) random vector of q common factors - e = (e 1,..., e p ) random vector of p unique factors x j : linear combination of common factors + unique factor 7/22 Applied Stochastic Models and Data Analysis, Crete, 2007

8 Notations Recall about PCA Factor Analysis model Link between FA and PCA FA model : various estimation methods - Principal factor method - Maximum likelihood method - Principal component method -... Commonly used method : PCA - Easy implementation - Reasonable results - Default method in statistical softwares 8/22 Applied Stochastic Models and Data Analysis, Crete, 2007

9 Notations Recall about PCA Factor Analysis model Link between FA and PCA q = r : In FA (with PCA estimation), equation X = M 1/2 UΛ 1/2 V is written : X = FA with : -F = M 1/2 U : factor scores matrix -A = V Λ 1/2 : loading (or pattern) matrix We have F = XVΛ 1/2 (V V = I q ) Columns of F : common factors f k, k = 1,..., q q < r : In practice, user retains only first q < r eigenvalues of Λ Approximation of X : ˆ X q = F q A q (F q, A q : matrices F and A reduced to first q columns) 9/22 Applied Stochastic Models and Data Analysis, Crete, 2007

10 Notations Recall about PCA Factor Analysis model Link between FA and PCA We have : PCA : Ψ = XV } F = ΨΛ 1/2 1/2 FA : F = XVΛ factors f k correspond to standardized principal components : f k = ψk λk, k = 1,..., q 10/22 Applied Stochastic Models and Data Analysis, Crete, 2007

11 Motivation Criterion Remark on rotation in PCA Loading matrix A q : a k j = corr(x j, f k ) identification of groups of correlated elements Problem : Intermediate values difficult association variables/factors Objective : - Column of A q : values close to 0 or 1 - Row of A q : only one value close to 1 Solution : - Non-uniqueness of FA solution : rotation of factors - Orthogonal transformation matrix T (TT = T T = I q ) - Ăq = A q T : correlation of variables to rotated factors f k - How to determine T? 11/22 Applied Stochastic Models and Data Analysis, Crete, 2007

12 Motivation Criterion Remark on rotation in PCA Several criteria : most used is varimax Maximizes empirical variance of squared elements of each column of Ăq = (ă α j ) p,q : 2 p p q (ăj α ) 4 (ăj α ) 2 j=1 j=1 p p α=1 with ă α j = a j t α and u.cs. t α (t α ) = 1, t l (t k ) = 0, k l. 12/22 Applied Stochastic Models and Data Analysis, Crete, 2007

13 Motivation Criterion Remark on rotation in PCA Possible rotation in PCA as in FA Must apply T to the good matrices keep property of non correlation of components One must not write : but : X = X = Ψ q {}}{ M 1/2 UΛ 1/2 TT V F q {}}{ M 1/2 UTT Λ 1/2 V rotation of standardized principal components = factors of FA 13/22 Applied Stochastic Models and Data Analysis, Crete, 2007

14 A three steps process Statistical treatment Some confirmation of the results Part of scientific project of French Ministry of Ecology Improve air quality : identification + quantification of pollution sources Three steps process : - Collecting of fine particles in French urban site (Anglet) : n = 61 samples (Dec. 2005) - Measurements of p = 16 chemical species : PIXE method (Particle Induced X-ray Emission) - Statistical treatment of X = (x j i ) n,p : concentration of jth chemical compound in ith sample 14/22 Applied Stochastic Models and Data Analysis, Crete, 2007

15 A three steps process Statistical treatment Some confirmation of the results FA model with PCA estimation Choice of number of factors : q = 5 Orthogonal rotation of factors with varimax criterion Detection of groups of correlated chemical compounds Identification of air pollution sources 15/22 Applied Stochastic Models and Data Analysis, Crete, 2007

16 A three steps process Statistical treatment Some confirmation of the results f 1 f 2 f 3 f 4 f 5 Al2O SiO P SO Cl K Ca Mn Fe2O Cu Zn Br Pb C-Org Table: Loading matrix before rotation A 5 Lots of intermediate values difficult association variables/factors 16/22 Applied Stochastic Models and Data Analysis, Crete, 2007

17 A three steps process Statistical treatment Some confirmation of the results f 1 f 2 f 3 f 4 f 5 Al2O SiO P SO Cl K Ca Mn Fe2O Cu Zn Br Pb C-Org Factor Possible source Factor1 Soil dust Factor2 Combustion Factor3 Industry Factor4 Vehicle Factor5 Sea Table: Pollution sources Table: Rotated loading matrix Ă5 Obvious usefulness of rotation : clear associations ex: Zn + Pb highly correlated to f 3 industrial pollution source 17/22 Applied Stochastic Models and Data Analysis, Crete, 2007

18 A three steps process Statistical treatment Some confirmation of the results Rotated factors : columns of F 5 = F 5 T = ( f i k ) n,5 Score f i k = relative contribution of kth source to ith sample Confrontation of rotated factors with external parameters : - Meteorological data (wind, temperature,...) - Periodicity day/night - Other chemical coumpounds 18/22 Applied Stochastic Models and Data Analysis, Crete, 2007

19 A three steps process Statistical treatment Some confirmation of the results Figure: Evolution of vehicle pollution source Stronger contribution during day than night confirmation of cars pollution source 19/22 Applied Stochastic Models and Data Analysis, Crete, 2007

period : } Increase in contribution confirmation of heating pollution

20 A three steps process Statistical treatment Some confirmation of the results Figure: Evolution of heating pollution and temperatures Middle of period : } Increase in contribution confirmation of heating pollution Decrease in temperature 20/22 Applied Stochastic Models and Data Analysis, Crete, 2007

A three steps process Statistical treatment Some confirmation of the results Figure: Map of sampling site and correlations wind/sea pollution }

21 A three steps process Statistical treatment Some confirmation of the results Figure: Map of sampling site and correlations wind/sea pollution } Strong correlation to N-W wind validation of sea pollution Location toward Atlantic Ocean 21/22 Applied Stochastic Models and Data Analysis, Crete, 2007

22 Conclusion A three steps process Statistical treatment Some confirmation of the results FA (with PCA estimation) followed by rotation : identification of five sources of particulate emission No prior knowledge : number, composition? more complex sampling site Sources quantification : percentage of total fine dust mass of each source (JDS, Angers, 2007) 22/22 Applied Stochastic Models and Data Analysis, Crete, 2007

Pollution sources detection via principal component analysis and rotation

Pollution sources detection via principal component analysis and rotation Marie Chavent, Hervé Guegan, Vanessa Kuentz, Brigitte Patouille, Jérôme Saracco To cite this version: Marie Chavent, Hervé Guegan,