Inverse regression approach to (robust) non-linear high-to-low dimensional mapping

Size: px

Start display at page:

Download "Inverse regression approach to (robust) non-linear high-to-low dimensional mapping"

Gordon Taylor
5 years ago
Views:

1 Inverse regression approach to (robust) non-linear high-to-low dimensional mapping Emeline Perthame Joint work with Florence Forbes INRIA, team MISTIS, Grenoble LMNO, Caen October 27, / 25

2 Outlines 1. Non linear mapping problem 2. GLLiM/SLLiM: inverse regression approach 3. Estimation of parameters 4. Results and conclusion 2 / 25

3 Outlines 1. Non linear mapping problem 2. GLLiM/SLLiM: inverse regression approach 3. Estimation of parameters 4. Results and conclusion 3 / 25

4 A non linear mapping problem A non linear mapping problem y = y 1... y D g(y) x 1. x L = x Prediction of X from Y through a non linear regression function g with Y R D, X R L, D L E(X Y = y) = g(y) 4 / 25

A non linear mapping problem Application: Ω mission on Mars launch of a spectrometer around Mars Problem: Retrieving physical properties from hyperspectral images Y: spectrum (D=184) X:

5 A non linear mapping problem Application: Ω mission on Mars launch of a spectrometer around Mars Problem: Retrieving physical properties from hyperspectral images Y: spectrum (D=184) X: composition of the ground (L=3) Reflectance Mars Express - Omega (2004) [ prop. of dust prop. of CO 2 ice prop. of water ice Wavelength 5 / 25

6 Some approaches Difficulty: D large curse of dimensionality Solutions: via dimensionality reduction Reduce dimension of y before regression: eg. PCA on y Risk: poor prediction of x Take x into account: PLS, SIR, Kernel SIR, PC based methods Two steps approaches not expressed as a single optimization problem Our approach: inverse regression to reduce dimension 6 / 25

7 Outlines 1. Non linear mapping problem 2. GLLiM/SLLiM: inverse regression approach 3. Estimation of parameters 4. Results and conclusion 7 / 25

8 Proposed Method: An inverse regression strategy x R L low-dimensional space, y R D high-dimensional space, (y, x) are realizations of (Y, X ) p(y, X ; θ), θ parameters Inverse conditional density: p(y X ; θ) Y is a noisy function of X Modeled via mixtures Tractable θ estimation Forward conditional density: p(x Y ; θ ), with θ = f (θ) High-to-low prediction, eg. ˆX = E[X Y = Y ; θ ] 8 / 25

Student Locally-linear Mapping (SLLiM) A piecewise affine model: Introduce a missing variable Z Z = k Y is the image of X by an affine transformation K Y = I(Z = k)(a k X + b k + E k ) k=1 Definition

9 Student Locally-linear Mapping (SLLiM) A piecewise affine model: Introduce a missing variable Z Z = k Y is the image of X by an affine transformation K Y = I(Z = k)(a k X + b k + E k ) k=1 Definition of SLLiM p(y X, Z = k; θ) = S(Y ; A k X + b k, Σ k, α y k, γy k ) Affine transformations are local: mixture of K Student laws p(x Z = k; θ) = S(X ; c k, Γ k, α k, 1) p(z = k; θ) = π k The set of all model parameters is: θ = {π k, c k, Γ k, A k, b k, Σ k, α k, k = 1... K } 9 / 25

10 Why a Student mixture? Dealing with outliers Generalized Student distribution for the joint density of (X, Y ) S M (y; µ, Σ, α, γ) = Γ(α + M /2) Σ 1/2 Γ(α) (2πγ) M /2 [1 + δ(y, µ, Σ)/(2γ)] (α+m /2), Gaussian scale mixture representation (using weight variable U distributed according to a Gamma distribution ) S M (y; µ, Σ, α, γ) = 0 N M (y; µ, Σ/u) G(u; α, γ) du Parameters estimation is tractable by an EM algorithm Density Gaussian Student α= x 10 / 25

11 Low-to-high (Inverse) Regression If X and Y are both observed The parameter vector, θ, can be estimated in closed-form using an EM inference procedure This yields the inverse conditional density which is a Student mixture: p(y X ; θ) = K k=1 π k S(X ; c k, Γ k, α k, 1) K j =1 πj S(X ; cj, Γj, αj, 1) S(Y ; A k X + b k, Σ k α y k, γy k ) Both densities are Student mixtures parameterized by θ. Therefore, to obtain: A low-to-high inverse regression function: E[Y X = x; θ] = K k=1 π k S(x; c k, Γ k, α k, 1) K j =1 πj S(x; cj, Γj, α k, 1) (A k x + b k ), 11 / 25

12 High-to-low (Forward) Regression The forward conditional density is a Student mixture as well: p(x Y ; θ ) = K k=1 π k S(Y ; c k, Γ k, α k, 1) K j =1 π j S(Y ; c j, Γ j, αj, 1) S(X ; A k Y + b k, Σ k, α x k, γ x k ) The forward parameter vector, θ has an analytic expression as a function of θ Both densities are Student mixtures parameterized by θ. Therefore, to obtain: A high-to-low forward regression function: E[X Y = y; θ] = K k=1 π k S(y; c k, Γ k, α k, 1) K j =1 πj S(y; c j, Γ j, αj, 1) (A k y + b k ). 12 / 25

13 The forward parameter vector θ from θ c k = A k c k + b k, Γ k = Σ k + A k Γ k A T k, A k = Σ k A T k Σ 1 k, bk = Σ k (Γ 1 k c k A T k Σ 1 k b k ), Σ k = (Γ 1 k + A T k Σ 1 k A k ) / 25

14 A joint model approach to reduce the number of parameters Joint model p(x = x, Y = y Z = k) = S L+D ([ x y ] ) ; m k, V k, α k, 1 with [ ] c k m k = A k c k + b k [ ] Γk Γ k A T k and V k = A k Γ k Σ k + A k Γ k A T k Reduce the number of parameters to estimate Forward strategy + Γ k diagonal nb. par. = 1 D(D 1) + DL + 2L + D 2 D = 500, L = parameters Inverse strategy + Σ k diagonal nb. par. 1 L(L 1) + DL + 2D + L 2 D = 500, L = parameters 14 / 25

15 Extension to partially observed responses Incorporate a latent component into the low-dimensional variable: [ ] T X = W where T R L t is observed and W R Lw is latent (L = L t + L w) Example on Mars data: lighting? temperature? grain size? Observed pairs {(y n, T n), n = 1... N } (T R L t ) Additional latent variable W (W R Lw ) Assuming the independence of T and W given Z : p(x = (T, W ) Z = k) = S L ((T, W ) ; c k, Γ k, α k, 1) [ ] [ ] c t with c k = k Γ t, Γ 0 k = k 0 0 I Lw 15 / 25

16 Extension to partially observed responses Extension of SLLiM to more general covariance structure With A k = [ ] A t k A w k, K Y = I(Z = k)(a t k T + A w k W + b k + E k ) k=1 rewrites K Y = I(Z = k)(a t k T + b k + E k ) k=1 with Var(E k ) Σ k + A w k Aw k Diagonal Σ k Factor analysis with L w factors (at most) A compromise between full O(D 2 ) and diagonal O(D) covariances 16 / 25

17 Outlines 1. Non linear mapping problem 2. GLLiM/SLLiM: inverse regression approach 3. Estimation of parameters 4. Results and conclusion 17 / 25

18 Estimation of θ = (c k, Γ k, A k, b k, Σ k, π k, α k ) 1 k K by EM algorithm E-step Update posterior probabilities (E Z ) p(z = k t, y, θ (i) ) SMM-like (E W ) p(w Z = k, t, y, θ (i) ) Probabilistic PCA or Factor Analysis like (E U ) E(U Z = k, t, y, θ (i) ) Down-weighting extreme/atypic values in estimators More robust M-step (M X ) (π k, c k, Γ k ) SMM-like (M Y X ) (A k, b k, Σ k ) Hybrid between linear regression and PPCA/FA [ ] 0 0 Ã k = Ỹ k X k T ( 0 S k w + X k X k T ) 1 (M α) α k Not in closed-form but standard (specific to Student) 18 / 25

19 Outlines 1. Non linear mapping problem 2. GLLiM/SLLiM: inverse regression approach 3. Estimation of parameters 4. Results and conclusion 19 / 25

20 Application L = D = 1 RATP Subway in Paris Measure of air quality at Châtelet station, line 4 March 2015 N = 341 measures Prediction of NO (L=1) from NO 2 (D=1) Robustness of SLLiM NO NO2 20 / 25

21 Application L = D = 1 / SLLiM compared to GLLiM NO GLLiM SLLiM NO GLLiM SLLiM NO NO2 Illustration of robustness of the proposed model 21 / 25

22 Application L = D = 1 / SLLiM compared to GLLiM NRMSE GLLiM SLLiM GLLiM-WO SLLiM-WO K SLLiM achieves better prediction rates than GLLiM on complete data SLLiM becomes equivalent to GLLiM when outliers are removed 22 / 25

23 Other applications and augmented version of SLLiM Application when D L Hyperspectral data on Mars (D=184, L=2, N=6983) Comparison with other non linear regression methods Table: Mars data: average NRMSE and standard deviations in parenthesis for proportions of CO 2 ice and dust over 100 runs. Method Prop. of CO 2 ice Prop. of dust SLLiM (K=10) (0.019) (0.020) GLLiM (K=10) (0.023) (0.023) MARS (0.016) (0.021) SIR (0.025) (0.016) RVM (0.021) (0.034) 23 / 25

24 Results - Application to hyperspectral image analysis GLLiM SLLiM Splines Proportion of CO2 ice Proportion of dust 24 / 25

25 Conclusion and future work Mixture model used for prediction Addition of latent variables of partially observed responses Selection of K and L w K fixed? Or selected by BIC? L w selected by BIC? Thank you for your attention! Any questions? 25 / 25

Inverse regression approach to robust non-linear high-to-low dimensional mapping

Inverse regression approach to robust non-linear high-to-low dimensional mapping Emeline Perthame, Florence Forbes, Antoine Deleforge To cite this version: Emeline Perthame, Florence Forbes, Antoine Deleforge.