ACTIVE SUBSPACES for dimension reduction in parameter studies

Size: px

Start display at page:

Download "ACTIVE SUBSPACES for dimension reduction in parameter studies"

Robyn Gardner
5 years ago
Views:

1 ACTIVE SUBSPACES for dimension reduction in parameter studies PAUL CONSTANTINE Ben L. Fryrear Assistant Professor Applied Mathematics & Statistics Colorado School of Mines In collaboration with: RACHEL WARD UT Austin ARMIN EFTEKHARI UT Austin This material is based upon work supported by the U.S. Department of Energy Office of Science, Office of Advanced Scientific Computing Research, Applied Mathematics program under Award Number DE-SC SLIDES: goo.gl/ck6gol DISCLAIMER: These slides are meant to complement the oral presentation. Use out of context at your own risk.

2 TAKE-HOMES An active subspace is a type of low-dimensional structure in a function of several variables. We have tools for identifying and exploiting active subspaces for parameter studies. Active subspaces appear in a wide range of physical models. Active subspaces are closely related to ridge approximation.

3 TAKE-HOMES An active subspace is a type of low-dimensional structure in a function of several variables. We have tools for identifying and exploiting active subspaces for parameter studies. Active subspaces appear in a wide range of physical models. Active subspaces are closely related to ridge approximation.

4 TAKE-HOMES An active subspace is a type of low-dimensional structure in a function of several variables. We have tools for identifying and exploiting active subspaces for parameter studies. Active subspaces appear in a wide range of physical models. Active subspaces are closely related to ridge approximation.

5 TAKE-HOMES An active subspace is a type of low-dimensional structure in a function of several variables. We have tools for identifying and exploiting active subspaces for parameter studies. Active subspaces appear in a wide range of physical models. Active subspaces are closely related to ridge approximation.

6 f(x)

7 f(x) Shape optimization in aerospace vehicles (with J. Alonso, T. Lukaczyk)

8 f(x) Shape optimization in aerospace vehicles (with J. Alonso, T. Lukaczyk) Sensitivity analysis in integrated hydrologic models (with R. Maxwell, J. Jefferson, J. Gilbert)

9 f(x) Shape optimization in aerospace vehicles (with J. Alonso, T. Lukaczyk) Uncertainty quantification for hypersonic scramjets (with G. Iaccarino, J. Larsson, M. Emory) Sensitivity analysis in integrated hydrologic models (with R. Maxwell, J. Jefferson, J. Gilbert)

10 f(x) Sensitivity analysis in HIV modeling (with T. Loudon, S. Pankavich)

11 f(x) Sensitivity analysis in HIV modeling (with T. Loudon, S. Pankavich) Calibration of an atmospheric reentry vehicle model (with P. Congedo, T. Magin)

12 f(x) Sensitivity analysis in HIV modeling (with T. Loudon, S. Pankavich) Sensitivity analysis in solar cell models (with B. Zaharatos, M. Campanelli) Calibration of an atmospheric reentry vehicle model (with P. Congedo, T. Magin)

Sensitivity analysis in Ebola transmission models

Pankavich) Sensitivity analysis in solar cell

13 Sensitivity analysis in Ebola transmission models (with P. Diaz, S. Pankavich) f(x) Sensitivity analysis in HIV modeling (with T. Loudon, S. Pankavich) Sensitivity analysis in solar cell models (with B. Zaharatos, M. Campanelli) Calibration of an atmospheric reentry vehicle model (with P. Congedo, T. Magin)

14 f(x) Lithium ion battery model (with A. Doostan)

15 f (x) Lithium ion battery model (with A. Doostan) Magnetohydrodynamics generator model (with A. Glaws, T. Wildey, J. Shadid)

16 f (x) Vehicle design (with C. Othmer, J. Alonso) Lithium ion battery model (with A. Doostan) Magnetohydrodynamics generator model (with A. Glaws, T. Wildey, J. Shadid)

19 Ridge approximation f(x) g(u T x) where U T : R m! R n g : R n! R

20 Ridge approximation What is U? f(x) g(u T x)

21 Ridge approximation What is U? f(x) g(u T x) What is g?

22 Ridge approximation What is the approximation error? What is U? f(x) g(u T x) What is g?

23 Ridge approximation What is the approximation error? f(x) g(u T x)

24 Ridge approximation What is the approximation error? f(x) g(u T x) Use the weighted root-mean-squared error: f(x) g(u T x) L 2 ( ) = Z (f(x) 1 g(u T x)) 2 2 (x) dx

25 Ridge approximation What is the approximation error? f(x) g(u T x) Use the weighted root-mean-squared error: f(x) g(u T x) L 2 ( ) = Z (f(x) 1 g(u T x)) 2 2 (x) dx Given weight function

26 Ridge approximation f(x) g(u T x) What is g?

27 Ridge approximation f(x) g(u T x) What is g? Use the conditional average: µ(y) = Z f(uy + V z) (z y) dz

28 Ridge approximation f(x) g(u T x) What is g? Use the conditional average: µ(y) = Z f(uy + V z) (z y) dz Subspace coordinates

29 Ridge approximation f(x) g(u T x) What is g? Use the conditional average: µ(y) = Z f(uy + V z) (z y) dz Subspace coordinates Complement subspace and coordinates

30 Ridge approximation f(x) g(u T x) What is g? Use the conditional average: µ(y) = Z Conditional density f(uy + V z) (z y) dz Subspace coordinates Complement subspace and coordinates

31 Ridge approximation f(x) g(u T x) What is g? Use the conditional average: µ(y) = Z Conditional density f(uy + V z) (z y) dz Subspace coordinates Complement subspace and coordinates µ(u T x) is the best approximation (Pinkus, 2015).

32 Ridge approximation What is U? f(x) g(u T x)

33 Ridge approximation What is U? f(x) g(u T x) Define the error function: R(U) = 1 2 Z (f(x) µ(u T x)) 2 (x) dx

34 Ridge approximation What is U? f(x) g(u T x) Define the error function: R(U) = 1 2 Z (f(x) µ(u T x)) 2 (x) dx Minimize the error: minimize U R(U) subject to U 2 G(n, m)

35 Ridge approximation What is U? f(x) g(u T x) Define the error function: R(U) = 1 2 Z (f(x) µ(u T x)) 2 (x) dx Minimize the error: minimize U R(U) subject to U 2 G(n, m) Grassmann manifold of n-dimensional subspaces

36 Define the active subspace Consider a function and its gradient vector, f = f(x), x 2 R m, rf(x) 2 R m, : R m! R +

37 Define the active subspace Consider a function and its gradient vector, f = f(x), x 2 R m, rf(x) 2 R m, : R m! R + The average outer product of the gradient and its eigendecomposition, Z C = rf(x) rf(x) T (x) dx = W W T

38 Define the active subspace Consider a function and its gradient vector, f = f(x), x 2 R m, rf(x) 2 R m, : R m! R + The average outer product of the gradient and its eigendecomposition, Z C = rf(x) rf(x) T (x) dx = W W T Partition the eigendecomposition, = apple 1 2, W = W 1 W 2, W 1 2 R m n

39 Define the active subspace Consider a function and its gradient vector, f = f(x), x 2 R m, rf(x) 2 R m, : R m! R + The average outer product of the gradient and its eigendecomposition, Z C = rf(x) rf(x) T (x) dx = W W T Partition the eigendecomposition, = apple 1 2, W = W 1 W 2, W 1 2 R m n Rotate and separate the coordinates, x = WW T x = W 1 W T 1 x + W 2 W T 2 x = W 1 y + W 2 z

40 Define the active subspace Consider a function and its gradient vector, f = f(x), x 2 R m, rf(x) 2 R m, : R m! R + The average outer product of the gradient and its eigendecomposition, Z C = rf(x) rf(x) T (x) dx = W W T Partition the eigendecomposition, = apple 1 2, W = W 1 W 2, W 1 2 R m n Rotate and separate the coordinates, active variables x = WW T x = W 1 W T 1 x + W 2 W T 2 x = W 1 y + W 2 z inactive variables

41 The eigenpairs identify perturbations that change the function more, on average. LEMMA i = Z w T i rf(x) 2 (x) dx, i =1,...,m

42 The eigenpairs identify perturbations that change the function more, on average. LEMMA i = Z w T i rf(x) 2 (x) dx, i =1,...,m LEMMA Z Z kr y f(x)k 2 2 (x) dx = n kr z f(x)k 2 2 (x) dx = n m

43 An approximation result f(x) µ(w T 1 x) L 2 ( ) apple C ( n m ) 1 2

44 An approximation result Conditional average f(x) µ(w T 1 x) L 2 ( ) apple C ( n m ) 1 2

45 An approximation result Conditional average f(x) µ(w T 1 x) L 2 ( ) apple C ( n m ) 1 2 Active subspace

46 An approximation result Conditional average Poincaré constant f(x) µ(w T 1 x) L 2 ( ) apple C ( n m ) 1 2 Active subspace

47 An approximation result Conditional average Poincaré constant f(x) µ(w T 1 x) L 2 ( ) apple C ( n m ) 1 2 Active subspace Eigenvalues associated with inactive subspace

48 The active subspace is nearly stationary. Recall: R(U) = 1 2 Z (f(x) µ(u T x)) 2 (x) dx Assume (1) Lipschitz continuous function (2) Gaussian density function

49 The active subspace is nearly stationary. Recall: R(U) = 1 2 Z (f(x) µ(u T x)) 2 (x) dx Assume (1) Lipschitz continuous function (2) Gaussian density function k rr(w 1 )k F apple L 2m (m n) 2 ( n m ) 1 2

50 The active subspace is nearly stationary. Recall: R(U) = 1 2 Z (f(x) µ(u T x)) 2 (x) dx Assume (1) Lipschitz continuous function (2) Gaussian density function Gradient on the Grassmann manifold k rr(w 1 )k F apple L 2m (m n) 2 ( n m ) 1 2

51 The active subspace is nearly stationary. Recall: R(U) = 1 2 Z (f(x) µ(u T x)) 2 (x) dx Assume (1) Lipschitz continuous function (2) Gaussian density function Gradient on the Grassmann manifold k rr(w 1 )k F apple L 2m (m n) 2 ( n m ) 1 2 Active subspace

52 The active subspace is nearly stationary. Recall: R(U) = 1 2 Z (f(x) µ(u T x)) 2 (x) dx Assume (1) Lipschitz continuous function (2) Gaussian density function Gradient on the Grassmann manifold k rr(w 1 )k F apple L 2m (m n) 2 ( n m ) 1 2 Active subspace Frobenius norm

53 The active subspace is nearly stationary. Recall: R(U) = 1 2 Z (f(x) µ(u T x)) 2 (x) dx Assume (1) Lipschitz continuous function (2) Gaussian density function Gradient on the Grassmann manifold k rr(w 1 )k F Lipschitz constant apple L Dimensions 2m (m n) 2 ( n m ) 1 2 Active subspace Frobenius norm

54 The active subspace is nearly stationary. Recall: R(U) = 1 2 Z (f(x) µ(u T x)) 2 (x) dx Assume (1) Lipschitz continuous function (2) Gaussian density function Gradient on the Grassmann manifold k rr(w 1 )k F Lipschitz constant apple L Dimensions 2m (m n) 2 ( n m ) 1 2 Active subspace Frobenius norm Eigenvalues associated with inactive subspace

55 IDEA Use active subspace as the starting point for numerical ridge approximation. Given an initial subspace U 0 and samples {x i,f(x i )} (1) Compute y i = U T 0 x i (2) Fit a polynomial p N (y, ) with the pairs = argmin X i {y i,f(x i )} f(x i ) p(y i, ) 2 (3) Minimize residual over subpsaces U = argmin U2G(n,m) X i 2 f(x i ) p(u T x i, ) (4) Set U 0 = U and repeat

56 IDEA Use active subspace as the starting point for numerical ridge approximation. Given an initial subspace U 0 and samples {x i,f(x i )} (1) Compute y i = U T 0 x i (2) Fit a polynomial p N (y, ) with the pairs = argmin X i {y i,f(x i )} f(x i ) p(y i, ) 2 (3) Minimize residual over subpsaces U = argmin U2G(n,m) X i 2 f(x i ) p(u T x i, ) (4) Set U 0 = U and repeat

57 IDEA Use active subspace as the starting point for numerical ridge approximation. Given an initial subspace U 0 and samples {x i,f(x i )} (1) Compute y i = U T 0 x i (2) Fit a polynomial p N (y, ) with the pairs = argmin X i {y i,f(x i )} f(x i ) p(y i, ) 2 (3) Minimize residual over subpsaces U = argmin U2G(n,m) X i 2 f(x i ) p(u T x i, ) (4) Set U 0 = U and repeat

58 IDEA Use active subspace as the starting point for numerical ridge approximation. Given an initial subspace U 0 and samples {x i,f(x i )} (1) Compute y i = U T 0 x i (2) Fit a polynomial p N (y, ) with the pairs = argmin X i {y i,f(x i )} f(x i ) p(y i, ) 2 (3) Minimize residual over subpsaces U = argmin U2G(n,m) X i 2 f(x i ) p(u T x i, ) (4) Set U 0 = U and repeat

59 IDEA Use active subspace as the starting point for numerical ridge approximation. Given an initial subspace U 0 and samples {x i,f(x i )} (1) Compute y i = U T 0 x i (2) Fit a polynomial p N (y, ) with the pairs = argmin X i {y i,f(x i )} f(x i ) p(y i, ) 2 (3) Minimize residual over subpsaces U = argmin U2G(n,m) X i 2 f(x i ) p(u T x i, ) (4) Set U 0 = U and repeat

60 An example where it doesn t work f(x 1,x 2 ) = 5x 1 + sin(10 x 2 ) C = apple

61 An example where it doesn t work f(x 1,x 2 ) = 5x 1 + sin(10 x 2 ) C = apple ERROR π/4 π/2 3π/4 π SUBSPACE ANGLE

62 An example where it doesn t work f(x 1,x 2 ) = 5x 1 + sin(10 x 2 ) C = apple ERROR Active subspace U = [0; 1] π/4 π/2 3π/4 π SUBSPACE ANGLE

63 An example where it doesn t work f(x 1,x 2 ) = 5x 1 + sin(10 x 2 ) C = apple ERROR Active subspace U = [0; 1] Inactive subspace U = [1; 0] π/4 π/2 3π/4 π SUBSPACE ANGLE

64 An example where it works DRAG COEFFICIENT as a function of 18 shape parameters Uniform on a hypercube SU2 CFD solver with adjoint solver for gradients

65 An example where it works DRAG COEFFICIENT as a function of 18 shape parameters Uniform on a hypercube SU2 CFD solver with adjoint solver for gradients Recall: C = Z rf(x) rf(x) T (x) dx = W W T

66 Residual as a function of alternating iteration for different starting subspaces RANDOM Increasing subspace dimension Residual Increasing polynomial degree (total) IDENTITY ACTIVE SUBSPACE Alternating iteration

67 Residual as a function of alternating iteration for different starting subspaces RANDOM Increasing subspace dimension Increasing polynomial degree (total) IDENTITY ACTIVE SUBSPACE

68 TAKE-HOMES An active subspace is a type of low-dimensional structure in a function of several variables. We have tools for identifying and exploiting active subspaces for parameter studies. Active subspaces appear in a wide range of physical models. Active subspaces are closely related to ridge approximation.

69 QUESTIONS? How do active subspaces relate to [insert method]? How do I compute active subspaces? What if I don t have gradients? What kinds of models does this work on? Active Subspaces SIAM (2015) PAUL CONSTANTINE Ben L. Fryrear Assistant Professor Colorado School of Mines

70 BACK UP SLIDES

71 Sufficient Dimension Reduction in regression Projection Pursuit Regression, Neural Nets, Ridge functions Assume: Assume: y = f(x)+ P(y x) =P(y A T x) minimize A, Z (f(x) g(a T x, )) 2 dx Given (y i, x i ), find A Fisher Information Theory Principal Components / Regression, Karhunen-Loéve Z r 2 log L(x, ) (x) dx Z xx T (x) dx

72 References and related work Sufficient dimension reduction Cook. Regression Graphics. (1998/2009) K.C. Li. Sliced inverse regression for dimension reduction. (1991) K.C. Li. On principal Hessian directions for data visualization and dimension reduction (1992) In approximation theory Fornassier, Schass, Vybiral. Learning functions of few arbitrary linear parameters in high dimensions. FOCM (2012) Mayer, Ullrich, Vybiral. Entropy and sampling numbers of classes of ridge functions. Constructive Approximation (2014) In UQ Tipireddy, Ghanem. Basis adaptation in homogeneous chaos spaces. JCP (2014) Stoyanov, Webster. A gradient-based sampling approach for dimension reduction of partial differential equations with stochastic coefficients. IJ4UQ (2015) Lei, Yang, Lei, Zheng, Lin, Baker. Constructing Surrogate Models of Complex Systems with Enhanced Sparsity: Quantifying the Influence of Conformational Uncertainty in Biomolecular Solvation. SIAM MMS (2015) In engineering applications Abdel-Khalik, Bang, Wang. Overview of hybrid subspace methods for uncertainty quantification, sensitivity analysis. Annals of Nuclear Energy (2013) Berguin, Rancourt, Mavris. A method for high-dimensional design space exploration of expensive functions with access to gradient information. AIAA Russi. Uncertainty Quantification with Experimental Data and Complex System Models, Ph.D. thesis (2010) In statistics/machine learning: principal pursuit regression, neural nets

73 Discover the active subspace with random sampling. Draw samples: x j Compute: f j = f(x j ) and rf j = rf(x j ) Approximate with Monte Carlo C Equivalent to SVD of samples of the gradient 1 N NX rf j rfj T = Ŵ ˆ Ŵ T j=1 1 T p rf1 rf N = Ŵ pˆ ˆV N Called an active subspace method in T. Russi s 2010 Ph.D. thesis, Uncertainty Quantification with Experimental Data in Complex System Models

74 How many gradient samples? Bound on gradient norm squared Dimension L 2 1 N = 2 k "2 log(m) =) k ˆk apple" k (with high probability) Relative accuracy Using Gittens and Tropp (2011)

75 How many gradient samples? Bound on gradient norm squared L 2 N = 1" 2 log(m) Dimension (with high probability) =) dist (W 1, Ŵ 1) apple 4 1 " n n+1 Relative accuracy Spectral gap Gittens and Tropp (2011), Golub and Van Loan (1996), Stewart (1973)

76 Let s be abundantly clear about the problem we are trying to solve. Low-rank approximation of the collection of gradients: q 1 T p rf1 rf N Ŵ 1 ˆ 1 ˆV 1 N Low-dimensional linear approximation of the gradient: rf(x) Ŵ 1 a(x) Approximate a function of many variables by a function of a few linear combinations of the variables: f(x) g T Ŵ 1 x

Off-axis anisotropy in multivariate functions

Off-axis anisotropy in multivariate functions PAUL CONSTANTINE Assistant Professor Department of Computer Science University of Colorado, Boulder Thanks to: Jeff Hokanson (CU Boulder) Andrew Glaws (CU