New results for the functional ANOVA and. Sobol indices

Size: px

Start display at page:

Download "New results for the functional ANOVA and. Sobol indices"

Camron Dean
5 years ago
Views:

1 Generalized Sobol Indices 1 New results for the functional ANOVA and Sobol indices Art B. Owen Stanford University

2 Generalized Sobol Indices 2 Ilya Meerovich Sobol At MCM 2001, Salzburg Known for Sobol sequences and Sobol indices Every time I read one of his papers, I wish I d read it earlier

3 Generalized Sobol Indices 3 Simplified Saint-Venant flood model Overflow in meters (Lamboni, Iooss, Popelin, Gamboa, 2012) at a dyke S = Z v + H H d C b, ( Q H = BK s (Zm Z v )/L where ) 3/5 (max annual river height) Q Maximal annual flow m 3 /s Gumbel(1013, 558) [500, 3000] K s Strickler coefficient m 1/3 /s N (30, 8) [15, ) Z v River downstream level m Triangle(49, 50, 51) Z m River upstream level m Triangle(54, 55, 56) H d Dyke height m U[7, 9] C b Bank level m Triangle(55, 55.5, 56) L Length of river stretch m Triangle(4990, 5000, 5010) B River width m Triangle(295, 300, 305) Reduced from a Navier-Stokes model

4 Generalized Sobol Indices 4 The cost model C p = 1 S>0m (flood cost) + 1 S 0m ( (1 e 1000m4 /S 4 )) (dyke maintenance) min(h d m 1, 8) (investment cost, from construction) in millions of Euros A discontinuity at S = 0 would be better. Quibble

5 Generalized Sobol Indices 5 Variable importance Which of these variables is most important? How important is any given subset of variables taken together? How should one define/measure variable importance? Similar problems come up in black box models for aerospace, semiconductor manufacturing, climate modeling, etc.

6 Generalized Sobol Indices 6 Why do they care? The ultimate goal may be to optimize something (a mean or variance or max or min) by choosing levels for those variables under your control. Understanding a function is an intermediate goal, not an ultimate goal. Yet variable importance is widely applied. Similar question Why compute R 2? Sobol indices yield many analogues of R 2. Quantitative measures of importance aid discussions.

7 Generalized Sobol Indices 7 Quasi-Monte Carlo QMC sometimes gives accurate estimates for high dimensional integrals. Despite the curse of dimensionality. When it happens, we usually find that the integrand was dominated by low order interactions among the variables. That understanding motivates trying to lower the effective dimension of our integrands. (Analogous to variance reduction.) which never actually said that all high dim problems are insoluble.

8 Generalized Sobol Indices 8 ANOVA Fisher (1923) to Hoeffding (1948) to Sobol (1967) to Efron & Stein (1981) then back to Sobol (1990/3) Recall X ij = X + ( X i X ) + ( X j X ) + (X ij X i X j + X ) Take a grand average X Subtract it out and average over j, getting X i X Similarly get X j X Generally: subtract away all lower order effects and average out the extra variables.

9 Generalized Sobol Indices 9 Extensions of ANOVA From functions on {1, 2,..., I} {1, 2,..., J} to {1, 2,..., I} {1, 2,..., J} {1, 2,..., Z} To functions in L 2 [0, 1] d. To L 2 functions of d arbitrary independent inputs. Also d = works via martingales.

10 Generalized Sobol Indices 10 ANOVA for L 2 [0, 1] d Goes back to Hoeffding (1948) for U -statistics (skillful reading may be required) & Efron & Stein (1981) for jackknife f(x) = f () () + = f () () + d f (j) (x j ) + f (j,k) (x j, x k ) + + f (1,2,...,d) (x 1,..., x d ) j=1 j<k d f (j1,j 2,...,j d )(x j1, x j2,..., x jd ) r=1 1 j 1 <j 2 < <j r d More simply f(x) = u f u (x) Sum over all u D = {1, 2,..., d}

11 Generalized Sobol Indices 11 Notation For u D {1,..., d} u = card(u) u = u c = {1, 2,..., d} u v u strict subset i.e. If u = {j 1, j 2,..., j u } then x u = (x j1,..., x j u ) and dx u = j u dx j

12 Generalized Sobol Indices 12 Recursive definition For u {1,..., d}, f u (x) only depends on x j for j u. Overall mean µ f (x) = f(x) dx (f(x) Main effect j f {j} (x) = f (x) ) dx {j} (f(x) Interaction u f u (x) = f v (x) ) dx u = Dependence v u f(x) dx u v u f v (x) f u (x) is a function of x that happens to only depend on x u f u (x) + f v (x) makes sense because they re on the same domain

13 Generalized Sobol Indices 13 ANOVA properties j u = u v = & 1 f u (x) dx j = 0 0 f u (x)f v (x) dx = 0 f u (x)g v (x) dx = 0 Variances Var(f) (f(x) µ) 2 dx = u {1,...,d} σu 2 = σu(f) 2 fu (x) 2 dx u = 0 u =. σ 2 u

14 Generalized Sobol Indices 14 Sobol s decomposition Let φ 0, φ 1, φ 2... be a complete orthonormal basis of L 2 [0, 1] with φ 0 (x) 1. Sobol (1969) expanded f(x) in a tensor product basis (Haar wavelets). He grouped the terms into 2 d subsets depending on which inputs are active. Sobol has a synthesis not an analysis for this decomposition. Thanks to A. Chouldechova for translation.

15 Generalized Sobol Indices 15 How important is x u? Variable importance Larger σ 2 u means that f u(x) contributes more. We also want to count σ 2 v for v u. Sobol s importance measures τ 2 u = v u σ 2 v v contained in u τ 2 u = v u σ 2 v v touches u, so interactions count Identity: τ 2 u = σ 2 τ 2 u Normalized versions: τ 2 u σ 2 and τ 2 u σ 2

16 Generalized Sobol Indices 16 More derived importance measures Superset importance Υ 2 u = v u σ 2 v Liu & O (2006) Small Υ 2 u means deleting f u and f v for v u (to stay hierarchical) makes little difference. Relevant to Hooker (2004) s simplifications of black box functions. Mean dimension u D σ 2 u σ 2 u Measures dimensionality of f. Liu & O (2006) Higher dimensionality makes for harder numerical handling. Many quadrature problems have mean dimension near 1

17 Generalized Sobol Indices 17 Estimation of τ 2 u and τ 2 u Naive approach for τ 2 u : 1) Sample x i [0, 1] d and get y i = f(x i ) for i = 1,..., n. 2) Statistical machine learning estimate ˆf v (x) for all necessary v. 3) Put ˆσ v 2 = ˆfu (x) 2 dx, u. 4) Sum: τ 2 u = v u ˆσ2 v. This is expensive and has many biases. Sobol has a much better way.

18 Generalized Sobol Indices 18 Evaluate f at two points: repeat some components independent draws for the others. Fixing methods Hybrid points For x, z [0, 1] d, y = x u :z u means x j, j u y j = z j, j u. We glue together part of x and part of z to form y = x u :z u. Sobol (1990/3) used the identities: τ 2 u = Cov(f(x), f(x u :z u )) τ 2 u = 1 2 E((f(x) f(x u:z u )) 2

19 Generalized Sobol Indices 19 Identity for τ 2 u f(x)f(x u :z u ) dx dz = f v (x)f v (x u :z u ) dx dz v D (orthogonality) = v u f v (x)f v (x u :z u ) dx dz (line integrals) = µ 2 + v u σ 2 u µ 2 + τ 2 u. τ 2 u = 1 n i=1 Bias adjustment n ( 1 f(x i )f(x i,u :z i, u ) n n i=1 ) 2 f(x i )

20 Generalized Sobol Indices 20 τ 2 u = Even better f(x) ( f(x u :z u ) f(z) ) dx dz τ 2 u = 1 n n f(x i ) ( f(x i,u :z i, u ) f(z i ) ) i=1 This avoids subtracting ˆµ 2. It is unbiased: E ( τ 2 u ) = τ 2 u Kucherenko, Feil, Shah, Mauntz (2011), Mauntz (2002), Saltelli (2002) τ 2 u = 1 n Improved statistical efficiency n f(x i )f(x i,u :z i, u ) i=1 Janon, Klein, Lagnoux, Nodet & Prieur (2012) ( 1 n n i=1 f(x i ) + f(x i,u :z i, u ) 2 Efficient in a class of estimators that does not include the unbiased one above. ) 2

21 Generalized Sobol Indices 21 For τ 2 u 1 2 (f(x) f(x u:z u ) ) 2 dx dz = 1 2 (σ 2 + µ 2 2 ( τ 2 u + µ2) + σ 2 + µ 2) = σ 2 τ 2 u = τ 2 u. Sobol s estimates are like tomography: integrals reveal internal structure.

22 Generalized Sobol Indices 22 τ 2 {j} for the flood model τ 2 /σ 2 Q K s Z v Z m H d C b L B Height H Overflow S Cost C p From n = 100,000 runs Q Maximal annual flow m 3 /s Gumbel(1013, 558) [500, 3000] K s Strickler coefficient m 1/3 /s N (30, 8) [15, ) Z v River downstream level m Triangle(49, 50, 51) Z m River upstream level m Triangle(54, 55, 56) H d Dyke height m U[7, 9] C b Bank level m Triangle(55, 55.5, 56) L Length of river stretch m Triangle(4990, 5000, 5010) B River width m Triangle(295, 300, 305)

23 Generalized Sobol Indices 23 For mean dimension d τ 2 j = j=1 d j=1 v {j} = d σ 2 v = v σ 2 v j=1 1 v {j} = = v v σ 2 v Estimator from Liu & O (2006) Generalizes to v k σv 2 for k 1. v

24 Generalized Sobol Indices 24 Example Kuo, Schwab, Sloan (2012) consider quadrature for f(x) = d j=1 xα j /j!, 0 < α 1. For α = 1 and d = 500 R = 50 replicated estimates of v v σ2 v/σ 2 using n = 10,000 had mean and standard deviation Upshot f(x) is nearly additive, though it is hard to quantify near perfect additivity. (The difficulty seems to be in forming the ratio.)

25 Generalized Sobol Indices 25 For superset importance Υ 2 u v u σ 2 v = 1 2 u ( 2 ( 1) u v f(x v :z v )) dx dz v u Mean of a square of differences better than differences of means of squares. From Liu & O (2006) Generalizes τ 2 u formula from 2 terms to 2 u terms. As a design Use n repeats of a 2 u 1 d u factorial randomly embedded in the unit cube. Does best in comparisons by Fruth, Roustant, Kuhnt (2012)

26 Generalized Sobol Indices 26 Generalized Sobol indices What can be attained via fixing methods? Θ uv = f(x u :z u )f(x v :z v ) dx dz Generalized Sobol index Ω uv Θ uv = tr(ω T Θ) u D v D Θ R 2d 2 d is the Sobol matrix. Ω R 2d 2 d has coefficients. Redundant (but useful) We have a 2 2d dimensional space of estimators for a 2 d dimensional space of estimands: δ µ 2 + δ u σu 2 u >0

27 Generalized Sobol Indices 27 NXOR XOR(u, v) = u v u v NXOR(u, v) = XOR(u, v) c = (u v) (u c v c ) (exclusive OR) (not exclusive OR) Θ uv f(x u :z u )f(x v :z v ) dx dz = µ 2 + τ 2 NXOR(u,v) Θ uv = 1 n f(x i,u :z i, u )f(x i,v :z i, v ) n i=1 Use tr(ω T Θ) often written XNOR

28 Generalized Sobol Indices 28 Special GSIs 1) Mean squares Ω = λλ T ( 2 λ u f(x u :z u )) dx dz Nonnegative u 2) Bilinear (rank one) Ω = λγ T ( )( λ u f(x u :z u ) u v ) γ v f(x v :z v ) dx dz Fast 3) Simple ( u ) λ u f(x u :z u ) f(z) dx dz Only uses one row/col of Θ 4) Contrast Ω u,v = 0 Free of µ 2 u v N.B.: Here a contrast can also be a sum of squares.

29 Generalized Sobol Indices 29 Cost of a GSI C(Ω) counts the # of function evaluations per (x, z) pair. Recipe 1) Count the rows u that are needed for some f(x u :z u ) 2) add the columns (where u appears as the needed v ) 3) subtract any doubly counted subsets We can have tr(ω T 1 Θ) = tr(ω T 2 Θ) but C(Ω 1 ) < C(Ω 2 ).

30 Generalized Sobol Indices 30 Squares For a square (or a sum of squares) tr(ω T Θ) 0. Also E ( tr(ω T Θ) ) = tr(ω T Θ) Therefore tr(ω T Θ) = 0 implies Pr ( tr(ω T Θ) = 0 ) = 1. GSIs with sum of squares estimators τ 2 u and Υ 2 u and No sum of squares exists for τ 2 u Can show that the coefficient of σ 2 D = u λ2 u generally u Ω uu i.e., tr(ω) Same thing happens in ANOVA tables: u σu 2 u when u < d every variance component has a contribution from the measurement error.

31 Generalized Sobol Indices 31 Targeting one variance component σ 2 {1,2,3} = τ 2 {1,2,3} τ 2 {1,2} τ 2 {1,3} τ 2 {2,3} + τ 2 {1} + τ 2 {2} + τ 2 {3} A simple (contrast) GSI f(x) λ u f(x u :z u ), u {1,2,3} λ u = ( 1) 3 u [ λ u ] Cost is = 9 function evaluations.

32 Cost is 6. (no duplication) Generalized Sobol Indices 32 A bilinear GSI For u, v w {1, 2, 3} NXOR(u,v+w c ) Ω = λγ T λ γ Coefficients of τ 2 u are in [(123) (13) (12) + (1)] [(23) (3) (2) + ]

33 Generalized Sobol Indices 33 Simple vs. bilinear for d = 5 f(x) ( f(x 1, x 2, x 3, z 4, z 5 ) f(x 1, x 2, z 3, z 4, z 5 ) f(x 1, z 2, x 3, z 4, z 5 ) f(z 1, x 2, x 3, z 4, z 5 ) + f(x 1, z 2, z 3, z 4, z 5 ) + f(z 1, x 2, z 3, z 4, z 5 ) + f(z 1, z 2, x 3, z 4, z 5 ) f(z) ) Versus ( f(z) f(x1, z 2, z 3, z 4, z 5 ) ) ( f(z1, z 2, z 3, x 4, x 5 ) f(z 1, x 2, z 3, x 4, x 5 ) f(z 1, z 2, x 3, x 4, x 5 ) + f(z 1, x 2, x 3, x 4, x 5 ) ) N.B. The bilinear version is invariant under f f + c

34 Generalized Sobol Indices 34 More generally Simple estimator at cost 2 w + 1 w <d σ 2 w = u w( 1) w v Θ u,d Bilinear for w 1 w and w 2 = w w 1 σ 2 w = u 1 w 1 u 2 w 2 ( 1) u1 + u2 Θ u1,u 2 +w c Bilinear cost is 2 w w 2 2 w /2+1. C bilinear 2 C simple

35 Generalized Sobol Indices 35 Superset importance Let w be a nonempty subset of D for d 1. Let f L 2 [0, 1] d. Choose w 1 w and put w 2 = w w 1. Then Υ 2 w = u 1 w 1 compare σ 2 w = u 1 w 1 ( 1) u1 + u2 Θ wc +u 1,w c +u 2 u 2 w 2 ( 1) u1 + u2 Θ u1,w c +u 2. u 2 w 2 Lower cost than a square estimator but probably much higher variance.

36 Generalized Sobol Indices 36 Bilinear, with O(d) evaluations Suppose λ u = 0 for u {0, 1, d 1, d}. Same for γ v = 0. Then the rule λ u γ v u v f(x u :z u )f(x v :z v ) dx dz takes O(d) computation not O(d 2 ).

37 Generalized Sobol Indices 37 O(d) pairs, with k j For j k, let j represent {j} and j represent {j} etc. All the XORs XOR j k j k D j k j k D j j {j, k} D {j, k} j j j D {j, k} {j, k} j D D j k j k

38 Generalized Sobol Indices 38 All the NXORs NXOR j k j k D D j k j k j j D {j, k} {j, k} j j j {j, k} D {j, k} j D j k j k D For u and v in {0, 1, d 1, d}. We can estimate the corresponding τ 2 NXOR(u,v) with O(d) cost per (x, z) pair. Saltelli (2002) already noticed this (or at least most of it).

39 Generalized Sobol Indices 39 What we can get After some algebra we can get unbiased estimates of u σu 2 u u 2 σu 2 u σu 2 u =1 σu 2 u =2 at cost 2d + 2. (Some parts can be gotten at C = d + 1)

40 Generalized Sobol Indices 40 Initial and final segments Suppose that x 1, x 2 x d are used in that order. E.g. time steps in a Markov chain First j variables {1, 2,..., j}, 1 j d (0, j], j = 0 Last d j variables {j + 1,..., d}, 0 j d 1 (j, d] j = d There are 2d + 1 of these subsets.

41 Generalized Sobol Indices 41 Enumeration NXOR (0,j] (0,k] (j,d] (k,d] D D (j, d] (k, d] (0, j] (0, k] (0,j] (j, d] D (j, k] (j, k] (0, j] (j,d] (0, j] (j, k] D (j, k] (j, d] D (0, j] (0, k] (j, d] (k, d] D WLOG j < k.

42 Generalized Sobol Indices 42 Effect of recent variables First and last elements of u : u = min{j j u} u = max{j j u} Recency weighted variance components d 1 ( ) ΘD,(j,d] Θ D, = ( u 1)σu, 2 j=1 u D d 1 ( ) ΘD,(0,j] Θ D, = (d u )σu. 2 j=1 u D and, Another measure of how fast f() forgets its initial conditions. Weighting by u (d u + 1) also possible.

43 Generalized Sobol Indices 43 Test functions g j = 0 f(x) = d (µ j + τ j g j (x j )) j=1 g 2 j = 1 and gj 4 <. σ 2 u = j u τ 2 j j u µ 2 j g(x) = 12(x 1/2) Min function f(x) = min 1 j d x j τ 2 u = u (d + 1) 2 (2d u + 2) Liu and O. (2006)

44 Generalized Sobol Indices 44 σ 2 {1,2,3} Product function = numerically same estimate for simple or bilinear. Therefore bilinear is better because of lower cost. For min(x) and d = 6 the bilinear estimator was about 5 times as efficient as the simple one based on n = 1,000,000 (x, z) pairs. Υ 2 {1,2,3,4} Product function with d = 8 and µ j = 1 and τ = (4, 4, 3, 3, 2, 2, 1, 1)/4. Square beats bilinear: Measure Value R 2 Square s efficiency Υ 2 {1,2,3,4} Υ 2 {5,6,7,8} Hard to beat a sum of squares when the true effect is small.

45 Generalized Sobol Indices 45 Lower index τ 2 u No sum of squares is available. 1 n 1 n Contrast n f(x i )(f(x i,u :z i, u ) f(z i )) i=1 Simple estimator (bias adjusted) n f(x i )f(x i,u :z i, u ) i=1 ( 1 2n n f(x i ) + f(x i,u :z i, u ) i=1 ) 2 The contrast has an advantage on small τ 2 u. The simple estimator sometimes beats it on large ones.

46 Generalized Sobol Indices 46 GSIs so far Just use 2 inputs, x and z What about 3? x, y, z

47 Generalized Sobol Indices 47 Here it pays to use 3 vectors x, y, x [0, 1] d For small τ 2 u 1 n 1 n 1 n 1 n n f(x i ) ( f(x i,u :y i, u ) f(y) ) i=1 (Mauntz-Saltelli) n ( f(xi ) µ )( f(x i,u :y i, u ) f(y) ) (Oracle centered) i=1 n ( f(xi ) µ )( f(x i,u :y i, u ) µ ) (Double oracle) i=1 n ( f(xi ) f(z i,u :x i, u ) )( f(x i,u :y i, u ) f(y) ) (Use 3 vectors) ( ) i=1 where (x i, y i, z i ) iid U[0, 1] 3d for i = 1,..., n. Simulations: On small effects the new estimator beats both oracles. Double oracle wins on large effects.

48 Generalized Sobol Indices 48 Conclusions Sums of squares are very good. Bilinear estimators λ T Θγ work well, especially when 1 T γ = 1 T λ = 0. Further work 1) Pursue variance inequalities 2) Replace plain MC by Quasi-Monte Carlo and/or 3) Find nice confidence intervals for ratios of means over U -statistics 4) Variance reductions

49 Generalized Sobol Indices 49 Thanks Bertrand Iooss for answering questions about Saint-Venant Sergei Kucherenko for extensive discussions National Science Foundation of the U.S. DMS

50 Generalized Sobol Indices 50 Let η 2 = u δ uσ 2 u. Optimal estimates We would like E (ˆη 2 ) = η 2 and, Var (ˆη 2) cost = minimum. Using variance components theory Unfortunately Var(ˆη 2 ) depends on 4 th moments Fortunately There is a theory of MINimum Quadratic Norm UNbiased Estimates (MINQUE) Unfortunately They do not appear to be available for crossed random effects Fortunately The computed case gives us more options, e.g., quadrature. C. R. Rao (1970s)

VARIANCE COMPONENTS AND GENERALIZED SOBOL INDICES. Copright on this work belongs to SIAM.

VARIANCE COMPONENTS AND GENERALIZED SOBOL INDICES ART B. OWEN Abstract. This paper introduces generalized Sobol indices, compares strategies for their estimation, and makes a systematic search for efficient