Concepts in Global Sensitivity Analysis IMA UQ Short Course, June 23, 2015

Similar documents
ACTIVE SUBSPACES for dimension reduction in parameter studies

Off-axis anisotropy in multivariate functions

arxiv: v2 [math.na] 5 Dec 2013

Parameter Selection Techniques and Surrogate Models

c 2016 Society for Industrial and Applied Mathematics

Mathematical foundations - linear algebra

8 Eigenvectors and the Anisotropic Multivariate Gaussian Distribution

11 a 12 a 21 a 11 a 22 a 12 a 21. (C.11) A = The determinant of a product of two matrices is given by AB = A B 1 1 = (C.13) and similarly.

Linear Algebra. Session 12

APPENDIX A. Background Mathematics. A.1 Linear Algebra. Vector algebra. Let x denote the n-dimensional column vector with components x 1 x 2.

Multivariate Statistical Analysis

Experiences with Model Reduction and Interpolation

CSL361 Problem set 4: Basic linear algebra

Polynomial Chaos and Karhunen-Loeve Expansion

Preliminary Examination, Numerical Analysis, August 2016

1 Linearity and Linear Systems

EE731 Lecture Notes: Matrix Computations for Signal Processing

IV. Matrix Approximation using Least-Squares

Properties of Matrices and Operations on Matrices

Foundations of Computer Vision

Kernel-based Approximation. Methods using MATLAB. Gregory Fasshauer. Interdisciplinary Mathematical Sciences. Michael McCourt.

Linear Algebra Review

Sobol-Hoeffding Decomposition with Application to Global Sensitivity Analysis

Assignment 1 Math 5341 Linear Algebra Review. Give complete answers to each of the following questions. Show all of your work.

VAR Model. (k-variate) VAR(p) model (in the Reduced Form): Y t-2. Y t-1 = A + B 1. Y t + B 2. Y t-p. + ε t. + + B p. where:

ACTIVE SUBSPACE METHODS IN THEORY AND PRACTICE: APPLICATIONS TO KRIGING SURFACES

SVD, Power method, and Planted Graph problems (+ eigenvalues of random matrices)

Jeffrey D. Ullman Stanford University

Polynomial chaos expansions for sensitivity analysis

MATH 167: APPLIED LINEAR ALGEBRA Least-Squares

Review of Some Concepts from Linear Algebra: Part 2

MAT Linear Algebra Collection of sample exams

Active Subspaces of Airfoil Shape Parameterizations

Quadratic forms. Here. Thus symmetric matrices are diagonalizable, and the diagonalization can be performed by means of an orthogonal matrix.

The Kadison-Singer Conjecture

Symmetric Matrices and Eigendecomposition

Contents. Preface for the Instructor. Preface for the Student. xvii. Acknowledgments. 1 Vector Spaces 1 1.A R n and C n 2

Dimensionality Reduction and Principle Components

Announce Statistical Motivation Properties Spectral Theorem Other ODE Theory Spectral Embedding. Eigenproblems I

UNIT 6: The singular value decomposition.

Conceptual Questions for Review

Announce Statistical Motivation ODE Theory Spectral Embedding Properties Spectral Theorem Other. Eigenproblems I

Linear Algebra - Part II

Structure in Data. A major objective in data analysis is to identify interesting features or structure in the data.

1. What is the determinant of the following matrix? a 1 a 2 4a 3 2a 2 b 1 b 2 4b 3 2b c 1. = 4, then det

Inner products. Theorem (basic properties): Given vectors u, v, w in an inner product space V, and a scalar k, the following properties hold:

Dimensionality Reduction: PCA. Nicholas Ruozzi University of Texas at Dallas

Linear Algebra: Characteristic Value Problem

Dimensionality Reduction and Principal Components

The Kernel Trick, Gram Matrices, and Feature Extraction. CS6787 Lecture 4 Fall 2017

Mathematical foundations - linear algebra

1 Last time: least-squares problems

NONLOCALITY AND STOCHASTICITY TWO EMERGENT DIRECTIONS FOR APPLIED MATHEMATICS. Max Gunzburger

MATH 115A: SAMPLE FINAL SOLUTIONS

a s 1.3 Matrix Multiplication. Know how to multiply two matrices and be able to write down the formula

10-725/36-725: Convex Optimization Prerequisite Topics

Linear Algebra Review. Vectors

Lecture 02 Linear Algebra Basics

Chapter XII: Data Pre and Post Processing

Gaussian process regression for Sensitivity analysis

Learning gradients: prescriptive models

Singular Value Decomposition. 1 Singular Value Decomposition and the Four Fundamental Subspaces

Tutorial: PART 2. Optimization for Machine Learning. Elad Hazan Princeton University. + help from Sanjeev Arora & Yoram Singer

linearly indepedent eigenvectors as the multiplicity of the root, but in general there may be no more than one. For further discussion, assume matrice

CME 302: NUMERICAL LINEAR ALGEBRA FALL 2005/06 LECTURE 0

Background Mathematics (2/2) 1. David Barber

Knowledge Discovery and Data Mining 1 (VO) ( )

Spectral Graph Theory Lecture 2. The Laplacian. Daniel A. Spielman September 4, x T M x. ψ i = arg min

Problem Set 1. Homeworks will graded based on content and clarity. Please show your work clearly for full credit.

1 Inner Product and Orthogonality

CS 246 Review of Linear Algebra 01/17/19

Large Scale Data Analysis Using Deep Learning

COMP 558 lecture 18 Nov. 15, 2010

LINEAR ALGEBRA QUESTION BANK

Lecture 7: Positive Semidefinite Matrices

Gaussian Process Regression and Emulation

YORK UNIVERSITY. Faculty of Science Department of Mathematics and Statistics MATH M Test #1. July 11, 2013 Solutions

LINEAR ALGEBRA REVIEW

The University of Texas at Austin Department of Electrical and Computer Engineering. EE381V: Large Scale Learning Spring 2013.

forms Christopher Engström November 14, 2014 MAA704: Matrix factorization and canonical forms Matrix properties Matrix factorization Canonical forms

Functional Analysis Review

Basic Calculus Review

Nonlinear Diffusion. 1 Introduction: Motivation for non-standard diffusion

Lagrange Multipliers

Positive Definite Matrix

Final Exam, Linear Algebra, Fall, 2003, W. Stephen Wilson

Review problems for MA 54, Fall 2004.

CS 542G: Robustifying Newton, Constraints, Nonlinear Least Squares

Non-Convex Optimization. CS6787 Lecture 7 Fall 2017

Math 102, Winter Final Exam Review. Chapter 1. Matrices and Gaussian Elimination

Exercises * on Principal Component Analysis

Final Review Sheet. B = (1, 1 + 3x, 1 + x 2 ) then 2 + 3x + 6x 2

ELE/MCE 503 Linear Algebra Facts Fall 2018

STA141C: Big Data & High Performance Statistical Computing

Introduction to Machine Learning. PCA and Spectral Clustering. Introduction to Machine Learning, Slides: Eran Halperin

MATH 221, Spring Homework 10 Solutions

DS-GA 1002 Lecture notes 0 Fall Linear Algebra. These notes provide a review of basic concepts in linear algebra.

Online Kernel PCA with Entropic Matrix Updates

Data Mining Techniques

Transcription:

Concepts in Global Sensitivity Analysis IMA UQ Short Course, June 23, 2015 A good reference is Global Sensitivity Analysis: The Primer. Saltelli, et al. (2008) Paul Constantine Colorado School of Mines inside.mines.edu/~pconstan activesubspaces.org @DrPaulynomial WARNING: These slides are meant to complement the oral presentation in the short course. Use out of context at your own risk.

http://www.sfu.ca/~ssurjano/index.html

Von Neumann, John, and Herman H. Goldstine. "Numerical inverting of matrices of high order." Bulletin of the American Mathematical Society 53.11 (1947): 1021-1099.

What kinds of science/engineering models do you care about? Do you have a simulation that you trust? What are the inputs and outputs? How would you characterize the uncertainty in the inputs? In other words, what do you know about the unknown inputs? What question are you trying to answer with your model?

f(x)

x Finite dimensional vector Independent components Centered and scaled to remove units

Response 1 Baseline 0.8 0.6 0.4 0.2 0-0.2-0.4-0.6-0.8-1 0 0.5 1 1.5 2 Time Which perturbation shows the largest change? 1 0.8 0.6 0.4 1 0.8 0.6 0.4 Perturbation 2 Baseline Response 0.2 0-0.2-0.4 Perturbation 1 Baseline Response 0.2 0-0.2-0.4-0.6-0.8-0.6-0.8-1 0 0.5 1 1.5 2 Time 2- norm difference 20.5 infinity- norm difference 2.0 difference at final 5me 0.0-1 0 0.5 1 1.5 2 Time 2- norm difference 31.6 infinity- norm difference 1.8 difference at final 5me 0.0

f Scalar-valued Smooth No noise!

Sensitivity analysis seeks to identify the most important parameters. What are the most important parameters in your model? What are the least important parameters? What does it mean for a parameter to be important?

@f @x i (x) Derivatives measure local sensitivity. But we want something global.

Some Global Sensitivity Metrics 1. Morris elementary effects 2. Sobol sensitivity indices 3. Mean (squared) derivatives 4. Active subspaces

Morris Elementary Effects (Like bad approximations to average derivatives) x 2 p-level grid x 1 Step size h 2 2n p 1,n=1,...,p 1 Elementary effect EE ij (h) = f(x j + he i ) f(x j ) h Sensitivity indices µ i (h) = 1 N µ i (h) = 1 N NX EE ij (h) j=1 NX EE ij (h) j=1

Variance-based decompositions f(x) =f 0 mx + f i (x i ) + i=1 mx i=1 j>i mx f i,j (x i,x j ) + f 1,...,m (x 1,...,x m ) constant functions of one variable functions of two variables functions of 3, 4, variables function of m variables

Variance-based decompositions f 0 = E [f] orthogonal functions f i = E [f x i ] f 0 f i,j = E [f x i,x j ] X f i f 0 i. f 1,...,m = f(x) everything else Decomposition of variance Var [f] = X i Var [f i ]+ X i,j Var [f i,j ]+ + Var [f 1,...,m ]

Sobol indices First order sensitivity index Interaction effects S i = Var [f i] Var [f] S i1,...,i k = Var [f i 1,...,i k ] Var [f] Total effects S T 1 = S 1 + S 1,2 + S 1,3 + S 1,2,3 (e.g., sum everything with a 1 ) (PAUL: Mention the relationship to polynomial chaos.)

Mean (squared) derivatives apple @f " @f 2 # E @x i E @x i Kucherenko, et al., DGSM, RESS (2008)

Let s play! Think of an interesting bivariate function.

Estimating with Monte Carlo is loud. 10 0 10-1 Monte Carlo Error 10-2 10-3 10-4 10-5 10 2 10 4 10 6 Number of samples

What is it goood for? Sensitivity metrics can be hard to interpret if not zero. May provide or confirm understanding. Lots of ideas for using them as weights for anisotropic approximation schemes. Would like to use them to reduce the dimension.

AUDIENCE POLL How many dimensions is high dimensions?

APPROXIMATION f(x) f(x) Z INTEGRATION f(x) dx OPTIMIZATION minimize x f(x)

Dimension 10 points / dimension 1 second / evaluation 1 10 10 s 2 100 ~ 1.6 min 3 1,000 ~ 16 min 4 10,000 ~ 2.7 hours 5 100,000 ~ 1.1 days 6 1,000,000 ~ 1.6 weeks 20 1e20 3 trillion years (240x age of the universe)

Reduced order models Dimension 10 points / dimension 1 second / evaluation 1 10 10 s 2 100 ~ 1.6 min 3 1,000 ~ 16 min 4 10,000 ~ 2.7 hours 5 100,000 ~ 1.1 days 6 1,000,000 ~ 1.6 weeks 20 1e20 3 trillion years (240x age of the universe)

Better designs Dimension 10 points / dimension 1 second / evaluation 1 10 10 s 2 100 ~ 1.6 min 3 1,000 ~ 16 min 4 10,000 ~ 2.7 hours 5 100,000 ~ 1.1 days 6 1,000,000 ~ 1.6 weeks 20 1e20 3 trillion years (240x age of the universe)

Dimension reduction Dimension 10 points / dimension 1 second / evaluation 1 10 10 s 2 100 ~ 1.6 min 3 1,000 ~ 16 min 4 10,000 ~ 2.7 hours 5 100,000 ~ 1.1 days 6 1,000,000 ~ 1.6 weeks 20 1e20 3 trillion years (240x age of the universe)

f(x 1,x 2 ) = exp(0.7x 1 +0.3x 2 ) direction of flat direction of change

$27 Coupon code: BKSL15 bookstore.siam.org/sl02/

DEFINE the active subspace. Consider a function and its gradient vector, f = f(x), x 2 R m, rf(x) 2 R m, : R m! R + The average outer product of the gradient and its eigendecomposition, Z C = (r x f)(r x f) T dx = W W T Partition the eigendecomposition, = apple 1 2, W = W 1 W 2, W 1 2 R m n Rotate and separate the coordinates, active variables x = WW T x = W 1 W T 1 x + W 2 W T 2 x = W 1 y + W 2 z inactive variables

Z r x f r x f T dx VS. Z xx T dx

The eigenvectors indicate perturbations that change the function more, on average. LEMMA 1: i = Z (r x f) T w i 2 dx, i =1,...,m LEMMA 2: Z Z (r y f) T (r y f) dx = 1 + + n (r z f) T (r z f) dx = n+1 + + m

DISCOVER the active subspace with random sampling. Draw samples: x j Compute: f j = f(x j ) and r x f j = r x f(x j ) Approximate with Monte Carlo C Equivalent to SVD of samples of the gradient. 1 N NX r x f j r x fj T = Ŵ ˆ Ŵ T j=1 1 T p rx f 1 r x f N = Ŵ pˆ ˆV N Called an active subspace method in T. Russi s 2010 Ph.D. thesis, Uncertainty Quantification with Experimental Data in Complex System Models

Let s be abundantly clear about the problem we are trying to solve. Low-rank approximation of the collection of gradients: q 1 T p rx f 1 r x f N Ŵ 1 ˆ 1 ˆV 1 N Low-dimensional linear approximation of the gradient: rf(x) Ŵ 1 a(x) Approximate a function of many variables by a function of a few linear combinations of the variables: f(x) g T Ŵ 1 x

What is the approximation error? f(x) g What is the effect of the approximate eigenvectors? Ŵ T x 1 How do you construct g?

[ Show them the animation! ]

EXPLOIT active subspaces for response surfaces with conditional averaging. Define the conditional expectation: Z g(y) = f(w 1 y + W 2 z) (z y) dz, f(x) g(w T 1 x) THEOREM: Z f(x) 1 2 g(w T 2 1 x) dx apple C P ( n+1 + + m ) 1 2 Define the Monte Carlo approximation: ĝ(y) = 1 NX f(w 1 y + W 2 z i ), N THEOREM: Z f(x) i=1 1 2 ĝ(w T 2 1 x) dx z i (z y) apple C P 1+N 1 2 ( n+1 + + m ) 1 2

EXPLOIT active subspaces for response surfaces with conditional averaging. Define the subspace error: " = dist(w 1, Ŵ 1) THEOREM: Z f(x) g(ŵ T 2 1 2 1 x) dx Eigenvalues for inactive variables apple C P " ( 1 + + n ) 1 2 +( n+1 + + m ) 1 2 Subspace error Eigenvalues for active variables

THE BIG IDEA 1. Choose points in the domain of g. 2. Estimate conditional averages at each point. 3. Construct the approximation in n < m dimensions.

There s an active subspace in this parameterized PDE. Eigenvalues 10 6 10 7 10 8 10 9 10 10 10 11 10 12 10 13 10 0 Est BI 1 2 3 4 5 6 Index Two-d Poisson with 100-term Karhunen-Loeve coefficients 1 D 2 Subspace Distance 10 1 10 2 BI Est 1 2 3 4 5 6 Subspace Dimension r (aru) =1,x2D DIMENSION REDUCTION: 100 to 1 u =0,x2 1 n aru =0,x2 2

DIMENSION REDUCTION: 100 to 1 There s an active subspace in this parameterized PDE. 3 2.5 #10-3 Two-d Poisson with 100-term Karhunen-Loeve coefficients Quantity of Interest 2 1.5 1 1 D 2 0.5 0-3 -2-1 0 1 Active variable 2 3 r (aru) =1,x2D u =0,x2 1 n aru =0,x2 2

Active subspaces can be sensitivity metrics. Components of first eigenvector 1 0.8 0.6 0.4 0.2 0 β=0.01 β=1 Short correlation length Long correlation length 0.2 0.4 0 20 40 60 80 100 Index

Questions? How do the active subspaces relate to the coordinate-based sensitivity metrics? How does this relate to PCA/POD? How many gradient samples do I need? How new is all this? Paul Constantine Colorado School of Mines activesubspaces.org @DrPaulynomial