Uncertainty Quantification for Machine Learning and Statistical Models

Similar documents
9/12/17. Types of learning. Modeling data. Supervised learning: Classification. Supervised learning: Regression. Unsupervised learning: Clustering

Parametric Unsupervised Learning Expectation Maximization (EM) Lecture 20.a

Mixture Models and EM

Statistics: Learning models from data

PILCO: A Model-Based and Data-Efficient Approach to Policy Search

Statistical Rock Physics

PATTERN RECOGNITION AND MACHINE LEARNING

Advanced Statistical Methods. Lecture 6

Linear Dynamical Systems

Data Analysis Methods

Decision-making, inference, and learning theory. ECE 830 & CS 761, Spring 2016

Hidden Markov Models Part 1: Introduction

Machine Learning. Gaussian Mixture Models. Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall

Checking up on the neighbors: Quantifying uncertainty in relative event location

Non-Parametric Bayes

Human Pose Tracking I: Basics. David Fleet University of Toronto

Introduction to Machine Learning

Bayesian Identity Clustering

Data-Driven Bayesian Model Selection: Parameter Space Dimension Reduction using Automatic Relevance Determination Priors

Validation Metrics. Kathryn Maupin. Laura Swiler. June 28, 2017

Clustering K-means. Clustering images. Machine Learning CSE546 Carlos Guestrin University of Washington. November 4, 2014.

Latent Variable Models

Statistical Learning Reading Assignments

Lecture : Probabilistic Machine Learning

Dynamic Data Modeling, Recognition, and Synthesis. Rui Zhao Thesis Defense Advisor: Professor Qiang Ji

EEL 851: Biometrics. An Overview of Statistical Pattern Recognition EEL 851 1

Lecture 6: Gaussian Mixture Models (GMM)

Automated Segmentation of Low Light Level Imagery using Poisson MAP- MRF Labelling

Probability and Information Theory. Sargur N. Srihari

Density Estimation: ML, MAP, Bayesian estimation

Overview of Statistical Tools. Statistical Inference. Bayesian Framework. Modeling. Very simple case. Things are usually more complicated

Model Calibration under Uncertainty: Matching Distribution Information

Classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2012

Utilizing Adjoint-Based Techniques to Improve the Accuracy and Reliability in Uncertainty Quantification

Big model configuration with Bayesian quadrature. David Duvenaud, Roman Garnett, Tom Gunter, Philipp Hennig, Michael A Osborne and Stephen Roberts.

Lecture 16 Deep Neural Generative Models

Review. DS GA 1002 Statistical and Mathematical Models. Carlos Fernandez-Granda

Why Bayesian? Rigorous approach to address statistical estimation problems. The Bayesian philosophy is mature and powerful.

Parameter Estimation. Industrial AI Lab.

CSC321 Lecture 18: Learning Probabilistic Models

New Approaches in Process Monitoring for Fuel Cycle Facilities

Gentle Introduction to Infinite Gaussian Mixture Modeling

Parallel Simulation of Subsurface Fluid Flow

Global Optimisation with Gaussian Processes. Michael A. Osborne Machine Learning Research Group Department o Engineering Science University o Oxford

Probabilistic Machine Learning. Industrial AI Lab.

Wolter Imaging On Z. Chris Bourdon, Manager Z Imaging and Spectroscopy Julia Vogel, LLNL; Ming Wu, SNL ICF Diagnostics Workshop, October 5 th 2015

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2016

Seismic reservoir characterization in offshore Nile Delta.

Statistical learning. Chapter 20, Sections 1 4 1

Part 4: Conditional Random Fields

Machine Learning Basics: Maximum Likelihood Estimation

Probabilistic Graphical Models

Brief Introduction of Machine Learning Techniques for Content Analysis

Uncertainty in energy system models

Statistical and Learning Techniques in Computer Vision Lecture 2: Maximum Likelihood and Bayesian Estimation Jens Rittscher and Chuck Stewart

Modeling Data with Linear Combinations of Basis Functions. Read Chapter 3 in the text by Bishop

Miscellany : Long Run Behavior of Bayesian Methods; Bayesian Experimental Design (Lecture 4)

Bayesian Machine Learning

Minimum Message Length Inference and Mixture Modelling of Inverse Gaussian Distributions

STATS 306B: Unsupervised Learning Spring Lecture 2 April 2

Net-to-gross from Seismic P and S Impedances: Estimation and Uncertainty Analysis using Bayesian Statistics

Introduction to Probabilistic Machine Learning

Probabilistic Graphical Models for Image Analysis - Lecture 1

We LHR3 04 Realistic Uncertainty Quantification in Geostatistical Seismic Reservoir Characterization

Experiments on the Consciousness Prior

STA 4273H: Statistical Machine Learning

Chapter 7 (Cont d) PERT

MACHINE LEARNING FOR PRODUCTION FORECASTING: ACCURACY THROUGH UNCERTAINTY

Lecture: Gaussian Process Regression. STAT 6474 Instructor: Hongxiao Zhu

Classification and Regression Trees

Mathematical Formulation of Our Example

Probabilistic Graphical Models

Missing Data and Dynamical Systems

Statistics - Lecture One. Outline. Charlotte Wickham 1. Basic ideas about estimation

Making rating curves - the Bayesian approach

Day 5: Generative models, structured classification

Unsupervised Learning with Permuted Data

Notes on Machine Learning for and

An Introduction to No Free Lunch Theorems

Computer Vision Group Prof. Daniel Cremers. 3. Regression

Nonparametric Bayesian Methods (Gaussian Processes)

Photos placed in horizontal position with even amount of white space between photos and header

Gaussian Models

Recent Advances in Bayesian Inference Techniques

STATS 200: Introduction to Statistical Inference. Lecture 29: Course review

Advanced Machine Learning

Logistic Regression Introduction to Machine Learning. Matt Gormley Lecture 8 Feb. 12, 2018

(1) Introduction to Bayesian statistics

Deriving Uncertainty of Area Estimates from Satellite Imagery using Fuzzy Land-cover Classification

Regularizing inverse problems. Damping and smoothing and choosing...

A Bayesian Perspective on Residential Demand Response Using Smart Meter Data

Lecture Notes 1: Decisions and Data. In these notes, I describe some basic ideas in decision theory. theory is constructed from

Multimodal Nested Sampling

Probabilistic modeling. The slides are closely adapted from Subhransu Maji s slides

Selection of Variables and Functional Forms in Multivariable Analysis: Current Issues and Future Directions

Probabilistic Graphical Models: MRFs and CRFs. CSE628: Natural Language Processing Guest Lecturer: Veselin Stoyanov

Clustering. Professor Ameet Talwalkar. Professor Ameet Talwalkar CS260 Machine Learning Algorithms March 8, / 26

Density Estimation. Seungjin Choi

Bayesian Semi-supervised Learning with Deep Generative Models

Clustering, K-Means, EM Tutorial

Transcription:

Uncertainty Quantification for Machine Learning and Statistical Models David J. Stracuzzi Joint work with: Max Chen, Michael Darling, Stephen Dauphin, Matt Peterson, and Chris Young Sandia National Laboratories is a multi-mission laboratory managed and operated by Sandia Corporation, a wholly owned subsidiary of Lockheed Martin Corporation, for the U.S. Department of Energy s National Nuclear Security Administration under contract DE-AC04-94AL85000. SAND2017-2966C

Inverse Problems D = M(T) T = M -1 (D) M has a known form Determining M -1 is ill-posed (unstable, non-unique) Example: determine Earth s subsurface density (T) from gravitational field measurements (D) M is based on Newton s law, F=Gm/r 2 D is almost certainty insufficient to parameterize M -1 Regularize and optimize to get a best solution Uncertainty comes from D Regularization + Optimization (definition of best ) Model parameters 2

Machine Learning and Statistical Modeling D = M(T) T = M -1 (D) M has an unknown form Substitute statistics for domain knowledge Uncertainty comes from D Regularization + Optimization (definition of best ) Model form (and parameters) Inference Lack of domain knowledge impacts selection of model structure, regularization methods, and definition of best 3

Example: Seismic Onset Detection Given Waveform data containing both noise and seismic signals Produce Signal onset time Precision is critical to downstream processing Known Wave propagation theory Unknown Point of origin Event size or type Composition/density of medium (rock) Simplified: signal is 3D. Ground truth does not exist. 4

Seismic Detection Models Model / )-., )-. ARMA(p,q): x " = c + ε " + φ ) x "*) + θ ) ε "*) Basic Approach Model the noise (M1) and the signal plus noise (M2) separately Optimize model parameters θ, φ via maximum likelihood Akaike information criterion (AIC) to select p, q Point at which two models meet is the best guess signal onset 5

Seismic Detection Uncertainty Method 1: Parametric Bootstrap Construct a sampling distribution for the input waveform Sample M 1 & M 2 to create new waveforms Fit a new model to each sample and record k Method 2: Model Sampling M 1 =ARMA(p 1,q 1 ); M 2 =ARMA(p 2,q 2 ) <p 1, q 1, p 2, q 2 > determines k and a likelihood Sample from <p 1, q 1, p 2, q 2 > and fit the data Use the likelihoods to construct probability distribution over k Not clear that these produce similar distributions 6

Uncertainty Quantification for Statistical Models inverse uncertainty quantification problem classic machine learning problem unobservable inferred state approximate inference model induction / calibration input regularization indirect observations solution uncertainty inference errors model-form uncertainty regularization effects = + + + measurement errors Key Question: How much do sensor observations really tell us about the world? Stats / ML research communities do not typically frame questions this way. Most work focuses on building a better statistical model What the model says emphasized over how well 2

Impact on Downstream Analyses Several analyses depend on onset time: Location (hypocenter) Event type (natural or man-made) & size Subsurface tomography Earth model Provide additional information to human analysts Distribution over onset times (vs. confidence interval) Relative reliability of data points and sensors Rely more on current data, less on historical data and modeling assumptions Improve sensitivity? Near term: Model selection (improve fit and uncertainty) Validate against synthetic & known events Compare uncertainty results against analyst picks 8

Example 2: Multimodal Image Analysis Given Co-registered imagery from multiple sources Produce Segmentation with associated class probabilities and uncertainties, or Detection of specific objects Motivation Trend toward automated image analysis Trend toward layered analysis and combining information sources Open question: How reliable are the results? Open question: Which data is useful? Currently As many analytic methods as there are tasks Some ignore uncertainty Others evaluate it in their own uniquely sane way Optical Image Lidar Image 9

Approach 1. Merge data sources Pixel = R + G + B + Height 2. Sample the data to create a new image Non-parametric bootstrap 3. Probabilistically cluster the pixels Gaussian Mixture Nonparametric Mixture 4. Record the class probabilities for each pixel Keep a histogram for each class at each pixel 5. Bootstrap: Repeat 2,3,4 many times Histograms represent the posterior probability distributions 6. Future work: Translate unsupervised classes into semantic labels 10

Multimodal Image Analysis Source Imagery Probability Most Likely of Class Most Likely Class Top row: optical image only Bottom row: combined optical and lidar imagery Class Entropy Uncertainty 11

Closer Look at the Uncertainty Optical Only Combined Imagery 12

Value of Information Evaluate data sufficiency How much does a given data source contribute to an analytic result? Which data do we really need? Which data do we wish we had? Delta Probability Source Max Likelihood Probability Map Delta Uncertainty

Resource Tasking Under Uncertainty Relax assumption that desired information is actually obtained from a scheduled collect Target / Environmental Conditions Uncertainty Representations Risk Additional Value Stochastic Optimization Risk Additional Value Optimum Costs Quality Distribution of Acquired Information 0% 100% Quality/Validation/Compliance Sensor Performance Characteristics Goal: Maximize utility of available data and resources Approach: Quantify impact of data on detection quality Use uncertainty as basis for value Define value as improved label, geospatial, or temporal discrimination Incorporate information values into optimization objective functions 14

Summary Uncertainty analysis does not build a better model It indicates how well a given model captures the data Research is to bridge the theory application gap Identifying which information is useful Digging it out from deep within algorithms Propagating uncertainty through layers of analysis Important questions What s the relationship between uncertainties generated by Measurement errors & data sampling (bootstrap) Model selection and induction processes (model sampling) Inference (MCMC) and how do we combine them? Or should we? What issues arise when we propagate uncertainties from one statistical inverse problem to another? 15

Uncertainty Quotes We demand rigidly defined areas of doubt and uncertainty! Douglas Adams, The Hitchhiker's Guide to the Galaxy It ain t what you don t know that gets you into trouble. It s what you know for sure that just ain t so. Uncertain Origin Uncertainty is an uncomfortable position. But certainty is an absurd one. Voltaire "Ignorance more frequently begets confidence than does knowledge Charles Darwin As far as the laws of mathematics refer to reality, they are not certain; and as far as they are certain, they do not refer to reality. Albert Einstein POC: David Stracuzzi djstrac@sandia.gov