Expected Value of Partial Perfect Information

Similar documents
A Review of Multiple Try MCMC algorithms for Signal Processing

Two new developments in Multilevel Monte Carlo methods

Lower Bounds for the Smoothed Number of Pareto optimal Solutions

Survey Sampling. 1 Design-based Inference. Kosuke Imai Department of Politics, Princeton University. February 19, 2013

arxiv: v1 [hep-lat] 19 Nov 2013

A new proof of the sharpness of the phase transition for Bernoulli percolation on Z d

Collapsed Variational Inference for HDP

Modelling and simulation of dependence structures in nonlife insurance with Bernstein copulas

Linear First-Order Equations

Lower bounds on Locality Sensitive Hashing

Introduction. A Dirichlet Form approach to MCMC Optimal Scaling. MCMC idea

Table of Common Derivatives By David Abraham

Estimating Causal Direction and Confounding Of Two Discrete Variables

Short Intro to Coordinate Transformation

Analyzing Tensor Power Method Dynamics in Overcomplete Regime

An Optimal Algorithm for Bandit and Zero-Order Convex Optimization with Two-Point Feedback

Agmon Kolmogorov Inequalities on l 2 (Z d )

Leaving Randomness to Nature: d-dimensional Product Codes through the lens of Generalized-LDPC codes

Computing Exact Confidence Coefficients of Simultaneous Confidence Intervals for Multinomial Proportions and their Functions

On conditional moments of high-dimensional random vectors given lower-dimensional projections

KNN Particle Filters for Dynamic Hybrid Bayesian Networks

Lecture Introduction. 2 Examples of Measure Concentration. 3 The Johnson-Lindenstrauss Lemma. CS-621 Theory Gems November 28, 2012

Least-Squares Regression on Sparse Spaces

Thermal conductivity of graded composites: Numerical simulations and an effective medium approximation

Linear Regression with Limited Observation

Non-Linear Bayesian CBRN Source Term Estimation

On a limit theorem for non-stationary branching processes.

26.1 Metropolis method

Monte Carlo Methods with Reduced Error

Influence of weight initialization on multilayer perceptron performance

Multi-View Clustering via Canonical Correlation Analysis

Necessary and Sufficient Conditions for Sketched Subspace Clustering

Separation of Variables

Flexible High-Dimensional Classification Machines and Their Asymptotic Properties

MODELLING DEPENDENCE IN INSURANCE CLAIMS PROCESSES WITH LÉVY COPULAS ABSTRACT KEYWORDS

arxiv: v4 [cs.ds] 7 Mar 2014

Robust Low Rank Kernel Embeddings of Multivariate Distributions

A note on asymptotic formulae for one-dimensional network flow problems Carlos F. Daganzo and Karen R. Smilowitz

Parameter estimation: A new approach to weighting a priori information

Perfect Matchings in Õ(n1.5 ) Time in Regular Bipartite Graphs

FLUCTUATIONS IN THE NUMBER OF POINTS ON SMOOTH PLANE CURVES OVER FINITE FIELDS. 1. Introduction

Gaussian processes with monotonicity information

Inter-domain Gaussian Processes for Sparse Inference using Inducing Features

arxiv: v1 [stat.co] 23 Jun 2012

A General Analytical Approximation to Impulse Response of 3-D Microfluidic Channels in Molecular Communication

A PAC-Bayesian Approach to Spectrally-Normalized Margin Bounds for Neural Networks

Problem 3.84 of Bergman. Consider one-dimensional conduction in a plane composite wall. The outer surfaces are exposed to a fluid at T

Robust Bounds for Classification via Selective Sampling

Equilibrium in Queues Under Unknown Service Times and Service Value

LOCAL AND GLOBAL MINIMALITY RESULTS FOR A NONLOCAL ISOPERIMETRIC PROBLEM ON R N

A. Exclusive KL View of the MLE

The Exact Form and General Integrating Factors

Tractability results for weighted Banach spaces of smooth functions

Switching Time Optimization in Discretized Hybrid Dynamical Systems

Topic 7: Convergence of Random Variables

GLOBAL SOLUTIONS FOR 2D COUPLED BURGERS-COMPLEX-GINZBURG-LANDAU EQUATIONS

Multi-View Clustering via Canonical Correlation Analysis

Chapter 4. Electrostatics of Macroscopic Media

Cascaded redundancy reduction

Efficient Macro-Micro Scale Coupled Modeling of Batteries

Multiple Rank-1 Lattices as Sampling Schemes for Multivariate Trigonometric Polynomials

β ˆ j, and the SD path uses the local gradient

Multi-View Clustering via Canonical Correlation Analysis

Approximate Constraint Satisfaction Requires Large LP Relaxations

A simple tranformation of copulas

NOTES ON EULER-BOOLE SUMMATION (1) f (l 1) (n) f (l 1) (m) + ( 1)k 1 k! B k (y) f (k) (y) dy,

A system s wave function is uniquely determined by its underlying physical state

Robustness and Perturbations of Minimal Bases

Sensors & Transducers 2015 by IFSA Publishing, S. L.

Convergence of Random Walks

A Course in Machine Learning

Track Initialization from Incomplete Measurements

Qubit channels that achieve capacity with two states

1. Aufgabenblatt zur Vorlesung Probability Theory

Local Linear ICA for Mutual Information Estimation in Feature Selection

Time-of-Arrival Estimation in Non-Line-Of-Sight Environments

ensembles When working with density operators, we can use this connection to define a generalized Bloch vector: v x Tr x, v y Tr y

WEIGHTING A RESAMPLED PARTICLE IN SEQUENTIAL MONTE CARLO. L. Martino, V. Elvira, F. Louzada

Optimal Signal Detection for False Track Discrimination

Number of wireless sensors needed to detect a wildfire

arxiv: v4 [math.pr] 27 Jul 2016

12.11 Laplace s Equation in Cylindrical and

Energy behaviour of the Boris method for charged-particle dynamics

On the Aloha throughput-fairness tradeoff

Membership testing for Bernoulli and tail-dependence matrices. Daniel Krause. Matthias Scherer. Jonas Schwinn. Ralf Werner

High-Dimensional p-norms

Survey-weighted Unit-Level Small Area Estimation

On the Value of Partial Information for Learning from Examples

Level Construction of Decision Trees in a Partition-based Framework for Classification

Math Notes on differentials, the Chain Rule, gradients, directional derivative, and normal vectors

A Sketch of Menshikov s Theorem

Estimation of the Maximum Domination Value in Multi-Dimensional Data Sets

Some New Thoughts on the Multipoint Method for Reactor Physics Applications. Sandra Dulla, Piero Ravetto, Paolo Saracco,

ON THE OPTIMALITY SYSTEM FOR A 1 D EULER FLOW PROBLEM

arxiv:math/ v1 [math.pr] 3 Jul 2006

Lecture 2 Lagrangian formulation of classical mechanics Mechanics

CONSERVATION PROPERTIES OF SMOOTHED PARTICLE HYDRODYNAMICS APPLIED TO THE SHALLOW WATER EQUATIONS

Chapter 6: Energy-Momentum Tensors

TEMPORAL AND TIME-FREQUENCY CORRELATION-BASED BLIND SOURCE SEPARATION METHODS. Yannick DEVILLE

Generalization of the persistent random walk to dimensions greater than 1

Transcription:

Expecte Value of Partial Perfect Information Mike Giles 1, Takashi Goa 2, Howar Thom 3 Wei Fang 1, Zhenru Wang 1 1 Mathematical Institute, University of Oxfor 2 School of Engineering, University of Tokyo 3 School of Social an Community Meicine, University of Bristol MCM International Conference on Monte Carlo Methos July 4, 2017 Mike Giles (Oxfor) EVPPI July 4, 2017 1 / 19

Outline EVPI an EVPPI MLMC for neste simulation numerical analysis some numerical results using MCMC inputs conclusions Mike Giles (Oxfor) EVPPI July 4, 2017 2 / 19

Decision making uner uncertainty The motivation comes from ecision-making in funing meical research. Moels of the cost-effectiveness of meical treatments epen on various parameters. The parameter values are not known precisely, an are instea moelle as ranom variables coming from some prior istribution. The research question is whether it is worth conucting some meical research to eliminate uncertainty in some of the parameters. Similar applications arise in oil reservoir recovery is it worth sinking an aitional exploratory oil well to learn more about the oilfiel? Mike Giles (Oxfor) EVPPI July 4, 2017 3 / 19

Decision making uner uncertainty Given no knowlege of inepenent uncertain parameters X, Y, best treatment out of some finite set D correspons to max E[f (X,Y)] D while with perfect knowlege we woul have [ ] E max f (X,Y). D However, if X is known but not Y, this is reuce to [ ] E maxe[f (X,Y) X] giving a neste simulation problem. Mike Giles (Oxfor) EVPPI July 4, 2017 4 / 19

EVPI & EVPPI EVPI, the expecte value of perfect information, is the ifference [ ] EVPI = E maxf (X,Y) max E[f (X,Y)] which can be estimate with O(ε 2 ) complexity by stanar methos, assuming an O(1) cost per sample f (X,Y). EVPPI, the expecte value of partial perfect information, is the ifference [ ] EVPPI = E max E[f (X,Y) X] max E[f (X,Y)]. In practice, we choose to estimate [ ] EVPI EVPPI = E max f (X,Y) [ ] E max E[f (X,Y) X] Mike Giles (Oxfor) EVPPI July 4, 2017 5 / 19

MLMC treatment Base on work by Oxfor colleagues (Bujok, Hambly, Reisinger, 2015) Takashi Goa (arxiv, April 2016) propose an efficient MLMC estimator using 2 l samples on level l for conitional expectation. For given sample X, efine ( Z l = 1 2 max ) (a) (b) f +max f maxf where (a) f is an average of f (X,Y) over 2 l 1 inepenent samples for Y; (b) f is an average over a secon inepenent set of 2 l 1 samples; f is an average over the combine set of 2 l inner samples. Mike Giles (Oxfor) EVPPI July 4, 2017 6 / 19

MLMC treatment The expecte value of this estimator is E[Z l ] = E[max f,2 l 1] E[max f,2 l] where f,2 l is an average of 2 l inner samples, an hence L l=1 E[Z l ] = E[maxf] E[max f,2 L] E[max as L, giving us the esire estimate. f] E[maxE[f(X,Y) X] Mike Giles (Oxfor) EVPPI July 4, 2017 7 / 19

MLMC treatment How goo is the estimator? With reference to stanar MLMC theorem, γ=1, but what are α an β? Define F (X) = E[f (X,Y) X], opt (X) = argmaxf (X) so opt (x) is piecewise constant, with a lower-imensional manifol K on which it is not uniquely-efine. 1 Note that for any, 2 (f (a) (b) +f ) f = 0, so Z l =0 if the same maximises each term in Z l. Mike Giles (Oxfor) EVPPI July 4, 2017 8 / 19

Numerical analysis Heuristic analysis: (a) (b) f f = O(2 l/2 ), ue to CLT O(2 l/2 ) probability of both being within istance O(2 l/2 ) of K uner this conition, Z l = O(2 l/2 ) hence E[Z l ] = O(2 l ) an E[Zl 2] = O(2 3l/2 ), so α=1,β=3/2. It is possible to make this rigorous given some assumptions. Mike Giles (Oxfor) EVPPI July 4, 2017 9 / 19

Numerical analysis Assumptions E[ f (X,Y) p ] is finite for all p 2. Comment: helps to boun the ifference between f an F (X). There exists a constant c 0 >0 such that for all 0<ǫ<1 P(min y K X y ǫ) c 0ǫ. Comment: bouns the probability of X being close to K. There exist constants c 1,c 2 > 0 such that if X / K, then maxf (X) max F (X) > min(c 1, c 2 min X y ). opt(x) y K Comment: ensure linear separation of the optimal F away from K. Mike Giles (Oxfor) EVPPI July 4, 2017 10 / 19

Numerical analysis Builing on the heuristic analysis, an past analyses (Giles & Szpruch, 2015), we obtain the following theorem: Theorem (a) (b) If the Assumptions are satisfie, an f, f, f are as efine previously for level l, with 2 l inner samples being use for f, then for any δ>0 an E [ 1 2 (max f (a) +max ] (b) f ) max f = o(2 (1 δ)l ). ( 1 E[ 2 (max f (a) +max ) ] 2 (b) f ) max f = o(2 (3/2 δ)l ). Mike Giles (Oxfor) EVPPI July 4, 2017 11 / 19

Supporting theory Lemma If X i are ii scalar samples with zero mean, an X N is average of N samples, then for p 2, if E[ X p ] is finite then there exists a constant C p, epening only on p, such that E[ X N p ] C p N p/2 E[ X p ] P[ X N >c] C p E[ X p ]/(c 2 N) p/2. Proof. Use iscrete Burkholer-Davis-Guny inequality applie to i X i an then Markov inequality. Mike Giles (Oxfor) EVPPI July 4, 2017 12 / 19

Supporting theory Applying the result to the EVPPI problem, we get: Lemma If all moments of f are finite, then for p 2, there exists a constant C p, epening only on p, such that Proof. E[ f F p ] C p 2 pl/2 E[ f F p ] P[ f F >c] C p E[ f F p ]/(c 2 2 l ) p/2. Conitional on X, we have E[ f F p X] C p 2 pl/2 E[ f F p X] then taking an outer expectation w.r.t. X gives the esire result. A similar argument works for P[ f F >c ] E[1 f F >c ] Mike Giles (Oxfor) EVPPI July 4, 2017 13 / 19

Numerical results Real application: Cost-effectiveness of Noval Oral AntiCoagulants in Atrial Fibrillation A choice of 6 ifferent treatments, an a total of 99 input ranom variables: 34 Normal 8 uniform 15 Beta 3 sets of MCMC inputs approximate by multivariate Normals of imension 7, 7 an 28 corresponing to an aitional 42 Normals Mike Giles (Oxfor) EVPPI July 4, 2017 14 / 19

Numerical results 24 22 20 18 16 14 0 2 4 12 11 10 9 8 7 6 5 0 2 4 β 1.5 α 1.0 Mike Giles (Oxfor) EVPPI July 4, 2017 15 / 19

Numerical results 10 6 15 10 5 30 60 10 4 10 11 St MC MLMC 10 10 10 3 10 9 10 2 10 1 0 2 4 6 8 10 8 20 40 60 Mike Giles (Oxfor) EVPPI July 4, 2017 16 / 19

MCMC inputs A key new challenge is that some of the ranom inputs come from a Bayesian posterior istribution, with samples generate by MCMC methos. x 1 N(0,1), x 2 N(0,1) MCMC sampler: (y 1,y 2 ) f (X,Y,Z) f MCMC sampler: (z 1,z 2 ) The stanar proceure (?) woul be to generate MCMC samples on eman, an then use them immeiately. But how can this be extene to MLMC? Mike Giles (Oxfor) EVPPI July 4, 2017 17 / 19

MCMC within MLMC An alternative iea is to first pre-generate an store a large collection of MCMC samples: {Y 1,Y 2,Y 3,...,Y N } Then, ranom samples for Y can be obtaine from this collection by uniformly istribute selection. This avois the problem of strong correlation between successive MCMC samples, an also works fine for the MLMC calculation. This approach also leas to ieas on QMC for empirical atasets, an ataset thinning, reucing the number of samples which are store. Mike Giles (Oxfor) EVPPI July 4, 2017 18 / 19

Conclusions neste simulation is an important new irection for MLMC research splitting a large set of samples into 2 or more subsets is key to a goo coupling complete numerical analysis proves O(ε 2 ) complexity numerical tests show goo benefits in some cases, but in others QMC is more effective MLQMC might be best? MCMC inputs can be introuce through ranom selection from a large pre-generate empirical ataset Webpages: http://people.maths.ox.ac.uk/gilesm/slies.html http://people.maths.ox.ac.uk/gilesm/mlmc community.html Mike Giles (Oxfor) EVPPI July 4, 2017 19 / 19