Statistical Estimation: Data & Non-data Information

Similar documents
How to Make the Most Out of Evolving Information: Function Identification Using Epi-Splines

STATISTICAL ESTIMATION FROM AN OPTIMIZATION VIEWPOINT

1 1. Introduction From their inception Statistics and Optimization have been tightly linked. Finding the `best' estimate of a statistical parameter is

Lopsided Convergence: an Extension and its Quantification

VARIATIONAL THEORY FOR OPTIMIZATION UNDER STOCHASTIC AMBIGUITY

Learning Methods for Online Prediction Problems. Peter Bartlett Statistics and EECS UC Berkeley

Statistics: Learning models from data

Can we do statistical inference in a non-asymptotic way? 1

Lopsided Convergence: an Extension and its Quantification

Chapter 2. Binary and M-ary Hypothesis Testing 2.1 Introduction (Levy 2.1)

Spatially Smoothed Kernel Density Estimation via Generalized Empirical Likelihood

Convex Sets and Functions (part III)

Reproducing Kernel Hilbert Spaces

Thomas Knispel Leibniz Universität Hannover

Empirical Risk Minimization

STATISTICAL BEHAVIOR AND CONSISTENCY OF CLASSIFICATION METHODS BASED ON CONVEX RISK MINIMIZATION

RKHS, Mercer s theorem, Unbounded domains, Frames and Wavelets Class 22, 2004 Tomaso Poggio and Sayan Mukherjee

MIT 9.520/6.860, Fall 2017 Statistical Learning Theory and Applications. Class 19: Data Representation by Design

The Moment Method; Convex Duality; and Large/Medium/Small Deviations

Econometrics I, Estimation

Quantifying Stochastic Model Errors via Robust Optimization

Combining Interval and Probabilistic Uncertainty in Engineering Applications

ORIGINS OF STOCHASTIC PROGRAMMING

State Space Representation of Gaussian Processes

Asymptotics for posterior hazards

Definition and basic properties of heat kernels I, An introduction

Invariance Principle for Variable Speed Random Walks on Trees

Stochastic representation of random positive-definite tensor-valued properties: application to 3D anisotropic permeability random fields

Econ 2148, fall 2017 Gaussian process priors, reproducing kernel Hilbert spaces, and Splines

Bayes spaces: use of improper priors and distances between densities

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Bayesian Regularization

Maximum likelihood estimation of a log-concave density based on censored data

ROBUST - September 10-14, 2012

Sobolev spaces, Trace theorems and Green s functions.

A new estimator for quantile-oriented sensitivity indices

arxiv: v2 [math.st] 21 Jan 2018

Covariance function estimation in Gaussian process regression

Reproducing Kernel Hilbert Spaces

Asymptotic properties of the maximum likelihood estimator for a ballistic random walk in a random environment

Kernel Methods. Jean-Philippe Vert Last update: Jan Jean-Philippe Vert (Mines ParisTech) 1 / 444

STAT 730 Chapter 4: Estimation

Sensitivity analysis of the expected utility maximization problem with respect to model perturbations

INSTITUTE AND FACULTY OF ACTUARIES. Curriculum 2019 SPECIMEN SOLUTIONS

minimize x subject to (x 2)(x 4) u,

Machine Learning Basics Lecture 7: Multiclass Classification. Princeton University COS 495 Instructor: Yingyu Liang

Two optimization problems in a stochastic bandit model

Concentration behavior of the penalized least squares estimator

Support Vector Machines for Classification: A Statistical Portrait

Applications of Information Geometry to Hypothesis Testing and Signal Detection

Keywords. 1. Introduction.

Optimization Problems with Probabilistic Constraints

RANDOM FIELDS AND GEOMETRY. Robert Adler and Jonathan Taylor

Consistency of the maximum likelihood estimator for general hidden Markov models

Ambiguity in portfolio optimization

Lecture 35: December The fundamental statistical distances

Homework Assignment #2 for Prob-Stats, Fall 2018 Due date: Monday, October 22, 2018

Information Geometry

Model Selection and Geometry

Nonparametric estimation under Shape Restrictions

Testing Algebraic Hypotheses

Divergences, surrogate loss functions and experimental design

Complexity of two and multi-stage stochastic programming problems

Kernels to detect abrupt changes in time series

Nonparametric estimation of. s concave and log-concave densities: alternatives to maximum likelihood

Ch 4. Linear Models for Classification

Spline Density Estimation and Inference with Model-Based Penalities

Geometry and Optimization of Relative Arbitrage

Computational and Statistical Learning Theory

Competitive Equilibria in a Comonotone Market

ECE531 Lecture 10b: Maximum Likelihood Estimation

Lecture 3. Optimization Problems and Iterative Algorithms

Log Concavity and Density Estimation

Asymptotics for posterior hazards

Stability and Sensitivity of the Capacity in Continuous Channels. Malcolm Egan

Research Article Least Squares Estimators for Unit Root Processes with Locally Stationary Disturbance

A MODEL FOR THE LONG-TERM OPTIMAL CAPACITY LEVEL OF AN INVESTMENT PROJECT

Interest Rate Models:

Statistical inference on Lévy processes

Gaussian processes for inference in stochastic differential equations

Free Entropy for Free Gibbs Laws Given by Convex Potentials

Stochastic optimal control with rough paths

Stochastic Proximal Gradient Algorithm

Bandits : optimality in exponential families

Modelling Non-linear and Non-stationary Time Series

Exercises Chapter 4 Statistical Hypothesis Testing

Introduction & Random Variables. John Dodson. September 3, 2008

Applied Analysis (APPM 5440): Final exam 1:30pm 4:00pm, Dec. 14, Closed books.

Some Properties of the Augmented Lagrangian in Cone Constrained Optimization

Estimation of Dynamic Regression Models

The information complexity of sequential resource allocation

Computational and Statistical Learning Theory

MVE165/MMG631 Linear and integer optimization with applications Lecture 13 Overview of nonlinear programming. Ann-Brith Strömberg

Minimax estimators of the coverage probability of the impermissible error for a location family

Problem Set 5: Solutions Math 201A: Fall 2016

Consistency of Modularity Clustering on Random Geometric Graphs

An Adaptive Test of Independence with Analytic Kernel Embeddings

Kernel Methods. Lecture 4: Maximum Mean Discrepancy Thanks to Karsten Borgwardt, Malte Rasch, Bernhard Schölkopf, Jiayuan Huang, Arthur Gretton

Introduction to Bandit Algorithms. Introduction to Bandit Algorithms

Nonlinear Time Series Modeling

Transcription:

Statistical Estimation: Data & Non-data Information Roger J-B Wets University of California, Davis & M.Casey @ Raytheon G.Pflug @ U. Vienna, X. Dong @ EpiRisk, G-M You @ EpiRisk.

a little background Decision making under uncertainty: min f 01 (x)+ E{Q(!, x)} so that x "C 1 #! n Q(!, x) = inf $% f 02 (!, y) y "C 2 (!, x) & ' C 2 (!, x) = { y "! d + W! y = d! ( T! x} = stochastic programming problems. Issue: realizations of data: 43 samples, never reaches the asymptotic range ( d!,t!,w! ) =! "! N, N " 2!

Formulation Find F est, an estimate of the distribution of a random (phenomena) variable X given all the information available about this random phenomena, i.e. such that est true! x : F ( x)! F ( x) = prob.[ X " x].

Information Observations (data): x1, x2,, x! Non-data facts: density or discrete distribution, bounds on expectation, moments, shape: unimodal, decreasing, parametric class Non-data modeling assumptions: see above + density is smooth, (un)bounded support,..

Applications: Estimating (cum.) distribution functions Estimating coefficient of time series Estimating coefficients of stochastic differential equation (SDE) Estimating financial curves (zero-curves) Dealing with lack of data: few observations Estimating density functions: h est

Kernel-choice Information = Observations: 1 2 x, x,, x! Optimal bandwidth = kernel support?

Kernel estimates

Statistical Estimation from an optimization viewpoint

Criterion: Maximum Likelihood Find h H=class-fcns(R) that maximizes the probability of observing,,, equivalently: "! l = 1 max h ( x ) 1! max ln h( x ) " l= 1!! max E ln h( x) { } = l l x1 x2 x! =! max ln h( x) P ( dx) #

leads to: dim. optimization problem max E! { ln h(x) } = 1 "! l =1! ln h(x l ) so that h(x)dx # = 1, h(x) $ 0, %x &! h & A! ' H A ν : non-data information constraints H= C 2 (R), L p (S), H 1 (S), S R

Non-data information constraints [! "] S [! ) support: S =,, =,#,... bounds on moments: a! " xh( x) dx! b! Q( x) unimodal: h( x) = e, Q : convex decreasing: h '( x)! 0, " x # S 2 1 h'( x) 'smoothness': h " H0, $ dx #! (Fisher info.) h( x) h! h a " # or $ ( h! h a ), `Bayesian : Total variation,

Questions (M -estimators) + Consistency: as ν does h est h true? hypo-convergence: a law of large numbers Convergence rate: how large does ν have to be so that h est ~ h true? quantitative hypo-convergence: dist(h est,h true ) Solving the -dimensional optimization problem? finite dimensional approximations + again convergence issues.

When is (P) near (P a ) ( ) max ( ) a P f0 x 0 s.t. f ( x)! 0, i " I i x " X # E f f ( P ) max g ( x) s.t. g ( x)! 0, i " I i x " X # E g g Example: ( P) max f ( x) s.t. f ( x)! 0, i = 1,, m i 0 a ( P ) m max f0( x) +!" max[0, fi ( x)] i= 1 2

From: when is (P) near (P a ) ( P) max f ( x) where f ( x) = f ( x) when f ( x)! 0, i " I, x " X # E 0 $ % otherwise i f f a ( P ) max g( x) where g( x) = g ( x) when g ( x)! 0, i " I, x " X # E 0 $ % otherwise i g g To: when is f near g

Example (with penalty)

Hypographs of f & f a

hypo f a near hypo f argmax (P a ) ~ argmax (P)

f = h-lim ν f ν, f ν f h x! " arg max f!, x! # x $ x " arg max f cluster aw-topology for usc functions (Attouch-Wets, 1992)!dl such that dl( f ", f ) # 0 $ f " # f h dl( f!, f ) " # $ dist(arg max f!, arg max f )" %& '1 (#)!! F x = f x P d F x = f x P d ( ) (", ) ( " ) & ( ) (", ) ( " ) # # P! " P & f -tight # F! " F n h

Quantitative hypo-convergence! :! + " #$ 0,1% & continuous, strictly increasing for example: dl C D = " d x C $ " d x D! (, ) sup (, (, ) x #! "! ( a) = (2 /" )arctan a ( ) #! dl C, D = $ e dl ( C, D) d ( ) ( ) 0 metric on space of closed sets:!! dl( f, g) = dl( hypo f, hypo g) metric on space of usc-fcns:! aw -topology! aw -topology

Quantitative hypo-convergence II dl( f!, f ) " 0 # f! " f Theorem: aw-hypo f, g : X!!, usc, concave, " 0 : arg max f # " 0 B $ %, sup f < " 0 & for g then &" > " 0, ' > 0 : dl ( " '-arg max f,'-arg max g) ( (1 + 4" ' ) dl f,g " ( )

Opt -Formulation max E! { ln h(x) } = 1 "! l =1! ln h(x l ) so that h(x)dx # = 1, h(x) $ 0, %x &! h & A! ' H A ν : non-data information constraints H= C 2 (R), L p (S), H 1 (S), S R

( ) Convergence rate!! L h, x = ln h( x) if h " A # H, h( x) dx = 1, h $ 0 = -% otherwise!! L = L except for A = aw-lim A ( aw = dl) then L! i, x dp! = E! L! " EL = L i, x dp ( ) ( ) # # aw-hypo cont. compact when H Hilbert, H H, H RKHS embedding!!!!!"!!! constraint set: H -bounded, H -closed! H lin. subspace,!x : x " h(x) cont., relies on the Hilbert-Schmidt Embedding Theorem & true

Numerical Procedures 1. 2. " q k = 1 h! u! i k k ( ) Fourier coefficients, wavelets, kernels-basis h = exp s( i) ( ) s( ) constrained cubic (or quadratic) spline

q 3 k = 1 Basis: Fourier coefficients max 1! % 1 ln! " u + 2 q $ cos( w # x k l ' $ 0 l =1 &' " k =1 " q sin(w so that u 0 + 2 k # ) $ u w k # k = ", k =1 )u k q u 0 + 2$ cos( w # x k " ) + 0,,x -%& 0," ( ), k =1 u k -!, k = 0,,q, ( * )* w k -! +, k = 1,,q # % wk! x & % wk! x ' & $ - cos+, ' cos+, uk ( 0, ) 0 * x * x ' * " " ". 1 / 0 / 02 i.e., h is decreasing

Test Case: exponential true "! x h x =! e x # = x <! = empirical estimate kernel estimate from R-stat unconstrained with support [0, ) unconstrained with adaptive wave length constrained (h decreasing) parametric, i.e., ( ) if 0; 0 if 0 ( 1) exp- class h!

200-observations! exp. empirical param.-class R-stat kernel unconstrained + wave length + decreasing

h est : empirical (five observations)

h est : kernel

h est : unconstrained

h est :..with adaptive wave #

h est : with decreasing constraint

5-observations! empirical R-stat kernel unconstrained decreasing

1-observation! R-stat kernel unconstrained parametric decreasing

Exponential (quadratic) spline

h( x) = e!s x ( ) x r s( x) = s + v x + dr dt z( t), z( t)! z on x, x # # ( ] ( ] 0 0 k k k + 1 0 0 $ k = s + v x + a z when x " x, x 0 0 j= 1 kj k k k + 1!! 1 max E ln h( x) min s( x ) { } %! l = 1 " s( x) so that e dx # 1, ( h $ 0) &! [ ] z " #!,! 'constrained'-spline k l u unimodal:! l = 0 l

Test Case: log-normal, ν= 25 ( ) 1 2 2 true (ln x ) / 2 h x x" 2# $ $ $! " e x empirical estimate ( ) = for % 0. kernel estimate from R-stat unconstrained with support [0, ) constrained (h unimodal)

25 observations! msqe = mean square error empirical R-stat kernel msqe > 0.0795 unconstrained msqe= 0.0419 unimodal constraint msqe = 0.0190

Test Case: N (10,1), ν= 5 h true (x) = (1/ 2! )e "( x"10)2 / 2, x #! kernel estimate from R-stat unconstrained with support (-, ) light-unimodality constraint on h stricter-unimodality constraint on h h: unimodal

5-observations: N (10,1) R-stat kernel without unimodal unimodal

Higher Dimensions 2-dim. (& dim. >2) h(x, y) = exp(z(x, y))! " 0 z(x, y) = z 0 + v x 0 x + " d! ds a x (s) 0 x! " 0 + " d! dr a y (r) 0 y + " ds" dr a x,y (s,r). 0 x 0 y Set a x (t) = a k on (s k!1,s k ), a y (t) = a l on (r l!1,r l ) a x,y (t) = a k,l on (s k!1,s k ) " (r l!1,r l )

20 samples & unimodal

15 samples