Statistical Estimation: Data & Non-data Information

Statistical Estimation: Data & Non-data Information Roger J-B Wets University of California, Davis & M.Casey @ Raytheon G.Pflug @ U. Vienna, X. Dong @ EpiRisk, G-M You @ EpiRisk.

a little background Decision making under uncertainty: min f 01 (x)+ E{Q(!, x)} so that x "C 1 #! n Q(!, x) = inf $% f 02 (!, y) y "C 2 (!, x) & ' C 2 (!, x) = { y "! d + W! y = d! ( T! x} = stochastic programming problems. Issue: realizations of data: 43 samples, never reaches the asymptotic range ( d!,t!,w! ) =! "! N, N " 2!

Formulation Find F est, an estimate of the distribution of a random (phenomena) variable X given all the information available about this random phenomena, i.e. such that est true! x : F ( x)! F ( x) = prob.[ X " x].

Information Observations (data): x1, x2,, x! Non-data facts: density or discrete distribution, bounds on expectation, moments, shape: unimodal, decreasing, parametric class Non-data modeling assumptions: see above + density is smooth, (un)bounded support,..

Applications: Estimating (cum.) distribution functions Estimating coefficient of time series Estimating coefficients of stochastic differential equation (SDE) Estimating financial curves (zero-curves) Dealing with lack of data: few observations Estimating density functions: h est

Kernel-choice Information = Observations: 1 2 x, x,, x! Optimal bandwidth = kernel support?

Kernel estimates

Statistical Estimation from an optimization viewpoint

Criterion: Maximum Likelihood Find h H=class-fcns(R) that maximizes the probability of observing,,, equivalently: "! l = 1 max h ( x ) 1! max ln h( x ) " l= 1!! max E ln h( x) { } = l l x1 x2 x! =! max ln h( x) P ( dx) #

leads to: dim. optimization problem max E! { ln h(x) } = 1 "! l =1! ln h(x l ) so that h(x)dx # = 1, h(x) $ 0, %x &! h & A! ' H A ν : non-data information constraints H= C 2 (R), L p (S), H 1 (S), S R

Non-data information constraints [! "] S [! ) support: S =,, =,#,... bounds on moments: a! " xh( x) dx! b! Q( x) unimodal: h( x) = e, Q : convex decreasing: h '( x)! 0, " x # S 2 1 h'( x) 'smoothness': h " H0, $ dx #! (Fisher info.) h( x) h! h a " # or $ ( h! h a ), `Bayesian : Total variation,

Questions (M -estimators) + Consistency: as ν does h est h true? hypo-convergence: a law of large numbers Convergence rate: how large does ν have to be so that h est ~ h true? quantitative hypo-convergence: dist(h est,h true ) Solving the -dimensional optimization problem? finite dimensional approximations + again convergence issues.

When is (P) near (P a ) ( ) max ( ) a P f0 x 0 s.t. f ( x)! 0, i " I i x " X # E f f ( P ) max g ( x) s.t. g ( x)! 0, i " I i x " X # E g g Example: ( P) max f ( x) s.t. f ( x)! 0, i = 1,, m i 0 a ( P ) m max f0( x) +!" max[0, fi ( x)] i= 1 2

From: when is (P) near (P a ) ( P) max f ( x) where f ( x) = f ( x) when f ( x)! 0, i " I, x " X # E 0 $ % otherwise i f f a ( P ) max g( x) where g( x) = g ( x) when g ( x)! 0, i " I, x " X # E 0 $ % otherwise i g g To: when is f near g

Example (with penalty)

Hypographs of f & f a

hypo f a near hypo f argmax (P a ) ~ argmax (P)

f = h-lim ν f ν, f ν f h x! " arg max f!, x! # x $ x " arg max f cluster aw-topology for usc functions (Attouch-Wets, 1992)!dl such that dl( f ", f ) # 0 $ f " # f h dl( f!, f ) " # $ dist(arg max f!, arg max f )" %& '1 (#)!! F x = f x P d F x = f x P d ( ) (", ) ( " ) & ( ) (", ) ( " ) # # P! " P & f -tight # F! " F n h

Quantitative hypo-convergence! :! + " #$ 0,1% & continuous, strictly increasing for example: dl C D = " d x C $ " d x D! (, ) sup (, (, ) x #! "! ( a) = (2 /" )arctan a ( ) #! dl C, D = $ e dl ( C, D) d ( ) ( ) 0 metric on space of closed sets:!! dl( f, g) = dl( hypo f, hypo g) metric on space of usc-fcns:! aw -topology! aw -topology

Quantitative hypo-convergence II dl( f!, f ) " 0 # f! " f Theorem: aw-hypo f, g : X!!, usc, concave, " 0 : arg max f # " 0 B $ %, sup f < " 0 & for g then &" > " 0, ' > 0 : dl ( " '-arg max f,'-arg max g) ( (1 + 4" ' ) dl f,g " ( )

Opt -Formulation max E! { ln h(x) } = 1 "! l =1! ln h(x l ) so that h(x)dx # = 1, h(x) $ 0, %x &! h & A! ' H A ν : non-data information constraints H= C 2 (R), L p (S), H 1 (S), S R

( ) Convergence rate!! L h, x = ln h( x) if h " A # H, h( x) dx = 1, h $ 0 = -% otherwise!! L = L except for A = aw-lim A ( aw = dl) then L! i, x dp! = E! L! " EL = L i, x dp ( ) ( ) # # aw-hypo cont. compact when H Hilbert, H H, H RKHS embedding!!!!!"!!! constraint set: H -bounded, H -closed! H lin. subspace,!x : x " h(x) cont., relies on the Hilbert-Schmidt Embedding Theorem & true

Numerical Procedures 1. 2. " q k = 1 h! u! i k k ( ) Fourier coefficients, wavelets, kernels-basis h = exp s( i) ( ) s( ) constrained cubic (or quadratic) spline

q 3 k = 1 Basis: Fourier coefficients max 1! % 1 ln! " u + 2 q $ cos( w # x k l ' $ 0 l =1 &' " k =1 " q sin(w so that u 0 + 2 k # ) $ u w k # k = ", k =1 )u k q u 0 + 2$ cos( w # x k " ) + 0,,x -%& 0," ( ), k =1 u k -!, k = 0,,q, ( * )* w k -! +, k = 1,,q # % wk! x & % wk! x ' & $ - cos+, ' cos+, uk ( 0, ) 0 * x * x ' * " " ". 1 / 0 / 02 i.e., h is decreasing

Test Case: exponential true "! x h x =! e x # = x <! = empirical estimate kernel estimate from R-stat unconstrained with support [0, ) unconstrained with adaptive wave length constrained (h decreasing) parametric, i.e., ( ) if 0; 0 if 0 ( 1) exp- class h!

200-observations! exp. empirical param.-class R-stat kernel unconstrained + wave length + decreasing

h est : empirical (five observations)

h est : kernel

h est : unconstrained

h est :..with adaptive wave #

h est : with decreasing constraint

5-observations! empirical R-stat kernel unconstrained decreasing

1-observation! R-stat kernel unconstrained parametric decreasing

Exponential (quadratic) spline

h( x) = e!s x ( ) x r s( x) = s + v x + dr dt z( t), z( t)! z on x, x # # ( ] ( ] 0 0 k k k + 1 0 0 $ k = s + v x + a z when x " x, x 0 0 j= 1 kj k k k + 1!! 1 max E ln h( x) min s( x ) { } %! l = 1 " s( x) so that e dx # 1, ( h $ 0) &! [ ] z " #!,! 'constrained'-spline k l u unimodal:! l = 0 l

Test Case: log-normal, ν= 25 ( ) 1 2 2 true (ln x ) / 2 h x x" 2# $ $ $! " e x empirical estimate ( ) = for % 0. kernel estimate from R-stat unconstrained with support [0, ) constrained (h unimodal)

25 observations! msqe = mean square error empirical R-stat kernel msqe > 0.0795 unconstrained msqe= 0.0419 unimodal constraint msqe = 0.0190

Test Case: N (10,1), ν= 5 h true (x) = (1/ 2! )e "( x"10)2 / 2, x #! kernel estimate from R-stat unconstrained with support (-, ) light-unimodality constraint on h stricter-unimodality constraint on h h: unimodal

5-observations: N (10,1) R-stat kernel without unimodal unimodal

Higher Dimensions 2-dim. (& dim. >2) h(x, y) = exp(z(x, y))! " 0 z(x, y) = z 0 + v x 0 x + " d! ds a x (s) 0 x! " 0 + " d! dr a y (r) 0 y + " ds" dr a x,y (s,r). 0 x 0 y Set a x (t) = a k on (s k!1,s k ), a y (t) = a l on (r l!1,r l ) a x,y (t) = a k,l on (s k!1,s k ) " (r l!1,r l )

20 samples & unimodal

15 samples