How to Make the Most Out of Evolving Information: Function Identification Using Epi-Splines

Size: px

Start display at page:

Download "How to Make the Most Out of Evolving Information: Function Identification Using Epi-Splines"

Sybil Fox
6 years ago
Views:

1 How to Make the Most Out of Evolving Information: Function Identification Using Epi-Splines Johannes O. Royset Operations Research, NPS in collaboration with L. Bonfiglio, S. Buttrey, S. Brizzolara, J. Carbaugh, I.Rios, D. Singham, K. Spurkel, G. Vernengo, J-P. Watson, R. Wets, D. Woodruff Smart Energy and Stochastic Optimization, ENPC ParisTech June 2015 This material is based upon work supported in part by the U.S. Army Research Laboratory and the U.S. Army Research Office under grants , W911NF and W911NF as well as by DARPA under HR / 126

2 Goal: estimate, predict, forecast 2 / 126

3 Copper prices Price[USD] Months 3 / 126

4 Copper prices Price[USD] Real prices E(hist+market) + σ E(hist+market) σ E(hist+market) Months 4 / 126

5 Hourly electricity loads 5 / 126

6 Hard information: data Observations Usually scarce, excessive, corrupted, uncertain 6 / 126

7 Soft information Indirect observations, external information Knowledge about structures established laws physical restrictions shapes, smoothness of curves 7 / 126

8 Evolving information data acquisition, information growth 8 / 126

9 Function Identification Problem Identify a function that best represents all available information 9 / 126

10 Applications of function identification approach energy natural resources financial markets image reconstruction uncertain quantification variograms nonparametric regression deconvolution density estimation 10 / 126

11 Outline Illustrations and formulations (slides 12-33) Framework (slides 34-60) Epi-splines and approximations (slides 61-77) Implementations (slides 78-85) Examples (slides ) 11 / 126

12 Illustrations and formulations 12 / 126

13 Identifying financial curves Estimate discount factor curve given instruments i = 1, 2,..., I and payments p i 0,p i 1,...,p i N i at times 0 = t i 0,t i 1,...,t i N i 13 / 126

14 Identifying financial curves Estimate discount factor curve given instruments i = 1, 2,..., I and payments p i 0,p i 1,...,p i N i at times 0 = t i 0,t i 1,...,t i N i Identify nonnegative, nonincreasing function f with f(0) = 1 N i f(tj i )pi j 0 for all i j=0 13 / 126

15 Identifying financial curves Estimate discount factor curve given instruments i = 1, 2,..., I and payments p i 0,p i 1,...,p i N i at times 0 = t i 0,t i 1,...,t i N i Identify nonnegative, nonincreasing function f with f(0) = 1 N i f(tj i )pi j 0 for all i j=0 Constrained infinite-dimensional optimization problem: I N i minimize f(tj)p i j i i=1 j=0 such that f(0) = 1, f 0, f 0 13 / 126

16 Day-ahead electricity load forecast Data: predicted temperature, dew point; observed load 14 / 126

17 Day-ahead load forecast problem Using a relevant data and weather forecast for tomorrow, determine tomorrow s load curve and its uncertainty 15 / 126

18 Day-ahead load forecast problem Using a relevant data and weather forecast for tomorrow, determine tomorrow s load curve and its uncertainty j = 1,2,...,J: days in data set 15 / 126

19 Day-ahead load forecast problem Using a relevant data and weather forecast for tomorrow, determine tomorrow s load curve and its uncertainty j = 1,2,...,J: days in data set h = 1,2,...,24: hours of the day 15 / 126

20 Day-ahead load forecast problem Using a relevant data and weather forecast for tomorrow, determine tomorrow s load curve and its uncertainty j = 1,2,...,J: days in data set h = 1,2,...,24: hours of the day t j h : predicted temperature in hour h of day j 15 / 126

21 Day-ahead load forecast problem Using a relevant data and weather forecast for tomorrow, determine tomorrow s load curve and its uncertainty j = 1,2,...,J: days in data set h = 1,2,...,24: hours of the day t j h : predicted temperature in hour h of day j d j h : predicted dew point in hour h of day j 15 / 126

22 Day-ahead load forecast problem Using a relevant data and weather forecast for tomorrow, determine tomorrow s load curve and its uncertainty j = 1,2,...,J: days in data set h = 1,2,...,24: hours of the day t j h : predicted temperature in hour h of day j d j h : predicted dew point in hour h of day j l j h : actual observed load in hour h of day j 15 / 126

23 Day-ahead load forecast problem (cont.) Functional regression model: l j h = f tmp (h)t j h +fdpt (h)d j h +ej h, regression coefficients f tmp and f dpt are functions of time e j h : error between the observed and predicted loads 16 / 126

24 Day-ahead load forecast problem (cont.) Functional regression model: l j h = f tmp (h)t j h +fdpt (h)d j h +ej h, regression coefficients f tmp and f dpt are functions of time e j h : error between the observed and predicted loads Regression problem: find functions f tmp and f dpt on [0,24] that J 24 minimize l j h [f tmp (h)t j h +fdpt (h)dh] j j=1 h=1 subject to smoothness, curvature conditions 16 / 126

25 Day-ahead load forecast problem (cont.) Functional regression model: l j h = f tmp (h)t j h +fdpt (h)d j h +ej h, regression coefficients f tmp and f dpt are functions of time e j h : error between the observed and predicted loads Regression problem: find functions f tmp and f dpt on [0,24] that J 24 minimize l j h [f tmp (h)t j h +fdpt (h)dh] j j=1 h=1 subject to smoothness, curvature conditions Constrained infinite-dimensional optimization problem 16 / 126

26 Day-ahead load forecast problem (cont.) Given temperature t and dew point d forecasts (functions of time) for tomorrow, load forecast becomes f tmp (h)t(h)+f dpt (h)d(h), h [0,24] 17 / 126

27 Day-ahead load forecast problem (cont.) Given temperature t and dew point d forecasts (functions of time) for tomorrow, load forecast becomes f tmp (h)t(h)+f dpt (h)d(h), h [0,24] But, what about uncertainty? 17 / 126

28 Estimation of errors For hour h: minimized errors e j h = lj h [f tmp (h)t j h +fdpt (h)d j h ], j = 1,...,J = samples from actual error density But that wouldn t capture the uncertainty! one would expect: h 18 / 126

29 Estimation of errors For hour h: minimized errors e j h = lj h [f tmp (h)t j h +fdpt (h)d j h ], j = 1,...,J = samples from actual error density But that wouldn t capture the uncertainty! one would expect: h Probability density estimation problem with very small data set 18 / 126

30 Estimation of errors (cont.) Probability density estimation problem: Given iid sample x 1,x 2,...,x ν, find a function h H that maximizes ν h(x i ) i=1 19 / 126

31 Estimation of errors (cont.) Probability density estimation problem: Given iid sample x 1,x 2,...,x ν, find a function h H that maximizes ν h(x i ) i=1 H = constraints on h: h 0, h(x)dx = 1, and more 19 / 126

32 Estimation of errors (cont.) Probability density estimation problem: Given iid sample x 1,x 2,...,x ν, find a function h H that maximizes ν h(x i ) i=1 H = constraints on h: h 0, h(x)dx = 1, and more Constrained infinite-dimensional optimization problem 19 / 126

33 Estimation of errors (cont.) 20 / 126

34 Generating load scenarios Also need to consider conditioning: if load above regression function early, then likely above later error data for density estimation reduced further 21 / 126

35 Generating load scenarios Also need to consider conditioning: if load above regression function early, then likely above later error data for density estimation reduced further 21 / 126

36 Forecast precision (Connecticut ) Season Mean % error Fall 3.99 Spring 2.73 Summer 4.19 Winter / 126

37 Uncertainty quantification (UQ) Engineering, biological, physical systems Input: random vector V ( known distribution) System function g; implicitly defined e.g. by simulation Output: random variable X = g(v) 23 / 126

38 Illustration of UQ challenges Output amplitude of dynamical system: 1.5 density of X x How to estimate densities like this with a small sample? 24 / 126

39 Probability density estimation Sample X 1,...,X ν : maximize ν i=1 h(xi ) s.t. h H ν H Sample size might be growing Constraint set H ν might be evolving 25 / 126

40 Evolving probability density estimation maximize ν i=1 (h(xi )) 1/ν s.t. h H ν H 26 / 126

41 Evolving probability density estimation maximize ν i=1 (h(xi )) 1/ν s.t. h H ν H maximize 1 ν ν i=1 logh(xi ) s.t. h H ν H 26 / 126

42 Evolving probability density estimation maximize ν i=1 (h(xi )) 1/ν s.t. h H ν H maximize 1 ν ν i=1 logh(xi ) s.t. h H ν H actual problem: maximize E[logh(X)] s.t. h H H 26 / 126

43 Evolving constraints Constraints: nonnegative support, smooth, curvature bound 1.5 True Density Epi Spline Estimate Kernel Estimate Empirical Density 1 density x MSE = (epi-spline) and (kernel) 27 / 126

44 Evolving constraints (cont.) Also log-concave 1.5 True Density Epi Spline Estimate Kernel Estimate Empirical Density 1 density x MSE = (epi-spline) and (kernel) 28 / 126

45 Evolving constraints (cont.) Also nonincreasing 1.5 True Density Epi Spline Estimate Kernel Estimate Empirical Density 1 density x MSE = (epi-spline) and (kernel) 29 / 126

46 Evolving constraints (cont.) Also slope bounds 1.5 True Density Epi Spline Estimate Kernel Estimate Empirical Density 1 density x MSE = (epi-spline) and (kernel) 30 / 126

47 Take-aways so far Challenges in forecasting and estimation are function identification problems Day-ahead forecasting of electricity demand involves functional regression to get trend probability density estimation to get errors Soft information supplements hard information (data) 31 / 126

48 Questions? 32 / 126

49 Outline Illustrations and formulations (slides 12-33) Framework (slides 34-60) Epi-splines and approximations (slides 61-77) Implementations (slides 78-85) Examples (slides ) 33 / 126

50 Framework 34 / 126

51 Actual problem Constrained infinite-dimensional optimization problem minψ(f) such that f F F some function space 35 / 126

52 Actual problem Constrained infinite-dimensional optimization problem minψ(f) such that f F F some function space criterion ψ (norms, error measures, etc) 35 / 126

53 Actual problem Constrained infinite-dimensional optimization problem minψ(f) such that f F F some function space criterion ψ (norms, error measures, etc) feasible set F (shape restrictions, external info) 35 / 126

54 Actual problem Constrained infinite-dimensional optimization problem minψ(f) such that f F F some function space criterion ψ (norms, error measures, etc) feasible set F (shape restrictions, external info) subspace of interest F (continuity, smoothness) 35 / 126

55 Special case: linear regression Food Expenditure Superquantile Regression 0.75 Quantile Regression Least squares Regression find affine f that min j Household Income (y j f(x j )) 2 or other error measures 36 / 126

56 Special case: interpolation data smoothing spline interpolating spline min f 2 such that f(x j ) = y j for all j and f Sobolev space 37 / 126

57 Special case: smoothing data smoothing spline interpolating spline min j (y j f(x j )) 2 +λ f 2 such that f Sobolev space 38 / 126

58 Evolving problems Evolving infinite-dimensional optimization problems minψ ν (f) such that f F ν F some function space evolving/approximating criterion ψ ν evolving/approximating feasible set F ν 39 / 126

59 Evolving problems Evolving infinite-dimensional optimization problems minψ ν (f) such that f F ν F some function space evolving/approximating criterion ψ ν evolving/approximating feasible set F ν 40 / 126

60 Function space extended real-valued lower semicontinuous functions on IR n 41 / 126

61 Function space extended real-valued lower semicontinuous functions on IR n lower semicont. f x 41 / 126

62 Function space extended real-valued lower semicontinuous functions on IR n lower semicont. f x (lsc-fcns(ir n ),dl): complete separable metric, dl = epi-distance 41 / 126

63 Modeling possibilities with lsc functions functions with jumps and high growth response surface building with implicit constraints system identification with subsequent minimization nonlinear transformations requiring ± functions with unbounded domain 42 / 126

64 Jumps Fits by polynomial and lsc function 43 / 126

65 Modeling possibilities with lsc functions functions with jumps and high growth response surface building with implicit constraints system identification with subsequent minimization nonlinear transformations requiring ± functions with unbounded domain 44 / 126

66 Nonlinear transformations Recall: probability density estimation with sample X 1,...,X ν : maximize 1 ν ν i=1 logh(xi ) s.t. h H ν H including h 0 45 / 126

67 Nonlinear transformations Recall: probability density estimation with sample X 1,...,X ν : maximize 1 ν ν i=1 logh(xi ) s.t. h H ν H including h 0 Exponential transformation: h(x) = exp( s(x)) 45 / 126

68 Nonlinear transformations Recall: probability density estimation with sample X 1,...,X ν : maximize 1 ν ν i=1 logh(xi ) s.t. h H ν H including h 0 Exponential transformation: h(x) = exp( s(x)) minimize ν i=1 s(xi ) s.t. s S ν S excluding nonnegativity 45 / 126

69 Nonlinear transformations Recall: probability density estimation with sample X 1,...,X ν : maximize 1 ν ν i=1 logh(xi ) s.t. h H ν H including h 0 Exponential transformation: h(x) = exp( s(x)) minimize ν i=1 s(xi ) s.t. s S ν S excluding nonnegativity h log-concave s convex 45 / 126

70 Nonlinear transformations Recall: probability density estimation with sample X 1,...,X ν : maximize 1 ν ν i=1 logh(xi ) s.t. h H ν H including h 0 Exponential transformation: h(x) = exp( s(x)) minimize ν i=1 s(xi ) s.t. s S ν S excluding nonnegativity h log-concave s convex If s = c( ),r, then certain expression linear in r 45 / 126

71 Nonlinear transformations Recall: probability density estimation with sample X 1,...,X ν : maximize 1 ν ν i=1 logh(xi ) s.t. h H ν H including h 0 Exponential transformation: h(x) = exp( s(x)) minimize ν i=1 s(xi ) s.t. s S ν S excluding nonnegativity h log-concave s convex If s = c( ),r, then certain expression linear in r But need s to take on for h = exp( s( )) to vanish 45 / 126

72 Identifying functions with unbounded domains Complications for standard spaces 46 / 126

73 Identifying functions with unbounded domains Complications for standard spaces Is f ν near f 0? / x 46 / 126

74 Identifying functions with unbounded domains Complications for standard spaces Is f ν near f 0? / x Not in L p -norm, but epi-distance 1/ν 46 / 126

75 Modeling possibilities with lsc functions functions with jumps and high growth response surface building with implicit constraints system identification with subsequent minimization nonlinear transformations requiring ± functions with unbounded domain 47 / 126

76 Epi-graph f x 48 / 126

77 Epi-graph f epi f x 49 / 126

78 Geometric view of lsc functions f lsc-fcns(ir n ) epi f nonemptly closed subset of IR n+1 50 / 126

79 Geometric view of lsc functions f lsc-fcns(ir n ) epi f nonemptly closed subset of IR n+1 Notation: ρib = IB(0,ρ) = origin-centered ball with radius ρ d(y,s) = inf y S y y = distance between y and S 50 / 126

80 Distances between epi-graphs epi f epi g x 51 / 126

81 Distances between epi-graphs (cont.) ρ-epi-distance = dl ρ (f,g) = sup d( x,epi f) d( x,epi g), ρ 0 x ρib 52 / 126

82 Distances between epi-graphs (cont.) ρ-epi-distance = dl ρ (f,g) = sup d( x,epi f) d( x,epi g), ρ 0 x ρib g f x 52 / 126

83 Distances between epi-graphs (cont.) ρ-epi-distance = dl ρ (f,g) = sup d( x,epi f) d( x,epi g), ρ 0 x ρib epi g epi f =(, x) x dl ρ (f,g) = δ 53 / 126

84 Distances between epi-graphs (cont.) ρ-epi-distance = dl ρ (f,g) = sup d( x,epi f) d( x,epi g), ρ 0 x ρib epi (, ) epi (, epi ) (, epi ) = (, ) 54 / 126

85 Metric on the lsc functions ρ-epi-distance dl ρ is a pseudo-metric on lsc-fcns(ir n ) epi-distance = dl(f,g) = dl is a metric on lsc-fcns(ir n ) 0 dl ρ (f,g)e ρ dρ The epi-distance induces the epi-topology on lsc-fcns(ir n ) (lsc-fcns(ir n ),dl) is a complete separable metric space (Polish) 55 / 126

86 Epi-convergence f ν lsc-fcns(ir n ) epi-converges to f if dl(f ν,f) 0 56 / 126

87 Actual problem Constrained infinite-dimensional optimization problem minψ(f) such that f F F lsc-fcns(ir n ) criterion ψ (norms, error measures, etc) feasible set F (shape restrictions, external info) subspace of interest F (continuity, smoothness) 57 / 126

88 Take-aways in this section Actual problem: find best extended real-valued lower semicontinuous function captures regression, curve fitting, interpolation, density estimation etc: allows functions on IR n, jumps, transformation requiring ± Evolving problem due to changes in information and approximations Theory about lsc functions: metric space under epi-distance epi-distance = distance between epi-graphs 58 / 126

89 Questions? 59 / 126

90 Outline Illustrations and formulations (slides 12-33) Framework (slides 34-60) Epi-splines and approximations (slides 61-77) Implementations (slides 78-85) Examples (slides ) 60 / 126

91 Epi-splines and approximations 61 / 126

92 Epi-splines = piecewise polynomial + lower semicont. Piecewise polynomial not automatic; must be constructed 62 / 126

93 Epi-splines = piecewise polynomial + lower semicont. Piecewise polynomial not automatic; must be constructed s(x) m 1 m 2 m 3 m 4 m 5 m 6 m N-1 x 62 / 126

94 Epi-splines = piecewise polynomial + lower semicont. Piecewise polynomial not automatic; must be constructed s(x) m 1 m 2 m 3 m 4 m 5 m 6 m N-1 Optimization over new class of piecewise polynomial functions x 62 / 126

95 n-dim epi-splines 63 / 126

96 n-dim epi-splines (cont.) Epi-spline s of order p defined on IR n with partition R = {R k } N k=1 is a real-valued function that on each R k, k = 1,...,N, is polynomial of total degree p and for every x IR n, has s(x) = liminf x x s(x ) 64 / 126

97 n-dim epi-splines (cont.) Epi-spline s of order p defined on IR n with partition R = {R k } N k=1 is a real-valued function that on each R k, k = 1,...,N, is polynomial of total degree p and for every x IR n, has s(x) = liminf x x s(x ) e-spl p n(r) = epi-splines of order p on IR n with partition R 64 / 126

98 Evolving and approximating problems minψ ν (s) such that s S ν = F ν S ν 65 / 126

99 Evolving and approximating problems minψ ν (s) such that s S ν = F ν S ν subspace of interest S ν e-spl pν n (R ν ) F 65 / 126

100 Evolving and approximating problems minψ ν (s) such that s S ν = F ν S ν subspace of interest S ν e-spl pν n (R ν ) F evolving/approximate feasible set F ν 65 / 126

101 Evolving and approximating problems minψ ν (s) such that s S ν = F ν S ν subspace of interest S ν e-spl pν n (R ν ) F evolving/approximate feasible set F ν evolving/approximate criterion ψ ν 65 / 126

102 Function identification framework Evolving Problems - information growth - criterion and constraint approx. - function approximations solutions = epi-splines Actual Problem - complete information - infinite dimensional - conceptual solutions consistency, rates 66 / 126

103 Convergence of evolving problems Epi-convergence of functionals on a metric space: 67 / 126

104 Convergence of evolving problems Epi-convergence of functionals on a metric space: Definition: {ψ ν : S ν IR} ν=1 epi-converge to ψ : F IR if and only if (i) for every s ν f lsc-fcns(ir n ), with s ν S ν, we have liminfψ ν (s ν ) ψ(f) if f F and ψ ν (s ν ) otherwise; (ii) for every f F, there exists {s ν } ν=1, with sν S ν, such that s ν f and limsupψ ν (s ν ) ψ(f). 67 / 126

105 Convergence of evolving problems Epi-convergence of functionals on a metric space: Definition: {ψ ν : S ν IR} ν=1 epi-converge to ψ : F IR if and only if (i) for every s ν f lsc-fcns(ir n ), with s ν S ν, we have liminfψ ν (s ν ) ψ(f) if f F and ψ ν (s ν ) otherwise; (ii) for every f F, there exists {s ν } ν=1, with sν S ν, such that s ν f and limsupψ ν (s ν ) ψ(f). Equivalent to previous definition 67 / 126

106 Convergence of evolving problems Epi-convergence of functionals on a metric space: Definition: {ψ ν : S ν IR} ν=1 epi-converge to ψ : F IR if and only if (i) for every s ν f lsc-fcns(ir n ), with s ν S ν, we have liminfψ ν (s ν ) ψ(f) if f F and ψ ν (s ν ) otherwise; (ii) for every f F, there exists {s ν } ν=1, with sν S ν, such that s ν f and limsupψ ν (s ν ) ψ(f). Equivalent to previous definition Also say that the evolving optimization problems epi-converge to the actual problem. 67 / 126

107 Consequence of epi-convergence If evolving problems epi-converge to the actual problem, then ( ) lim sup inf s S ψν (s) inf ψ(f). ν f F 68 / 126

108 Consequence of epi-convergence If evolving problems epi-converge to the actual problem, then ( ) lim sup inf s S ψν (s) inf ψ(f). ν f F Moreover, if s k are optimal for the evolving problems min s S ν k ψ ν k(s) and s k f 0, then f 0 is optimal for the actual problem and lim inf ψ ν k (s) = inf ψ(f) = ψ(f 0 ). k s S ν k f F 68 / 126

109 When will evolving problems epi-converge to actual? minψ(f) minψ ν (s) such that f F F lsc-fcns(ir n ) such that s S ν = F ν S ν Theorem: A sufficient condition for epi-convergence is ψ ν converges continuously to ψ relative to F Epi-splines dense in lsc functions F ν set-converges to F F is solid 69 / 126

110 Epi-splines dense in lsc functions {R ν } ν=1, with Rν = {R ν 1,...,Rν N ν }, is an infinite refinement if for every x IR n and ǫ > 0, there exists a positive integer ν s.t. R ν k IB(x,ǫ) for every ν ν and k satisfying x clrν k. 70 / 126

111 Epi-splines dense in lsc functions {R ν } ν=1, with Rν = {R ν 1,...,Rν N ν }, is an infinite refinement if for every x IR n and ǫ > 0, there exists a positive integer ν s.t. R ν k IB(x,ǫ) for every ν ν and k satisfying x clrν k. Theorem: For any p = 0,1,2,..., and infinite refinement {R ν } ν=1, e-spl p n(r ν ) is dense in lsc-fcns(ir n ) ν=1 70 / 126

112 Dense in continuous functions under simplex partitioning Definition: A simplex S in IR n is the convex hull of n+1 points x 0,x 1,...,x n IR n, with x 1 x 0, x 2 x 0,..., x n x 0 linearly independent. 71 / 126

113 Dense in continuous functions under simplex partitioning Definition: A simplex S in IR n is the convex hull of n+1 points x 0,x 1,...,x n IR n, with x 1 x 0, x 2 x 0,..., x n x 0 linearly independent. Definition: A partition R 1,R 2,...,R N of IR n is a simplex partition of IR n if clr 1,..., clr N are mostly simplexes. 71 / 126

114 Dense in cnts fcns under simplex partitioning (cont.) Theorem: For any p = 1,2,.., and {R ν } ν=1, an infinite refinement of IR n consisting of simplex partitions of IR n, e-spl p n(r ν ) C 0 (IR n ) is dense in C 0 (IR n ). ν=1 72 / 126

115 Rates in univariate probability density estimation For convex and finite-dimensional estimation problem: Theorem: For any correct soft information, ν 1/2 d KL (h 0 h ν ǫ ) = O p(1) for some h ν ǫ = near-optimal solution 73 / 126

116 Decomposition Theorem: For every s e-spl p n(r), with n > p 1 and R = {R k } N k=1, there exist q k,i poly p (IR n 1 ), i = 1,2,...,n and k = 1,2,...,N, such that s(x) = n q k,i (x i ), for all x R k. i=1 an n-dim epi-spline is the sum of n, (n 1)-dim epi-splines 74 / 126

117 Take-aways in this section Epi-splines are piecewise polynomials defined on arbitrary partition of IR n Only lower-semicontinuity required (not smoothness) Dense in space of extended real-valued lower-semicontinuous functions under epi-distance (i.e., epi-splines can approximate every lsc function to an arbitrary accuracy) Solutions of evolving problem (in terms of epi-splines) are approximate solutions of actual problem 75 / 126

118 Questions? 76 / 126

119 Outline Illustrations and formulations (slides 12-33) Framework (slides 34-60) Epi-splines and approximations (slides 61-77) Implementations (slides 78-85) Examples (slides ) 77 / 126

120 Implementations 78 / 126

121 Implementation considerations Selection of epi-spline order and composition h = exp( s) Selection of partition: go fine! Implementation of criterion functional and soft info 79 / 126

122 Implementation considerations Selection of epi-spline order and composition h = exp( s) Selection of partition: go fine! Implementation of criterion functional and soft info Provide details for one-dimensional epi-splines of order 1 Focusing on criteria and constraints for probability densities 79 / 126

123 Implementation of criteria functionals Probability densities h e-spl 1 (m), with mesh m = {m 0,m 1,...,m N }, < m 0 < m 1 <... < m N < 80 / 126

124 Implementation of criteria functionals Probability densities h e-spl 1 (m), with mesh m = {m 0,m 1,...,m N }, < m 0 < m 1 <... < m N < First-order epi-splines (piecewise linear): h(x) = a k 0 +ak x for x (m k 1,m k ) 80 / 126

125 Implementation of criteria functionals (cont.) Log-likelihood for observations x 1,...,x ν : ν log h(x i ) = i=1 ν i=1 log(a k i 0 +ak i x i ), where k i such that x i (m ki 1,m ki ) 81 / 126

126 Implementation of criteria functionals (cont.) Log-likelihood for observations x 1,...,x ν : ν log h(x i ) = i=1 ν i=1 log(a k i 0 +ak i x i ), where k i such that x i (m ki 1,m ki ) Entropy: h(x)logh(x)dx = N k=1 mk m k 1 (a k 0 +ak x)log(a k 0 +ak x) 81 / 126

127 Implementation of criteria functionals (cont.) Log-likelihood for observations x 1,...,x ν : ν log h(x i ) = i=1 ν i=1 log(a k i 0 +ak i x i ), where k i such that x i (m ki 1,m ki ) Entropy: h(x)logh(x)dx = N k=1 mk m k 1 (a k 0 +ak x)log(a k 0 +ak x) Both concave in epi-parameters a k 0 and ak, k = 1,...,N 81 / 126

128 Soft information Nonnegativity: h 0 if a k 0 +ak m k 1 0 and a k 0 +ak m k 0, k = 1,...,N 82 / 126

129 Soft information Nonnegativity: h 0 if a k 0 +ak m k 1 0 and a k 0 +ak m k 0, k = 1,...,N Integrate to one: h(x)dx = N a0 k (m k m k 1 )+ k=1 N k=1 a k 2 (m2 k m2 k 1 ) = 1 82 / 126

130 Soft information Nonnegativity: h 0 if a k 0 +ak m k 1 0 and a k 0 +ak m k 0, k = 1,...,N Integrate to one: h(x)dx = N a0 k (m k m k 1 )+ k=1 N k=1 a k 2 (m2 k m2 k 1 ) = 1 Continuity: a k 0 +ak m k = a k+1 0 +a k+1 m k for k = 1,...,N / 126

131 Soft information Nonnegativity: h 0 if a k 0 +ak m k 1 0 and a k 0 +ak m k 0, k = 1,...,N Integrate to one: h(x)dx = N a0 k (m k m k 1 )+ k=1 N k=1 a k 2 (m2 k m2 k 1 ) = 1 Continuity: a k 0 +ak m k = a k+1 0 +a k+1 m k for k = 1,...,N 1. Log-concavity, convexity, monotonicity 82 / 126

132 Soft information (cont.) Deconvolution Y = X +W: Given densities h W and h Y, N mk h Y (y) = h(x)h W (y x)dx = a0 k h W (y x)dx k=1 m k 1 N mk + a k xh W (y x)dx for all y m k 1 k=1 83 / 126

133 Soft information (cont.) Deconvolution Y = X +W: Given densities h W and h Y, N mk h Y (y) = h(x)h W (y x)dx = a0 k h W (y x)dx k=1 m k 1 N mk + a k xh W (y x)dx for all y m k 1 k=1 Inverse problem Y = g(x): Given v q = y q h Y (y)dy, q = 1,2,... v q = N M k [g(x)] q h(x)dx w jk [g(x jk )] q (a0 k +a k x jk ) k=1 j=1 83 / 126

134 Soft information (cont.) Deconvolution Y = X +W: Given densities h W and h Y, N mk h Y (y) = h(x)h W (y x)dx = a0 k h W (y x)dx k=1 m k 1 N mk + a k xh W (y x)dx for all y m k 1 k=1 Inverse problem Y = g(x): Given v q = y q h Y (y)dy, q = 1,2,... v q = N M k [g(x)] q h(x)dx w jk [g(x jk )] q (a0 k +a k x jk ) k=1 j=1 Convex optimization problems 83 / 126

135 Software, references Papers and tutorials: Software for univariate probability density estimation Matlab toolbox R toolbox (S. Buttrey) 84 / 126

136 Outline Illustrations and formulations (slides 12-33) Framework (slides 34-60) Epi-splines and approximations (slides 61-77) Implementations (slides 78-85) Examples (slides ) 85 / 126

137 Examples: forecasting and fitting 86 / 126

138 Copper price forecast Stochastic differential equation: estimate drift, volatility Steps: identify discount factor curve using epi-splines and futures market information = future spot prices identify drift using epi-splines, historical data, future spot prices, etc identify volatility using epi-splines and observed/estimated errors 87 / 126

139 Copper price forecast (cont.) Price[USD] Real prices E(hist+market) + σ E(hist+market) σ E(hist+market) Months 88 / 126

140 Surface reconstruction f(x) = (cos(πx 1 )+cos(πx 2 )) 3 on [ 3,3] 2 continuity; second-order epi-splines; N = 400; 900 uniform points Actual function and epi-spline approximation 89 / 126

141 Surface reconstruction f(x) = sin(π x )/(π x ) for x [ 5,5] 2 \{0}, f(0) = 1 cont. diff.; second-order epi-splines; N = 225; 600 random points actual surface x 2 x x x Actual function and epi-spline approximation epi spline estimate / 126

142 Examples: density estimation 91 / 126

143 Examples of probability density estimation Find a density h that maximizes log-likelihood of observations and satisfies soft information Examples: earthquake losses, queuing, robustness, deconvolution, UQ, mixture, bivariate densities 92 / 126

144 Earthquake losses 93 / 126

145 Earthquake losses, Vancouver Region* Comprehensive damage model (284 random variables) 100,000 simulations of 50-year loss *Data from Mahsuli, 2012; see also Mahsuli & Haukaas, / 126

146 Earthquake losses (cont.) Probability density of 50-year loss (billion CAD) density loss 95 / 126

147 Use fewer simulations? Using 100 simulations only and kernel estimator 96 / 126

148 Use fewer simulations? Using 100 simulations only and kernel estimator density loss 96 / 126

149 Using function identification approach Max likelihood of 100 simulations; second-order exp. epi-splines Eng. knowledge: nonincreasing, smooth, nonnegative support 97 / 126

150 Using function identification approach Max likelihood of 100 simulations; second-order exp. epi-splines Eng. knowledge: nonincreasing, smooth, nonnegative support sample 100 sample 100 kernel 0.25 density loss 97 / 126

151 Using function identification approach (cont.) Varying number of simulations; same soft information sample sample 1000 sample 100 sample 10 sample 0.2 density loss 98 / 126

152 Using function identification approach (cont.) 30 meta-replications of 100 simulations density loss 100,000 sim. 100 sim. mean (±1.3) average 10% highest (±4.4) 99 / 126

153 Using function identification approach (cont.) Also pointwise Fisher info.: h (y)/h(y) [ 0.5, 0.1] density loss 100,000 sim. 100 sim. mean (±1.2) average 10% highest (±4.0) 100 / 126

154 Using function identification approach (cont.) Also pointwise Fisher info.: h (x)/h(x) [ 0.35, 0.25] density loss 100,000 sim. 100 sim. mean (±0.5) average 10% highest (±1.7) 101 / 126

155 Building robustness 102 / 126

156 Diversity of estimates using KL-divergence Returning to exponential density Continuously diff., nonincreasing, nonnegative support Original Estimate Empirical Density KL = True Density KL = 0.01 KL = 0.1 density x 103 / 126

157 Queuing 104 / 126

158 M/M/1; 50% of customers delayed for fixed time X = customer time-in-service; 100 obs.; exp. epi-spline Soft info: lsc, X 0, pointwise Fisher, unimodal upper tail True Density Epi Spline Estimate Kernel Estimate Empirical Density density h(x) x 105 / 126

159 Deconvolution 106 / 126

160 True density Gamma(5,1) First-order epi-spline on [0, 23], N = 1000 Three sample points Contin., unimodal, convex tails, bounds on gradient jumps density X True Density X Epi-Spline Estimate X Sample data x 107 / 126

161 Also noisy observations 5000 observations of Y = X +W W independent normal noise; mean 0, stdev / 126

162 Also noisy observations 5000 observations of Y = X +W W independent normal noise; mean 0, stdev 3.2 Estimate h Y separately 108 / 126

163 Also noisy observations 5000 observations of Y = X +W W independent normal noise; mean 0, stdev 3.2 Estimate h Y separately h Y (y) h(x)h W (y x)dx for 101 y points 108 / 126

164 Also noisy observations 5000 observations of Y = X +W W independent normal noise; mean 0, stdev 3.2 Estimate h Y separately h Y (y) h(x)h W (y x)dx for 101 y points density X True Density X Epi-spline Estimate X Sample data Error density x 108 / 126

165 Uncertainty quantification 109 / 126

166 Dynamical system Recall the true density of the amplitude 1.5 density of X x 110 / 126

167 Density of amplitude Sample size 100; cont. diff.; unimodal tails, exp. epi-splines 1.5 True Density Epi Spline Estimate Kernel Estimate Sample 1 density x 111 / 126

168 Gradient information Gradient information for bijective g : IR IR Recall: If X = g(v), then h(x) = h V (g 1 (x))/ g (g 1 (x)) 112 / 126

169 Gradient information Gradient information for bijective g : IR IR Recall: If X = g(v), then h(x) = h V (g 1 (x))/ g (g 1 (x)) Present context without a bijection and data x i = g(v i ),g (v i ): h(x i ) h V(v i ) g (v i ) Value of pdf bounded from below at x i 112 / 126

170 Gradient information (cont.) Sample size True Density Epi Spline Estimate Kernel Estimate Sample 1 density x 113 / 126

171 Fluid dynamics Drag/lift estimates: High-fidelity RANSE solves (each 4 hours on 8 cores) Low-fidelity potential flow solves (each 5 sec on 1 core) 114 / 126

172 Predicting high-fidelity performance 898 high- and low-fidelity solves Drag/lift using high-fidelity solver Drag/lift using low-fidelity solver Learn from low-fidelity solves and avoid (many) high-fidelity solves 115 / 126

173 Density using high or low solves Exponential epi-splines of second order; mesh N = 50 Soft info: log-concavity and bounds on second-order derivatives 15 high 898 Sample low 800 density response 116 / 126

174 Estimating conditional error X =high-fidelity; Y =low-fidelity h X (x) = h X Y (x y)h Y (y)dy 117 / 126

175 Estimating conditional error X =high-fidelity; Y =low-fidelity h X (x) = h X Y (x y)h Y (y)dy Normal linear least-squares regression model on training data of size 50 h X Y (x y) 117 / 126

176 Estimating conditional error X =high-fidelity; Y =low-fidelity h X (x) = h X Y (x y)h Y (y)dy Normal linear least-squares regression model on training data of size 50 h X Y (x y) density 15 high 898 cond ls Sample low response 117 / 126

177 Information fusion Max likelihood using 10 high-fidelity simulations 0.5h X (x) h X Y (x y)h Y (y)dy 1.5h X (x) 118 / 126

178 Information fusion Max likelihood using 10 high-fidelity simulations 0.5h X (x) h X Y (x y)h Y (y)dy 1.5h X (x) density 15 high 898 cond ls Sample high 10 + cond 10 low response 118 / 126

179 Only 10 high-fidelity density 15 high 898 cond ls Sample high 10 + cond 10 high 10 low response 119 / 126

180 Uniform mixture density 120 / 126

181 Uniform mixture density Sample size 100; lsc, slope constraints; exp. epi-splines True Density Epi Spline Estimate Kernel Estimate Empirical Density density x 121 / 126

182 Bivariate normal density 122 / 126

183 Bivariate normal probability density 123 / 126

184 Bivariate normal probability density Curvature, log-concave, 25 sample points, exp. epi-spline 124 / 126

185 Conclusions Function identification problems: rich class 125 / 126

186 Conclusions Function identification problems: rich class lsc functions provide modeling flexibility 125 / 126

187 Conclusions Function identification problems: rich class lsc functions provide modeling flexibility Epi-convergence allows evolution of info./approx. 125 / 126

188 Conclusions Function identification problems: rich class lsc functions provide modeling flexibility Epi-convergence allows evolution of info./approx. Epi-splines central approximation tools 125 / 126

189 References Singham, Royset, & Wets, Density estimation of simulation output using exponential epi-splines Royset, Sukumar, & Wets, Uncertainty quantification using exponential epi-splines Royset & Wets, From data to assessments and decisions: epi-spline technology Royset & Wets, Fusion of hard and soft information in nonparametric density estimation Royset & Wets, Multivariate epi-splines and evolving function identification problems Feng, Rios, Ryan, Spurkel, Watson, Wets & Woodruff, Toward scalable stochastic unit commitment - Part 1 Rios, Wets & Woodruff, Multi-period forecasting with limited information and scenario generation with limited data Royset & Wets, On function identification problems 126 / 126

From Data to Assessments and Decisions: Epi-Spline Technology

From Data to Assessments and Decisions: Epi-Spline Technology Johannes O. Royset Department of Operations Research Naval Postgraduate School joroyset@nps.edu Roger J-B Wets Department of Mathematics University