How to Make the Most Out of Evolving Information: Function Identification Using Epi-Splines

Size: px
Start display at page:

Download "How to Make the Most Out of Evolving Information: Function Identification Using Epi-Splines"

Transcription

1 How to Make the Most Out of Evolving Information: Function Identification Using Epi-Splines Johannes O. Royset Operations Research, NPS in collaboration with L. Bonfiglio, S. Buttrey, S. Brizzolara, J. Carbaugh, I.Rios, D. Singham, K. Spurkel, G. Vernengo, J-P. Watson, R. Wets, D. Woodruff Smart Energy and Stochastic Optimization, ENPC ParisTech June 2015 This material is based upon work supported in part by the U.S. Army Research Laboratory and the U.S. Army Research Office under grants , W911NF and W911NF as well as by DARPA under HR / 126

2 Goal: estimate, predict, forecast 2 / 126

3 Copper prices Price[USD] Months 3 / 126

4 Copper prices Price[USD] Real prices E(hist+market) + σ E(hist+market) σ E(hist+market) Months 4 / 126

5 Hourly electricity loads 5 / 126

6 Hard information: data Observations Usually scarce, excessive, corrupted, uncertain 6 / 126

7 Soft information Indirect observations, external information Knowledge about structures established laws physical restrictions shapes, smoothness of curves 7 / 126

8 Evolving information data acquisition, information growth 8 / 126

9 Function Identification Problem Identify a function that best represents all available information 9 / 126

10 Applications of function identification approach energy natural resources financial markets image reconstruction uncertain quantification variograms nonparametric regression deconvolution density estimation 10 / 126

11 Outline Illustrations and formulations (slides 12-33) Framework (slides 34-60) Epi-splines and approximations (slides 61-77) Implementations (slides 78-85) Examples (slides ) 11 / 126

12 Illustrations and formulations 12 / 126

13 Identifying financial curves Estimate discount factor curve given instruments i = 1, 2,..., I and payments p i 0,p i 1,...,p i N i at times 0 = t i 0,t i 1,...,t i N i 13 / 126

14 Identifying financial curves Estimate discount factor curve given instruments i = 1, 2,..., I and payments p i 0,p i 1,...,p i N i at times 0 = t i 0,t i 1,...,t i N i Identify nonnegative, nonincreasing function f with f(0) = 1 N i f(tj i )pi j 0 for all i j=0 13 / 126

15 Identifying financial curves Estimate discount factor curve given instruments i = 1, 2,..., I and payments p i 0,p i 1,...,p i N i at times 0 = t i 0,t i 1,...,t i N i Identify nonnegative, nonincreasing function f with f(0) = 1 N i f(tj i )pi j 0 for all i j=0 Constrained infinite-dimensional optimization problem: I N i minimize f(tj)p i j i i=1 j=0 such that f(0) = 1, f 0, f 0 13 / 126

16 Day-ahead electricity load forecast Data: predicted temperature, dew point; observed load 14 / 126

17 Day-ahead load forecast problem Using a relevant data and weather forecast for tomorrow, determine tomorrow s load curve and its uncertainty 15 / 126

18 Day-ahead load forecast problem Using a relevant data and weather forecast for tomorrow, determine tomorrow s load curve and its uncertainty j = 1,2,...,J: days in data set 15 / 126

19 Day-ahead load forecast problem Using a relevant data and weather forecast for tomorrow, determine tomorrow s load curve and its uncertainty j = 1,2,...,J: days in data set h = 1,2,...,24: hours of the day 15 / 126

20 Day-ahead load forecast problem Using a relevant data and weather forecast for tomorrow, determine tomorrow s load curve and its uncertainty j = 1,2,...,J: days in data set h = 1,2,...,24: hours of the day t j h : predicted temperature in hour h of day j 15 / 126

21 Day-ahead load forecast problem Using a relevant data and weather forecast for tomorrow, determine tomorrow s load curve and its uncertainty j = 1,2,...,J: days in data set h = 1,2,...,24: hours of the day t j h : predicted temperature in hour h of day j d j h : predicted dew point in hour h of day j 15 / 126

22 Day-ahead load forecast problem Using a relevant data and weather forecast for tomorrow, determine tomorrow s load curve and its uncertainty j = 1,2,...,J: days in data set h = 1,2,...,24: hours of the day t j h : predicted temperature in hour h of day j d j h : predicted dew point in hour h of day j l j h : actual observed load in hour h of day j 15 / 126

23 Day-ahead load forecast problem (cont.) Functional regression model: l j h = f tmp (h)t j h +fdpt (h)d j h +ej h, regression coefficients f tmp and f dpt are functions of time e j h : error between the observed and predicted loads 16 / 126

24 Day-ahead load forecast problem (cont.) Functional regression model: l j h = f tmp (h)t j h +fdpt (h)d j h +ej h, regression coefficients f tmp and f dpt are functions of time e j h : error between the observed and predicted loads Regression problem: find functions f tmp and f dpt on [0,24] that J 24 minimize l j h [f tmp (h)t j h +fdpt (h)dh] j j=1 h=1 subject to smoothness, curvature conditions 16 / 126

25 Day-ahead load forecast problem (cont.) Functional regression model: l j h = f tmp (h)t j h +fdpt (h)d j h +ej h, regression coefficients f tmp and f dpt are functions of time e j h : error between the observed and predicted loads Regression problem: find functions f tmp and f dpt on [0,24] that J 24 minimize l j h [f tmp (h)t j h +fdpt (h)dh] j j=1 h=1 subject to smoothness, curvature conditions Constrained infinite-dimensional optimization problem 16 / 126

26 Day-ahead load forecast problem (cont.) Given temperature t and dew point d forecasts (functions of time) for tomorrow, load forecast becomes f tmp (h)t(h)+f dpt (h)d(h), h [0,24] 17 / 126

27 Day-ahead load forecast problem (cont.) Given temperature t and dew point d forecasts (functions of time) for tomorrow, load forecast becomes f tmp (h)t(h)+f dpt (h)d(h), h [0,24] But, what about uncertainty? 17 / 126

28 Estimation of errors For hour h: minimized errors e j h = lj h [f tmp (h)t j h +fdpt (h)d j h ], j = 1,...,J = samples from actual error density But that wouldn t capture the uncertainty! one would expect: h 18 / 126

29 Estimation of errors For hour h: minimized errors e j h = lj h [f tmp (h)t j h +fdpt (h)d j h ], j = 1,...,J = samples from actual error density But that wouldn t capture the uncertainty! one would expect: h Probability density estimation problem with very small data set 18 / 126

30 Estimation of errors (cont.) Probability density estimation problem: Given iid sample x 1,x 2,...,x ν, find a function h H that maximizes ν h(x i ) i=1 19 / 126

31 Estimation of errors (cont.) Probability density estimation problem: Given iid sample x 1,x 2,...,x ν, find a function h H that maximizes ν h(x i ) i=1 H = constraints on h: h 0, h(x)dx = 1, and more 19 / 126

32 Estimation of errors (cont.) Probability density estimation problem: Given iid sample x 1,x 2,...,x ν, find a function h H that maximizes ν h(x i ) i=1 H = constraints on h: h 0, h(x)dx = 1, and more Constrained infinite-dimensional optimization problem 19 / 126

33 Estimation of errors (cont.) 20 / 126

34 Generating load scenarios Also need to consider conditioning: if load above regression function early, then likely above later error data for density estimation reduced further 21 / 126

35 Generating load scenarios Also need to consider conditioning: if load above regression function early, then likely above later error data for density estimation reduced further 21 / 126

36 Forecast precision (Connecticut ) Season Mean % error Fall 3.99 Spring 2.73 Summer 4.19 Winter / 126

37 Uncertainty quantification (UQ) Engineering, biological, physical systems Input: random vector V ( known distribution) System function g; implicitly defined e.g. by simulation Output: random variable X = g(v) 23 / 126

38 Illustration of UQ challenges Output amplitude of dynamical system: 1.5 density of X x How to estimate densities like this with a small sample? 24 / 126

39 Probability density estimation Sample X 1,...,X ν : maximize ν i=1 h(xi ) s.t. h H ν H Sample size might be growing Constraint set H ν might be evolving 25 / 126

40 Evolving probability density estimation maximize ν i=1 (h(xi )) 1/ν s.t. h H ν H 26 / 126

41 Evolving probability density estimation maximize ν i=1 (h(xi )) 1/ν s.t. h H ν H maximize 1 ν ν i=1 logh(xi ) s.t. h H ν H 26 / 126

42 Evolving probability density estimation maximize ν i=1 (h(xi )) 1/ν s.t. h H ν H maximize 1 ν ν i=1 logh(xi ) s.t. h H ν H actual problem: maximize E[logh(X)] s.t. h H H 26 / 126

43 Evolving constraints Constraints: nonnegative support, smooth, curvature bound 1.5 True Density Epi Spline Estimate Kernel Estimate Empirical Density 1 density x MSE = (epi-spline) and (kernel) 27 / 126

44 Evolving constraints (cont.) Also log-concave 1.5 True Density Epi Spline Estimate Kernel Estimate Empirical Density 1 density x MSE = (epi-spline) and (kernel) 28 / 126

45 Evolving constraints (cont.) Also nonincreasing 1.5 True Density Epi Spline Estimate Kernel Estimate Empirical Density 1 density x MSE = (epi-spline) and (kernel) 29 / 126

46 Evolving constraints (cont.) Also slope bounds 1.5 True Density Epi Spline Estimate Kernel Estimate Empirical Density 1 density x MSE = (epi-spline) and (kernel) 30 / 126

47 Take-aways so far Challenges in forecasting and estimation are function identification problems Day-ahead forecasting of electricity demand involves functional regression to get trend probability density estimation to get errors Soft information supplements hard information (data) 31 / 126

48 Questions? 32 / 126

49 Outline Illustrations and formulations (slides 12-33) Framework (slides 34-60) Epi-splines and approximations (slides 61-77) Implementations (slides 78-85) Examples (slides ) 33 / 126

50 Framework 34 / 126

51 Actual problem Constrained infinite-dimensional optimization problem minψ(f) such that f F F some function space 35 / 126

52 Actual problem Constrained infinite-dimensional optimization problem minψ(f) such that f F F some function space criterion ψ (norms, error measures, etc) 35 / 126

53 Actual problem Constrained infinite-dimensional optimization problem minψ(f) such that f F F some function space criterion ψ (norms, error measures, etc) feasible set F (shape restrictions, external info) 35 / 126

54 Actual problem Constrained infinite-dimensional optimization problem minψ(f) such that f F F some function space criterion ψ (norms, error measures, etc) feasible set F (shape restrictions, external info) subspace of interest F (continuity, smoothness) 35 / 126

55 Special case: linear regression Food Expenditure Superquantile Regression 0.75 Quantile Regression Least squares Regression find affine f that min j Household Income (y j f(x j )) 2 or other error measures 36 / 126

56 Special case: interpolation data smoothing spline interpolating spline min f 2 such that f(x j ) = y j for all j and f Sobolev space 37 / 126

57 Special case: smoothing data smoothing spline interpolating spline min j (y j f(x j )) 2 +λ f 2 such that f Sobolev space 38 / 126

58 Evolving problems Evolving infinite-dimensional optimization problems minψ ν (f) such that f F ν F some function space evolving/approximating criterion ψ ν evolving/approximating feasible set F ν 39 / 126

59 Evolving problems Evolving infinite-dimensional optimization problems minψ ν (f) such that f F ν F some function space evolving/approximating criterion ψ ν evolving/approximating feasible set F ν 40 / 126

60 Function space extended real-valued lower semicontinuous functions on IR n 41 / 126

61 Function space extended real-valued lower semicontinuous functions on IR n lower semicont. f x 41 / 126

62 Function space extended real-valued lower semicontinuous functions on IR n lower semicont. f x (lsc-fcns(ir n ),dl): complete separable metric, dl = epi-distance 41 / 126

63 Modeling possibilities with lsc functions functions with jumps and high growth response surface building with implicit constraints system identification with subsequent minimization nonlinear transformations requiring ± functions with unbounded domain 42 / 126

64 Jumps Fits by polynomial and lsc function 43 / 126

65 Modeling possibilities with lsc functions functions with jumps and high growth response surface building with implicit constraints system identification with subsequent minimization nonlinear transformations requiring ± functions with unbounded domain 44 / 126

66 Nonlinear transformations Recall: probability density estimation with sample X 1,...,X ν : maximize 1 ν ν i=1 logh(xi ) s.t. h H ν H including h 0 45 / 126

67 Nonlinear transformations Recall: probability density estimation with sample X 1,...,X ν : maximize 1 ν ν i=1 logh(xi ) s.t. h H ν H including h 0 Exponential transformation: h(x) = exp( s(x)) 45 / 126

68 Nonlinear transformations Recall: probability density estimation with sample X 1,...,X ν : maximize 1 ν ν i=1 logh(xi ) s.t. h H ν H including h 0 Exponential transformation: h(x) = exp( s(x)) minimize ν i=1 s(xi ) s.t. s S ν S excluding nonnegativity 45 / 126

69 Nonlinear transformations Recall: probability density estimation with sample X 1,...,X ν : maximize 1 ν ν i=1 logh(xi ) s.t. h H ν H including h 0 Exponential transformation: h(x) = exp( s(x)) minimize ν i=1 s(xi ) s.t. s S ν S excluding nonnegativity h log-concave s convex 45 / 126

70 Nonlinear transformations Recall: probability density estimation with sample X 1,...,X ν : maximize 1 ν ν i=1 logh(xi ) s.t. h H ν H including h 0 Exponential transformation: h(x) = exp( s(x)) minimize ν i=1 s(xi ) s.t. s S ν S excluding nonnegativity h log-concave s convex If s = c( ),r, then certain expression linear in r 45 / 126

71 Nonlinear transformations Recall: probability density estimation with sample X 1,...,X ν : maximize 1 ν ν i=1 logh(xi ) s.t. h H ν H including h 0 Exponential transformation: h(x) = exp( s(x)) minimize ν i=1 s(xi ) s.t. s S ν S excluding nonnegativity h log-concave s convex If s = c( ),r, then certain expression linear in r But need s to take on for h = exp( s( )) to vanish 45 / 126

72 Identifying functions with unbounded domains Complications for standard spaces 46 / 126

73 Identifying functions with unbounded domains Complications for standard spaces Is f ν near f 0? / x 46 / 126

74 Identifying functions with unbounded domains Complications for standard spaces Is f ν near f 0? / x Not in L p -norm, but epi-distance 1/ν 46 / 126

75 Modeling possibilities with lsc functions functions with jumps and high growth response surface building with implicit constraints system identification with subsequent minimization nonlinear transformations requiring ± functions with unbounded domain 47 / 126

76 Epi-graph f x 48 / 126

77 Epi-graph f epi f x 49 / 126

78 Geometric view of lsc functions f lsc-fcns(ir n ) epi f nonemptly closed subset of IR n+1 50 / 126

79 Geometric view of lsc functions f lsc-fcns(ir n ) epi f nonemptly closed subset of IR n+1 Notation: ρib = IB(0,ρ) = origin-centered ball with radius ρ d(y,s) = inf y S y y = distance between y and S 50 / 126

80 Distances between epi-graphs epi f epi g x 51 / 126

81 Distances between epi-graphs (cont.) ρ-epi-distance = dl ρ (f,g) = sup d( x,epi f) d( x,epi g), ρ 0 x ρib 52 / 126

82 Distances between epi-graphs (cont.) ρ-epi-distance = dl ρ (f,g) = sup d( x,epi f) d( x,epi g), ρ 0 x ρib g f x 52 / 126

83 Distances between epi-graphs (cont.) ρ-epi-distance = dl ρ (f,g) = sup d( x,epi f) d( x,epi g), ρ 0 x ρib epi g epi f =(, x) x dl ρ (f,g) = δ 53 / 126

84 Distances between epi-graphs (cont.) ρ-epi-distance = dl ρ (f,g) = sup d( x,epi f) d( x,epi g), ρ 0 x ρib epi (, ) epi (, epi ) (, epi ) = (, ) 54 / 126

85 Metric on the lsc functions ρ-epi-distance dl ρ is a pseudo-metric on lsc-fcns(ir n ) epi-distance = dl(f,g) = dl is a metric on lsc-fcns(ir n ) 0 dl ρ (f,g)e ρ dρ The epi-distance induces the epi-topology on lsc-fcns(ir n ) (lsc-fcns(ir n ),dl) is a complete separable metric space (Polish) 55 / 126

86 Epi-convergence f ν lsc-fcns(ir n ) epi-converges to f if dl(f ν,f) 0 56 / 126

87 Actual problem Constrained infinite-dimensional optimization problem minψ(f) such that f F F lsc-fcns(ir n ) criterion ψ (norms, error measures, etc) feasible set F (shape restrictions, external info) subspace of interest F (continuity, smoothness) 57 / 126

88 Take-aways in this section Actual problem: find best extended real-valued lower semicontinuous function captures regression, curve fitting, interpolation, density estimation etc: allows functions on IR n, jumps, transformation requiring ± Evolving problem due to changes in information and approximations Theory about lsc functions: metric space under epi-distance epi-distance = distance between epi-graphs 58 / 126

89 Questions? 59 / 126

90 Outline Illustrations and formulations (slides 12-33) Framework (slides 34-60) Epi-splines and approximations (slides 61-77) Implementations (slides 78-85) Examples (slides ) 60 / 126

91 Epi-splines and approximations 61 / 126

92 Epi-splines = piecewise polynomial + lower semicont. Piecewise polynomial not automatic; must be constructed 62 / 126

93 Epi-splines = piecewise polynomial + lower semicont. Piecewise polynomial not automatic; must be constructed s(x) m 1 m 2 m 3 m 4 m 5 m 6 m N-1 x 62 / 126

94 Epi-splines = piecewise polynomial + lower semicont. Piecewise polynomial not automatic; must be constructed s(x) m 1 m 2 m 3 m 4 m 5 m 6 m N-1 Optimization over new class of piecewise polynomial functions x 62 / 126

95 n-dim epi-splines 63 / 126

96 n-dim epi-splines (cont.) Epi-spline s of order p defined on IR n with partition R = {R k } N k=1 is a real-valued function that on each R k, k = 1,...,N, is polynomial of total degree p and for every x IR n, has s(x) = liminf x x s(x ) 64 / 126

97 n-dim epi-splines (cont.) Epi-spline s of order p defined on IR n with partition R = {R k } N k=1 is a real-valued function that on each R k, k = 1,...,N, is polynomial of total degree p and for every x IR n, has s(x) = liminf x x s(x ) e-spl p n(r) = epi-splines of order p on IR n with partition R 64 / 126

98 Evolving and approximating problems minψ ν (s) such that s S ν = F ν S ν 65 / 126

99 Evolving and approximating problems minψ ν (s) such that s S ν = F ν S ν subspace of interest S ν e-spl pν n (R ν ) F 65 / 126

100 Evolving and approximating problems minψ ν (s) such that s S ν = F ν S ν subspace of interest S ν e-spl pν n (R ν ) F evolving/approximate feasible set F ν 65 / 126

101 Evolving and approximating problems minψ ν (s) such that s S ν = F ν S ν subspace of interest S ν e-spl pν n (R ν ) F evolving/approximate feasible set F ν evolving/approximate criterion ψ ν 65 / 126

102 Function identification framework Evolving Problems - information growth - criterion and constraint approx. - function approximations solutions = epi-splines Actual Problem - complete information - infinite dimensional - conceptual solutions consistency, rates 66 / 126

103 Convergence of evolving problems Epi-convergence of functionals on a metric space: 67 / 126

104 Convergence of evolving problems Epi-convergence of functionals on a metric space: Definition: {ψ ν : S ν IR} ν=1 epi-converge to ψ : F IR if and only if (i) for every s ν f lsc-fcns(ir n ), with s ν S ν, we have liminfψ ν (s ν ) ψ(f) if f F and ψ ν (s ν ) otherwise; (ii) for every f F, there exists {s ν } ν=1, with sν S ν, such that s ν f and limsupψ ν (s ν ) ψ(f). 67 / 126

105 Convergence of evolving problems Epi-convergence of functionals on a metric space: Definition: {ψ ν : S ν IR} ν=1 epi-converge to ψ : F IR if and only if (i) for every s ν f lsc-fcns(ir n ), with s ν S ν, we have liminfψ ν (s ν ) ψ(f) if f F and ψ ν (s ν ) otherwise; (ii) for every f F, there exists {s ν } ν=1, with sν S ν, such that s ν f and limsupψ ν (s ν ) ψ(f). Equivalent to previous definition 67 / 126

106 Convergence of evolving problems Epi-convergence of functionals on a metric space: Definition: {ψ ν : S ν IR} ν=1 epi-converge to ψ : F IR if and only if (i) for every s ν f lsc-fcns(ir n ), with s ν S ν, we have liminfψ ν (s ν ) ψ(f) if f F and ψ ν (s ν ) otherwise; (ii) for every f F, there exists {s ν } ν=1, with sν S ν, such that s ν f and limsupψ ν (s ν ) ψ(f). Equivalent to previous definition Also say that the evolving optimization problems epi-converge to the actual problem. 67 / 126

107 Consequence of epi-convergence If evolving problems epi-converge to the actual problem, then ( ) lim sup inf s S ψν (s) inf ψ(f). ν f F 68 / 126

108 Consequence of epi-convergence If evolving problems epi-converge to the actual problem, then ( ) lim sup inf s S ψν (s) inf ψ(f). ν f F Moreover, if s k are optimal for the evolving problems min s S ν k ψ ν k(s) and s k f 0, then f 0 is optimal for the actual problem and lim inf ψ ν k (s) = inf ψ(f) = ψ(f 0 ). k s S ν k f F 68 / 126

109 When will evolving problems epi-converge to actual? minψ(f) minψ ν (s) such that f F F lsc-fcns(ir n ) such that s S ν = F ν S ν Theorem: A sufficient condition for epi-convergence is ψ ν converges continuously to ψ relative to F Epi-splines dense in lsc functions F ν set-converges to F F is solid 69 / 126

110 Epi-splines dense in lsc functions {R ν } ν=1, with Rν = {R ν 1,...,Rν N ν }, is an infinite refinement if for every x IR n and ǫ > 0, there exists a positive integer ν s.t. R ν k IB(x,ǫ) for every ν ν and k satisfying x clrν k. 70 / 126

111 Epi-splines dense in lsc functions {R ν } ν=1, with Rν = {R ν 1,...,Rν N ν }, is an infinite refinement if for every x IR n and ǫ > 0, there exists a positive integer ν s.t. R ν k IB(x,ǫ) for every ν ν and k satisfying x clrν k. Theorem: For any p = 0,1,2,..., and infinite refinement {R ν } ν=1, e-spl p n(r ν ) is dense in lsc-fcns(ir n ) ν=1 70 / 126

112 Dense in continuous functions under simplex partitioning Definition: A simplex S in IR n is the convex hull of n+1 points x 0,x 1,...,x n IR n, with x 1 x 0, x 2 x 0,..., x n x 0 linearly independent. 71 / 126

113 Dense in continuous functions under simplex partitioning Definition: A simplex S in IR n is the convex hull of n+1 points x 0,x 1,...,x n IR n, with x 1 x 0, x 2 x 0,..., x n x 0 linearly independent. Definition: A partition R 1,R 2,...,R N of IR n is a simplex partition of IR n if clr 1,..., clr N are mostly simplexes. 71 / 126

114 Dense in cnts fcns under simplex partitioning (cont.) Theorem: For any p = 1,2,.., and {R ν } ν=1, an infinite refinement of IR n consisting of simplex partitions of IR n, e-spl p n(r ν ) C 0 (IR n ) is dense in C 0 (IR n ). ν=1 72 / 126

115 Rates in univariate probability density estimation For convex and finite-dimensional estimation problem: Theorem: For any correct soft information, ν 1/2 d KL (h 0 h ν ǫ ) = O p(1) for some h ν ǫ = near-optimal solution 73 / 126

116 Decomposition Theorem: For every s e-spl p n(r), with n > p 1 and R = {R k } N k=1, there exist q k,i poly p (IR n 1 ), i = 1,2,...,n and k = 1,2,...,N, such that s(x) = n q k,i (x i ), for all x R k. i=1 an n-dim epi-spline is the sum of n, (n 1)-dim epi-splines 74 / 126

117 Take-aways in this section Epi-splines are piecewise polynomials defined on arbitrary partition of IR n Only lower-semicontinuity required (not smoothness) Dense in space of extended real-valued lower-semicontinuous functions under epi-distance (i.e., epi-splines can approximate every lsc function to an arbitrary accuracy) Solutions of evolving problem (in terms of epi-splines) are approximate solutions of actual problem 75 / 126

118 Questions? 76 / 126

119 Outline Illustrations and formulations (slides 12-33) Framework (slides 34-60) Epi-splines and approximations (slides 61-77) Implementations (slides 78-85) Examples (slides ) 77 / 126

120 Implementations 78 / 126

121 Implementation considerations Selection of epi-spline order and composition h = exp( s) Selection of partition: go fine! Implementation of criterion functional and soft info 79 / 126

122 Implementation considerations Selection of epi-spline order and composition h = exp( s) Selection of partition: go fine! Implementation of criterion functional and soft info Provide details for one-dimensional epi-splines of order 1 Focusing on criteria and constraints for probability densities 79 / 126

123 Implementation of criteria functionals Probability densities h e-spl 1 (m), with mesh m = {m 0,m 1,...,m N }, < m 0 < m 1 <... < m N < 80 / 126

124 Implementation of criteria functionals Probability densities h e-spl 1 (m), with mesh m = {m 0,m 1,...,m N }, < m 0 < m 1 <... < m N < First-order epi-splines (piecewise linear): h(x) = a k 0 +ak x for x (m k 1,m k ) 80 / 126

125 Implementation of criteria functionals (cont.) Log-likelihood for observations x 1,...,x ν : ν log h(x i ) = i=1 ν i=1 log(a k i 0 +ak i x i ), where k i such that x i (m ki 1,m ki ) 81 / 126

126 Implementation of criteria functionals (cont.) Log-likelihood for observations x 1,...,x ν : ν log h(x i ) = i=1 ν i=1 log(a k i 0 +ak i x i ), where k i such that x i (m ki 1,m ki ) Entropy: h(x)logh(x)dx = N k=1 mk m k 1 (a k 0 +ak x)log(a k 0 +ak x) 81 / 126

127 Implementation of criteria functionals (cont.) Log-likelihood for observations x 1,...,x ν : ν log h(x i ) = i=1 ν i=1 log(a k i 0 +ak i x i ), where k i such that x i (m ki 1,m ki ) Entropy: h(x)logh(x)dx = N k=1 mk m k 1 (a k 0 +ak x)log(a k 0 +ak x) Both concave in epi-parameters a k 0 and ak, k = 1,...,N 81 / 126

128 Soft information Nonnegativity: h 0 if a k 0 +ak m k 1 0 and a k 0 +ak m k 0, k = 1,...,N 82 / 126

129 Soft information Nonnegativity: h 0 if a k 0 +ak m k 1 0 and a k 0 +ak m k 0, k = 1,...,N Integrate to one: h(x)dx = N a0 k (m k m k 1 )+ k=1 N k=1 a k 2 (m2 k m2 k 1 ) = 1 82 / 126

130 Soft information Nonnegativity: h 0 if a k 0 +ak m k 1 0 and a k 0 +ak m k 0, k = 1,...,N Integrate to one: h(x)dx = N a0 k (m k m k 1 )+ k=1 N k=1 a k 2 (m2 k m2 k 1 ) = 1 Continuity: a k 0 +ak m k = a k+1 0 +a k+1 m k for k = 1,...,N / 126

131 Soft information Nonnegativity: h 0 if a k 0 +ak m k 1 0 and a k 0 +ak m k 0, k = 1,...,N Integrate to one: h(x)dx = N a0 k (m k m k 1 )+ k=1 N k=1 a k 2 (m2 k m2 k 1 ) = 1 Continuity: a k 0 +ak m k = a k+1 0 +a k+1 m k for k = 1,...,N 1. Log-concavity, convexity, monotonicity 82 / 126

132 Soft information (cont.) Deconvolution Y = X +W: Given densities h W and h Y, N mk h Y (y) = h(x)h W (y x)dx = a0 k h W (y x)dx k=1 m k 1 N mk + a k xh W (y x)dx for all y m k 1 k=1 83 / 126

133 Soft information (cont.) Deconvolution Y = X +W: Given densities h W and h Y, N mk h Y (y) = h(x)h W (y x)dx = a0 k h W (y x)dx k=1 m k 1 N mk + a k xh W (y x)dx for all y m k 1 k=1 Inverse problem Y = g(x): Given v q = y q h Y (y)dy, q = 1,2,... v q = N M k [g(x)] q h(x)dx w jk [g(x jk )] q (a0 k +a k x jk ) k=1 j=1 83 / 126

134 Soft information (cont.) Deconvolution Y = X +W: Given densities h W and h Y, N mk h Y (y) = h(x)h W (y x)dx = a0 k h W (y x)dx k=1 m k 1 N mk + a k xh W (y x)dx for all y m k 1 k=1 Inverse problem Y = g(x): Given v q = y q h Y (y)dy, q = 1,2,... v q = N M k [g(x)] q h(x)dx w jk [g(x jk )] q (a0 k +a k x jk ) k=1 j=1 Convex optimization problems 83 / 126

135 Software, references Papers and tutorials: Software for univariate probability density estimation Matlab toolbox R toolbox (S. Buttrey) 84 / 126

136 Outline Illustrations and formulations (slides 12-33) Framework (slides 34-60) Epi-splines and approximations (slides 61-77) Implementations (slides 78-85) Examples (slides ) 85 / 126

137 Examples: forecasting and fitting 86 / 126

138 Copper price forecast Stochastic differential equation: estimate drift, volatility Steps: identify discount factor curve using epi-splines and futures market information = future spot prices identify drift using epi-splines, historical data, future spot prices, etc identify volatility using epi-splines and observed/estimated errors 87 / 126

139 Copper price forecast (cont.) Price[USD] Real prices E(hist+market) + σ E(hist+market) σ E(hist+market) Months 88 / 126

140 Surface reconstruction f(x) = (cos(πx 1 )+cos(πx 2 )) 3 on [ 3,3] 2 continuity; second-order epi-splines; N = 400; 900 uniform points Actual function and epi-spline approximation 89 / 126

141 Surface reconstruction f(x) = sin(π x )/(π x ) for x [ 5,5] 2 \{0}, f(0) = 1 cont. diff.; second-order epi-splines; N = 225; 600 random points actual surface x 2 x x x Actual function and epi-spline approximation epi spline estimate / 126

142 Examples: density estimation 91 / 126

143 Examples of probability density estimation Find a density h that maximizes log-likelihood of observations and satisfies soft information Examples: earthquake losses, queuing, robustness, deconvolution, UQ, mixture, bivariate densities 92 / 126

144 Earthquake losses 93 / 126

145 Earthquake losses, Vancouver Region* Comprehensive damage model (284 random variables) 100,000 simulations of 50-year loss *Data from Mahsuli, 2012; see also Mahsuli & Haukaas, / 126

146 Earthquake losses (cont.) Probability density of 50-year loss (billion CAD) density loss 95 / 126

147 Use fewer simulations? Using 100 simulations only and kernel estimator 96 / 126

148 Use fewer simulations? Using 100 simulations only and kernel estimator density loss 96 / 126

149 Using function identification approach Max likelihood of 100 simulations; second-order exp. epi-splines Eng. knowledge: nonincreasing, smooth, nonnegative support 97 / 126

150 Using function identification approach Max likelihood of 100 simulations; second-order exp. epi-splines Eng. knowledge: nonincreasing, smooth, nonnegative support sample 100 sample 100 kernel 0.25 density loss 97 / 126

151 Using function identification approach (cont.) Varying number of simulations; same soft information sample sample 1000 sample 100 sample 10 sample 0.2 density loss 98 / 126

152 Using function identification approach (cont.) 30 meta-replications of 100 simulations density loss 100,000 sim. 100 sim. mean (±1.3) average 10% highest (±4.4) 99 / 126

153 Using function identification approach (cont.) Also pointwise Fisher info.: h (y)/h(y) [ 0.5, 0.1] density loss 100,000 sim. 100 sim. mean (±1.2) average 10% highest (±4.0) 100 / 126

154 Using function identification approach (cont.) Also pointwise Fisher info.: h (x)/h(x) [ 0.35, 0.25] density loss 100,000 sim. 100 sim. mean (±0.5) average 10% highest (±1.7) 101 / 126

155 Building robustness 102 / 126

156 Diversity of estimates using KL-divergence Returning to exponential density Continuously diff., nonincreasing, nonnegative support Original Estimate Empirical Density KL = True Density KL = 0.01 KL = 0.1 density x 103 / 126

157 Queuing 104 / 126

158 M/M/1; 50% of customers delayed for fixed time X = customer time-in-service; 100 obs.; exp. epi-spline Soft info: lsc, X 0, pointwise Fisher, unimodal upper tail True Density Epi Spline Estimate Kernel Estimate Empirical Density density h(x) x 105 / 126

159 Deconvolution 106 / 126

160 True density Gamma(5,1) First-order epi-spline on [0, 23], N = 1000 Three sample points Contin., unimodal, convex tails, bounds on gradient jumps density X True Density X Epi-Spline Estimate X Sample data x 107 / 126

161 Also noisy observations 5000 observations of Y = X +W W independent normal noise; mean 0, stdev / 126

162 Also noisy observations 5000 observations of Y = X +W W independent normal noise; mean 0, stdev 3.2 Estimate h Y separately 108 / 126

163 Also noisy observations 5000 observations of Y = X +W W independent normal noise; mean 0, stdev 3.2 Estimate h Y separately h Y (y) h(x)h W (y x)dx for 101 y points 108 / 126

164 Also noisy observations 5000 observations of Y = X +W W independent normal noise; mean 0, stdev 3.2 Estimate h Y separately h Y (y) h(x)h W (y x)dx for 101 y points density X True Density X Epi-spline Estimate X Sample data Error density x 108 / 126

165 Uncertainty quantification 109 / 126

166 Dynamical system Recall the true density of the amplitude 1.5 density of X x 110 / 126

167 Density of amplitude Sample size 100; cont. diff.; unimodal tails, exp. epi-splines 1.5 True Density Epi Spline Estimate Kernel Estimate Sample 1 density x 111 / 126

168 Gradient information Gradient information for bijective g : IR IR Recall: If X = g(v), then h(x) = h V (g 1 (x))/ g (g 1 (x)) 112 / 126

169 Gradient information Gradient information for bijective g : IR IR Recall: If X = g(v), then h(x) = h V (g 1 (x))/ g (g 1 (x)) Present context without a bijection and data x i = g(v i ),g (v i ): h(x i ) h V(v i ) g (v i ) Value of pdf bounded from below at x i 112 / 126

170 Gradient information (cont.) Sample size True Density Epi Spline Estimate Kernel Estimate Sample 1 density x 113 / 126

171 Fluid dynamics Drag/lift estimates: High-fidelity RANSE solves (each 4 hours on 8 cores) Low-fidelity potential flow solves (each 5 sec on 1 core) 114 / 126

172 Predicting high-fidelity performance 898 high- and low-fidelity solves Drag/lift using high-fidelity solver Drag/lift using low-fidelity solver Learn from low-fidelity solves and avoid (many) high-fidelity solves 115 / 126

173 Density using high or low solves Exponential epi-splines of second order; mesh N = 50 Soft info: log-concavity and bounds on second-order derivatives 15 high 898 Sample low 800 density response 116 / 126

174 Estimating conditional error X =high-fidelity; Y =low-fidelity h X (x) = h X Y (x y)h Y (y)dy 117 / 126

175 Estimating conditional error X =high-fidelity; Y =low-fidelity h X (x) = h X Y (x y)h Y (y)dy Normal linear least-squares regression model on training data of size 50 h X Y (x y) 117 / 126

176 Estimating conditional error X =high-fidelity; Y =low-fidelity h X (x) = h X Y (x y)h Y (y)dy Normal linear least-squares regression model on training data of size 50 h X Y (x y) density 15 high 898 cond ls Sample low response 117 / 126

177 Information fusion Max likelihood using 10 high-fidelity simulations 0.5h X (x) h X Y (x y)h Y (y)dy 1.5h X (x) 118 / 126

178 Information fusion Max likelihood using 10 high-fidelity simulations 0.5h X (x) h X Y (x y)h Y (y)dy 1.5h X (x) density 15 high 898 cond ls Sample high 10 + cond 10 low response 118 / 126

179 Only 10 high-fidelity density 15 high 898 cond ls Sample high 10 + cond 10 high 10 low response 119 / 126

180 Uniform mixture density 120 / 126

181 Uniform mixture density Sample size 100; lsc, slope constraints; exp. epi-splines True Density Epi Spline Estimate Kernel Estimate Empirical Density density x 121 / 126

182 Bivariate normal density 122 / 126

183 Bivariate normal probability density 123 / 126

184 Bivariate normal probability density Curvature, log-concave, 25 sample points, exp. epi-spline 124 / 126

185 Conclusions Function identification problems: rich class 125 / 126

186 Conclusions Function identification problems: rich class lsc functions provide modeling flexibility 125 / 126

187 Conclusions Function identification problems: rich class lsc functions provide modeling flexibility Epi-convergence allows evolution of info./approx. 125 / 126

188 Conclusions Function identification problems: rich class lsc functions provide modeling flexibility Epi-convergence allows evolution of info./approx. Epi-splines central approximation tools 125 / 126

189 References Singham, Royset, & Wets, Density estimation of simulation output using exponential epi-splines Royset, Sukumar, & Wets, Uncertainty quantification using exponential epi-splines Royset & Wets, From data to assessments and decisions: epi-spline technology Royset & Wets, Fusion of hard and soft information in nonparametric density estimation Royset & Wets, Multivariate epi-splines and evolving function identification problems Feng, Rios, Ryan, Spurkel, Watson, Wets & Woodruff, Toward scalable stochastic unit commitment - Part 1 Rios, Wets & Woodruff, Multi-period forecasting with limited information and scenario generation with limited data Royset & Wets, On function identification problems 126 / 126

From Data to Assessments and Decisions: Epi-Spline Technology

From Data to Assessments and Decisions: Epi-Spline Technology From Data to Assessments and Decisions: Epi-Spline Technology Johannes O. Royset Department of Operations Research Naval Postgraduate School joroyset@nps.edu Roger J-B Wets Department of Mathematics University

More information

MULTIVARIATE EPI-SPLINES AND EVOLVING FUNCTION IDENTIFICATION PROBLEMS

MULTIVARIATE EPI-SPLINES AND EVOLVING FUNCTION IDENTIFICATION PROBLEMS MULTIVARIATE EPI-SPLINES AND EVOLVING FUNCTION IDENTIFICATION PROBLEMS Johannes O. Royset Operations Research Department Naval Postgraduate School joroyset@nps.edu Roger J-B Wets Department of Mathematics

More information

Statistical Estimation: Data & Non-data Information

Statistical Estimation: Data & Non-data Information Statistical Estimation: Data & Non-data Information Roger J-B Wets University of California, Davis & M.Casey @ Raytheon G.Pflug @ U. Vienna, X. Dong @ EpiRisk, G-M You @ EpiRisk. a little background Decision

More information

VARIATIONAL THEORY FOR OPTIMIZATION UNDER STOCHASTIC AMBIGUITY

VARIATIONAL THEORY FOR OPTIMIZATION UNDER STOCHASTIC AMBIGUITY SIAM J. OPTIM. Vol. 27, No. 2, pp. 1118 1149 c 2017 Work carried out by an employee of the US Government VARIATIONAL THEORY FOR OPTIMIZATION UNDER STOCHASTIC AMBIGUITY JOHANNES O. ROYSET AND ROGER J.-B.

More information

DENSITY ESTIMATION OF SIMULATION OUTPUT USING EXPONENTIAL EPI-SPLINES

DENSITY ESTIMATION OF SIMULATION OUTPUT USING EXPONENTIAL EPI-SPLINES Proceedings of the 213 Winter Simulation Conference R. Pasupathy, S.-H. Kim, A. Tolk, R. Hill, and M. E. Kuhl, eds. DENSITY ESTIMATION OF SIMULATION OUTPUT USING EXPONENTIAL EPI-SPLINES Dashi I. Singham

More information

Information geometry for bivariate distribution control

Information geometry for bivariate distribution control Information geometry for bivariate distribution control C.T.J.Dodson + Hong Wang Mathematics + Control Systems Centre, University of Manchester Institute of Science and Technology Optimal control of stochastic

More information

Quantifying Stochastic Model Errors via Robust Optimization

Quantifying Stochastic Model Errors via Robust Optimization Quantifying Stochastic Model Errors via Robust Optimization IPAM Workshop on Uncertainty Quantification for Multiscale Stochastic Systems and Applications Jan 19, 2016 Henry Lam Industrial & Operations

More information

Generalized Convexity in Economics

Generalized Convexity in Economics Generalized Convexity in Economics J.E. Martínez-Legaz Universitat Autònoma de Barcelona, Spain 2nd Summer School: Generalized Convexity Analysis Introduction to Theory and Applications JULY 19, 2008 KAOHSIUNG

More information

Introduction to Empirical Processes and Semiparametric Inference Lecture 13: Entropy Calculations

Introduction to Empirical Processes and Semiparametric Inference Lecture 13: Entropy Calculations Introduction to Empirical Processes and Semiparametric Inference Lecture 13: Entropy Calculations Michael R. Kosorok, Ph.D. Professor and Chair of Biostatistics Professor of Statistics and Operations Research

More information

Extreme Abridgment of Boyd and Vandenberghe s Convex Optimization

Extreme Abridgment of Boyd and Vandenberghe s Convex Optimization Extreme Abridgment of Boyd and Vandenberghe s Convex Optimization Compiled by David Rosenberg Abstract Boyd and Vandenberghe s Convex Optimization book is very well-written and a pleasure to read. The

More information

Lopsided Convergence: an Extension and its Quantification

Lopsided Convergence: an Extension and its Quantification Lopsided Convergence: an Extension and its Quantification Johannes O. Royset Operations Research Department Naval Postgraduate School joroyset@nps.edu Roger J-B Wets Department of Mathematics University

More information

Constrained Optimization and Lagrangian Duality

Constrained Optimization and Lagrangian Duality CIS 520: Machine Learning Oct 02, 2017 Constrained Optimization and Lagrangian Duality Lecturer: Shivani Agarwal Disclaimer: These notes are designed to be a supplement to the lecture. They may or may

More information

Gaussian processes for inference in stochastic differential equations

Gaussian processes for inference in stochastic differential equations Gaussian processes for inference in stochastic differential equations Manfred Opper, AI group, TU Berlin November 6, 2017 Manfred Opper, AI group, TU Berlin (TU Berlin) inference in SDE November 6, 2017

More information

Optimality Functions and Lopsided Convergence

Optimality Functions and Lopsided Convergence Optimality Functions and Lopsided Convergence Johannes O. Royset Operations Research Department Naval Postgraduate School joroyset@nps.edu Roger J-B Wets Department of Mathematics University of California,

More information

4th Preparation Sheet - Solutions

4th Preparation Sheet - Solutions Prof. Dr. Rainer Dahlhaus Probability Theory Summer term 017 4th Preparation Sheet - Solutions Remark: Throughout the exercise sheet we use the two equivalent definitions of separability of a metric space

More information

Nonparametric estimation of. s concave and log-concave densities: alternatives to maximum likelihood

Nonparametric estimation of. s concave and log-concave densities: alternatives to maximum likelihood Nonparametric estimation of s concave and log-concave densities: alternatives to maximum likelihood Jon A. Wellner University of Washington, Seattle Cowles Foundation Seminar, Yale University November

More information

The Learning Problem and Regularization Class 03, 11 February 2004 Tomaso Poggio and Sayan Mukherjee

The Learning Problem and Regularization Class 03, 11 February 2004 Tomaso Poggio and Sayan Mukherjee The Learning Problem and Regularization 9.520 Class 03, 11 February 2004 Tomaso Poggio and Sayan Mukherjee About this class Goal To introduce a particularly useful family of hypothesis spaces called Reproducing

More information

PDEs in Image Processing, Tutorials

PDEs in Image Processing, Tutorials PDEs in Image Processing, Tutorials Markus Grasmair Vienna, Winter Term 2010 2011 Direct Methods Let X be a topological space and R: X R {+ } some functional. following definitions: The mapping R is lower

More information

Steiner s formula and large deviations theory

Steiner s formula and large deviations theory Steiner s formula and large deviations theory Venkat Anantharam EECS Department University of California, Berkeley May 19, 2015 Simons Conference on Networks and Stochastic Geometry Blanton Museum of Art

More information

HW1 solutions. 1. α Ef(x) β, where Ef(x) is the expected value of f(x), i.e., Ef(x) = n. i=1 p if(a i ). (The function f : R R is given.

HW1 solutions. 1. α Ef(x) β, where Ef(x) is the expected value of f(x), i.e., Ef(x) = n. i=1 p if(a i ). (The function f : R R is given. HW1 solutions Exercise 1 (Some sets of probability distributions.) Let x be a real-valued random variable with Prob(x = a i ) = p i, i = 1,..., n, where a 1 < a 2 < < a n. Of course p R n lies in the standard

More information

Convex Functions. Daniel P. Palomar. Hong Kong University of Science and Technology (HKUST)

Convex Functions. Daniel P. Palomar. Hong Kong University of Science and Technology (HKUST) Convex Functions Daniel P. Palomar Hong Kong University of Science and Technology (HKUST) ELEC5470 - Convex Optimization Fall 2017-18, HKUST, Hong Kong Outline of Lecture Definition convex function Examples

More information

Lopsided Convergence: an Extension and its Quantification

Lopsided Convergence: an Extension and its Quantification Lopsided Convergence: an Extension and its Quantification Johannes O. Royset Operations Research Department Naval Postgraduate School Monterey, California, USA joroyset@nps.edu Roger J-B Wets Department

More information

Gaussian Processes. Le Song. Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012

Gaussian Processes. Le Song. Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012 Gaussian Processes Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 01 Pictorial view of embedding distribution Transform the entire distribution to expected features Feature space Feature

More information

Invariance Principle for Variable Speed Random Walks on Trees

Invariance Principle for Variable Speed Random Walks on Trees Invariance Principle for Variable Speed Random Walks on Trees Wolfgang Löhr, University of Duisburg-Essen joint work with Siva Athreya and Anita Winter Stochastic Analysis and Applications Thoku University,

More information

Nonparametric estimation: s concave and log-concave densities: alternatives to maximum likelihood

Nonparametric estimation: s concave and log-concave densities: alternatives to maximum likelihood Nonparametric estimation: s concave and log-concave densities: alternatives to maximum likelihood Jon A. Wellner University of Washington, Seattle Statistics Seminar, York October 15, 2015 Statistics Seminar,

More information

1.1 Basis of Statistical Decision Theory

1.1 Basis of Statistical Decision Theory ECE598: Information-theoretic methods in high-dimensional statistics Spring 2016 Lecture 1: Introduction Lecturer: Yihong Wu Scribe: AmirEmad Ghassami, Jan 21, 2016 [Ed. Jan 31] Outline: Introduction of

More information

Finite-dimensional spaces. C n is the space of n-tuples x = (x 1,..., x n ) of complex numbers. It is a Hilbert space with the inner product

Finite-dimensional spaces. C n is the space of n-tuples x = (x 1,..., x n ) of complex numbers. It is a Hilbert space with the inner product Chapter 4 Hilbert Spaces 4.1 Inner Product Spaces Inner Product Space. A complex vector space E is called an inner product space (or a pre-hilbert space, or a unitary space) if there is a mapping (, )

More information

Empirical Risk Minimization

Empirical Risk Minimization Empirical Risk Minimization Fabrice Rossi SAMM Université Paris 1 Panthéon Sorbonne 2018 Outline Introduction PAC learning ERM in practice 2 General setting Data X the input space and Y the output space

More information

Reproducing Kernel Hilbert Spaces Class 03, 15 February 2006 Andrea Caponnetto

Reproducing Kernel Hilbert Spaces Class 03, 15 February 2006 Andrea Caponnetto Reproducing Kernel Hilbert Spaces 9.520 Class 03, 15 February 2006 Andrea Caponnetto About this class Goal To introduce a particularly useful family of hypothesis spaces called Reproducing Kernel Hilbert

More information

VARIATIONAL THEORY FOR OPTIMIZATION UNDER STOCHASTIC AMBIGUITY

VARIATIONAL THEORY FOR OPTIMIZATION UNDER STOCHASTIC AMBIGUITY 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 VARIATIONAL THEORY FOR OPTIMIZATION UNDER STOCHASTIC AMBIGUITY JOHANNES O. ROYSET AND ROGER J-B WETS Abstract. Stochastic ambiguity provides a rich

More information

An introduction to some aspects of functional analysis

An introduction to some aspects of functional analysis An introduction to some aspects of functional analysis Stephen Semmes Rice University Abstract These informal notes deal with some very basic objects in functional analysis, including norms and seminorms

More information

Chapter 2: Linear Programming Basics. (Bertsimas & Tsitsiklis, Chapter 1)

Chapter 2: Linear Programming Basics. (Bertsimas & Tsitsiklis, Chapter 1) Chapter 2: Linear Programming Basics (Bertsimas & Tsitsiklis, Chapter 1) 33 Example of a Linear Program Remarks. minimize 2x 1 x 2 + 4x 3 subject to x 1 + x 2 + x 4 2 3x 2 x 3 = 5 x 3 + x 4 3 x 1 0 x 3

More information

Efficient coding of natural images with a population of noisy Linear-Nonlinear neurons

Efficient coding of natural images with a population of noisy Linear-Nonlinear neurons Efficient coding of natural images with a population of noisy Linear-Nonlinear neurons Yan Karklin and Eero P. Simoncelli NYU Overview Efficient coding is a well-known objective for the evaluation and

More information

Geometry and Optimization of Relative Arbitrage

Geometry and Optimization of Relative Arbitrage Geometry and Optimization of Relative Arbitrage Ting-Kam Leonard Wong joint work with Soumik Pal Department of Mathematics, University of Washington Financial/Actuarial Mathematic Seminar, University of

More information

Lecture 1. Stochastic Optimization: Introduction. January 8, 2018

Lecture 1. Stochastic Optimization: Introduction. January 8, 2018 Lecture 1 Stochastic Optimization: Introduction January 8, 2018 Optimization Concerned with mininmization/maximization of mathematical functions Often subject to constraints Euler (1707-1783): Nothing

More information

AUTOMATIC CONTROL COMMUNICATION SYSTEMS LINKÖPINGS UNIVERSITET. Questions AUTOMATIC CONTROL COMMUNICATION SYSTEMS LINKÖPINGS UNIVERSITET

AUTOMATIC CONTROL COMMUNICATION SYSTEMS LINKÖPINGS UNIVERSITET. Questions AUTOMATIC CONTROL COMMUNICATION SYSTEMS LINKÖPINGS UNIVERSITET The Problem Identification of Linear and onlinear Dynamical Systems Theme : Curve Fitting Division of Automatic Control Linköping University Sweden Data from Gripen Questions How do the control surface

More information

Penalty Methods for Bivariate Smoothing and Chicago Land Values

Penalty Methods for Bivariate Smoothing and Chicago Land Values Penalty Methods for Bivariate Smoothing and Chicago Land Values Roger Koenker University of Illinois, Urbana-Champaign Ivan Mizera University of Alberta, Edmonton Northwestern University: October 2001

More information

STATISTICAL BEHAVIOR AND CONSISTENCY OF CLASSIFICATION METHODS BASED ON CONVEX RISK MINIMIZATION

STATISTICAL BEHAVIOR AND CONSISTENCY OF CLASSIFICATION METHODS BASED ON CONVEX RISK MINIMIZATION STATISTICAL BEHAVIOR AND CONSISTENCY OF CLASSIFICATION METHODS BASED ON CONVEX RISK MINIMIZATION Tong Zhang The Annals of Statistics, 2004 Outline Motivation Approximation error under convex risk minimization

More information

Convex envelopes, cardinality constrained optimization and LASSO. An application in supervised learning: support vector machines (SVMs)

Convex envelopes, cardinality constrained optimization and LASSO. An application in supervised learning: support vector machines (SVMs) ORF 523 Lecture 8 Princeton University Instructor: A.A. Ahmadi Scribe: G. Hall Any typos should be emailed to a a a@princeton.edu. 1 Outline Convexity-preserving operations Convex envelopes, cardinality

More information

A dual form of the sharp Nash inequality and its weighted generalization

A dual form of the sharp Nash inequality and its weighted generalization A dual form of the sharp Nash inequality and its weighted generalization Elliott Lieb Princeton University Joint work with Eric Carlen, arxiv: 1704.08720 Kato conference, University of Tokyo September

More information

Hidden Markov Models (HMM) and Support Vector Machine (SVM)

Hidden Markov Models (HMM) and Support Vector Machine (SVM) Hidden Markov Models (HMM) and Support Vector Machine (SVM) Professor Joongheon Kim School of Computer Science and Engineering, Chung-Ang University, Seoul, Republic of Korea 1 Hidden Markov Models (HMM)

More information

Optimization and Optimal Control in Banach Spaces

Optimization and Optimal Control in Banach Spaces Optimization and Optimal Control in Banach Spaces Bernhard Schmitzer October 19, 2017 1 Convex non-smooth optimization with proximal operators Remark 1.1 (Motivation). Convex optimization: easier to solve,

More information

Optimization Problems with Probabilistic Constraints

Optimization Problems with Probabilistic Constraints Optimization Problems with Probabilistic Constraints R. Henrion Weierstrass Institute Berlin 10 th International Conference on Stochastic Programming University of Arizona, Tucson Recommended Reading A.

More information

Real Analysis Problems

Real Analysis Problems Real Analysis Problems Cristian E. Gutiérrez September 14, 29 1 1 CONTINUITY 1 Continuity Problem 1.1 Let r n be the sequence of rational numbers and Prove that f(x) = 1. f is continuous on the irrationals.

More information

CS598 Machine Learning in Computational Biology (Lecture 5: Matrix - part 2) Professor Jian Peng Teaching Assistant: Rongda Zhu

CS598 Machine Learning in Computational Biology (Lecture 5: Matrix - part 2) Professor Jian Peng Teaching Assistant: Rongda Zhu CS598 Machine Learning in Computational Biology (Lecture 5: Matrix - part 2) Professor Jian Peng Teaching Assistant: Rongda Zhu Feature engineering is hard 1. Extract informative features from domain knowledge

More information

Some Background Material

Some Background Material Chapter 1 Some Background Material In the first chapter, we present a quick review of elementary - but important - material as a way of dipping our toes in the water. This chapter also introduces important

More information

Model Selection and Geometry

Model Selection and Geometry Model Selection and Geometry Pascal Massart Université Paris-Sud, Orsay Leipzig, February Purpose of the talk! Concentration of measure plays a fundamental role in the theory of model selection! Model

More information

CS6375: Machine Learning Gautam Kunapuli. Support Vector Machines

CS6375: Machine Learning Gautam Kunapuli. Support Vector Machines Gautam Kunapuli Example: Text Categorization Example: Develop a model to classify news stories into various categories based on their content. sports politics Use the bag-of-words representation for this

More information

Convex optimization problems. Optimization problem in standard form

Convex optimization problems. Optimization problem in standard form Convex optimization problems optimization problem in standard form convex optimization problems linear optimization quadratic optimization geometric programming quasiconvex optimization generalized inequality

More information

BAYESIAN DECISION THEORY

BAYESIAN DECISION THEORY Last updated: September 17, 2012 BAYESIAN DECISION THEORY Problems 2 The following problems from the textbook are relevant: 2.1 2.9, 2.11, 2.17 For this week, please at least solve Problem 2.3. We will

More information

l 1 -Regularized Linear Regression: Persistence and Oracle Inequalities

l 1 -Regularized Linear Regression: Persistence and Oracle Inequalities l -Regularized Linear Regression: Persistence and Oracle Inequalities Peter Bartlett EECS and Statistics UC Berkeley slides at http://www.stat.berkeley.edu/ bartlett Joint work with Shahar Mendelson and

More information

Reproducing Kernel Hilbert Spaces

Reproducing Kernel Hilbert Spaces Reproducing Kernel Hilbert Spaces Lorenzo Rosasco 9.520 Class 03 February 11, 2009 About this class Goal To introduce a particularly useful family of hypothesis spaces called Reproducing Kernel Hilbert

More information

Mathematical Methods for Physics and Engineering

Mathematical Methods for Physics and Engineering Mathematical Methods for Physics and Engineering Lecture notes for PDEs Sergei V. Shabanov Department of Mathematics, University of Florida, Gainesville, FL 32611 USA CHAPTER 1 The integration theory

More information

Support Vector Machines

Support Vector Machines Support Vector Machines Support vector machines (SVMs) are one of the central concepts in all of machine learning. They are simply a combination of two ideas: linear classification via maximum (or optimal

More information

THEOREMS, ETC., FOR MATH 516

THEOREMS, ETC., FOR MATH 516 THEOREMS, ETC., FOR MATH 516 Results labeled Theorem Ea.b.c (or Proposition Ea.b.c, etc.) refer to Theorem c from section a.b of Evans book (Partial Differential Equations). Proposition 1 (=Proposition

More information

NONLINEAR CLASSIFICATION AND REGRESSION. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition

NONLINEAR CLASSIFICATION AND REGRESSION. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition NONLINEAR CLASSIFICATION AND REGRESSION Nonlinear Classification and Regression: Outline 2 Multi-Layer Perceptrons The Back-Propagation Learning Algorithm Generalized Linear Models Radial Basis Function

More information

2. Variance and Higher Moments

2. Variance and Higher Moments 1 of 16 7/16/2009 5:45 AM Virtual Laboratories > 4. Expected Value > 1 2 3 4 5 6 2. Variance and Higher Moments Recall that by taking the expected value of various transformations of a random variable,

More information

McGill University Math 354: Honors Analysis 3

McGill University Math 354: Honors Analysis 3 Practice problems McGill University Math 354: Honors Analysis 3 not for credit Problem 1. Determine whether the family of F = {f n } functions f n (x) = x n is uniformly equicontinuous. 1st Solution: The

More information

Computational Intelligence Winter Term 2018/19

Computational Intelligence Winter Term 2018/19 Computational Intelligence Winter Term 2018/19 Prof. Dr. Günter Rudolph Lehrstuhl für Algorithm Engineering (LS 11) Fakultät für Informatik TU Dortmund Three tasks: 1. Choice of an appropriate problem

More information

41903: Introduction to Nonparametrics

41903: Introduction to Nonparametrics 41903: Notes 5 Introduction Nonparametrics fundamentally about fitting flexible models: want model that is flexible enough to accommodate important patterns but not so flexible it overspecializes to specific

More information

Optimal global rates of convergence for interpolation problems with random design

Optimal global rates of convergence for interpolation problems with random design Optimal global rates of convergence for interpolation problems with random design Michael Kohler 1 and Adam Krzyżak 2, 1 Fachbereich Mathematik, Technische Universität Darmstadt, Schlossgartenstr. 7, 64289

More information

RKHS, Mercer s theorem, Unbounded domains, Frames and Wavelets Class 22, 2004 Tomaso Poggio and Sayan Mukherjee

RKHS, Mercer s theorem, Unbounded domains, Frames and Wavelets Class 22, 2004 Tomaso Poggio and Sayan Mukherjee RKHS, Mercer s theorem, Unbounded domains, Frames and Wavelets 9.520 Class 22, 2004 Tomaso Poggio and Sayan Mukherjee About this class Goal To introduce an alternate perspective of RKHS via integral operators

More information

L5 Support Vector Classification

L5 Support Vector Classification L5 Support Vector Classification Support Vector Machine Problem definition Geometrical picture Optimization problem Optimization Problem Hard margin Convexity Dual problem Soft margin problem Alexander

More information

Lecture 5: Linear models for classification. Logistic regression. Gradient Descent. Second-order methods.

Lecture 5: Linear models for classification. Logistic regression. Gradient Descent. Second-order methods. Lecture 5: Linear models for classification. Logistic regression. Gradient Descent. Second-order methods. Linear models for classification Logistic regression Gradient descent and second-order methods

More information

CPSC 540: Machine Learning

CPSC 540: Machine Learning CPSC 540: Machine Learning Multivariate Gaussians Mark Schmidt University of British Columbia Winter 2019 Last Time: Multivariate Gaussian http://personal.kenyon.edu/hartlaub/mellonproject/bivariate2.html

More information

L p Spaces and Convexity

L p Spaces and Convexity L p Spaces and Convexity These notes largely follow the treatments in Royden, Real Analysis, and Rudin, Real & Complex Analysis. 1. Convex functions Let I R be an interval. For I open, we say a function

More information

Linear Models for Regression CS534

Linear Models for Regression CS534 Linear Models for Regression CS534 Example Regression Problems Predict housing price based on House size, lot size, Location, # of rooms Predict stock price based on Price history of the past month Predict

More information

ECONOMICS 7200 MODERN TIME SERIES ANALYSIS Econometric Theory and Applications

ECONOMICS 7200 MODERN TIME SERIES ANALYSIS Econometric Theory and Applications ECONOMICS 7200 MODERN TIME SERIES ANALYSIS Econometric Theory and Applications Yongmiao Hong Department of Economics & Department of Statistical Sciences Cornell University Spring 2019 Time and uncertainty

More information

Nonparametric estimation under Shape Restrictions

Nonparametric estimation under Shape Restrictions Nonparametric estimation under Shape Restrictions Jon A. Wellner University of Washington, Seattle Statistical Seminar, Frejus, France August 30 - September 3, 2010 Outline: Five Lectures on Shape Restrictions

More information

Partial Generalized Additive Models

Partial Generalized Additive Models Partial Generalized Additive Models An Information-theoretic Approach for Selecting Variables and Avoiding Concurvity Hong Gu 1 Mu Zhu 2 1 Department of Mathematics and Statistics Dalhousie University

More information

Foundations of Nonparametric Bayesian Methods

Foundations of Nonparametric Bayesian Methods 1 / 27 Foundations of Nonparametric Bayesian Methods Part II: Models on the Simplex Peter Orbanz http://mlg.eng.cam.ac.uk/porbanz/npb-tutorial.html 2 / 27 Tutorial Overview Part I: Basics Part II: Models

More information

minimize x subject to (x 2)(x 4) u,

minimize x subject to (x 2)(x 4) u, Math 6366/6367: Optimization and Variational Methods Sample Preliminary Exam Questions 1. Suppose that f : [, L] R is a C 2 -function with f () on (, L) and that you have explicit formulae for

More information

Algorithms for Nonsmooth Optimization

Algorithms for Nonsmooth Optimization Algorithms for Nonsmooth Optimization Frank E. Curtis, Lehigh University presented at Center for Optimization and Statistical Learning, Northwestern University 2 March 2018 Algorithms for Nonsmooth Optimization

More information

Can we do statistical inference in a non-asymptotic way? 1

Can we do statistical inference in a non-asymptotic way? 1 Can we do statistical inference in a non-asymptotic way? 1 Guang Cheng 2 Statistics@Purdue www.science.purdue.edu/bigdata/ ONR Review Meeting@Duke Oct 11, 2017 1 Acknowledge NSF, ONR and Simons Foundation.

More information

Nonlinear Programming (Hillier, Lieberman Chapter 13) CHEM-E7155 Production Planning and Control

Nonlinear Programming (Hillier, Lieberman Chapter 13) CHEM-E7155 Production Planning and Control Nonlinear Programming (Hillier, Lieberman Chapter 13) CHEM-E7155 Production Planning and Control 19/4/2012 Lecture content Problem formulation and sample examples (ch 13.1) Theoretical background Graphical

More information

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Cheng Soon Ong & Christian Walder. Canberra February June 2018 Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 2018 Outlines Overview Introduction Linear Algebra Probability Linear Regression

More information

Lecture 2: Convex Sets and Functions

Lecture 2: Convex Sets and Functions Lecture 2: Convex Sets and Functions Hyang-Won Lee Dept. of Internet & Multimedia Eng. Konkuk University Lecture 2 Network Optimization, Fall 2015 1 / 22 Optimization Problems Optimization problems are

More information

Multi-period forecasting and scenario generation with limited data

Multi-period forecasting and scenario generation with limited data Multi-period forecasting and scenario generation with limited data Ignacio Rios University of Chile Santiago, Chile Roger J-B Wets and David L. Woodruff University of California, Davis Davis, CA 95616,

More information

Approximate Dynamic Programming

Approximate Dynamic Programming Approximate Dynamic Programming A. LAZARIC (SequeL Team @INRIA-Lille) ENS Cachan - Master 2 MVA SequeL INRIA Lille MVA-RL Course Approximate Dynamic Programming (a.k.a. Batch Reinforcement Learning) A.

More information

Game Theory and its Applications to Networks - Part I: Strict Competition

Game Theory and its Applications to Networks - Part I: Strict Competition Game Theory and its Applications to Networks - Part I: Strict Competition Corinne Touati Master ENS Lyon, Fall 200 What is Game Theory and what is it for? Definition (Roger Myerson, Game Theory, Analysis

More information

Stat542 (F11) Statistical Learning. First consider the scenario where the two classes of points are separable.

Stat542 (F11) Statistical Learning. First consider the scenario where the two classes of points are separable. Linear SVM (separable case) First consider the scenario where the two classes of points are separable. It s desirable to have the width (called margin) between the two dashed lines to be large, i.e., have

More information

Computational Intelligence

Computational Intelligence Computational Intelligence Winter Term 2010/11 Prof. Dr. Günter Rudolph Lehrstuhl für Algorithm Engineering (LS 11 Fakultät für Informatik TU Dortmund Three tasks: 1. Choice of an appropriate problem representation.

More information

Course Outline. FRTN10 Multivariable Control, Lecture 13. General idea for Lectures Lecture 13 Outline. Example 1 (Doyle Stein, 1979)

Course Outline. FRTN10 Multivariable Control, Lecture 13. General idea for Lectures Lecture 13 Outline. Example 1 (Doyle Stein, 1979) Course Outline FRTN Multivariable Control, Lecture Automatic Control LTH, 6 L-L Specifications, models and loop-shaping by hand L6-L8 Limitations on achievable performance L9-L Controller optimization:

More information

Optimal Transport in Risk Analysis

Optimal Transport in Risk Analysis Optimal Transport in Risk Analysis Jose Blanchet (based on work with Y. Kang and K. Murthy) Stanford University (Management Science and Engineering), and Columbia University (Department of Statistics and

More information

How much should we rely on Besov spaces as a framework for the mathematical study of images?

How much should we rely on Besov spaces as a framework for the mathematical study of images? How much should we rely on Besov spaces as a framework for the mathematical study of images? C. Sinan Güntürk Princeton University, Program in Applied and Computational Mathematics Abstract Relations between

More information

Fitting Linear Statistical Models to Data by Least Squares: Introduction

Fitting Linear Statistical Models to Data by Least Squares: Introduction Fitting Linear Statistical Models to Data by Least Squares: Introduction Radu Balan, Brian R. Hunt and C. David Levermore University of Maryland, College Park University of Maryland, College Park, MD Math

More information

Lesson 2: Analysis of time series

Lesson 2: Analysis of time series Lesson 2: Analysis of time series Time series Main aims of time series analysis choosing right model statistical testing forecast driving and optimalisation Problems in analysis of time series time problems

More information

Nonlinear Stationary Subdivision

Nonlinear Stationary Subdivision Nonlinear Stationary Subdivision Michael S. Floater SINTEF P. O. Box 4 Blindern, 034 Oslo, Norway E-mail: michael.floater@math.sintef.no Charles A. Micchelli IBM Corporation T.J. Watson Research Center

More information

Stability-based generation of scenario trees for multistage stochastic programs

Stability-based generation of scenario trees for multistage stochastic programs Stability-based generation of scenario trees for multistage stochastic programs H. Heitsch and W. Römisch Humboldt-University Berlin Institute of Mathematics 10099 Berlin, Germany http://www.math.hu-berlin.de/~romisch

More information

Convex Optimization. (EE227A: UC Berkeley) Lecture 28. Suvrit Sra. (Algebra + Optimization) 02 May, 2013

Convex Optimization. (EE227A: UC Berkeley) Lecture 28. Suvrit Sra. (Algebra + Optimization) 02 May, 2013 Convex Optimization (EE227A: UC Berkeley) Lecture 28 (Algebra + Optimization) 02 May, 2013 Suvrit Sra Admin Poster presentation on 10th May mandatory HW, Midterm, Quiz to be reweighted Project final report

More information

Lecture Notes on Support Vector Machine

Lecture Notes on Support Vector Machine Lecture Notes on Support Vector Machine Feng Li fli@sdu.edu.cn Shandong University, China 1 Hyperplane and Margin In a n-dimensional space, a hyper plane is defined by ω T x + b = 0 (1) where ω R n is

More information

Notation. General. Notation Description See. Sets, Functions, and Spaces. a b & a b The minimum and the maximum of a and b

Notation. General. Notation Description See. Sets, Functions, and Spaces. a b & a b The minimum and the maximum of a and b Notation General Notation Description See a b & a b The minimum and the maximum of a and b a + & a f S u The non-negative part, a 0, and non-positive part, (a 0) of a R The restriction of the function

More information

DS-GA 1002 Lecture notes 12 Fall Linear regression

DS-GA 1002 Lecture notes 12 Fall Linear regression DS-GA Lecture notes 1 Fall 16 1 Linear models Linear regression In statistics, regression consists of learning a function relating a certain quantity of interest y, the response or dependent variable,

More information

Issues on quantile autoregression

Issues on quantile autoregression Issues on quantile autoregression Jianqing Fan and Yingying Fan We congratulate Koenker and Xiao on their interesting and important contribution to the quantile autoregression (QAR). The paper provides

More information

Statistical inference on Lévy processes

Statistical inference on Lévy processes Alberto Coca Cabrero University of Cambridge - CCA Supervisors: Dr. Richard Nickl and Professor L.C.G.Rogers Funded by Fundación Mutua Madrileña and EPSRC MASDOC/CCA student workshop 2013 26th March Outline

More information

Analysis Qualifying Exam

Analysis Qualifying Exam Analysis Qualifying Exam Spring 2017 Problem 1: Let f be differentiable on R. Suppose that there exists M > 0 such that f(k) M for each integer k, and f (x) M for all x R. Show that f is bounded, i.e.,

More information

Linear classifiers selecting hyperplane maximizing separation margin between classes (large margin classifiers)

Linear classifiers selecting hyperplane maximizing separation margin between classes (large margin classifiers) Support vector machines In a nutshell Linear classifiers selecting hyperplane maximizing separation margin between classes (large margin classifiers) Solution only depends on a small subset of training

More information

Random Sets. Roger J-B Wets Mathematics, University of California, Davis. May 2012, ETH-Zürich Lecture #2

Random Sets. Roger J-B Wets Mathematics, University of California, Davis. May 2012, ETH-Zürich Lecture #2 Random Sets Roger J-B Wets Mathematics, University of California, Davis May 2012, ETH-Zürich Lecture #2 G : E d, G 1 (0) soln s of G(x) = 0, approximations? EG(x) = {G(ξ,x)} = 0 approximated by G ν (x)

More information

FRTN10 Multivariable Control, Lecture 13. Course outline. The Q-parametrization (Youla) Example: Spring-mass System

FRTN10 Multivariable Control, Lecture 13. Course outline. The Q-parametrization (Youla) Example: Spring-mass System FRTN Multivariable Control, Lecture 3 Anders Robertsson Automatic Control LTH, Lund University Course outline The Q-parametrization (Youla) L-L5 Purpose, models and loop-shaping by hand L6-L8 Limitations

More information

Testing convex hypotheses on the mean of a Gaussian vector. Application to testing qualitative hypotheses on a regression function

Testing convex hypotheses on the mean of a Gaussian vector. Application to testing qualitative hypotheses on a regression function Testing convex hypotheses on the mean of a Gaussian vector. Application to testing qualitative hypotheses on a regression function By Y. Baraud, S. Huet, B. Laurent Ecole Normale Supérieure, DMA INRA,

More information