Gradient-based Adaptive Stochastic Search
|
|
- Patience Fletcher
- 5 years ago
- Views:
Transcription
1 1 / 41 Gradient-based Adaptive Stochastic Search Enlu Zhou H. Milton Stewart School of Industrial and Systems Engineering Georgia Institute of Technology November 5, 2014
2 Outline 2 / 41 1 Introduction 2 GASS for non-differentiable optimization 3 GASS for simulation optimization 4 Conclusions
3 Outline 3 / 41 1 Introduction 2 GASS for non-differentiable optimization 3 GASS for simulation optimization 4 Conclusions
4 Introduction 4 / 41 We consider x arg max x X H(x) Given any x X, H(x) can be evaluated exactly.
5 Introduction 4 / 41 We consider x arg max x X H(x) Given any x X, H(x) can be evaluated exactly. We are interested in objective functions: lack structural properties (such as convexity and differentiability) have multiple local optima only be assessed by black-box evaluation
6 Examples of Objective Functions 5 / 41
7 Background 6 / 41 Stochastic Search: use randomized mechanism to generate a sequence of iterates
8 6 / 41 Background Stochastic Search: use randomized mechanism to generate a sequence of iterates e.g., simulated annealing (Krikpatrick et al. 1983), genetic algorithms (Goldberg 1989), tabu search (Glover 1990), nested partitions method (Shi and Ólafsson 2000), pure adaptive search (Zabinsky 2003), sequential Monte Carlo simulated annealing (Zhou and Chen 2011), model-based algorithms (survey by Zlochin et al. 2004).
9 6 / 41 Background Stochastic Search: use randomized mechanism to generate a sequence of iterates e.g., simulated annealing (Krikpatrick et al. 1983), genetic algorithms (Goldberg 1989), tabu search (Glover 1990), nested partitions method (Shi and Ólafsson 2000), pure adaptive search (Zabinsky 2003), sequential Monte Carlo simulated annealing (Zhou and Chen 2011), model-based algorithms (survey by Zlochin et al. 2004). Model-based Algorithms: generate candidate solutions from a sampling distribution (i.e., probabilistic model)
10 6 / 41 Background Stochastic Search: use randomized mechanism to generate a sequence of iterates e.g., simulated annealing (Krikpatrick et al. 1983), genetic algorithms (Goldberg 1989), tabu search (Glover 1990), nested partitions method (Shi and Ólafsson 2000), pure adaptive search (Zabinsky 2003), sequential Monte Carlo simulated annealing (Zhou and Chen 2011), model-based algorithms (survey by Zlochin et al. 2004). Model-based Algorithms: generate candidate solutions from a sampling distribution (i.e., probabilistic model) e.g., ant colony optimization (Dorigo and Gambardella 1997), annealing adaptive search (Romeijn and Smith 1994), estimation of distribution algorithms (Muhlenbein and Paaß 1996), the cross-entropy method (Rubinstein 1997), model reference adaptive search (Hu et al. 2007).
11 Model-based optimization 7 / 41
12 Model-based optimization Figure 1: Optimization via model-based methods. 7 / distribution X H(x) X
13 Model-based optimization 7 / distribution X H(x) X
14 Model-based optimization Figure 1: Optimization via model-based methods. 7 / distribution X H(x) X
15 Model-based optimization Figure 1: Optimization via model-based methods. 7 / distribution X H(x) X
16 Outline 8 / 41 1 Introduction 2 GASS for non-differentiable optimization 3 GASS for simulation optimization 4 Conclusions
17 Reformulation 9 / 41 Original problem: x arg max x X H(x), X Rn.
18 Reformulation 9 / 41 Original problem: x arg max x X H(x), X Rn. Let {f (x; θ)} be a parameterized family of probability density functions on X. H(x)f (x; θ)dx H(x ) H, θ R d.
19 Reformulation 9 / 41 Original problem: x arg max x X H(x), X Rn. Let {f (x; θ)} be a parameterized family of probability density functions on X. H(x)f (x; θ)dx H(x ) H, θ R d. = is achieved if and only if θ s.t. the probability mass of f (x; θ ) is concentrated on a subset of the optimal solutions.
20 Reformulation 9 / 41 Original problem: x arg max x X H(x), X Rn. Let {f (x; θ)} be a parameterized family of probability density functions on X. H(x)f (x; θ)dx H(x ) H, θ R d. = is achieved if and only if θ s.t. the probability mass of f (x; θ ) is concentrated on a subset of the optimal solutions. New problem: θ arg max θ R d H(x)f (x; θ)dx.
21 Why reformulation? 10 / 41 Possible Scenarios: Original Problem arg max x X H(x) Discrete in x Non-differentiable in x New Problem arg max θ H(x)f (x; θ)dx Continuous in θ Differentiable in θ
22 Why reformulation? 10 / 41 Possible Scenarios: Original Problem arg max x X H(x) Discrete in x Non-differentiable in x New Problem arg max θ H(x)f (x; θ)dx Continuous in θ Differentiable in θ Incorporate model-based optimization into gradient-based optimization: 1). Generate candidate solutions from f ( ; θ) on the solution space X. 2). Use a gradient-based method to update the parameter θ.
23 Why reformulation? 10 / 41 Possible Scenarios: Original Problem arg max x X H(x) Discrete in x Non-differentiable in x New Problem arg max θ H(x)f (x; θ)dx Continuous in θ Differentiable in θ Incorporate model-based optimization into gradient-based optimization: 1). Generate candidate solutions from f ( ; θ) on the solution space X. 2). Use a gradient-based method to update the parameter θ. Combine the robustness of model-based optimization with the relative fast convergence of gradient-based optimization.
24 More reformulation 11 / 41 For an arbitrary but fixed θ R d, define the function ( ) l(θ; θ ) ln S θ (H(x))f (x; θ)dx.
25 More reformulation 11 / 41 For an arbitrary but fixed θ R d, define the function ( ) l(θ; θ ) ln S θ (H(x))f (x; θ)dx. The shape function S θ ( ) : R R + is chosen to ensure 0 < l(θ; θ ) ln (S θ (H )) θ, and = is achieved if a θ s.t. the probability mass of f (x; θ ) is concentrated on a subset of global optima.
26 More reformulation 11 / 41 For an arbitrary but fixed θ R d, define the function ( ) l(θ; θ ) ln S θ (H(x))f (x; θ)dx. The shape function S θ ( ) : R R + is chosen to ensure 0 < l(θ; θ ) ln (S θ (H )) θ, and = is achieved if a θ s.t. the probability mass of f (x; θ ) is concentrated on a subset of global optima. So consider max l(θ; θ ). θ
27 Parameter updating 12 / 41 Suppose {f ( ; θ)} is an exponential family of densities, i.e., f (x; θ) = exp{θ T T (x) φ(θ)}, φ(θ) = ln{ exp(θ T T (x))dx}.
28 Parameter updating 12 / 41 Suppose {f ( ; θ)} is an exponential family of densities, i.e., f (x; θ) = exp{θ T T (x) φ(θ)}, φ(θ) = ln{ exp(θ T T (x))dx}. Then θ l(θ; θ ) θ=θ = E p( ;θ )[T (X)] E θ [T (X)], 2 θ l(θ; θ ) θ=θ = Var p( ;θ )[T (X)] Var θ [T (X)], where p(x; θ ) S θ (H(x))f (x;θ ) Sθ (H(x))f (x;θ )dx.
29 Parameter updating 12 / 41 Suppose {f ( ; θ)} is an exponential family of densities, i.e., f (x; θ) = exp{θ T T (x) φ(θ)}, φ(θ) = ln{ exp(θ T T (x))dx}. Then θ l(θ; θ ) θ=θ = E p( ;θ )[T (X)] E θ [T (X)], 2 θ l(θ; θ ) θ=θ = Var p( ;θ )[T (X)] Var θ [T (X)], where p(x; θ ) S θ (H(x))f (x;θ ) Sθ (H(x))f (x;θ )dx. A Newton-like scheme for updating θ ( ) θ k+1 = θ k + α k (Var θk [T (X)] + ɛi) 1 E [T (X)] E p( ;θk ) θ k [T (X)], α k > 0, ɛ > 0.
30 Parameter updating 13 / 41 A Newton-like scheme for updating θ ( ) θ k+1 = θ k + α k (Var θk [T (X)] + ɛi) 1 E [T (X)] E p( ;θk ) θ k [T (X)], α k > 0, ɛ > 0.
31 Parameter updating 13 / 41 A Newton-like scheme for updating θ ( ) θ k+1 = θ k + α k (Var θk [T (X)] + ɛi) 1 E [T (X)] E p( ;θk ) θ k [T (X)], α k > 0, ɛ > 0. Var θk [T (X)] = E[( θ ln f (X; θ k )) 2 ] is the Fisher information matrix, leading to the following facts:
32 Parameter updating 13 / 41 A Newton-like scheme for updating θ ( ) θ k+1 = θ k + α k (Var θk [T (X)] + ɛi) 1 E [T (X)] E p( ;θk ) θ k [T (X)], α k > 0, ɛ > 0. Var θk [T (X)] = E[( θ ln f (X; θ k )) 2 ] is the Fisher information matrix, leading to the following facts: Var θk [T (X)] 1 is the minimum-variance step size in stochastic approximation.
33 Parameter updating 13 / 41 A Newton-like scheme for updating θ ( ) θ k+1 = θ k + α k (Var θk [T (X)] + ɛi) 1 E [T (X)] E p( ;θk ) θ k [T (X)], α k > 0, ɛ > 0. Var θk [T (X)] = E[( θ ln f (X; θ k )) 2 ] is the Fisher information matrix, leading to the following facts: Var θk [T (X)] 1 is the minimum-variance step size in stochastic approximation. Var θk [T (X)] 1 adapts the gradient step to our belief about promising regions. (Think about T (X) = X...)
34 Parameter updating 13 / 41 A Newton-like scheme for updating θ ( ) θ k+1 = θ k + α k (Var θk [T (X)] + ɛi) 1 E [T (X)] E p( ;θk ) θ k [T (X)], α k > 0, ɛ > 0. Var θk [T (X)] = E[( θ ln f (X; θ k )) 2 ] is the Fisher information matrix, leading to the following facts: Var θk [T (X)] 1 is the minimum-variance step size in stochastic approximation. Var θk [T (X)] 1 adapts the gradient step to our belief about promising regions. (Think about T (X) = X...) Var θk [T (X)] 1 θ l(θ; θ k ) θ=θk is the gradient of l(θ; θ k ) on the statistical manifold equipped with Fisher metric.
35 Main algorithm: GASS 14 / 41 Gradient-based Adaptive Stochastic Search (GASS) Initialization: set k = 0. Sampling: draw samples xk i iid f (x; θ k ), i = 1, 2,..., N k. Updating: update the parameter θ according to θ k+1 = θ k + α k ( Var θk [T (X)] + ɛi) 1 (Êp k [T (X)] E θk [T (X)]), where Var θk [T (X)] and Êp k [T (X)] are estimates using the samples {x i k, i = 1,..., N k}. Stopping: If some stopping criterion is satisfied, stop and return the current best sampled solution; else, set k := k + 1 and go back to step 2).
36 Accelerated algorithm: GASS_avg 15 / 41 GASS can be viewed as a stochastic approximation algorithm in finding θ.
37 Accelerated algorithm: GASS_avg 15 / 41 GASS can be viewed as a stochastic approximation algorithm in finding θ. Accelerated GASS: use Polyak averaging with online feedback θ k+1 = θ k + α k ( Var θk [T (X)] + ɛi) 1 ( Ê pk [T (X)] E θk [T (X)]) + α k c( θ k θ k ), θ k = 1 k θ i. k i=1
38 Convergence analysis 16 / 41 The updating of θ can be rewritten in the form of a generalized Robbins-Monro iterates: θ k+1 = θ k + α k [D(θ k ) + b k + ξ k ], where D(θ k ) is the gradient field, b k is the bias term, and ξ k is the noise term. D(θ k ) = (Var θk [T (X)] + ɛi) 1 θ l(θ k ; θ k ).
39 Convergence analysis 16 / 41 The updating of θ can be rewritten in the form of a generalized Robbins-Monro iterates: θ k+1 = θ k + α k [D(θ k ) + b k + ξ k ], where D(θ k ) is the gradient field, b k is the bias term, and ξ k is the noise term. D(θ k ) = (Var θk [T (X)] + ɛi) 1 θ l(θ k ; θ k ). It can be viewed as a noisy discretization of the ordinary differential equation (ODE) θ t = D(θ t ), t 0.
40 Convergence analysis 17 / 41 θ k+1 = θ k + α k [D(θ k ) + b k + ξ k ], θ t = D(θ t ), t 0.
41 17 / 41 Convergence analysis θ k+1 = θ k + α k [D(θ k ) + b k + ξ k ], θ t = D(θ t ), t 0. Assumption α k 0 as k, k=0 α k =.
42 Convergence analysis θ k+1 = θ k + α k [D(θ k ) + b k + ξ k ], θ t = D(θ t ), t 0. Assumption α k 0 as k, k=0 α k =. Lemma 1 Under certain assumptions, b k 0 w.p.1 as k. 17 / 41
43 Convergence analysis 17 / 41 Assumption θ k+1 = θ k + α k [D(θ k ) + b k + ξ k ], θ t = D(θ t ), t 0. α k 0 as k, k=0 α k =. Lemma 1 Under certain assumptions, b k 0 w.p.1 as k. Lemma 2 Under certain assumptions, for any T > 0, { } n lim sup α k {n:0 n 1 i ξ i = 0, w.p.1. i=k α i T } i=k
44 Convergence results 18 / 41 Theorem (Asymptotic Convergence) Assume that D(θ t ) is continuous with a unique integral curve and some regularity conditions hold. Then the sequence {θ k } converges to a limit set of the ODE w.p.1. Furthermore, if the limit sets of the ODE are isolated equilibrium points, then w.p.1 {θ k } converges to a unique equilibrium point. Implication: GASS converges to a stationary point of l(θ; θ ).
45 Convergence results 19 / 41 Theorem (Asymptotic Convergence Rate) Let α k = α 0 /k α for 0 < α < 1. For a given constant τ > 2α, let N k = Θ(k τ α ). Assume the convergence of the sequence {θ k } occurs to a unique equilibrium point θ w.p.1. If Assumptions 1, 2, and 3 hold, then k τ 2 (θk θ ) dist N(0, QMQ T ), where Q is an orthogonal matrix such that Q T ( J L (θ ))Q = Λ with Λ being a diagonal matrix, and the (i, j) th entry of the matrix M is given by M (i,j) = (Q T ΦΣΦ T Q) (i,j) (Λ (i,i) + Λ (j,j) ) 1. Implication: The asymptotic convergence rate of GASS is O(1/ k τ ).
46 Numerical results 20 / 41 2 D Dejong5 function 4 D Shekel function Function value true optimum GASS 3.5 GASS_avg modified CE MRAS Total sample size x 10 4 Function value true optimum GASS 2 GASS_avg 1 modified CE MRAS Total sample size x D Powel function D Rosenbrock function Function value Function value 10 2 true optimum 10 8 GASS GASS_avg modified CE MRAS Total sample size x true optimum GASS GASS_avg modified CE MRAS Total sample size x 10 6 Figure : Comparison of average performance of GASS, GASS_avg, MRAS, and the modified CE.
47 Numerical results 21 / D Griewank function 50 D Trigonometric function Function value Function value true optimum GASS 8 GASS_avg modified CE 9 MRAS Total sample size x 10 5 true optimum 10 4 GASS GASS_avg modified CE MRAS Total sample size x D Pinter function 50 D Sphere function Function value true optimum 10 4 GASS GASS_avg modified CE MRAS Total sample size x 10 5 Function value true optimum GASS 10 5 GASS_avg modified CE MRAS Total sample size x 10 5 Figure : Comparison of average performance of GASS, GASS_avg, MRAS, and the modified CE.
48 Numerical results 22 / D Griewank function D Trigonometric function Function value 10 1 Function value true optimum 10 2 GASS GASS_avg Total sample size x true optimum GASS GASS_avg Total sample size x D Levy function D Sphere function Function value 10 3 Function value true optimum GASS GASS_avg Total sample size x 10 5 true optimum GASS GASS_avg Total sample size x 10 5 Figure : Average performance of GASS and GASS_avg on 200-dimensional benchmark problems.
49 Numerical results 23 / 41 GASS_avg and GASS find the ɛ-optimal solutions in all the runs for 7 out of the 8 benchmark problems (except the Shekel function).
50 Numerical results 23 / 41 GASS_avg and GASS find the ɛ-optimal solutions in all the runs for 7 out of the 8 benchmark problems (except the Shekel function). Accuracy: GASS_avg and GASS find better solutions than the modified CE method on badly-scaled functions and are comparable to the modified Cross Entropy method (Rubinstein 1998) on multi-modal functions; outperform Model Reference Adaptive Search (Hu et al. 2007) on all the problems.
51 Numerical results 23 / 41 GASS_avg and GASS find the ɛ-optimal solutions in all the runs for 7 out of the 8 benchmark problems (except the Shekel function). Accuracy: GASS_avg and GASS find better solutions than the modified CE method on badly-scaled functions and are comparable to the modified Cross Entropy method (Rubinstein 1998) on multi-modal functions; outperform Model Reference Adaptive Search (Hu et al. 2007) on all the problems. Convergence speed: GASS_avg always converges faster than GASS; both are faster than MRAS on all the problems and faster than the modified CE on most problems.
52 Resource allocation in communication networks 24 / 41 Q users may transmit or receive signals using N carriers, under a power budget B q for the qth user. The objective is to maximize the total transmission rate (sum-rate) by optimally allocating each user s power resource to the carriers.
53 Resource allocation in communication networks 25 / 41 max p q(k), q, k Q subject to: q=1 k=1 ( N log 1 + ) H qq 2 p q (k) N 0 + Q r=1,r q H rq(k) 2 p r (k) N p q (k) B q, q = 1,, Q, k=1 p q (k) 0, q = 1,, Q, k = 1,, N.
54 Resource allocation in communication networks 25 / 41 max p q(k), q, k Q subject to: q=1 k=1 ( N log 1 + ) H qq 2 p q (k) N 0 + Q r=1,r q H rq(k) 2 p r (k) N p q (k) B q, q = 1,, Q, k=1 p q (k) 0, q = 1,, Q, k = 1,, N. The sampling distribution f ( ; θ) is chosen to be the Dirichlet distribution, whose support is a multi-dimensional simplex.
55 Resource allocation in communication networks 26 / 41 Figure : Numerical results on resource allocation in communication networks. IWFA, DDPA, MADP, GPA are distributed algorithms. Other algorithms are running multi-start versions of NEOS Solvers:
56 Discrete optimization 27 / 41 X is a discrete set.
57 Discrete optimization 27 / 41 X is a discrete set. Discrete-GASS: use discrete distribution Sampling is easy, but the parameter is of high dimension.
58 Discrete optimization 27 / 41 X is a discrete set. Discrete-GASS: use discrete distribution Sampling is easy, but the parameter is of high dimension. Annealing-GASS: use Boltzmann distribution Parameter is always of dimension 1, but sampling (by MCMC) is more expensive and inexact.
59 Discrete optimization 27 / 41 X is a discrete set. Discrete-GASS: use discrete distribution Sampling is easy, but the parameter is of high dimension. Annealing-GASS: use Boltzmann distribution Parameter is always of dimension 1, but sampling (by MCMC) is more expensive and inexact. Annealing-GASS converges to the set of optimal solutions in probability.
60 Numerical results X 10 6 (Shekel), (Rosenbrock), (others). 4 D Shekel function 50 D Powel function Function value True Optimum Dist GASS MRAS Annealing GASS SA (Geo Temp) SA (Log Temp) Function value True Optimum Dist GASS MRAS Annealing GASS SA (Geo Temp) SA (Log Temp) Total sample size x Total sample size x D Rosenbrock function 0 50 D Griewank function 10 1 Function value True Optimum Dist GASS MRAS 10 4 Annealing GASS SA (Geo Temp) SA (Log Temp) Total sample size x 10 5 Function value 0.5 True Optimum Dist GASS MRAS Annealing GASS 1 SA (Geo Temp) SA (Log Temp) Total sample size x 10 5 Figure : Average performance of discrete-gass, Annealing-GASS, MRAS, SA (geometric temperature), and SA (logarithmic temperature) 28 / 41
61 Numerical results 29 / D Rastrigin function 50 D Levy function Function value True Optimum Dist GASS MRAS Annealing GASS SA (Geo Temp) SA (Log Temp) Function value True Optimum Dist GASS MRAS Annealing GASS SA (Geo Temp) SA (Log Temp) Total sample size x Total sample size x D Pinter function 50 D Sphere function Function value True Optimum Dist GASS MRAS Annealing GASS SA (Geo Temp) SA (Log Temp) Function value True Optimum Dist GASS MRAS Annealing GASS SA (Geo Temp) SA (Log Temp) Total sample size x Total sample size x 10 5 Figure : Average performance of discrete-gass, Annealing-GASS, MRAS, SA (geometric temperature), and SA (logarithmic temperature)
62 Numerical results 30 / 41 Discrete-GASS outperforms MRAS in both accuracy and convergence rate.
63 Numerical results 30 / 41 Discrete-GASS outperforms MRAS in both accuracy and convergence rate. Annealing-GASS algorithm is an improvement of multi-start simulated annealing algorithms with geometric and logarithmic temperature schedules.
64 Numerical results 30 / 41 Discrete-GASS outperforms MRAS in both accuracy and convergence rate. Annealing-GASS algorithm is an improvement of multi-start simulated annealing algorithms with geometric and logarithmic temperature schedules. Discrete-GASS provides accurate solutions in most of the problems; Annealing-GASS yields accurate solutions only in the low-dimensional problem and badly-scaled problems.
65 Numerical results 30 / 41 Discrete-GASS outperforms MRAS in both accuracy and convergence rate. Annealing-GASS algorithm is an improvement of multi-start simulated annealing algorithms with geometric and logarithmic temperature schedules. Discrete-GASS provides accurate solutions in most of the problems; Annealing-GASS yields accurate solutions only in the low-dimensional problem and badly-scaled problems. Discrete-GASS usually needs more computation time for each iteration than Annealing-GASS, but needs less iterations to converge.
66 Implementation & Software 31 / 41 Most tuning parameters can be set to default; need carefully choose stepsize {α k }. Choice of sampling distribution X is a continuous set: (truncated) Gaussian X is a simplex (with or without interior): Dirichlet X is a discrete set: discrete, Boltzmann
67 Implementation & Software 31 / 41 Most tuning parameters can be set to default; need carefully choose stepsize {α k }. Choice of sampling distribution X is a continuous set: (truncated) Gaussian X is a simplex (with or without interior): Dirichlet X is a discrete set: discrete, Boltzmann Software available at
68 Outline 32 / 41 1 Introduction 2 GASS for non-differentiable optimization 3 GASS for simulation optimization 4 Conclusions
69 Simulation optimization: introduction 33 / 41 Simulation optimization: max H(x) E ξ x [h(x, ξ x )]. x X
70 Simulation optimization: introduction 33 / 41 Simulation optimization: max H(x) E ξ x [h(x, ξ x )]. x X x Computer simulation y ~ h(x,ξx) of a complex system
71 Simulation optimization: introduction 33 / 41 Simulation optimization: max H(x) E ξ x [h(x, ξ x )]. x X x Computer simulation y ~ h(x,ξx) of a complex system Example: a queueing system (x: service rate; H: waiting time + staffing cost; ξ x : arrival/service times)
72 Simulation optimization: introduction 33 / 41 Simulation optimization: max H(x) E ξ x [h(x, ξ x )]. x X x Computer simulation y ~ h(x,ξx) of a complex system Example: a queueing system (x: service rate; H: waiting time + staffing cost; ξ x : arrival/service times) X is a continuous set.
73 Simulation optimization: introduction 34 / 41 Main solution methods Ranking & Selection (for problems with finite solution space) Stochastic approximation Response surface methods Sample average approximation Stochastic search methods
74 GASS for simulation optimization 35 / 41 Gradient-based Adaptive Stochastic Search (GASS) Initialization Sampling: draw samples xk i iid f (x; θ k ), i = 1, 2,..., N k. Estimation: simulate each xk i for M k times; estimate Ĥ(xk i ) = 1 Mk M k j=1 h(x k i, ξi,j k ). Updating: update the parameter θ according to θ k+1 = θ k + α k ( Var θk [T (X)] + ɛi) 1 (Êp k [T (X)] E θk [T (X)]), where Var θk [T (X)] and Êp k [T (X)] are estimates using {x i k } and {Ĥ(x i k )}. Stopping
75 Two-timescale GASS 36 / 41 Motivated by two-timescale stochastic approximation (Borkar 1997): Two-timescale GASS (GASS_2T) Assume α k 0, β k 0, β k /α k 0. Draw samples xk i iid f (x; θ k ), i = 1,..., N, and carry out computer simulation for each x i once. Update the gradient and Hessian estimates in GASS on the fast timescale with step size α k. Update θ k on the slow timescale with step size β k.
76 Two-timescale GASS 36 / 41 Motivated by two-timescale stochastic approximation (Borkar 1997): Two-timescale GASS (GASS_2T) Assume α k 0, β k 0, β k /α k 0. Draw samples xk i iid f (x; θ k ), i = 1,..., N, and carry out computer simulation for each x i once. Update the gradient and Hessian estimates in GASS on the fast timescale with step size α k. Update θ k on the slow timescale with step size β k. Intuition: sampling distribution can be viewed as fixed while the gradient and Hessian estimates are updated over many iterations. So only a small sample size N is needed.
77 Numerical results 37 / D Powell function 10 0 true optimum GASSO GASSO_2T 10 2 CEOCBA D Trignometric function true optimum GASSO GASSO_2T CEOCBA Function value 10 4 Function value Number of Function Evaluations Number of Function Evaluations 10 D Pinter function 10 D Rastrigin function true optimum GASSO GASSO_2T CEOCBA true optimum GASSO GASSO_2T CEOCBA Function value Function value Number of Function Evaluations Number of Function Evaluations Figure : Average performance of GASS, GASS_2T, CEOCBA (He et al. 2010) on problems with independent noise
78 Outline 38 / 41 1 Introduction 2 GASS for non-differentiable optimization 3 GASS for simulation optimization 4 Conclusions
79 Conclusions 39 / 41 By reformulating a hard optimization problem into a differentiable one, we can incorporate direct gradient search with stochastic search.
80 Conclusions 39 / 41 By reformulating a hard optimization problem into a differentiable one, we can incorporate direct gradient search with stochastic search. A class of gradient-based adaptive stochastic search (GASS) algorithms for non-differentiable optimization, black-box optimization, and simulation optimization problems.
81 Conclusions 39 / 41 By reformulating a hard optimization problem into a differentiable one, we can incorporate direct gradient search with stochastic search. A class of gradient-based adaptive stochastic search (GASS) algorithms for non-differentiable optimization, black-box optimization, and simulation optimization problems. Convergence results and numerical results show that GASS is a promising and competitive method.
82 References 40 / 41 E. Zhou and Jiaqiao Hu, Gradient-based Adaptive Stochastic Search for Non-differentiable Optimization, IEEE Transactions on Automatic Control, 59(7), pp , E. Zhou and Shalabh Bhatnagar, Gradient-based Adaptive Stochastic Search for Simulation Optimization over Continuous Space, submitted.
83 Thank you! 41 / 41
Proceedings of the 2012 Winter Simulation Conference C. Laroque, J. Himmelspach, R. Pasupathy, O. Rose, and A. M. Uhrmacher, eds.
Proceedings of the 2012 Winter Simulation Conference C. Laroque, J. Himmelspach, R. Pasupathy, O. Rose, and A. M. Uhrmacher, eds. COMBINING GRADIENT-BASED OPTIMIZATION WITH STOCHASTIC SEARCH Enlu Zhou
More informationGradient-Based Adaptive Stochastic Search for Non-Differentiable Optimization
Gradient-Based Adaptive Stochastic Search for Non-Differentiable Optimization Enlu Zhou Department of Industrial & Enterprise Systems Engineering, University of Illinois at Urbana-Champaign, IL 61801,
More informationA Model Reference Adaptive Search Method for Global Optimization
OPERATIONS RESEARCH Vol. 55, No. 3, May June 27, pp. 549 568 issn 3-364X eissn 1526-5463 7 553 549 informs doi 1.1287/opre.16.367 27 INFORMS A Model Reference Adaptive Search Method for Global Optimization
More informationA Model Reference Adaptive Search Method for Global Optimization
A Model Reference Adaptive Search Method for Global Optimization Jiaqiao Hu Department of Electrical and Computer Engineering & Institute for Systems Research, University of Maryland, College Park, MD
More informationSimulated Annealing for Constrained Global Optimization
Monte Carlo Methods for Computation and Optimization Final Presentation Simulated Annealing for Constrained Global Optimization H. Edwin Romeijn & Robert L.Smith (1994) Presented by Ariel Schwartz Objective
More informationCSC2541 Lecture 5 Natural Gradient
CSC2541 Lecture 5 Natural Gradient Roger Grosse Roger Grosse CSC2541 Lecture 5 Natural Gradient 1 / 12 Motivation Two classes of optimization procedures used throughout ML (stochastic) gradient descent,
More informationfor Global Optimization with a Square-Root Cooling Schedule Faming Liang Simulated Stochastic Approximation Annealing for Global Optim
Simulated Stochastic Approximation Annealing for Global Optimization with a Square-Root Cooling Schedule Abstract Simulated annealing has been widely used in the solution of optimization problems. As known
More informationA PARTICLE FILTERING FRAMEWORK FOR RANDOMIZED OPTIMIZATION ALGORITHMS. Enlu Zhou Michael C. Fu Steven I. Marcus
Proceedings of the 2008 Winter Simulation Conference S. J. Mason, R. R. Hill, L. Moench, O. Rose, eds. A PARTICLE FILTERIG FRAMEWORK FOR RADOMIZED OPTIMIZATIO ALGORITHMS Enlu Zhou Michael C. Fu Steven
More informationLecture 1: Supervised Learning
Lecture 1: Supervised Learning Tuo Zhao Schools of ISYE and CSE, Georgia Tech ISYE6740/CSE6740/CS7641: Computational Data Analysis/Machine from Portland, Learning Oregon: pervised learning (Supervised)
More informationStochastic Proximal Gradient Algorithm
Stochastic Institut Mines-Télécom / Telecom ParisTech / Laboratoire Traitement et Communication de l Information Joint work with: Y. Atchade, Ann Arbor, USA, G. Fort LTCI/Télécom Paristech and the kind
More informationNumerical Optimization: Basic Concepts and Algorithms
May 27th 2015 Numerical Optimization: Basic Concepts and Algorithms R. Duvigneau R. Duvigneau - Numerical Optimization: Basic Concepts and Algorithms 1 Outline Some basic concepts in optimization Some
More informationA New Trust Region Algorithm Using Radial Basis Function Models
A New Trust Region Algorithm Using Radial Basis Function Models Seppo Pulkkinen University of Turku Department of Mathematics July 14, 2010 Outline 1 Introduction 2 Background Taylor series approximations
More informationStochastic Optimization with Inequality Constraints Using Simultaneous Perturbations and Penalty Functions
International Journal of Control Vol. 00, No. 00, January 2007, 1 10 Stochastic Optimization with Inequality Constraints Using Simultaneous Perturbations and Penalty Functions I-JENG WANG and JAMES C.
More informationLecture 4: Types of errors. Bayesian regression models. Logistic regression
Lecture 4: Types of errors. Bayesian regression models. Logistic regression A Bayesian interpretation of regularization Bayesian vs maximum likelihood fitting more generally COMP-652 and ECSE-68, Lecture
More informationMathematical Methods for Neurosciences. ENS - Master MVA Paris 6 - Master Maths-Bio ( )
Mathematical Methods for Neurosciences. ENS - Master MVA Paris 6 - Master Maths-Bio (2014-2015) Etienne Tanré - Olivier Faugeras INRIA - Team Tosca October 22nd, 2014 E. Tanré (INRIA - Team Tosca) Mathematical
More informationSTA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 3 Linear
More informationLecture 5: Linear models for classification. Logistic regression. Gradient Descent. Second-order methods.
Lecture 5: Linear models for classification. Logistic regression. Gradient Descent. Second-order methods. Linear models for classification Logistic regression Gradient descent and second-order methods
More informationSequential Monte Carlo Simulated Annealing. Enlu Zhou Xi Chen
Sequential Monte Carlo Simulated Annealing Enlu Zhou Xi Chen Department of Industrial & Enterprise Systems Engineering University of Illinois at Urbana-Champaign Urbana, IL 680, U.S.A. ABSTRACT In this
More informationSTOCHASTIC OPTIMIZATION USING MODEL REFERENCE ADAPTIVE SEARCH. Jiaqiao Hu Michael C. Fu Steven I. Marcus
Proceedings of the 2005 Winter Simulation Conference M. E. Kuhl, N. M. Steiger, F. B. Armstrong, and J. A. Joines, eds. STOCHASTIC OPTIMIZATION USING MODEL REFERENCE ADAPTIVE SEARCH Jiaqiao Hu Michael
More informationAdvanced Optimization
Advanced Optimization Lecture 3: 1: Randomized Algorithms for for Continuous Discrete Problems Problems November 22, 2016 Master AIC Université Paris-Saclay, Orsay, France Anne Auger INRIA Saclay Ile-de-France
More informationDensity Estimation. Seungjin Choi
Density Estimation Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr http://mlg.postech.ac.kr/
More informationAnnealing Between Distributions by Averaging Moments
Annealing Between Distributions by Averaging Moments Chris J. Maddison Dept. of Comp. Sci. University of Toronto Roger Grosse CSAIL MIT Ruslan Salakhutdinov University of Toronto Partition Functions We
More informationInterior-Point Methods for Linear Optimization
Interior-Point Methods for Linear Optimization Robert M. Freund and Jorge Vera March, 204 c 204 Robert M. Freund and Jorge Vera. All rights reserved. Linear Optimization with a Logarithmic Barrier Function
More informationStatistical Estimation
Statistical Estimation Use data and a model. The plug-in estimators are based on the simple principle of applying the defining functional to the ECDF. Other methods of estimation: minimize residuals from
More informationBayesian Methods for Machine Learning
Bayesian Methods for Machine Learning CS 584: Big Data Analytics Material adapted from Radford Neal s tutorial (http://ftp.cs.utoronto.ca/pub/radford/bayes-tut.pdf), Zoubin Ghahramni (http://hunch.net/~coms-4771/zoubin_ghahramani_bayesian_learning.pdf),
More informationDynamic Call Center Routing Policies Using Call Waiting and Agent Idle Times Online Supplement
Submitted to imanufacturing & Service Operations Management manuscript MSOM-11-370.R3 Dynamic Call Center Routing Policies Using Call Waiting and Agent Idle Times Online Supplement (Authors names blinded
More informationNatural Evolution Strategies for Direct Search
Tobias Glasmachers Natural Evolution Strategies for Direct Search 1 Natural Evolution Strategies for Direct Search PGMO-COPI 2014 Recent Advances on Continuous Randomized black-box optimization Thursday
More informationStochastic Optimization Algorithms Beyond SG
Stochastic Optimization Algorithms Beyond SG Frank E. Curtis 1, Lehigh University involving joint work with Léon Bottou, Facebook AI Research Jorge Nocedal, Northwestern University Optimization Methods
More informationToward Effective Initialization for Large-Scale Search Spaces
Toward Effective Initialization for Large-Scale Search Spaces Shahryar Rahnamayan University of Ontario Institute of Technology (UOIT) Faculty of Engineering and Applied Science 000 Simcoe Street North
More informationLinear Classification. CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington
Linear Classification CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington 1 Example of Linear Classification Red points: patterns belonging
More informationGenerative Models and Stochastic Algorithms for Population Average Estimation and Image Analysis
Generative Models and Stochastic Algorithms for Population Average Estimation and Image Analysis Stéphanie Allassonnière CIS, JHU July, 15th 28 Context : Computational Anatomy Context and motivations :
More informationSimple Techniques for Improving SGD. CS6787 Lecture 2 Fall 2017
Simple Techniques for Improving SGD CS6787 Lecture 2 Fall 2017 Step Sizes and Convergence Where we left off Stochastic gradient descent x t+1 = x t rf(x t ; yĩt ) Much faster per iteration than gradient
More informationAlmost Sure Convergence of Two Time-Scale Stochastic Approximation Algorithms
Almost Sure Convergence of Two Time-Scale Stochastic Approximation Algorithms Vladislav B. Tadić Abstract The almost sure convergence of two time-scale stochastic approximation algorithms is analyzed under
More informationMonte-Carlo MMD-MA, Université Paris-Dauphine. Xiaolu Tan
Monte-Carlo MMD-MA, Université Paris-Dauphine Xiaolu Tan tan@ceremade.dauphine.fr Septembre 2015 Contents 1 Introduction 1 1.1 The principle.................................. 1 1.2 The error analysis
More informationOnline Companion: Risk-Averse Approximate Dynamic Programming with Quantile-Based Risk Measures
Online Companion: Risk-Averse Approximate Dynamic Programming with Quantile-Based Risk Measures Daniel R. Jiang and Warren B. Powell Abstract In this online companion, we provide some additional preliminary
More informationNatural Computing. Lecture 11. Michael Herrmann phone: Informatics Forum /10/2011 ACO II
Natural Computing Lecture 11 Michael Herrmann mherrman@inf.ed.ac.uk phone: 0131 6 517177 Informatics Forum 1.42 25/10/2011 ACO II ACO (in brief) ACO Represent solution space Set parameters, initialize
More informationChapter 11. Stochastic Methods Rooted in Statistical Mechanics
Chapter 11. Stochastic Methods Rooted in Statistical Mechanics Neural Networks and Learning Machines (Haykin) Lecture Notes on Self-learning Neural Algorithms Byoung-Tak Zhang School of Computer Science
More informationTransmitter optimization for distributed Gaussian MIMO channels
Transmitter optimization for distributed Gaussian MIMO channels Hon-Fah Chong Electrical & Computer Eng Dept National University of Singapore Email: chonghonfah@ieeeorg Mehul Motani Electrical & Computer
More informationA Distributed Newton Method for Network Utility Maximization, II: Convergence
A Distributed Newton Method for Network Utility Maximization, II: Convergence Ermin Wei, Asuman Ozdaglar, and Ali Jadbabaie October 31, 2012 Abstract The existing distributed algorithms for Network Utility
More informationREINFORCE Framework for Stochastic Policy Optimization and its use in Deep Learning
REINFORCE Framework for Stochastic Policy Optimization and its use in Deep Learning Ronen Tamari The Hebrew University of Jerusalem Advanced Seminar in Deep Learning (#67679) February 28, 2016 Ronen Tamari
More informationOptimization Methods. Lecture 18: Optimality Conditions and. Gradient Methods. for Unconstrained Optimization
5.93 Optimization Methods Lecture 8: Optimality Conditions and Gradient Methods for Unconstrained Optimization Outline. Necessary and sucient optimality conditions Slide. Gradient m e t h o d s 3. The
More informationQuasi Stochastic Approximation American Control Conference San Francisco, June 2011
Quasi Stochastic Approximation American Control Conference San Francisco, June 2011 Sean P. Meyn Joint work with Darshan Shirodkar and Prashant Mehta Coordinated Science Laboratory and the Department of
More informationStochastic optimization Markov Chain Monte Carlo
Stochastic optimization Markov Chain Monte Carlo Ethan Fetaya Weizmann Institute of Science 1 Motivation Markov chains Stationary distribution Mixing time 2 Algorithms Metropolis-Hastings Simulated Annealing
More informationOptimization. Benjamin Recht University of California, Berkeley Stephen Wright University of Wisconsin-Madison
Optimization Benjamin Recht University of California, Berkeley Stephen Wright University of Wisconsin-Madison optimization () cost constraints might be too much to cover in 3 hours optimization (for big
More informationPhysics 403. Segev BenZvi. Numerical Methods, Maximum Likelihood, and Least Squares. Department of Physics and Astronomy University of Rochester
Physics 403 Numerical Methods, Maximum Likelihood, and Least Squares Segev BenZvi Department of Physics and Astronomy University of Rochester Table of Contents 1 Review of Last Class Quadratic Approximation
More informationCHANGE DETECTION WITH UNKNOWN POST-CHANGE PARAMETER USING KIEFER-WOLFOWITZ METHOD
CHANGE DETECTION WITH UNKNOWN POST-CHANGE PARAMETER USING KIEFER-WOLFOWITZ METHOD Vijay Singamasetty, Navneeth Nair, Srikrishna Bhashyam and Arun Pachai Kannu Department of Electrical Engineering Indian
More informationDistributed Power Control for Time Varying Wireless Networks: Optimality and Convergence
Distributed Power Control for Time Varying Wireless Networks: Optimality and Convergence Tim Holliday, Nick Bambos, Peter Glynn, Andrea Goldsmith Stanford University Abstract This paper presents a new
More informationLine Search Methods for Unconstrained Optimisation
Line Search Methods for Unconstrained Optimisation Lecture 8, Numerical Linear Algebra and Optimisation Oxford University Computing Laboratory, MT 2007 Dr Raphael Hauser (hauser@comlab.ox.ac.uk) The Generic
More informationWhy should you care about the solution strategies?
Optimization Why should you care about the solution strategies? Understanding the optimization approaches behind the algorithms makes you more effectively choose which algorithm to run Understanding the
More informationA Model Reference Adaptive Search Method for Stochastic Global Optimization
A Model Reference Adaptive Search Method for Stochastic Global Optimization Jiaqiao Hu Department of Electrical and Computer Engineering & Institute for Systems Research, University of Maryland, College
More informationCenter-based initialization for large-scale blackbox
See discussions, stats, and author profiles for this publication at: http://www.researchgate.net/publication/903587 Center-based initialization for large-scale blackbox problems ARTICLE FEBRUARY 009 READS
More informationCheng Soon Ong & Christian Walder. Canberra February June 2018
Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 2018 (Many figures from C. M. Bishop, "Pattern Recognition and ") 1of 254 Part V
More informationStochastic Quasi-Newton Methods
Stochastic Quasi-Newton Methods Donald Goldfarb Department of IEOR Columbia University UCLA Distinguished Lecture Series May 17-19, 2016 1 / 35 Outline Stochastic Approximation Stochastic Gradient Descent
More informationConnections between score matching, contrastive divergence, and pseudolikelihood for continuous-valued variables. Revised submission to IEEE TNN
Connections between score matching, contrastive divergence, and pseudolikelihood for continuous-valued variables Revised submission to IEEE TNN Aapo Hyvärinen Dept of Computer Science and HIIT University
More informationEfficient Monte Carlo Optimization with ATMathCoreLib
Efficient Monte Carlo Optimization with ATMathCoreLib Reiji Suda 1, 2 and Vivek S. Nittoor 1, 2 In this paper, we discuss a deterministic optimization method for stochastic simulation with unknown distribution.
More informationEstimation theory and information geometry based on denoising
Estimation theory and information geometry based on denoising Aapo Hyvärinen Dept of Computer Science & HIIT Dept of Mathematics and Statistics University of Helsinki Finland 1 Abstract What is the best
More informationLearning Tetris. 1 Tetris. February 3, 2009
Learning Tetris Matt Zucker Andrew Maas February 3, 2009 1 Tetris The Tetris game has been used as a benchmark for Machine Learning tasks because its large state space (over 2 200 cell configurations are
More informationUnsupervised Learning
Unsupervised Learning Bayesian Model Comparison Zoubin Ghahramani zoubin@gatsby.ucl.ac.uk Gatsby Computational Neuroscience Unit, and MSc in Intelligent Systems, Dept Computer Science University College
More informationLecture 10. Neural networks and optimization. Machine Learning and Data Mining November Nando de Freitas UBC. Nonlinear Supervised Learning
Lecture 0 Neural networks and optimization Machine Learning and Data Mining November 2009 UBC Gradient Searching for a good solution can be interpreted as looking for a minimum of some error (loss) function
More informationPartially Collapsed Gibbs Samplers: Theory and Methods. Ever increasing computational power along with ever more sophisticated statistical computing
Partially Collapsed Gibbs Samplers: Theory and Methods David A. van Dyk 1 and Taeyoung Park Ever increasing computational power along with ever more sophisticated statistical computing techniques is making
More informationNon-convex optimization. Issam Laradji
Non-convex optimization Issam Laradji Strongly Convex Objective function f(x) x Strongly Convex Objective function Assumptions Gradient Lipschitz continuous f(x) Strongly convex x Strongly Convex Objective
More informationIEEE TRANSACTIONS ON AUTOMATIC CONTROL, VOL. 43, NO. 5, MAY
IEEE TRANSACTIONS ON AUTOMATIC CONTROL, VOL. 43, NO. 5, MAY 1998 631 Centralized and Decentralized Asynchronous Optimization of Stochastic Discrete-Event Systems Felisa J. Vázquez-Abad, Christos G. Cassandras,
More informationMATH 4211/6211 Optimization Basics of Optimization Problems
MATH 4211/6211 Optimization Basics of Optimization Problems Xiaojing Ye Department of Mathematics & Statistics Georgia State University Xiaojing Ye, Math & Stat, Georgia State University 0 A standard minimization
More informationSub-Sampled Newton Methods
Sub-Sampled Newton Methods F. Roosta-Khorasani and M. W. Mahoney ICSI and Dept of Statistics, UC Berkeley February 2016 F. Roosta-Khorasani and M. W. Mahoney (UCB) Sub-Sampled Newton Methods Feb 2016 1
More informationApplication of Optimization Methods and Edge AI
and Hai-Liang Zhao hliangzhao97@gmail.com November 17, 2018 This slide can be downloaded at Link. Outline 1 How to Design Novel Models with Methods Embedded Naturally? Outline 1 How to Design Novel Models
More information13. Parameter Estimation. ECE 830, Spring 2014
13. Parameter Estimation ECE 830, Spring 2014 1 / 18 Primary Goal General problem statement: We observe X p(x θ), θ Θ and the goal is to determine the θ that produced X. Given a collection of observations
More informationr=1 r=1 argmin Q Jt (20) After computing the descent direction d Jt 2 dt H t d + P (x + d) d i = 0, i / J
7 Appendix 7. Proof of Theorem Proof. There are two main difficulties in proving the convergence of our algorithm, and none of them is addressed in previous works. First, the Hessian matrix H is a block-structured
More informationTheory of Maximum Likelihood Estimation. Konstantin Kashin
Gov 2001 Section 5: Theory of Maximum Likelihood Estimation Konstantin Kashin February 28, 2013 Outline Introduction Likelihood Examples of MLE Variance of MLE Asymptotic Properties What is Statistical
More informationInformation Geometric view of Belief Propagation
Information Geometric view of Belief Propagation Yunshu Liu 2013-10-17 References: [1]. Shiro Ikeda, Toshiyuki Tanaka and Shun-ichi Amari, Stochastic reasoning, Free energy and Information Geometry, Neural
More informationWeek 3: The EM algorithm
Week 3: The EM algorithm Maneesh Sahani maneesh@gatsby.ucl.ac.uk Gatsby Computational Neuroscience Unit University College London Term 1, Autumn 2005 Mixtures of Gaussians Data: Y = {y 1... y N } Latent
More informationLecture 3 - Linear and Logistic Regression
3 - Linear and Logistic Regression-1 Machine Learning Course Lecture 3 - Linear and Logistic Regression Lecturer: Haim Permuter Scribe: Ziv Aharoni Throughout this lecture we talk about how to use regression
More informationHOMEWORK #4: LOGISTIC REGRESSION
HOMEWORK #4: LOGISTIC REGRESSION Probabilistic Learning: Theory and Algorithms CS 274A, Winter 2019 Due: 11am Monday, February 25th, 2019 Submit scan of plots/written responses to Gradebook; submit your
More informationSome Results on the Ergodicity of Adaptive MCMC Algorithms
Some Results on the Ergodicity of Adaptive MCMC Algorithms Omar Khalil Supervisor: Jeffrey Rosenthal September 2, 2011 1 Contents 1 Andrieu-Moulines 4 2 Roberts-Rosenthal 7 3 Atchadé and Fort 8 4 Relationship
More informationGradient Descent. Sargur Srihari
Gradient Descent Sargur srihari@cedar.buffalo.edu 1 Topics Simple Gradient Descent/Ascent Difficulties with Simple Gradient Descent Line Search Brent s Method Conjugate Gradient Descent Weight vectors
More informationExpectation propagation for symbol detection in large-scale MIMO communications
Expectation propagation for symbol detection in large-scale MIMO communications Pablo M. Olmos olmos@tsc.uc3m.es Joint work with Javier Céspedes (UC3M) Matilde Sánchez-Fernández (UC3M) and Fernando Pérez-Cruz
More informationThe Behaviour of the Akaike Information Criterion when Applied to Non-nested Sequences of Models
The Behaviour of the Akaike Information Criterion when Applied to Non-nested Sequences of Models Centre for Molecular, Environmental, Genetic & Analytic (MEGA) Epidemiology School of Population Health
More informationStochastic Composition Optimization
Stochastic Composition Optimization Algorithms and Sample Complexities Mengdi Wang Joint works with Ethan X. Fang, Han Liu, and Ji Liu ORFE@Princeton ICCOPT, Tokyo, August 8-11, 2016 1 / 24 Collaborators
More informationIntroduction to Maximum Likelihood Estimation
Introduction to Maximum Likelihood Estimation Eric Zivot July 26, 2012 The Likelihood Function Let 1 be an iid sample with pdf ( ; ) where is a ( 1) vector of parameters that characterize ( ; ) Example:
More informationMonte Carlo methods for sampling-based Stochastic Optimization
Monte Carlo methods for sampling-based Stochastic Optimization Gersende FORT LTCI CNRS & Telecom ParisTech Paris, France Joint works with B. Jourdain, T. Lelièvre, G. Stoltz from ENPC and E. Kuhn from
More informationHOMEWORK #4: LOGISTIC REGRESSION
HOMEWORK #4: LOGISTIC REGRESSION Probabilistic Learning: Theory and Algorithms CS 274A, Winter 2018 Due: Friday, February 23rd, 2018, 11:55 PM Submit code and report via EEE Dropbox You should submit a
More informationThe particle swarm optimization algorithm: convergence analysis and parameter selection
Information Processing Letters 85 (2003) 317 325 www.elsevier.com/locate/ipl The particle swarm optimization algorithm: convergence analysis and parameter selection Ioan Cristian Trelea INA P-G, UMR Génie
More information14 : Theory of Variational Inference: Inner and Outer Approximation
10-708: Probabilistic Graphical Models 10-708, Spring 2017 14 : Theory of Variational Inference: Inner and Outer Approximation Lecturer: Eric P. Xing Scribes: Maria Ryskina, Yen-Chia Hsu 1 Introduction
More informationDynamic Call Center Routing Policies Using Call Waiting and Agent Idle Times Online Supplement
Dynamic Call Center Routing Policies Using Call Waiting and Agent Idle Times Online Supplement Wyean Chan DIRO, Université de Montréal, C.P. 6128, Succ. Centre-Ville, Montréal (Québec), H3C 3J7, CANADA,
More informationElaine T. Hale, Wotao Yin, Yin Zhang
, Wotao Yin, Yin Zhang Department of Computational and Applied Mathematics Rice University McMaster University, ICCOPT II-MOPTA 2007 August 13, 2007 1 with Noise 2 3 4 1 with Noise 2 3 4 1 with Noise 2
More informationPartially Collapsed Gibbs Samplers: Theory and Methods
David A. VAN DYK and Taeyoung PARK Partially Collapsed Gibbs Samplers: Theory and Methods Ever-increasing computational power, along with ever more sophisticated statistical computing techniques, is making
More informationVariational Inference via Stochastic Backpropagation
Variational Inference via Stochastic Backpropagation Kai Fan February 27, 2016 Preliminaries Stochastic Backpropagation Variational Auto-Encoding Related Work Summary Outline Preliminaries Stochastic Backpropagation
More informationSolving Continuous-State POMDPs. via Density Projection
Solving Continuous-State POMDPs 1 via Density Projection Enlu Zhou, Student Member, IEEE, Michael C. Fu, Fellow, IEEE, and Steven I. Marcus, Fellow, IEEE, Abstract Research on numerical solution methods
More informationConvex Optimization. Problem set 2. Due Monday April 26th
Convex Optimization Problem set 2 Due Monday April 26th 1 Gradient Decent without Line-search In this problem we will consider gradient descent with predetermined step sizes. That is, instead of determining
More informationIntroduction, basic but important concepts
Introduction, basic but important concepts Felix Kubler 1 1 DBF, University of Zurich and Swiss Finance Institute October 7, 2017 Felix Kubler Comp.Econ. Gerzensee, Ch1 October 7, 2017 1 / 31 Economics
More informationMollifying Networks. ICLR,2017 Presenter: Arshdeep Sekhon & Be
Mollifying Networks Caglar Gulcehre 1 Marcin Moczulski 2 Francesco Visin 3 Yoshua Bengio 1 1 University of Montreal, 2 University of Oxford, 3 Politecnico di Milano ICLR,2017 Presenter: Arshdeep Sekhon
More informationAUTOMATIC CONTROL COMMUNICATION SYSTEMS LINKÖPINGS UNIVERSITET. Questions AUTOMATIC CONTROL COMMUNICATION SYSTEMS LINKÖPINGS UNIVERSITET
The Problem Identification of Linear and onlinear Dynamical Systems Theme : Curve Fitting Division of Automatic Control Linköping University Sweden Data from Gripen Questions How do the control surface
More informationPattern Recognition and Machine Learning. Bishop Chapter 11: Sampling Methods
Pattern Recognition and Machine Learning Chapter 11: Sampling Methods Elise Arnaud Jakob Verbeek May 22, 2008 Outline of the chapter 11.1 Basic Sampling Algorithms 11.2 Markov Chain Monte Carlo 11.3 Gibbs
More informationMetaheuristics. 2.3 Local Search 2.4 Simulated annealing. Adrian Horga
Metaheuristics 2.3 Local Search 2.4 Simulated annealing Adrian Horga 1 2.3 Local Search 2 Local Search Other names: Hill climbing Descent Iterative improvement General S-Metaheuristics Old and simple method
More informationSum-Power Iterative Watefilling Algorithm
Sum-Power Iterative Watefilling Algorithm Daniel P. Palomar Hong Kong University of Science and Technolgy (HKUST) ELEC547 - Convex Optimization Fall 2009-10, HKUST, Hong Kong November 11, 2009 Outline
More informationA Very Brief Summary of Statistical Inference, and Examples
A Very Brief Summary of Statistical Inference, and Examples Trinity Term 2008 Prof. Gesine Reinert 1 Data x = x 1, x 2,..., x n, realisations of random variables X 1, X 2,..., X n with distribution (model)
More informationSGD and Randomized projection algorithms for overdetermined linear systems
SGD and Randomized projection algorithms for overdetermined linear systems Deanna Needell Claremont McKenna College IPAM, Feb. 25, 2014 Includes joint work with Eldar, Ward, Tropp, Srebro-Ward Setup Setup
More informationLinear Discrimination Functions
Laurea Magistrale in Informatica Nicola Fanizzi Dipartimento di Informatica Università degli Studi di Bari November 4, 2009 Outline Linear models Gradient descent Perceptron Minimum square error approach
More informationModel Selection Tutorial 2: Problems With Using AIC to Select a Subset of Exposures in a Regression Model
Model Selection Tutorial 2: Problems With Using AIC to Select a Subset of Exposures in a Regression Model Centre for Molecular, Environmental, Genetic & Analytic (MEGA) Epidemiology School of Population
More informationOverfitting, Bias / Variance Analysis
Overfitting, Bias / Variance Analysis Professor Ameet Talwalkar Professor Ameet Talwalkar CS260 Machine Learning Algorithms February 8, 207 / 40 Outline Administration 2 Review of last lecture 3 Basic
More informationDoes Better Inference mean Better Learning?
Does Better Inference mean Better Learning? Andrew E. Gelfand, Rina Dechter & Alexander Ihler Department of Computer Science University of California, Irvine {agelfand,dechter,ihler}@ics.uci.edu Abstract
More information