Natural Evolution Strategies for Direct Search
|
|
- Rebecca Montgomery
- 6 years ago
- Views:
Transcription
1 Tobias Glasmachers Natural Evolution Strategies for Direct Search 1 Natural Evolution Strategies for Direct Search PGMO-COPI 2014 Recent Advances on Continuous Randomized black-box optimization Thursday October 30, 2014 Tobias Glasmachers Institut für Neuroinformatik Ruhr-Universität Bochum, Germany
2 Introduction Tobias Glasmachers Natural Evolution Strategies for Direct Search 2
3 Introduction Tobias Glasmachers Natural Evolution Strategies for Direct Search 2 Function value free black-box optimization:
4 Introduction Tobias Glasmachers Natural Evolution Strategies for Direct Search 2 Function value free black-box optimization: 1 Minimize f : R d R.
5 Introduction Tobias Glasmachers Natural Evolution Strategies for Direct Search 2 Function value free black-box optimization: 1 Minimize f : R d R. 2 0-th order (direct search) setting.
6 Introduction Tobias Glasmachers Natural Evolution Strategies for Direct Search 2 Function value free black-box optimization: 1 Minimize f : R d R. 2 0-th order (direct search) setting. 3 Black-box model x f f(x)
7 Introduction Tobias Glasmachers Natural Evolution Strategies for Direct Search 2 Function value free black-box optimization: 1 Minimize f : R d R. 2 0-th order (direct search) setting. 3 Black-box model x f f(x) 4 Don t ever rely on function values, only comparisons f(x) < f(x ), e.g., for ranking.
8 Introduction Tobias Glasmachers Natural Evolution Strategies for Direct Search 3 Algorithm operations:
9 Introduction Tobias Glasmachers Natural Evolution Strategies for Direct Search 3 Algorithm operations: 1 generate solution x R d most of the logic
10 Introduction Tobias Glasmachers Natural Evolution Strategies for Direct Search 3 Algorithm operations: 1 generate solution x R d most of the logic 2 evaluate f(x) black box, most of the computation time
11 Introduction Tobias Glasmachers Natural Evolution Strategies for Direct Search 3 Algorithm operations: 1 generate solution x R d most of the logic 2 evaluate f(x) black box, most of the computation time 3 compare (rank) against other solutions: f(x) < f(x )?
12 Introduction: Challenges Tobias Glasmachers Natural Evolution Strategies for Direct Search 4
13 Introduction: Challenges Tobias Glasmachers Natural Evolution Strategies for Direct Search 4 Facing a black-box objective, anything can happen. We d better prepare for the following challenges:
14 Introduction: Challenges Tobias Glasmachers Natural Evolution Strategies for Direct Search 4 Facing a black-box objective, anything can happen. We d better prepare for the following challenges: non-smooth or even discontinuous objective
15 Introduction: Challenges Tobias Glasmachers Natural Evolution Strategies for Direct Search 4 Facing a black-box objective, anything can happen. We d better prepare for the following challenges: non-smooth or even discontinuous objective multi-modality
16 Introduction: Challenges Tobias Glasmachers Natural Evolution Strategies for Direct Search 4 Facing a black-box objective, anything can happen. We d better prepare for the following challenges: non-smooth or even discontinuous objective multi-modality observations of objective values perturbed by noise
17 Introduction: Challenges Tobias Glasmachers Natural Evolution Strategies for Direct Search 4 Facing a black-box objective, anything can happen. We d better prepare for the following challenges: non-smooth or even discontinuous objective multi-modality observations of objective values perturbed by noise high dimensionality, e.g., d 1000
18 Introduction: Challenges Tobias Glasmachers Natural Evolution Strategies for Direct Search 4 Facing a black-box objective, anything can happen. We d better prepare for the following challenges: non-smooth or even discontinuous objective multi-modality observations of objective values perturbed by noise high dimensionality, e.g., d 1000 black-box constraints (possibly non-smooth,...)
19 Introduction: Application Examples Tobias Glasmachers Natural Evolution Strategies for Direct Search 5
20 Introduction: Application Examples Tobias Glasmachers Natural Evolution Strategies for Direct Search 5 1 Tuning of parameters of expensive simulations, e.g., in engineering.
21 Introduction: Application Examples Tobias Glasmachers Natural Evolution Strategies for Direct Search 5 1 Tuning of parameters of expensive simulations, e.g., in engineering.
22 Introduction: Application Examples Tobias Glasmachers Natural Evolution Strategies for Direct Search 5 1 Tuning of parameters of expensive simulations, e.g., in engineering. 2 Tuning of parameters of machine learning algorithms.
23 Introduction: Application Examples Tobias Glasmachers Natural Evolution Strategies for Direct Search 5 1 Tuning of parameters of expensive simulations, e.g., in engineering. 2 Tuning of parameters of machine learning algorithms.
24 Evolution Strategies Tobias Glasmachers Natural Evolution Strategies for Direct Search 6 input m R d, σ > 0, C(= I) R d d loop sample offspring x 1,...,x λ N(m,σ 2 C) evaluate f(x 1 ),...,f(x λ ) select new population of size µ (e.g., best offspring) update mean vector m update global step size σ update covariance matrix C until stopping criterion met
25 Evolution Strategies Tobias Glasmachers Natural Evolution Strategies for Direct Search 6 input m R d, σ > 0, C(= I) R d d loop sample offspring x 1,...,x λ N(m,σ 2 C) evaluate f(x 1 ),...,f(x λ ) select new population of size µ (e.g., best offspring) update mean vector m update global step size σ update covariance matrix C until stopping criterion met randomized population-based rank-based (function-value free) step size control metric learning (covariance matrix adaptation, CMA)
26 Evolution Strategies: Design Principles Tobias Glasmachers Natural Evolution Strategies for Direct Search 7 Why rely on a population?
27 Evolution Strategies: Design Principles Tobias Glasmachers Natural Evolution Strategies for Direct Search 7 Why rely on a population? Not necessary, hillclimber (1+1)-ES also works, but populations are more robust for noisy and multi-modal problems.
28 Evolution Strategies: Design Principles Tobias Glasmachers Natural Evolution Strategies for Direct Search 7 Why rely on a population? Not necessary, hillclimber (1+1)-ES also works, but populations are more robust for noisy and multi-modal problems. Why online adaptation of the step size σ?
29 Evolution Strategies: Design Principles Tobias Glasmachers Natural Evolution Strategies for Direct Search 7 Why rely on a population? Not necessary, hillclimber (1+1)-ES also works, but populations are more robust for noisy and multi-modal problems. Why online adaptation of the step size σ? Scale-invariant algorithm linear convergence on scale-invariant problems locally linear convergence on C 2 functions.
30 Evolution Strategies: Design Principles Tobias Glasmachers Natural Evolution Strategies for Direct Search 7 Why rely on a population? Not necessary, hillclimber (1+1)-ES also works, but populations are more robust for noisy and multi-modal problems. Why online adaptation of the step size σ? Scale-invariant algorithm linear convergence on scale-invariant problems locally linear convergence on C 2 functions.
31 Evolution Strategies: Design Principles Tobias Glasmachers Natural Evolution Strategies for Direct Search 7 Why rely on a population? Not necessary, hillclimber (1+1)-ES also works, but populations are more robust for noisy and multi-modal problems. Why online adaptation of the step size σ? Scale-invariant algorithm linear convergence on scale-invariant problems locally linear convergence on C 2 functions.
32 Evolution Strategies: Design Principles Tobias Glasmachers Natural Evolution Strategies for Direct Search 7 Why rely on a population? Not necessary, hillclimber (1+1)-ES also works, but populations are more robust for noisy and multi-modal problems. Why online adaptation of the step size σ? Scale-invariant algorithm linear convergence on scale-invariant problems locally linear convergence on C 2 functions.
33 Evolution Strategies: Design Principles Tobias Glasmachers Natural Evolution Strategies for Direct Search 7 Why rely on a population? Not necessary, hillclimber (1+1)-ES also works, but populations are more robust for noisy and multi-modal problems. Why online adaptation of the step size σ? Scale-invariant algorithm linear convergence on scale-invariant problems locally linear convergence on C 2 functions. Why discard function values and keep only ranks?
34 Evolution Strategies: Design Principles Tobias Glasmachers Natural Evolution Strategies for Direct Search 7 Why rely on a population? Not necessary, hillclimber (1+1)-ES also works, but populations are more robust for noisy and multi-modal problems. Why online adaptation of the step size σ? Scale-invariant algorithm linear convergence on scale-invariant problems locally linear convergence on C 2 functions. Why discard function values and keep only ranks? Invariance to strictly monotonic (rank-preserving) transformations of objective values.
35 Evolution Strategies: Design Principles Tobias Glasmachers Natural Evolution Strategies for Direct Search 7 Why rely on a population? Not necessary, hillclimber (1+1)-ES also works, but populations are more robust for noisy and multi-modal problems. Why online adaptation of the step size σ? Scale-invariant algorithm linear convergence on scale-invariant problems locally linear convergence on C 2 functions. Why discard function values and keep only ranks? Invariance to strictly monotonic (rank-preserving) transformations of objective values.
36 Evolution Strategies: Design Principles Tobias Glasmachers Natural Evolution Strategies for Direct Search 7 Why rely on a population? Not necessary, hillclimber (1+1)-ES also works, but populations are more robust for noisy and multi-modal problems. Why online adaptation of the step size σ? Scale-invariant algorithm linear convergence on scale-invariant problems locally linear convergence on C 2 functions. Why discard function values and keep only ranks? Invariance to strictly monotonic (rank-preserving) transformations of objective values. Why CMA?
37 Evolution Strategies: Design Principles Tobias Glasmachers Natural Evolution Strategies for Direct Search 7 Why rely on a population? Not necessary, hillclimber (1+1)-ES also works, but populations are more robust for noisy and multi-modal problems. Why online adaptation of the step size σ? Scale-invariant algorithm linear convergence on scale-invariant problems locally linear convergence on C 2 functions. Why discard function values and keep only ranks? Invariance to strictly monotonic (rank-preserving) transformations of objective values. Why CMA? Efficient optimization of ill-conditioned problems, similar to (quasi) Newton methods.
38 Evolution Strategies: Design Principles Tobias Glasmachers Natural Evolution Strategies for Direct Search 7 Why rely on a population? Not necessary, hillclimber (1+1)-ES also works, but populations are more robust for noisy and multi-modal problems. Why online adaptation of the step size σ? Scale-invariant algorithm linear convergence on scale-invariant problems locally linear convergence on C 2 functions. Why discard function values and keep only ranks? Invariance to strictly monotonic (rank-preserving) transformations of objective values. Why CMA? Efficient optimization of ill-conditioned problems, similar to (quasi) Newton methods.
39 This Talk Tobias Glasmachers Natural Evolution Strategies for Direct Search 8 Given samples and their ranks, how to update the distribution parameters?
40 Information Geometric Perspective Tobias Glasmachers Natural Evolution Strategies for Direct Search 9
41 Information Geometric Perspective Tobias Glasmachers Natural Evolution Strategies for Direct Search 9 Change of perspective: algorithm state population x 1,...,x µ distribution N(m,C).
42 Information Geometric Perspective Tobias Glasmachers Natural Evolution Strategies for Direct Search 9 Change of perspective: algorithm state population x 1,...,x µ distribution N(m,C). Parameters (m,c) = θ Θ, distribution P θ, pdf p θ.
43 Information Geometric Perspective Tobias Glasmachers Natural Evolution Strategies for Direct Search 9 Change of perspective: algorithm state population x 1,...,x µ distribution N(m,C). Parameters (m,c) = θ Θ, distribution P θ, pdf p θ. For multi-variate Gaussians: Θ = R d P d { } P d = M R d d M = M T,M pos. def.
44 Information Geometric Perspective Tobias Glasmachers Natural Evolution Strategies for Direct Search 9 Change of perspective: algorithm state population x 1,...,x µ distribution N(m,C). Parameters (m,c) = θ Θ, distribution P θ, pdf p θ. For multi-variate Gaussians: Θ = R d P d { } P d = M R d d M = M T,M pos. def. Statistical manifold of distributions { P θ θ Θ } = Θ.
45 Information Geometric Perspective Tobias Glasmachers Natural Evolution Strategies for Direct Search 9 Change of perspective: algorithm state population x 1,...,x µ distribution N(m,C). Parameters (m,c) = θ Θ, distribution P θ, pdf p θ. For multi-variate Gaussians: Θ = R d P d { } P d = M R d d M = M T,M pos. def. Statistical manifold of distributions { P θ θ Θ } = Θ. Equipped with intrinsic (Riemannian) geometry; metric tensor given by Fisher information matrix I(θ).
46 Information Geometric Perspective Tobias Glasmachers Natural Evolution Strategies for Direct Search 10 Goal: optimization of θ Θ instead of x R d.
47 Information Geometric Perspective Tobias Glasmachers Natural Evolution Strategies for Direct Search 10 Goal: optimization of θ Θ instead of x R d. Lift objective function f : R d R to objective W f : Θ R.
48 Information Geometric Perspective Tobias Glasmachers Natural Evolution Strategies for Direct Search 10 Goal: optimization of θ Θ instead of x R d. Lift objective function f : R d R to objective W f : Θ R. Simplest choice: ] W f (θ) = E x Pθ [f(x)
49 Information Geometric Perspective Tobias Glasmachers Natural Evolution Strategies for Direct Search 10 Goal: optimization of θ Θ instead of x R d. Lift objective function f : R d R to objective W f : Θ R. Simplest choice: More flexible choice: ] W f (θ) = E x Pθ [f(x) W f (θ) = E x Pθ [ w ( f(x) )] with monotonic weight function w : R R.
50 Information Geometric Perspective Tobias Glasmachers Natural Evolution Strategies for Direct Search 11 Expectation operator adds one degree of smoothness.
51 Information Geometric Perspective Tobias Glasmachers Natural Evolution Strategies for Direct Search 11 Expectation operator adds one degree of smoothness. Hence under weak assumptions W f (θ) can be optimized with gradient descent (GD): θ θ η θ W f (θ)
52 Information Geometric Perspective Tobias Glasmachers Natural Evolution Strategies for Direct Search 11 Expectation operator adds one degree of smoothness. Hence under weak assumptions W f (θ) can be optimized with gradient descent (GD): log-likelohood trick : θ θ η θ W f (θ) θ W f (θ) = θ E x Pθ [ w ( f(x) )] = E x Pθ [ θ log ( p θ (x) ) w ( f(x) )]
53 Information Geometric Perspective Tobias Glasmachers Natural Evolution Strategies for Direct Search 12 Problem: this does not work well. Why?
54 Information Geometric Perspective Tobias Glasmachers Natural Evolution Strategies for Direct Search 12 Problem: this does not work well. Why? We have to replace the plain gradient on (Euclidean) parameter space Θ with the natural gradient respecting the intrinsic geometry of the statistical manifold: θ W f (θ) = ( I(θ)) 1 θ W f (θ)
55 Information Geometric Perspective Tobias Glasmachers Natural Evolution Strategies for Direct Search 12 Problem: this does not work well. Why? We have to replace the plain gradient on (Euclidean) parameter space Θ with the natural gradient respecting the intrinsic geometry of the statistical manifold: θ W f (θ) = ( I(θ)) 1 θ W f (θ) The natural gradient is invariant under changes of the parameterization θ P θ.
56 Information Geometric Perspective Tobias Glasmachers Natural Evolution Strategies for Direct Search 12 Problem: this does not work well. Why? We have to replace the plain gradient on (Euclidean) parameter space Θ with the natural gradient respecting the intrinsic geometry of the statistical manifold: θ W f (θ) = ( I(θ)) 1 θ W f (θ) The natural gradient is invariant under changes of the parameterization θ P θ. (Natural) gradient vector field Θ T Θ defines (natural) gradient flow φ t (θ) tangential to θ W f (θ).
57 Information Geometric Perspective Tobias Glasmachers Natural Evolution Strategies for Direct Search 12 Problem: this does not work well. Why? We have to replace the plain gradient on (Euclidean) parameter space Θ with the natural gradient respecting the intrinsic geometry of the statistical manifold: θ W f (θ) = ( I(θ)) 1 θ W f (θ) The natural gradient is invariant under changes of the parameterization θ P θ. (Natural) gradient vector field Θ T Θ defines (natural) gradient flow φ t (θ) tangential to θ W f (θ).
58 Information Geometric Perspective Tobias Glasmachers Natural Evolution Strategies for Direct Search 12 Problem: this does not work well. Why? We have to replace the plain gradient on (Euclidean) parameter space Θ with the natural gradient respecting the intrinsic geometry of the statistical manifold: θ W f (θ) = ( I(θ)) 1 θ W f (θ) The natural gradient is invariant under changes of the parameterization θ P θ. (Natural) gradient vector field Θ T Θ defines (natural) gradient flow φ t (θ) tangential to θ W f (θ). Optimization: follow inverse flow curves t φ t (θ) from θ into (local) minimum of W f.
59 Information Geometric Perspective Tobias Glasmachers Natural Evolution Strategies for Direct Search 13 Black-box setting: expectation is intractable. E x Pθ [ θ log ( p θ (x) ) w ( f(x) )]
60 Information Geometric Perspective Tobias Glasmachers Natural Evolution Strategies for Direct Search 13 Black-box setting: expectation [ E x Pθ θ log ( p θ (x) ) w ( f(x) )] is intractable. Monte Carlo (MC) approximation θ W f (θ) G(θ) = 1 λ for x 1,...,x λ P θ. λ [ θ log ( p θ (x i ) ) w ( f(x i ) )] i=1
61 Information Geometric Perspective Tobias Glasmachers Natural Evolution Strategies for Direct Search 13 Black-box setting: expectation is intractable. E x Pθ [ θ log ( p θ (x) ) w ( f(x) )] Monte Carlo (MC) approximation θ W f (θ) G(θ) = 1 λ for x 1,...,x λ P θ. λ [ θ log ( p θ (x i ) ) w ( f(x i ) )] i=1 The estimate E[G(θ)] = θ W f (θ) is consistent, hence following G(θ) amounts to stochastic gradient descent.
62 Information Geometric Perspective Tobias Glasmachers Natural Evolution Strategies for Direct Search 13 Black-box setting: expectation is intractable. E x Pθ [ θ log ( p θ (x) ) w ( f(x) )] Monte Carlo (MC) approximation θ W f (θ) G(θ) = 1 λ for x 1,...,x λ P θ. λ [ θ log ( p θ (x i ) ) w ( f(x i ) )] i=1 The estimate E[G(θ)] = θ W f (θ) is consistent, hence following G(θ) amounts to stochastic gradient descent. Yields natural gradient MC approximation G(θ) = ( I(θ) ) 1 G(θ).
63 Information Geometric Perspective Tobias Glasmachers Natural Evolution Strategies for Direct Search 14 Stochastic Natural Gradient Descent (SNGD) update rule: θ θ η G(θ)
64 Information Geometric Perspective Tobias Glasmachers Natural Evolution Strategies for Direct Search 14 Stochastic Natural Gradient Descent (SNGD) update rule: θ θ η G(θ) SNGD is a two-fold approximation of the gradient flow: discretized time: Euler steps, randomized gradient: MC sampling.
65 Information Geometric Perspective Tobias Glasmachers Natural Evolution Strategies for Direct Search 14 Stochastic Natural Gradient Descent (SNGD) update rule: θ θ η G(θ) SNGD is a two-fold approximation of the gradient flow: discretized time: Euler steps, randomized gradient: MC sampling. Natural gradient flow is invariant under choice of distribution parameters θ P θ.
66 Information Geometric Perspective Tobias Glasmachers Natural Evolution Strategies for Direct Search 14 Stochastic Natural Gradient Descent (SNGD) update rule: θ θ η G(θ) SNGD is a two-fold approximation of the gradient flow: discretized time: Euler steps, randomized gradient: MC sampling. Natural gradient flow is invariant under choice of distribution parameters θ P θ. SNGD algorithm invariant in first order approximation due to Euler steps.
67 Information Geometric Perspective Tobias Glasmachers Natural Evolution Strategies for Direct Search 14 Stochastic Natural Gradient Descent (SNGD) update rule: θ θ η G(θ) SNGD is a two-fold approximation of the gradient flow: discretized time: Euler steps, randomized gradient: MC sampling. Natural gradient flow is invariant under choice of distribution parameters θ P θ. SNGD algorithm invariant in first order approximation due to Euler steps. Ollivier et al. Information-Geometric Optimization Algorithms: A Unifying Picture via Invariance Principles. arxiv: , 2011.
68 Natural (Gradient) Evolution Strategies Tobias Glasmachers Natural Evolution Strategies for Direct Search 15 Natural (gradient) Evolution Strategies (NES) closely follow this scheme. They were the first ES derived this way.
69 Natural (Gradient) Evolution Strategies Tobias Glasmachers Natural Evolution Strategies for Direct Search 15 Natural (gradient) Evolution Strategies (NES) closely follow this scheme. They were the first ES derived this way. NES approach: optimization of expected objective value with Gaussian distributions.
70 Natural (Gradient) Evolution Strategies Tobias Glasmachers Natural Evolution Strategies for Direct Search 15 Natural (gradient) Evolution Strategies (NES) closely follow this scheme. They were the first ES derived this way. NES approach: optimization of expected objective value with Gaussian distributions. Offspring x 1,...,x λ act as MC sample for the estimation of G(θ).
71 Natural (Gradient) Evolution Strategies Tobias Glasmachers Natural Evolution Strategies for Direct Search 15 Natural (gradient) Evolution Strategies (NES) closely follow this scheme. They were the first ES derived this way. NES approach: optimization of expected objective value with Gaussian distributions. Offspring x 1,...,x λ act as MC sample for the estimation of G(θ). Closed form Fisher tensor for N(m,C): I i,j (θ) = mt C 1 m + 1 ( ) θ i θ j 2 tr C 1 C C 1 C θ i θ j
72 Natural (Gradient) Evolution Strategies Tobias Glasmachers Natural Evolution Strategies for Direct Search 16 NES algorithm input θ Θ, λ N, η > 0 loop sample x 1,...,x λ P θ evaluate f(x 1 ),...,f(x λ ) G(θ) 1 λ λ i=1 θlog ( p θ (x i ) ) f(x i ) G(θ) ( I(θ) ) 1 G(θ) θ θ η G(θ) until stopping criterion met Wierstra et al. Natural Evolution Strategies. CEC, 2008 and JMLR, 2014.
73 Natural (Gradient) Evolution Strategies Tobias Glasmachers Natural Evolution Strategies for Direct Search 17 NES works better when replacing f(x i ) in G(θ) = 1 λ λ [ θ log ( p θ (x i ) ) f(x i ) ] i=1 with rank-based utility values w 1,...,w λ.
74 Natural (Gradient) Evolution Strategies Tobias Glasmachers Natural Evolution Strategies for Direct Search 17 NES works better when replacing f(x i ) in G(θ) = 1 λ λ [ θ log ( p θ (x i ) ) f(x i ) ] i=1 with rank-based utility values w 1,...,w λ. Sample ranks are f-quantile estimators, hence utility values can be represented as w(f(x i )) with special weight function w = w θ based on f-quantiles under current distribution P θ.
75 Natural (Gradient) Evolution Strategies Tobias Glasmachers Natural Evolution Strategies for Direct Search 17 NES works better when replacing f(x i ) in G(θ) = 1 λ λ [ θ log ( p θ (x i ) ) f(x i ) ] i=1 with rank-based utility values w 1,...,w λ. Sample ranks are f-quantile estimators, hence utility values can be represented as w(f(x i )) with special weight function w = w θ based on f-quantiles under current distribution P θ. This turns NES into a function value free algorithm.
76 Natural (Gradient) Evolution Strategies Tobias Glasmachers Natural Evolution Strategies for Direct Search 17 NES works better when replacing f(x i ) in G(θ) = 1 λ λ [ θ log ( p θ (x i ) ) f(x i ) ] i=1 with rank-based utility values w 1,...,w λ. Sample ranks are f-quantile estimators, hence utility values can be represented as w(f(x i )) with special weight function w = w θ based on f-quantiles under current distribution P θ. This turns NES into a function value free algorithm. Benefits: invariance under monotonic transformations of objective values linear convergence on scale invariant problems.
77 Exponential Natural Evolution Strategies Tobias Glasmachers Natural Evolution Strategies for Direct Search 18 xnes (exponential NES) algorithm input (m,a) Θ, λ N, η m,η A > 0 loop sample z 1,...,z λ N(0,I) transform x k,...,x λ Az k +m evaluate f(x 1 ),...,f(x λ ) G m (θ) 1 λ λ i=1 w(f(x i)) z i G C (θ) 1 λ λ i=1 w(f(x i)) 1 2 (z izi T I) m m η m ( A G m (θ) A A exp η A 1 G ) 2 C (θ) until stopping criterion met Glasmachers et al. Exponential Natural Evolution Strategies. GECCO, 2010.
78 Natural (Gradient) Evolution Strategies Tobias Glasmachers Natural Evolution Strategies for Direct Search 19 The resulting algorithm is indeed an ES (R d perspective), and at the same time an SNGD algorithm (Θ perspective).
79 Natural (Gradient) Evolution Strategies Tobias Glasmachers Natural Evolution Strategies for Direct Search 19 The resulting algorithm is indeed an ES (R d perspective), and at the same time an SNGD algorithm (Θ perspective). ES traditionally have three distinct and often very different mechanisms for optimization: adaptation of m, step size control: adaptation of σ, metric learning: adaptation of C.
80 Natural (Gradient) Evolution Strategies Tobias Glasmachers Natural Evolution Strategies for Direct Search 19 The resulting algorithm is indeed an ES (R d perspective), and at the same time an SNGD algorithm (Θ perspective). ES traditionally have three distinct and often very different mechanisms for optimization: adaptation of m, step size control: adaptation of σ, metric learning: adaptation of C. SNGD has only a single mechanism.
81 Natural (Gradient) Evolution Strategies Tobias Glasmachers Natural Evolution Strategies for Direct Search 19 The resulting algorithm is indeed an ES (R d perspective), and at the same time an SNGD algorithm (Θ perspective). ES traditionally have three distinct and often very different mechanisms for optimization: adaptation of m, step size control: adaptation of σ, metric learning: adaptation of C. SNGD has only a single mechanism. Astonishing insight: all (most) parameters can be updated with a single mechanism.
82 Exponential Natural Evolution Strategies Tobias Glasmachers Natural Evolution Strategies for Direct Search 20 Although derived completely differently, this algorithm turns out to be closely related to CMA-ES. Akimoto et al. Bidirectional relation between CMA evolution strategies and natural evolution strategies. PPSN 2010.
83 Exponential Natural Evolution Strategies Tobias Glasmachers Natural Evolution Strategies for Direct Search 20 Although derived completely differently, this algorithm turns out to be closely related to CMA-ES. Akimoto et al. Bidirectional relation between CMA evolution strategies and natural evolution strategies. PPSN In particular, the update of m is identical and the rank-µ update of CMA-ES coincides with a NES update.
84 Exponential Natural Evolution Strategies Tobias Glasmachers Natural Evolution Strategies for Direct Search 20 Although derived completely differently, this algorithm turns out to be closely related to CMA-ES. Akimoto et al. Bidirectional relation between CMA evolution strategies and natural evolution strategies. PPSN In particular, the update of m is identical and the rank-µ update of CMA-ES coincides with a NES update. This is astonishing, it is surely not by coincidence.
85 Exponential Natural Evolution Strategies Tobias Glasmachers Natural Evolution Strategies for Direct Search 20 Although derived completely differently, this algorithm turns out to be closely related to CMA-ES. Akimoto et al. Bidirectional relation between CMA evolution strategies and natural evolution strategies. PPSN In particular, the update of m is identical and the rank-µ update of CMA-ES coincides with a NES update. This is astonishing, it is surely not by coincidence. This is insightful: it means that CMA-ES is (essentially) a SNGD algorithm.
86 Summary Tobias Glasmachers Natural Evolution Strategies for Direct Search 21
87 Summary Tobias Glasmachers Natural Evolution Strategies for Direct Search 21 Evolution Strategies (ES) are randomized direct search methods suitable for continuous black box optimization.
88 Summary Tobias Glasmachers Natural Evolution Strategies for Direct Search 21 Evolution Strategies (ES) are randomized direct search methods suitable for continuous black box optimization. Here we have focused on a core question: how to update the distribution parameters in a principled way?
89 Summary Tobias Glasmachers Natural Evolution Strategies for Direct Search 21 Evolution Strategies (ES) are randomized direct search methods suitable for continuous black box optimization. Here we have focused on a core question: how to update the distribution parameters in a principled way? The parameter space is equipped with the non-euclidean information geometry of the corresponding statistical manifold of search distributions.
90 Summary Tobias Glasmachers Natural Evolution Strategies for Direct Search 21 Evolution Strategies (ES) are randomized direct search methods suitable for continuous black box optimization. Here we have focused on a core question: how to update the distribution parameters in a principled way? The parameter space is equipped with the non-euclidean information geometry of the corresponding statistical manifold of search distributions. SNGD on a stochastically relaxed problem results in a direct search algorithm: the Natural-gradient Evolution Strategy (NES) algorithm.
91 Summary Tobias Glasmachers Natural Evolution Strategies for Direct Search 21 Evolution Strategies (ES) are randomized direct search methods suitable for continuous black box optimization. Here we have focused on a core question: how to update the distribution parameters in a principled way? The parameter space is equipped with the non-euclidean information geometry of the corresponding statistical manifold of search distributions. SNGD on a stochastically relaxed problem results in a direct search algorithm: the Natural-gradient Evolution Strategy (NES) algorithm. The SNGD parameter update is by no means restricted to Gaussian distributions. It is a general construction template for update equations of continuous distribution parameters.
92 Thank you! Questions? Tobias Glasmachers Natural Evolution Strategies for Direct Search 22
How information theory sheds new light on black-box optimization
How information theory sheds new light on black-box optimization Anne Auger Colloque Théorie de l information : nouvelles frontières dans le cadre du centenaire de Claude Shannon IHP, 26, 27 and 28 October,
More informationA CMA-ES with Multiplicative Covariance Matrix Updates
A with Multiplicative Covariance Matrix Updates Oswin Krause Department of Computer Science University of Copenhagen Copenhagen,Denmark oswin.krause@di.ku.dk Tobias Glasmachers Institut für Neuroinformatik
More informationNatural Gradient for the Gaussian Distribution via Least Squares Regression
Natural Gradient for the Gaussian Distribution via Least Squares Regression Luigi Malagò Department of Electrical and Electronic Engineering Shinshu University & INRIA Saclay Île-de-France 4-17-1 Wakasato,
More informationarxiv: v4 [math.oc] 28 Apr 2017
Journal of Machine Learning Research 18 (2017) 1-65 Submitted 11/14; Revised 10/16; Published 4/17 Information-Geometric Optimization Algorithms: A Unifying Picture via Invariance Principles arxiv:1106.3708v4
More informationInformation geometry of mirror descent
Information geometry of mirror descent Geometric Science of Information Anthea Monod Department of Statistical Science Duke University Information Initiative at Duke G. Raskutti (UW Madison) and S. Mukherjee
More informationAdvanced Optimization
Advanced Optimization Lecture 3: 1: Randomized Algorithms for for Continuous Discrete Problems Problems November 22, 2016 Master AIC Université Paris-Saclay, Orsay, France Anne Auger INRIA Saclay Ile-de-France
More informationInformation-Geometric Optimization Algorithms: A Unifying Picture via Invariance Principles
Information-Geometric Optimization Algorithms: A Unifying Picture via Invariance Principles Yann Ollivier, Ludovic Arnold, Anne Auger, Nikolaus Hansen To cite this version: Yann Ollivier, Ludovic Arnold,
More informationCS 542G: Robustifying Newton, Constraints, Nonlinear Least Squares
CS 542G: Robustifying Newton, Constraints, Nonlinear Least Squares Robert Bridson October 29, 2008 1 Hessian Problems in Newton Last time we fixed one of plain Newton s problems by introducing line search
More informationInformation geometry for bivariate distribution control
Information geometry for bivariate distribution control C.T.J.Dodson + Hong Wang Mathematics + Control Systems Centre, University of Manchester Institute of Science and Technology Optimal control of stochastic
More informationInformation-Geometric Optimization Algorithms: A Unifying Picture via Invariance Principles
Information-Geometric Optimization Algorithms: A Unifying Picture via Invariance Principles Ludovic Arnold, Anne Auger, Nikolaus Hansen, Yann Ollivier To cite this version: Ludovic Arnold, Anne Auger,
More informationAfternoon Meeting on Bayesian Computation 2018 University of Reading
Gabriele Abbati 1, Alessra Tosi 2, Seth Flaxman 3, Michael A Osborne 1 1 University of Oxford, 2 Mind Foundry Ltd, 3 Imperial College London Afternoon Meeting on Bayesian Computation 2018 University of
More informationStochastic Optimization Algorithms Beyond SG
Stochastic Optimization Algorithms Beyond SG Frank E. Curtis 1, Lehigh University involving joint work with Léon Bottou, Facebook AI Research Jorge Nocedal, Northwestern University Optimization Methods
More informationCSC2541 Lecture 5 Natural Gradient
CSC2541 Lecture 5 Natural Gradient Roger Grosse Roger Grosse CSC2541 Lecture 5 Natural Gradient 1 / 12 Motivation Two classes of optimization procedures used throughout ML (stochastic) gradient descent,
More informationPractical Bayesian Optimization of Machine Learning. Learning Algorithms
Practical Bayesian Optimization of Machine Learning Algorithms CS 294 University of California, Berkeley Tuesday, April 20, 2016 Motivation Machine Learning Algorithms (MLA s) have hyperparameters that
More informationTowards stability and optimality in stochastic gradient descent
Towards stability and optimality in stochastic gradient descent Panos Toulis, Dustin Tran and Edoardo M. Airoldi August 26, 2016 Discussion by Ikenna Odinaka Duke University Outline Introduction 1 Introduction
More informationarxiv: v2 [cs.ai] 13 Jun 2011
A Linear Time Natural Evolution Strategy for Non-Separable Functions Yi Sun, Faustino Gomez, Tom Schaul, and Jürgen Schmidhuber IDSIA, University of Lugano & SUPSI, Galleria, Manno, CH-698, Switzerland
More informationarxiv: v1 [cs.ne] 9 May 2016
Anytime Bi-Objective Optimization with a Hybrid Multi-Objective CMA-ES (HMO-CMA-ES) arxiv:1605.02720v1 [cs.ne] 9 May 2016 ABSTRACT Ilya Loshchilov University of Freiburg Freiburg, Germany ilya.loshchilov@gmail.com
More informationManifold Monte Carlo Methods
Manifold Monte Carlo Methods Mark Girolami Department of Statistical Science University College London Joint work with Ben Calderhead Research Section Ordinary Meeting The Royal Statistical Society October
More informationDiscrete Variables and Gradient Estimators
iscrete Variables and Gradient Estimators This assignment is designed to get you comfortable deriving gradient estimators, and optimizing distributions over discrete random variables. For most questions,
More informationIntroduction to Optimization
Introduction to Optimization Blackbox Optimization Marc Toussaint U Stuttgart Blackbox Optimization The term is not really well defined I use it to express that only f(x) can be evaluated f(x) or 2 f(x)
More informationNonparametric Bayesian Methods (Gaussian Processes)
[70240413 Statistical Machine Learning, Spring, 2015] Nonparametric Bayesian Methods (Gaussian Processes) Jun Zhu dcszj@mail.tsinghua.edu.cn http://bigml.cs.tsinghua.edu.cn/~jun State Key Lab of Intelligent
More informationTutorial CMA-ES Evolution Strategies and Covariance Matrix Adaptation
Tutorial CMA-ES Evolution Strategies and Covariance Matrix Adaptation Anne Auger & Nikolaus Hansen INRIA Research Centre Saclay Île-de-France Project team TAO University Paris-Sud, LRI (UMR 8623), Bat.
More informationBlock Diagonal Natural Evolution Strategies
Block Diagonal Natural Evolution Strategies Giuseppe Cuccu and Faustino Gomez IDSIA USI-SUPSI 6928 Manno-Lugano, Switzerland {giuse,tino}@idsia.ch http://www.idsia.ch/{~giuse,~tino} Abstract. The Natural
More informationBI-population CMA-ES Algorithms with Surrogate Models and Line Searches
BI-population CMA-ES Algorithms with Surrogate Models and Line Searches Ilya Loshchilov 1, Marc Schoenauer 2 and Michèle Sebag 2 1 LIS, École Polytechnique Fédérale de Lausanne 2 TAO, INRIA CNRS Université
More informationEvolution Strategies and Covariance Matrix Adaptation
Evolution Strategies and Covariance Matrix Adaptation Cours Contrôle Avancé - Ecole Centrale Paris Anne Auger January 2014 INRIA Research Centre Saclay Île-de-France University Paris-Sud, LRI (UMR 8623),
More informationA new Hierarchical Bayes approach to ensemble-variational data assimilation
A new Hierarchical Bayes approach to ensemble-variational data assimilation Michael Tsyrulnikov and Alexander Rakitko HydroMetCenter of Russia College Park, 20 Oct 2014 Michael Tsyrulnikov and Alexander
More informationProblem Statement Continuous Domain Search/Optimization. Tutorial Evolution Strategies and Related Estimation of Distribution Algorithms.
Tutorial Evolution Strategies and Related Estimation of Distribution Algorithms Anne Auger & Nikolaus Hansen INRIA Saclay - Ile-de-France, project team TAO Universite Paris-Sud, LRI, Bat. 49 945 ORSAY
More informationCMA-ES a Stochastic Second-Order Method for Function-Value Free Numerical Optimization
CMA-ES a Stochastic Second-Order Method for Function-Value Free Numerical Optimization Nikolaus Hansen INRIA, Research Centre Saclay Machine Learning and Optimization Team, TAO Univ. Paris-Sud, LRI MSRC
More informationAlgorithms for Variational Learning of Mixture of Gaussians
Algorithms for Variational Learning of Mixture of Gaussians Instructors: Tapani Raiko and Antti Honkela Bayes Group Adaptive Informatics Research Center 28.08.2008 Variational Bayesian Inference Mixture
More informationGaussian processes. Chuong B. Do (updated by Honglak Lee) November 22, 2008
Gaussian processes Chuong B Do (updated by Honglak Lee) November 22, 2008 Many of the classical machine learning algorithms that we talked about during the first half of this course fit the following pattern:
More informationConvex Optimization Lecture 16
Convex Optimization Lecture 16 Today: Projected Gradient Descent Conditional Gradient Descent Stochastic Gradient Descent Random Coordinate Descent Recall: Gradient Descent (Steepest Descent w.r.t Euclidean
More informationNew Insights and Perspectives on the Natural Gradient Method
1 / 18 New Insights and Perspectives on the Natural Gradient Method Yoonho Lee Department of Computer Science and Engineering Pohang University of Science and Technology March 13, 2018 Motivation 2 / 18
More informationTutorial on: Optimization I. (from a deep learning perspective) Jimmy Ba
Tutorial on: Optimization I (from a deep learning perspective) Jimmy Ba Outline Random search v.s. gradient descent Finding better search directions Design white-box optimization methods to improve computation
More informationAnnealing Between Distributions by Averaging Moments
Annealing Between Distributions by Averaging Moments Chris J. Maddison Dept. of Comp. Sci. University of Toronto Roger Grosse CSAIL MIT Ruslan Salakhutdinov University of Toronto Partition Functions We
More informationCOMP 551 Applied Machine Learning Lecture 20: Gaussian processes
COMP 55 Applied Machine Learning Lecture 2: Gaussian processes Instructor: Ryan Lowe (ryan.lowe@cs.mcgill.ca) Slides mostly by: (herke.vanhoof@mcgill.ca) Class web page: www.cs.mcgill.ca/~hvanho2/comp55
More informationThe Extended Kalman Filter is a Natural Gradient Descent in Trajectory Space
The Extended Kalman Filter is a Natural Gradient Descent in Trajectory Space Yann Ollivier Abstract The extended Kalman filter is perhaps the most standard tool to estimate in real time the state of a
More informationCovariance Matrix Adaptation in Multiobjective Optimization
Covariance Matrix Adaptation in Multiobjective Optimization Dimo Brockhoff INRIA Lille Nord Europe October 30, 2014 PGMO-COPI 2014, Ecole Polytechnique, France Mastertitelformat Scenario: Multiobjective
More informationZero-Order Methods for the Optimization of Noisy Functions. Jorge Nocedal
Zero-Order Methods for the Optimization of Noisy Functions Jorge Nocedal Northwestern University Simons Institute, October 2017 1 Collaborators Albert Berahas Northwestern University Richard Byrd University
More informationStochastic gradient descent on Riemannian manifolds
Stochastic gradient descent on Riemannian manifolds Silvère Bonnabel 1 Centre de Robotique - Mathématiques et systèmes Mines ParisTech SMILE Seminar Mines ParisTech Novembre 14th, 2013 1 silvere.bonnabel@mines-paristech
More informationAn Evolving Gradient Resampling Method for Machine Learning. Jorge Nocedal
An Evolving Gradient Resampling Method for Machine Learning Jorge Nocedal Northwestern University NIPS, Montreal 2015 1 Collaborators Figen Oztoprak Stefan Solntsev Richard Byrd 2 Outline 1. How to improve
More informationStochastic optimization and a variable metric approach
The challenges for stochastic optimization and a variable metric approach Microsoft Research INRIA Joint Centre, INRIA Saclay April 6, 2009 Content 1 Introduction 2 The Challenges 3 Stochastic Search 4
More informationStatistical Machine Learning Lecture 8: Markov Chain Monte Carlo Sampling
1 / 27 Statistical Machine Learning Lecture 8: Markov Chain Monte Carlo Sampling Melih Kandemir Özyeğin University, İstanbul, Turkey 2 / 27 Monte Carlo Integration The big question : Evaluate E p(z) [f(z)]
More informationNon-convex optimization. Issam Laradji
Non-convex optimization Issam Laradji Strongly Convex Objective function f(x) x Strongly Convex Objective function Assumptions Gradient Lipschitz continuous f(x) Strongly convex x Strongly Convex Objective
More informationLinear Classification. CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington
Linear Classification CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington 1 Example of Linear Classification Red points: patterns belonging
More informationStochastic gradient descent on Riemannian manifolds
Stochastic gradient descent on Riemannian manifolds Silvère Bonnabel 1 Robotics lab - Mathématiques et systèmes Mines ParisTech Gipsa-lab, Grenoble June 20th, 2013 1 silvere.bonnabel@mines-paristech Introduction
More informationVariational Autoencoders
Variational Autoencoders Recap: Story so far A classification MLP actually comprises two components A feature extraction network that converts the inputs into linearly separable features Or nearly linearly
More informationHomework 5. Convex Optimization /36-725
Homework 5 Convex Optimization 10-725/36-725 Due Tuesday November 22 at 5:30pm submitted to Christoph Dann in Gates 8013 (Remember to a submit separate writeup for each problem, with your name at the top)
More informationDiscrete Latent Variable Models
Discrete Latent Variable Models Stefano Ermon, Aditya Grover Stanford University Lecture 14 Stefano Ermon, Aditya Grover (AI Lab) Deep Generative Models Lecture 14 1 / 29 Summary Major themes in the course
More informationApproximating Covering and Minimum Enclosing Balls in Hyperbolic Geometry
Approximating Covering and Minimum Enclosing Balls in Hyperbolic Geometry Frank Nielsen 1 and Gaëtan Hadjeres 1 École Polytechnique, France/Sony Computer Science Laboratories, Japan Frank.Nielsen@acm.org
More informationNon-Convex Optimization. CS6787 Lecture 7 Fall 2017
Non-Convex Optimization CS6787 Lecture 7 Fall 2017 First some words about grading I sent out a bunch of grades on the course management system Everyone should have all their grades in Not including paper
More informationDegrees of Freedom in Regression Ensembles
Degrees of Freedom in Regression Ensembles Henry WJ Reeve Gavin Brown University of Manchester - School of Computer Science Kilburn Building, University of Manchester, Oxford Rd, Manchester M13 9PL Abstract.
More informationFast yet Simple Natural-Gradient Variational Inference in Complex Models
Fast yet Simple Natural-Gradient Variational Inference in Complex Models Mohammad Emtiyaz Khan RIKEN Center for AI Project, Tokyo, Japan http://emtiyaz.github.io Joint work with Wu Lin (UBC) DidrikNielsen,
More informationProbabilistic Graphical Models
Probabilistic Graphical Models Brown University CSCI 295-P, Spring 213 Prof. Erik Sudderth Lecture 11: Inference & Learning Overview, Gaussian Graphical Models Some figures courtesy Michael Jordan s draft
More informationNONLINEAR CLASSIFICATION AND REGRESSION. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition
NONLINEAR CLASSIFICATION AND REGRESSION Nonlinear Classification and Regression: Outline 2 Multi-Layer Perceptrons The Back-Propagation Learning Algorithm Generalized Linear Models Radial Basis Function
More informationPROJECTION METHODS FOR DYNAMIC MODELS
PROJECTION METHODS FOR DYNAMIC MODELS Kenneth L. Judd Hoover Institution and NBER June 28, 2006 Functional Problems Many problems involve solving for some unknown function Dynamic programming Consumption
More informationCh 4. Linear Models for Classification
Ch 4. Linear Models for Classification Pattern Recognition and Machine Learning, C. M. Bishop, 2006. Department of Computer Science and Engineering Pohang University of Science and echnology 77 Cheongam-ro,
More informationDistributed Estimation, Information Loss and Exponential Families. Qiang Liu Department of Computer Science Dartmouth College
Distributed Estimation, Information Loss and Exponential Families Qiang Liu Department of Computer Science Dartmouth College Statistical Learning / Estimation Learning generative models from data Topic
More informationBenchmarking Natural Evolution Strategies with Adaptation Sampling on the Noiseless and Noisy Black-box Optimization Testbeds
Benchmarking Natural Evolution Strategies with Adaptation Sampling on the Noiseless and Noisy Black-box Optimization Testbeds Tom Schaul Courant Institute of Mathematical Sciences, New York University
More informationLecture 3 September 1
STAT 383C: Statistical Modeling I Fall 2016 Lecture 3 September 1 Lecturer: Purnamrita Sarkar Scribe: Giorgio Paulon, Carlos Zanini Disclaimer: These scribe notes have been slightly proofread and may have
More informationNeighbourhoods of Randomness and Independence
Neighbourhoods of Randomness and Independence C.T.J. Dodson School of Mathematics, Manchester University Augment information geometric measures in spaces of distributions, via explicit geometric representations
More informationPOLI 8501 Introduction to Maximum Likelihood Estimation
POLI 8501 Introduction to Maximum Likelihood Estimation Maximum Likelihood Intuition Consider a model that looks like this: Y i N(µ, σ 2 ) So: E(Y ) = µ V ar(y ) = σ 2 Suppose you have some data on Y,
More informationMachine Learning and Computational Statistics, Spring 2017 Homework 2: Lasso Regression
Machine Learning and Computational Statistics, Spring 2017 Homework 2: Lasso Regression Due: Monday, February 13, 2017, at 10pm (Submit via Gradescope) Instructions: Your answers to the questions below,
More informationTopics we covered. Machine Learning. Statistics. Optimization. Systems! Basics of probability Tail bounds Density Estimation Exponential Families
Midterm Review Topics we covered Machine Learning Optimization Basics of optimization Convexity Unconstrained: GD, SGD Constrained: Lagrange, KKT Duality Linear Methods Perceptrons Support Vector Machines
More informationConvex Optimization Algorithms for Machine Learning in 10 Slides
Convex Optimization Algorithms for Machine Learning in 10 Slides Presenter: Jul. 15. 2015 Outline 1 Quadratic Problem Linear System 2 Smooth Problem Newton-CG 3 Composite Problem Proximal-Newton-CD 4 Non-smooth,
More informationOptimization and Gradient Descent
Optimization and Gradient Descent INFO-4604, Applied Machine Learning University of Colorado Boulder September 12, 2017 Prof. Michael Paul Prediction Functions Remember: a prediction function is the function
More informationIntroduction to Optimization
Introduction to Optimization Konstantin Tretyakov (kt@ut.ee) MTAT.03.227 Machine Learning So far Machine learning is important and interesting The general concept: Fitting models to data So far Machine
More informationStatistical Machine Learning from Data
Samy Bengio Statistical Machine Learning from Data 1 Statistical Machine Learning from Data Support Vector Machines Samy Bengio IDIAP Research Institute, Martigny, Switzerland, and Ecole Polytechnique
More informationHigh Dimensions and Heavy Tails for Natural Evolution Strategies
High Dimensions and Heavy Tails for Natural Evolution Strategies Tom Schaul IDSIA, University of Lugano Manno, Switzerland tom@idsia.ch Tobias Glasmachers IDSIA, University of Lugano Manno, Switzerland
More informationBio-inspired Continuous Optimization: The Coming of Age
Bio-inspired Continuous Optimization: The Coming of Age Anne Auger Nikolaus Hansen Nikolas Mauny Raymond Ros Marc Schoenauer TAO Team, INRIA Futurs, FRANCE http://tao.lri.fr First.Last@inria.fr CEC 27,
More informationStochastic Analogues to Deterministic Optimizers
Stochastic Analogues to Deterministic Optimizers ISMP 2018 Bordeaux, France Vivak Patel Presented by: Mihai Anitescu July 6, 2018 1 Apology I apologize for not being here to give this talk myself. I injured
More informationClustering by Mixture Models. General background on clustering Example method: k-means Mixture model based clustering Model estimation
Clustering by Mixture Models General bacground on clustering Example method: -means Mixture model based clustering Model estimation 1 Clustering A basic tool in data mining/pattern recognition: Divide
More informationGradient-based Adaptive Stochastic Search
1 / 41 Gradient-based Adaptive Stochastic Search Enlu Zhou H. Milton Stewart School of Industrial and Systems Engineering Georgia Institute of Technology November 5, 2014 Outline 2 / 41 1 Introduction
More informationIntroduction to Randomized Black-Box Numerical Optimization and CMA-ES
Introduction to Randomized Black-Box Numerical Optimization and CMA-ES July 3, 2017 CEA/EDF/Inria summer school "Numerical Analysis" Université Pierre-et-Marie-Curie, Paris, France Anne Auger, Asma Atamna,
More informationRiemann Manifold Methods in Bayesian Statistics
Ricardo Ehlers ehlers@icmc.usp.br Applied Maths and Stats University of São Paulo, Brazil Working Group in Statistical Learning University College Dublin September 2015 Bayesian inference is based on Bayes
More informationECE521 lecture 4: 19 January Optimization, MLE, regularization
ECE521 lecture 4: 19 January 2017 Optimization, MLE, regularization First four lectures Lectures 1 and 2: Intro to ML Probability review Types of loss functions and algorithms Lecture 3: KNN Convexity
More informationDifferentiation of functions of covariance
Differentiation of log X May 5, 2005 1 Differentiation of functions of covariance matrices or: Why you can forget you ever read this Richard Turner Covariance matrices are symmetric, but we often conveniently
More informationVectors. January 13, 2013
Vectors January 13, 2013 The simplest tensors are scalars, which are the measurable quantities of a theory, left invariant by symmetry transformations. By far the most common non-scalars are the vectors,
More informationStochastic Gradient Descent
Stochastic Gradient Descent Machine Learning CSE546 Carlos Guestrin University of Washington October 9, 2013 1 Logistic Regression Logistic function (or Sigmoid): Learn P(Y X) directly Assume a particular
More informationMachine Learning in Modern Well Testing
Machine Learning in Modern Well Testing Yang Liu and Yinfeng Qin December 11, 29 1 Introduction Well testing is a crucial stage in the decision of setting up new wells on oil field. Decision makers rely
More informationIPAM Summer School Optimization methods for machine learning. Jorge Nocedal
IPAM Summer School 2012 Tutorial on Optimization methods for machine learning Jorge Nocedal Northwestern University Overview 1. We discuss some characteristics of optimization problems arising in deep
More informationEfficient Likelihood-Free Inference
Efficient Likelihood-Free Inference Michael Gutmann http://homepages.inf.ed.ac.uk/mgutmann Institute for Adaptive and Neural Computation School of Informatics, University of Edinburgh 8th November 2017
More informationStochastic Methods for Continuous Optimization
Stochastic Methods for Continuous Optimization Anne Auger et Dimo Brockhoff Paris-Saclay Master - Master 2 Informatique - Parcours Apprentissage, Information et Contenu (AIC) 2016 Slides taken from Auger,
More information1 Bayesian Linear Regression (BLR)
Statistical Techniques in Robotics (STR, S15) Lecture#10 (Wednesday, February 11) Lecturer: Byron Boots Gaussian Properties, Bayesian Linear Regression 1 Bayesian Linear Regression (BLR) In linear regression,
More informationThe K-FAC method for neural network optimization
The K-FAC method for neural network optimization James Martens Thanks to my various collaborators on K-FAC research and engineering: Roger Grosse, Jimmy Ba, Vikram Tankasali, Matthew Johnson, Daniel Duckworth,
More informationStochastic Search using the Natural Gradient
Keywords: stochastic search, natural gradient, evolution strategies Sun Yi Daan Wierstra Tom Schaul Jürgen Schmidhuber IDSIA, Galleria 2, Manno 6928, Switzerland yi@idsiach daan@idsiach tom@idsiach juergen@idsiach
More informationCPSC 540: Machine Learning
CPSC 540: Machine Learning MCMC and Non-Parametric Bayes Mark Schmidt University of British Columbia Winter 2016 Admin I went through project proposals: Some of you got a message on Piazza. No news is
More informationMultilevel Sequential 2 Monte Carlo for Bayesian Inverse Problems
Jonas Latz 1 Multilevel Sequential 2 Monte Carlo for Bayesian Inverse Problems Jonas Latz Technische Universität München Fakultät für Mathematik Lehrstuhl für Numerische Mathematik jonas.latz@tum.de November
More informationStochastic Subgradient Method
Stochastic Subgradient Method Lingjie Weng, Yutian Chen Bren School of Information and Computer Science UC Irvine Subgradient Recall basic inequality for convex differentiable f : f y f x + f x T (y x)
More informationSupport Vector Machines and Bayes Regression
Statistical Techniques in Robotics (16-831, F11) Lecture #14 (Monday ctober 31th) Support Vector Machines and Bayes Regression Lecturer: Drew Bagnell Scribe: Carl Doersch 1 1 Linear SVMs We begin by considering
More informationCondensed Table of Contents for Introduction to Stochastic Search and Optimization: Estimation, Simulation, and Control by J. C.
Condensed Table of Contents for Introduction to Stochastic Search and Optimization: Estimation, Simulation, and Control by J. C. Spall John Wiley and Sons, Inc., 2003 Preface... xiii 1. Stochastic Search
More informationLinear Models in Machine Learning
CS540 Intro to AI Linear Models in Machine Learning Lecturer: Xiaojin Zhu jerryzhu@cs.wisc.edu We briefly go over two linear models frequently used in machine learning: linear regression for, well, regression,
More informationLogistic Regression: Online, Lazy, Kernelized, Sequential, etc.
Logistic Regression: Online, Lazy, Kernelized, Sequential, etc. Harsha Veeramachaneni Thomson Reuter Research and Development April 1, 2010 Harsha Veeramachaneni (TR R&D) Logistic Regression April 1, 2010
More informationProximal Newton Method. Zico Kolter (notes by Ryan Tibshirani) Convex Optimization
Proximal Newton Method Zico Kolter (notes by Ryan Tibshirani) Convex Optimization 10-725 Consider the problem Last time: quasi-newton methods min x f(x) with f convex, twice differentiable, dom(f) = R
More informationNeural Network Training
Neural Network Training Sargur Srihari Topics in Network Training 0. Neural network parameters Probabilistic problem formulation Specifying the activation and error functions for Regression Binary classification
More informationGreedy Layer-Wise Training of Deep Networks
Greedy Layer-Wise Training of Deep Networks Yoshua Bengio, Pascal Lamblin, Dan Popovici, Hugo Larochelle NIPS 2007 Presented by Ahmed Hefny Story so far Deep neural nets are more expressive: Can learn
More informationFALL 2018 MATH 4211/6211 Optimization Homework 4
FALL 2018 MATH 4211/6211 Optimization Homework 4 This homework assignment is open to textbook, reference books, slides, and online resources, excluding any direct solution to the problem (such as solution
More informationExercises * on Linear Algebra
Exercises * on Linear Algebra Laurenz Wiskott Institut für Neuroinformatik Ruhr-Universität Bochum, Germany, EU 4 February 7 Contents Vector spaces 4. Definition...............................................
More informationLecture 1: Supervised Learning
Lecture 1: Supervised Learning Tuo Zhao Schools of ISYE and CSE, Georgia Tech ISYE6740/CSE6740/CS7641: Computational Data Analysis/Machine from Portland, Learning Oregon: pervised learning (Supervised)
More informationLogistic Regression Logistic
Case Study 1: Estimating Click Probabilities L2 Regularization for Logistic Regression Machine Learning/Statistics for Big Data CSE599C1/STAT592, University of Washington Carlos Guestrin January 10 th,
More informationσ(a) = a N (x; 0, 1 2 ) dx. σ(a) = Φ(a) =
Until now we have always worked with likelihoods and prior distributions that were conjugate to each other, allowing the computation of the posterior distribution to be done in closed form. Unfortunately,
More information