Fitness Expectation Maximization

Size: px
Start display at page:

Download "Fitness Expectation Maximization"

Transcription

1 Fitness Expectation Maximization Daan Wierstra 1, Tom Schaul 1, Jan Peters 3 and Jürgen Schmidhuber 1,2 1 IDSIA, Galleria 2, 6298 Manno-Lugano, Switzerland 2 TU Munich, Boltzmannstr. 3, Garching, München, Germany 3 Max Planck Institute for Biological Cybernetics, Tübingen, Germany {daan, tom, juergen}@idsia.ch, mail@jan-peters.net Abstract. We present Fitness Expectation Maximization (FEM), a novel method for performing black box function optimization. FEM searches the fitness landscape of an objective function using an instantiation of the well-known Expectation Maximization algorithm, producing search points to match the sample distribution weighted according to higher expected fitness. FEM updates both candidate solution parameters and the search policy, which is represented as a multinormal distribution. Inheriting EM s stability and strong guarantees, the method is both elegant and competitive with some of the best heuristic search methods in the field, and performs well on a number of unimodal and multimodal benchmark tasks. To illustrate the potential practical applications of the approach, we also show experiments on finding the parameters for a controller of the challenging non-markovian double pole balancing task. 1 Introduction Real-valued black box function optimization is one of the major topics in modern applied machine learning research (e.g. see [1]). It concerns itself with optimizing the continuous parameters of an unknown (black box) objective fitness function, the exact analytical structure of which is assumed to be unknown or unspecified. Specific function measurements can be performed, however. The goal is to find a reasonably high-fitness candidate solution while keeping the number of function measurements limited. The black box optimization framework is crucial for many real-world domains, since often the precise structure of a problem is either not available to the engineer, or too expensive to model or simulate. Now, since exhaustively searching the entire space of solution parameters is considered to be infeasible, and since we do not assume we have access to a precise model of our fitness function, we are forced to settle for trying to find a reasonably good solution that satisfies certain pre-specified constraints. This, inevitably, involves using a sufficiently intelligent heuristic approach, since in practice it is important to find the right domain-specific trade-off on issues such as convergence speed, expected quality of the solutions found and the algorithm s sensitivity to local suboptima on the fitness landscape. A variety of algorithms has been developed within this framework, including methods such as Simulated Annealing [5], Simultaneous Perturbation Stochastic

2 Approximation [6], the Cross-Entropy method [7, 8], and evolutionary methods such as Covariance Matrix Adaption (CMA) [9] and the class of Estimation of Distribution Algorithms (EDAs) [?]. In this paper, we postulate the similarity and actual equivalence of black box function optimization and one-step reinforcement learning. In our attempt to create a viable optimization technique based on reinforcement learning, we fall back onto a classical goal of reinforcement learning (RL), i.e., we search for a way to reduce the reinforcement learning problem to a supervised learning problem. In order to do so, we re-evaluate the recent result in machine learning, that reinforcement learning can be reduced onto reward-weighted regression [10] which is a novel algorithm derived from Dayan & Hinton s [11] expectation maximization (EM) perspective on RL. We show that this approach generalizes from reinforcement learning to fitness maximization to form Fitness Expectation Maximization (FEM), a relatively well-founded instantiation of EDAs bearing great resemblance in particular to the EMNA α algorithm [?]. This algorithm is tested on a set of unimodal and multimodal benchmark functions, and is shown to exhibit excellent performance on both unimodal and multimodal benchmarks. A defining feature of FEM is its adaptive search policy, which takes the form of a multinormal distribution that produces correlated search points in search space. Its covariance matrix makes the algorithm invariant across rotations in the search space, and enables the algorithm to fine-tune its search appropriately, resulting in arbitrarily high-precision solutions. Furthermore, using the stability properties of the EM algorithm, the algorithm seeks to avoid catastrophically greedy updates on the search policy, thus preventing premature convergence in some cases. The paper is organized as follows. The next section provides a quick overview of the general problem framework of real-valued black box function optimization. The ensuing sections describe the derivation of the EM-based algorithm, the concept of fitness shaping, and the online instantiation of our algorithm. The experiments section shows initial results with a number of unimodal and multimodal benchmark problems. Furthermore, results with the non-markovian double pole balancing problem are discussed. The paper concludes with a discussion on the advantages and problems of the method, and points to some possible directions for future extensions. 2 Algorithm Framework First let us introduce the algorithm framework and the corresponding notation. The objective is to optimize the n-dimensional continuous vector of objective parameters x for an unknown fitness function f : R n R. The function is unknown or black box, in that the only information accessible to the algorithm consists of function measurements selected by the algorithm. The goal is to optimize f(x), while keeping the number of function evaluations which are considered costly as low as possible. This is done by successively evaluating batches of a number 1... N of separate search points z 1... z N on the fitness function, while using the

3 information extracted from fitness evaluations f(z 1 )... f(z N ) to adjust both the current candidate solution x and the search policy defined as a Gaussian with mean x and covariance matrix Σ. 3 Expectation Maximization for Black Box Function Optimization At every point in time while running the algorithm, we want to optimize the expected fitness J = E z [f(z)] of the next batch, given the current batch of search samples. We assume that every batch g is generated by search policy π (g) parameterized by θ = x, Σ, representing the current candidate solution x and covariance matrix Σ. In order to adjust parameters θ = x, Σ towards solutions with higher associated fitness, we match the search distribution to the actual sample points, but weighted by their utilities. Now let f(z) be the fitness at a particular search point z, and, utilizing the familiar multivariate normal distribution, let π(z θ) = 1 N (z x, Σ) = exp [ 1 (2π) n/2 Σ 1/2 2 (z x)t Σ 1 (z x) ] denote the probability density of search point z given the current search policy π. The expectation J = E z [f(z)] = π(z θ)f(z)dz. indicates the expected fitness over all possible sample points, weighted by their respective probabilities under policy π. 3.1 Optimizing Utility-transformed Fitness While an objective function such as the above is sufficient in theory, algorithms which simply optimize it have major disadvantages. They might be too aggressive when little experience few sample points is available, and converge prematurely to the best solution they have seen so far. On the opposite extreme, they might prove to be too passive and be biased by less fortunate experiences. Trading off such problems has been a long-standing challenge in reinforcement learning. However, in decision theory, such problems are surprisingly well-understood [12]. In that framework it is common to introduce a so-called utility transformation u (f(z)) which has to fulfill the requirement that it scales monotonically with f, is semi-positive and integrates to a constant. Once a utility transformation is inserted, we obtain an expected utility function given by J u (θ) = p(z θ)u (f(z)) dz. (1) The utility function u (f) is an adjustment for the aggressiveness of the decision making algorithms, e.g., if it is concave, it s attitude is risk-averse while if it is convex, it will be more likely to consider a fitness more than a coincidence.

4 Obviously, it is of essential importance that this risk function is properly set in accordance with the expected fitness landscape, and should be regarded as a metaparameter of the algorithm. Notice the similarity to the selection operator in evolutionary methods. We have empirically found that rank-based shaping functions (rank-based selection) work best for various problems, also because they circumvent the problem of extreme fitness values disproportionately distorting the estimation of the search distribution, making careful adaptation of the forget factor during search unnecessary even for problems with wildly fluctuating fitness. In this paper, we will consider a simple rank-based utility transformation function, the piecewise linear u k = u(f(z k ) f(z k 1 ),..., f(z k N )) which first ranks all samples k N,..., k based on fitness value, then assigns zero to the N m worst ones and assigns values linearly from to the m best samples. 3.2 Fitness Expectation Maximization Analogously as in [10, 11], we can establish the lower bound p(z θ)u (f(z)) log J u (θ) = log q(z) dz (2) q(z) p(z θ)u (f(z)) q(z) log dz (3) q(z) = q(z) [log p(z θ) + log u (f(z)) log q(z)] dz (4) := F (q, θ), (5) due to Jensen s inequality with the additional constraint 0 = q(z)dz 1. This points us to the following EM algorithm: Proposition 1. An Expectation Maximization algorithm for both optimizing expected utility and the raw expected fitness is given by E-Step: q g+1 (z) = M-Step Policy: θ g+1 = arg max θ p(z θ)u (f(z)), (6) p( z θ)u (f( z)) d z q g+1 (z) log p(z θ)dz. (7) Proof. The E-Step is given by q = argmax q F (q, θ) while fulfilling the constraint 0 = q(z)dz 1. Thus, we have a Lagrangian L (λ, q) = F (q, θ) λ. When differentiating L (λ, q) with respect to q and setting the derivative to zero, we obtain q (z) = p(z θ)u(f(z)) exp (λ 1). We insert this back into the Lagrangian obtaining the dual function L (λ, q ) = q (z)dz λ. Thus, by setting dl (λ, q ) /dλ = 0, we obtain λ = 1 log p(z θ)u (f(z)) dz, and solving for q implies Eq (6). The M-steps compute θ g+1 = argmax θ F (q g+1, θ). In practice, when using a Gaussian search distribution parameterized by θ (k) = x, Σ, the EM process comes down to simply fitting the samples in every batch to the Gaussian, weighted by the utilities.

5 4 Online Fitness Expectation Maximization In order to speed up convergence, the algorithm can be executed online, that is, sample by sample, instead of batch by batch. The online version of the algorithm can yield superior performance since updates to the policy can be made at every sample instead of just once per batch, and because doing so tends to preserve sample diversity more than by using the batch version of the algorithm. Crucial is that a forget factor α is now introduced to modulate the speed at which the search policy adapts to the current sample. Batch size N is now only used for utility ranking function u which ranks the current sample among the N last seen samples. The resulting FEM algorithm pseudocode can be found in Algorithm 1. Algorithm 1 Fitness Expectation Maximization use shaping function u, batch size N, forget factor α k 1 initialize search parameters θ (k) = x, Σ repeat draw sample z k π(x, Σ) evaluate fitness f(z k ) compute rank-based fitness shaping u k = u(f(z k ) f(z k 1 ),..., f(z k N )) x (1 αu k )x + αu k x Σ (1 αu k )Σ + αu k (x z k ) (x z k ) T k k + 1 until stopping criterion is met 5 Experiments 5.1 Standard benchmark functions To test the performance of the algorithm, we chose 6 unimodal functions (Sphere, Schwefel, Tablet, Cigar, Different-Powers, Ellipsoid) and 4 multimodal functions (Ackley, Rastrigin, Weierstrass and Griewank) from the set of benchmark functions from [13] and [9] that are typically used in the literature, for comparison purposes and for competitions. As those functions are designed to be minimized, we take the fitness to be the negative function value. Good test functions should be easy to interpret, but scale up with n. They must be highly nonlinear, nonseparable, largely resistant to hill-climbing, and preferably contain deceptive local suboptima. The multimodal functions were tested with both FEM and the Covariance Matrix Adaptation (CMA) algorithm widely regarded as one of the premier algorithms in this field for comparison purposes. In order to prevent potentially biased results, and to avoid trivial optima (e.g. at the origin), we follow [13] and consistently transform (by a combined rotation and translation) the functions inputs in order to make the variables

6 non-separable. This immediately renders many direct search method virtually useless, since they cannot cope with correlated search directions, unlike FEM and CMA. The tunable parameters of the FEM algorithm are comprised of batch size N, the fitness shaping function u applied on the fitness function f and forget factor α. The parameters should be chosen by the expert to fit the expected ruggedness of the fitness landscape. The forget factor must be low enough such that it does not too quickly forget earlier successful search points. The shaping function must be chosen such that enough randomness is preserved in the search policy after every update, which entails including the lesser samples in utility attribution. For all experiments, comprising both the benchmark unimodal/multimodal functions and the non-markovian double pole balancing task, initial Σ was set to the identity matrix Σ = I and x was always randomly initialized as x N (0, I) Cigar DiffPow Elli Schwefel Sphere Tablet Cigar DiffPow Elli Schwefel Sphere Tablet fitness 10-2 fitness number of evaluations number of evaluations Fig. 1. Results for experiments on the unimodal benchmark functions. Left: dimensionality 5, right: dimensionality 15. We ran FEM on the set of unimodal benchmark functions with dimensions 5 and 15 using a target precision of Figure 1 shows the average number of evaluations until success over 20 runs on the unimodal functions. The parameter settings for dimensionality 5 were identical in all runs: α = 0.1 and N = 50, m for selecting the shaping function s top m samples, was set at m = 5. The parameter settings for all runs in dimensionality 15 were: α = 0.02, N = 25 and m = 10. All runs converged. The number of evaluations wa roughly equal to that of CMA on the small dimnsionality, and for most problems not more than a factor 3 slower, even with dimensionality 15 [9]. On the multimodal benchmark functions we performed experiments while varying the distance of the initial guess to the optimum between 1 and 100. As with the unimodal functions, the problems were appropriately translated

7 and rotated, while the initial x was randomly initialized on the surface of the hypersphere with radius 1, 10 or 100 and the optimum at its center. Those runs were performed on dimension 2 with a target precision of 0.01, since here the focus was on avoiding local maxima. Table 1 shows, for all multimodal functions, the percentage of runs where FEM found the global optimum (as opposed to it getting stuck in a local suboptimum) conditioned on the distance from the initial guess to the optimum. The percentages are computed over 100 runs. For comparison purposes we included the results for the CMA implementation of [9], although it must be said that in all likelihood better results can be achieved for CMA using larger population sizes. Table 1. Results for the multimodal benchmark functions. Shown are percentages of runs that found the global optimum, for both FEM and CMA, for varying starting distances. Function FEM CMA Distance Rastrigin 91% 87% 64% 13% 11% 14% Ackley 100% 100% 0% 89% 70% 3% Weierstrass 19% 9% 19% 90% 92% 92% Griewank 100% 2% 0% 100% 2% 0% One additional, linear benchmark function f(z) = j z j was tested to verify the expected premature convergence of the algorithm. Indeed, FEM prematurely converges like EDAs typically do [?], while CMA performed well. This suggests the approach might not be applicable to all domains and might benefit from a mutative approach modeling mutations instead of weighted sample distributions. Lastly, we performed experiments using a batch-based version of the algorithm instead of the online version. We found the standard benchmark problems could only be solved using large batch sizes (1000 and up), slowing down the algorithm considerably. This might be due to the reduced sample diversity using small batch sizes, which is ameliorated using an online update rule which only gradually adjusts Σ values. To summarize, our experiments on these standard black box optimization benchmarks indicate that FEM is competitive with other high-performance algorithms in black box optimization on the selected high-precision unimodal test functions, and even performs well on some multimodal benchmarks. 5.2 Non-markovian Double Pole Balancing Non-Markovian double pole balancing [14] can be considered a difficult benchmark task for control optimization. We use the implementation as found in [15]. The FEM algorithm optimizes the parameters of the controller, which is implemented as a simple neural network with three inputs, three hidden sigmoid

8 neurons, and one output neuron, resulting in a total of 21 weights to be optimized. Method SANE ESP NEAT CMA CoSyNE FEM Evaluations 262, 700 7, 374 6, 929 3, 521 1, 249 2, 099 Table 2. Results for non-markovian double pole balancing. The table shows the average number of evaluations for SANE [16], ESP [15], NEAT [17], CMA [9], CoSyNE [18] and FEM. The algorithm s parameters were set as follows: piecewise linear shaping function with m = 5 top 5 selection, forget factor α = 0.05 and batch size N = 50. A run was considered a success when the poles did not fall over for 100, 000 time steps. The results on a total of 200 runs are, on average, 2099 evaluations until success (standard deviation: 1505). Not included in these statistics are 49/200 runs that did not reach success within the limit of evaluations, which compares badly with both CoSyNE and CMA which (almost) always converge. Table 2 shows results of other premier algorithms applied to this task, including CMA. All methods optimized the same type of recurrent neural network, albeit with differing numbers of hidden neurons. FEM, when it converges, outperforms all other methods except CoSyNE. Since our algorithm performs well on this relatively hard control benchmark, we expect the algorithm to do well on future real-world experiments. 6 Discussion Fitness Expectation Maximization constitutes a simple, principled approach to real-valued black box function optimization with a rather clean derivation from first principles. Its theoretical relationship to the field of reinforcement learning and in particular reward-weighted regression should be clear to any reader familiar with both fields. We anticipate that rephrasing the black box optimization problem as a reinforcement learning problem solvable by RL methods will spawn a whole series of additional new algorithms exploiting this connection. The experiments show that, on the unimodal and multimodal benchmarks, FEM is competitive with respect to its the main competitor algorithm CMA [9], at least on lower dimensional problems. Taking into account the good results on the pole balancing tasks, we envision that FEM might become a serious competitor in the field of black box function optimization, especially for neuroevolution. Future work on FEM will include a systematic study that must determine whether it can be made to outperform other search methods consistently on other typical benchmarks and real-world tasks. It remains to be seen how the method scales up with increased dimensionality, especially compared to CMA. We suggest extending the algorithm from a single multinormal distribution as

9 search policy representation to a mixture of Gaussians (which is a common procedure for vanilla EM), thus further reducing its sensitivity to local suboptima. Other pressing work includes a theoretical analysis of the shaping/selection function, which should ideally be made to adapt automatically based on the data instead of tuned manually. Worrisome is the premature convergence on the linear test function. Future work will determine whether this affects the practical applicability to real-world problems such as neurocontrol. Alternatively, we must investigate whether the introduction of a mutative approach like CMA might be beneficial. 7 Conclusion We introduced Fitness Expectation Maximization to tackle the important class of real-valued black box function optimization problems. Reframing black box optimization as a one-step reinforcement learning problem, we developed a method similar in spirit to expectation maximization. Using a search policy which matches samples weighted by their utilities, the algorithm performs competitively on a standard benchmark set of unimodal and multimodal functions and non- Markovian double pole balancing control. Acknowledgments This research was funded by SNF grants /1 and /1. References 1. Spall, J., Hill, S., Stark, D.: Theoretical framework for comparing several stochastic optimization approaches. Probabilistic and Randomized Methods for Design under Uncertainty (2006) Klockgether, J., Schwefel, H.P.: Two-phase nozzle and hollow core jet experiments. In: Proc. 11th Symp. Engineering Aspects of Magnetohydrodynamics. (1970) Kohl, N., Stone, P.: Policy gradient reinforcement learning for fast quadrupedal locomotion. In: Proceedings of the IEEE International Conference on Robotics and Automation. (May 2004) 4. Gomez, F.J., Miikkulainen, R.: Solving non-markovian control tasks with neuroevolution. In: Proc. IJCAI 99, Denver, CO, Morgan Kaufman (1999) 5. Kirkpatrick, S., Gelatt, C.D., Vecchi, M.P.: Optimization by simulated annealing. Science, Number 4598, 13 May , 4598 (1983) Spall, J.C.: Stochastic optimization and the simultaneous perturbation method. In: WSC 99: Proceedings of the 31st conference on Winter simulation, New York, NY, USA, ACM (1999) Rubinstein, R.Y., Kroese, D.P.: The Cross-Entropy Method: A Unified Approach to Monte Carlo Simulation, Randomized Optimization and Machine Learning. Springer-Verlag (2004)

10 8. De Boer, P., Kroese, D., Mannor, S., Rubinstein, R.: A tutorial on the cross-entropy method. In: Annals of Operations Research. Volume 134. (2004) Hansen, N., Ostermeier, A.: Completely derandomized self-adaptation in evolution strategies. Evolutionary Computation 9(2) (2001) Peters, J., Schaal, S.: Reinforcement learning by reward-weighted regression for operational space control. In: Proceedings of the International Conference on Machine Learning (ICML). (2007) 11. Dayan, P., Hinton, G.E.: Using expectation-maximization for reinforcement learning. Neural Computation 9(2) (1997) Chernoff, H., Moses, L.E.: Elementary Decision Theory. Dover Publications (1987) 13. Suganthan, P.N., Hansen, N., Liang, J.J., Deb, K., Chen, Y.P., Auger, A., Tiwari, S.: Problem definitions and evaluation criteria for the cec 2005 special session on real-parameter optimization. Technical report, Nanyang Technological University, Singapore (2005) 14. Wieland, A.: Evolving neural network controllers for unstable systems. In: Proceedings of the International Joint Conference on Neural Networks (Seattle, WA), Piscataway, NJ: IEEE (1991) Gomez, F.J., Miikkulainen, R.: Incremental evolution of complex general behavior. Adaptive Behavior 5 (1997) Moriarty, D.E., Miikkulainen, R.: Efficient reinforcement learning through symbiotic evolution. Machine Learning 22 (1996) Stanley, K.O., Miikkulainen, R.: Evolving neural networks through augmenting topologies. Evolutionary Computation 10 (2002) Faustino Gomez, J.S., Miikkulainen, R.: Efficient non-linear control through neuroevolution. In: Proceedings of the 16th European Conference on Machine Learning (ECML 2006). (2006)

Natural Evolution Strategies

Natural Evolution Strategies Natural Evolution Strategies Daan Wierstra, Tom Schaul, Jan Peters and Juergen Schmidhuber Abstract This paper presents Natural Evolution Strategies (NES), a novel algorithm for performing real-valued

More information

Stochastic Search using the Natural Gradient

Stochastic Search using the Natural Gradient Keywords: stochastic search, natural gradient, evolution strategies Sun Yi Daan Wierstra Tom Schaul Jürgen Schmidhuber IDSIA, Galleria 2, Manno 6928, Switzerland yi@idsiach daan@idsiach tom@idsiach juergen@idsiach

More information

Efficient Natural Evolution Strategies

Efficient Natural Evolution Strategies Efficient Natural Evolution Strategies Evolution Strategies and Evolutionary Programming Track ABSTRACT Yi Sun Manno 698, Switzerland yi@idsiach Tom Schaul Manno 698, Switzerland tom@idsiach Efficient

More information

A Restart CMA Evolution Strategy With Increasing Population Size

A Restart CMA Evolution Strategy With Increasing Population Size Anne Auger and Nikolaus Hansen A Restart CMA Evolution Strategy ith Increasing Population Size Proceedings of the IEEE Congress on Evolutionary Computation, CEC 2005 c IEEE A Restart CMA Evolution Strategy

More information

Neuroevolution for Reinforcement Learning Using Evolution Strategies

Neuroevolution for Reinforcement Learning Using Evolution Strategies Neuroevolution for Reinforcement Learning Using Evolution Strategies Christian Igel Institut für Neuroinformatik Ruhr-Universität Bochum 44780 Bochum, Germany christian.igel@neuroinformatik.rub.de Abstract-

More information

High Dimensions and Heavy Tails for Natural Evolution Strategies

High Dimensions and Heavy Tails for Natural Evolution Strategies High Dimensions and Heavy Tails for Natural Evolution Strategies Tom Schaul IDSIA, University of Lugano Manno, Switzerland tom@idsia.ch Tobias Glasmachers IDSIA, University of Lugano Manno, Switzerland

More information

Learning Tetris. 1 Tetris. February 3, 2009

Learning Tetris. 1 Tetris. February 3, 2009 Learning Tetris Matt Zucker Andrew Maas February 3, 2009 1 Tetris The Tetris game has been used as a benchmark for Machine Learning tasks because its large state space (over 2 200 cell configurations are

More information

Block Diagonal Natural Evolution Strategies

Block Diagonal Natural Evolution Strategies Block Diagonal Natural Evolution Strategies Giuseppe Cuccu and Faustino Gomez IDSIA USI-SUPSI 6928 Manno-Lugano, Switzerland {giuse,tino}@idsia.ch http://www.idsia.ch/{~giuse,~tino} Abstract. The Natural

More information

arxiv: v1 [cs.ne] 29 Jul 2014

arxiv: v1 [cs.ne] 29 Jul 2014 A CUDA-Based Real Parameter Optimization Benchmark Ke Ding and Ying Tan School of Electronics Engineering and Computer Science, Peking University arxiv:1407.7737v1 [cs.ne] 29 Jul 2014 Abstract. Benchmarking

More information

arxiv: v1 [cs.lg] 13 Dec 2013

arxiv: v1 [cs.lg] 13 Dec 2013 Efficient Baseline-free Sampling in Parameter Exploring Policy Gradients: Super Symmetric PGPE Frank Sehnke Zentrum für Sonnenenergie- und Wasserstoff-Forschung, Industriestr. 6, Stuttgart, BW 70565 Germany

More information

The Story So Far... The central problem of this course: Smartness( X ) arg max X. Possibly with some constraints on X.

The Story So Far... The central problem of this course: Smartness( X ) arg max X. Possibly with some constraints on X. Heuristic Search The Story So Far... The central problem of this course: arg max X Smartness( X ) Possibly with some constraints on X. (Alternatively: arg min Stupidness(X ) ) X Properties of Smartness(X)

More information

Tutorial CMA-ES Evolution Strategies and Covariance Matrix Adaptation

Tutorial CMA-ES Evolution Strategies and Covariance Matrix Adaptation Tutorial CMA-ES Evolution Strategies and Covariance Matrix Adaptation Anne Auger & Nikolaus Hansen INRIA Research Centre Saclay Île-de-France Project team TAO University Paris-Sud, LRI (UMR 8623), Bat.

More information

arxiv: v2 [cs.ai] 13 Jun 2011

arxiv: v2 [cs.ai] 13 Jun 2011 A Linear Time Natural Evolution Strategy for Non-Separable Functions Yi Sun, Faustino Gomez, Tom Schaul, and Jürgen Schmidhuber IDSIA, University of Lugano & SUPSI, Galleria, Manno, CH-698, Switzerland

More information

A.I.: Beyond Classical Search

A.I.: Beyond Classical Search A.I.: Beyond Classical Search Random Sampling Trivial Algorithms Generate a state randomly Random Walk Randomly pick a neighbor of the current state Both algorithms asymptotically complete. Overview Previously

More information

Introduction to Reinforcement Learning. CMPT 882 Mar. 18

Introduction to Reinforcement Learning. CMPT 882 Mar. 18 Introduction to Reinforcement Learning CMPT 882 Mar. 18 Outline for the week Basic ideas in RL Value functions and value iteration Policy evaluation and policy improvement Model-free RL Monte-Carlo and

More information

Introduction to Black-Box Optimization in Continuous Search Spaces. Definitions, Examples, Difficulties

Introduction to Black-Box Optimization in Continuous Search Spaces. Definitions, Examples, Difficulties 1 Introduction to Black-Box Optimization in Continuous Search Spaces Definitions, Examples, Difficulties Tutorial: Evolution Strategies and CMA-ES (Covariance Matrix Adaptation) Anne Auger & Nikolaus Hansen

More information

Problem Statement Continuous Domain Search/Optimization. Tutorial Evolution Strategies and Related Estimation of Distribution Algorithms.

Problem Statement Continuous Domain Search/Optimization. Tutorial Evolution Strategies and Related Estimation of Distribution Algorithms. Tutorial Evolution Strategies and Related Estimation of Distribution Algorithms Anne Auger & Nikolaus Hansen INRIA Saclay - Ile-de-France, project team TAO Universite Paris-Sud, LRI, Bat. 49 945 ORSAY

More information

Using Gaussian Processes for Variance Reduction in Policy Gradient Algorithms *

Using Gaussian Processes for Variance Reduction in Policy Gradient Algorithms * Proceedings of the 8 th International Conference on Applied Informatics Eger, Hungary, January 27 30, 2010. Vol. 1. pp. 87 94. Using Gaussian Processes for Variance Reduction in Policy Gradient Algorithms

More information

Multi-Attribute Bayesian Optimization under Utility Uncertainty

Multi-Attribute Bayesian Optimization under Utility Uncertainty Multi-Attribute Bayesian Optimization under Utility Uncertainty Raul Astudillo Cornell University Ithaca, NY 14853 ra598@cornell.edu Peter I. Frazier Cornell University Ithaca, NY 14853 pf98@cornell.edu

More information

Decomposition and Metaoptimization of Mutation Operator in Differential Evolution

Decomposition and Metaoptimization of Mutation Operator in Differential Evolution Decomposition and Metaoptimization of Mutation Operator in Differential Evolution Karol Opara 1 and Jaros law Arabas 2 1 Systems Research Institute, Polish Academy of Sciences 2 Institute of Electronic

More information

Policy Gradient Critics

Policy Gradient Critics Policy Gradient Critics Daan Wierstra 1 and Jürgen Schmidhuber 1,2 1 Istituto Dalle Molle di Studi sull Intelligenza Artificiale (IDSIA), CH-6928 Manno-Lugano, Switzerland, daan@idsia.ch 2 Department of

More information

Learning Gaussian Process Models from Uncertain Data

Learning Gaussian Process Models from Uncertain Data Learning Gaussian Process Models from Uncertain Data Patrick Dallaire, Camille Besse, and Brahim Chaib-draa DAMAS Laboratory, Computer Science & Software Engineering Department, Laval University, Canada

More information

Machine Learning I Continuous Reinforcement Learning

Machine Learning I Continuous Reinforcement Learning Machine Learning I Continuous Reinforcement Learning Thomas Rückstieß Technische Universität München January 7/8, 2010 RL Problem Statement (reminder) state s t+1 ENVIRONMENT reward r t+1 new step r t

More information

Model-Based Reinforcement Learning with Continuous States and Actions

Model-Based Reinforcement Learning with Continuous States and Actions Marc P. Deisenroth, Carl E. Rasmussen, and Jan Peters: Model-Based Reinforcement Learning with Continuous States and Actions in Proceedings of the 16th European Symposium on Artificial Neural Networks

More information

Parameter-exploring Policy Gradients

Parameter-exploring Policy Gradients Parameter-exploring Policy Gradients Frank Sehnke a, Christian Osendorfer a, Thomas Rückstieß a, Alex Graves a, Jan Peters c, Jürgen Schmidhuber a,b a Faculty of Computer Science, Technische Universität

More information

Benchmarking Natural Evolution Strategies with Adaptation Sampling on the Noiseless and Noisy Black-box Optimization Testbeds

Benchmarking Natural Evolution Strategies with Adaptation Sampling on the Noiseless and Noisy Black-box Optimization Testbeds Benchmarking Natural Evolution Strategies with Adaptation Sampling on the Noiseless and Noisy Black-box Optimization Testbeds Tom Schaul Courant Institute of Mathematical Sciences, New York University

More information

Recurrent Latent Variable Networks for Session-Based Recommendation

Recurrent Latent Variable Networks for Session-Based Recommendation Recurrent Latent Variable Networks for Session-Based Recommendation Panayiotis Christodoulou Cyprus University of Technology paa.christodoulou@edu.cut.ac.cy 27/8/2017 Panayiotis Christodoulou (C.U.T.)

More information

Toward Effective Initialization for Large-Scale Search Spaces

Toward Effective Initialization for Large-Scale Search Spaces Toward Effective Initialization for Large-Scale Search Spaces Shahryar Rahnamayan University of Ontario Institute of Technology (UOIT) Faculty of Engineering and Applied Science 000 Simcoe Street North

More information

Tuning Parameters across Mixed Dimensional Instances: A Performance Scalability Study of Sep-G-CMA-ES

Tuning Parameters across Mixed Dimensional Instances: A Performance Scalability Study of Sep-G-CMA-ES Université Libre de Bruxelles Institut de Recherches Interdisciplinaires et de Développements en Intelligence Artificielle Tuning Parameters across Mixed Dimensional Instances: A Performance Scalability

More information

Using Expectation-Maximization for Reinforcement Learning

Using Expectation-Maximization for Reinforcement Learning NOTE Communicated by Andrew Barto and Michael Jordan Using Expectation-Maximization for Reinforcement Learning Peter Dayan Department of Brain and Cognitive Sciences, Center for Biological and Computational

More information

How information theory sheds new light on black-box optimization

How information theory sheds new light on black-box optimization How information theory sheds new light on black-box optimization Anne Auger Colloque Théorie de l information : nouvelles frontières dans le cadre du centenaire de Claude Shannon IHP, 26, 27 and 28 October,

More information

Beta Damping Quantum Behaved Particle Swarm Optimization

Beta Damping Quantum Behaved Particle Swarm Optimization Beta Damping Quantum Behaved Particle Swarm Optimization Tarek M. Elbarbary, Hesham A. Hefny, Atef abel Moneim Institute of Statistical Studies and Research, Cairo University, Giza, Egypt tareqbarbary@yahoo.com,

More information

PILCO: A Model-Based and Data-Efficient Approach to Policy Search

PILCO: A Model-Based and Data-Efficient Approach to Policy Search PILCO: A Model-Based and Data-Efficient Approach to Policy Search (M.P. Deisenroth and C.E. Rasmussen) CSC2541 November 4, 2016 PILCO Graphical Model PILCO Probabilistic Inference for Learning COntrol

More information

Particle Swarm Optimization with Velocity Adaptation

Particle Swarm Optimization with Velocity Adaptation In Proceedings of the International Conference on Adaptive and Intelligent Systems (ICAIS 2009), pp. 146 151, 2009. c 2009 IEEE Particle Swarm Optimization with Velocity Adaptation Sabine Helwig, Frank

More information

Learning Control Under Uncertainty: A Probabilistic Value-Iteration Approach

Learning Control Under Uncertainty: A Probabilistic Value-Iteration Approach Learning Control Under Uncertainty: A Probabilistic Value-Iteration Approach B. Bischoff 1, D. Nguyen-Tuong 1,H.Markert 1 anda.knoll 2 1- Robert Bosch GmbH - Corporate Research Robert-Bosch-Str. 2, 71701

More information

Natural Gradient for the Gaussian Distribution via Least Squares Regression

Natural Gradient for the Gaussian Distribution via Least Squares Regression Natural Gradient for the Gaussian Distribution via Least Squares Regression Luigi Malagò Department of Electrical and Electronic Engineering Shinshu University & INRIA Saclay Île-de-France 4-17-1 Wakasato,

More information

Journal of Algorithms Cognition, Informatics and Logic. Neuroevolution strategies for episodic reinforcement learning

Journal of Algorithms Cognition, Informatics and Logic. Neuroevolution strategies for episodic reinforcement learning J. Algorithms 64 (2009) 152 168 Contents lists available at ScienceDirect Journal of Algorithms Cognition, Informatics and Logic www.elsevier.com/locate/jalgor Neuroevolution strategies for episodic reinforcement

More information

Gaussian EDA and Truncation Selection: Setting Limits for Sustainable Progress

Gaussian EDA and Truncation Selection: Setting Limits for Sustainable Progress Gaussian EDA and Truncation Selection: Setting Limits for Sustainable Progress Petr Pošík Czech Technical University, Faculty of Electrical Engineering, Department of Cybernetics Technická, 66 7 Prague

More information

Center-based initialization for large-scale blackbox

Center-based initialization for large-scale blackbox See discussions, stats, and author profiles for this publication at: http://www.researchgate.net/publication/903587 Center-based initialization for large-scale blackbox problems ARTICLE FEBRUARY 009 READS

More information

EM-based Reinforcement Learning

EM-based Reinforcement Learning EM-based Reinforcement Learning Gerhard Neumann 1 1 TU Darmstadt, Intelligent Autonomous Systems December 21, 2011 Outline Expectation Maximization (EM)-based Reinforcement Learning Recap : Modelling data

More information

NEUROEVOLUTION. Contents. Evolutionary Computation. Neuroevolution. Types of neuro-evolution algorithms

NEUROEVOLUTION. Contents. Evolutionary Computation. Neuroevolution. Types of neuro-evolution algorithms Contents Evolutionary Computation overview NEUROEVOLUTION Presenter: Vlad Chiriacescu Neuroevolution overview Issues in standard Evolutionary Computation NEAT method Complexification in competitive coevolution

More information

Self-Organization by Optimizing Free-Energy

Self-Organization by Optimizing Free-Energy Self-Organization by Optimizing Free-Energy J.J. Verbeek, N. Vlassis, B.J.A. Kröse University of Amsterdam, Informatics Institute Kruislaan 403, 1098 SJ Amsterdam, The Netherlands Abstract. We present

More information

Pattern Recognition and Machine Learning. Bishop Chapter 11: Sampling Methods

Pattern Recognition and Machine Learning. Bishop Chapter 11: Sampling Methods Pattern Recognition and Machine Learning Chapter 11: Sampling Methods Elise Arnaud Jakob Verbeek May 22, 2008 Outline of the chapter 11.1 Basic Sampling Algorithms 11.2 Markov Chain Monte Carlo 11.3 Gibbs

More information

A Mixed Strategy for Evolutionary Programming Based on Local Fitness Landscape

A Mixed Strategy for Evolutionary Programming Based on Local Fitness Landscape WCCI 200 IEEE World Congress on Computational Intelligence July, 8-23, 200 - CCIB, Barcelona, Spain CEC IEEE A Mixed Strategy for Evolutionary Programming Based on Local Fitness Landscape Liang Shen and

More information

Evolution Strategies and Covariance Matrix Adaptation

Evolution Strategies and Covariance Matrix Adaptation Evolution Strategies and Covariance Matrix Adaptation Cours Contrôle Avancé - Ecole Centrale Paris Anne Auger January 2014 INRIA Research Centre Saclay Île-de-France University Paris-Sud, LRI (UMR 8623),

More information

Parameter Sensitivity Analysis of Social Spider Algorithm

Parameter Sensitivity Analysis of Social Spider Algorithm Parameter Sensitivity Analysis of Social Spider Algorithm James J.Q. Yu, Student Member, IEEE and Victor O.K. Li, Fellow, IEEE Department of Electrical and Electronic Engineering The University of Hong

More information

Investigation of Mutation Strategies in Differential Evolution for Solving Global Optimization Problems

Investigation of Mutation Strategies in Differential Evolution for Solving Global Optimization Problems Investigation of Mutation Strategies in Differential Evolution for Solving Global Optimization Problems Miguel Leon Ortiz and Ning Xiong Mälardalen University, Västerås, SWEDEN Abstract. Differential evolution

More information

Variable Metric Reinforcement Learning Methods Applied to the Noisy Mountain Car Problem

Variable Metric Reinforcement Learning Methods Applied to the Noisy Mountain Car Problem Variable Metric Reinforcement Learning Methods Applied to the Noisy Mountain Car Problem Verena Heidrich-Meisner and Christian Igel Institut für Neuroinformatik, Ruhr-Universität Bochum, Germany {Verena.Heidrich-Meisner,Christian.Igel}@neuroinformatik.rub.de

More information

The particle swarm optimization algorithm: convergence analysis and parameter selection

The particle swarm optimization algorithm: convergence analysis and parameter selection Information Processing Letters 85 (2003) 317 325 www.elsevier.com/locate/ipl The particle swarm optimization algorithm: convergence analysis and parameter selection Ioan Cristian Trelea INA P-G, UMR Génie

More information

The Expectation-Maximization Algorithm

The Expectation-Maximization Algorithm 1/29 EM & Latent Variable Models Gaussian Mixture Models EM Theory The Expectation-Maximization Algorithm Mihaela van der Schaar Department of Engineering Science University of Oxford MLE for Latent Variable

More information

Evolving Neural Networks in Compressed Weight Space

Evolving Neural Networks in Compressed Weight Space Evolving Neural Networks in Compressed Weight Space Jan Koutník IDSIA University of Lugano Manno-Lugano, CH hkou@idsia.ch Faustino Gomez IDSIA University of Lugano Manno-Lugano, CH tino@idsia.ch Jürgen

More information

Comparison of Modern Stochastic Optimization Algorithms

Comparison of Modern Stochastic Optimization Algorithms Comparison of Modern Stochastic Optimization Algorithms George Papamakarios December 214 Abstract Gradient-based optimization methods are popular in machine learning applications. In large-scale problems,

More information

I D I A P. Online Policy Adaptation for Ensemble Classifiers R E S E A R C H R E P O R T. Samy Bengio b. Christos Dimitrakakis a IDIAP RR 03-69

I D I A P. Online Policy Adaptation for Ensemble Classifiers R E S E A R C H R E P O R T. Samy Bengio b. Christos Dimitrakakis a IDIAP RR 03-69 R E S E A R C H R E P O R T Online Policy Adaptation for Ensemble Classifiers Christos Dimitrakakis a IDIAP RR 03-69 Samy Bengio b I D I A P December 2003 D a l l e M o l l e I n s t i t u t e for Perceptual

More information

Natural Evolution Strategies for Direct Search

Natural Evolution Strategies for Direct Search Tobias Glasmachers Natural Evolution Strategies for Direct Search 1 Natural Evolution Strategies for Direct Search PGMO-COPI 2014 Recent Advances on Continuous Randomized black-box optimization Thursday

More information

CMA-ES a Stochastic Second-Order Method for Function-Value Free Numerical Optimization

CMA-ES a Stochastic Second-Order Method for Function-Value Free Numerical Optimization CMA-ES a Stochastic Second-Order Method for Function-Value Free Numerical Optimization Nikolaus Hansen INRIA, Research Centre Saclay Machine Learning and Optimization Team, TAO Univ. Paris-Sud, LRI MSRC

More information

Introduction to Optimization

Introduction to Optimization Introduction to Optimization Blackbox Optimization Marc Toussaint U Stuttgart Blackbox Optimization The term is not really well defined I use it to express that only f(x) can be evaluated f(x) or 2 f(x)

More information

2 Differential Evolution and its Control Parameters

2 Differential Evolution and its Control Parameters COMPETITIVE DIFFERENTIAL EVOLUTION AND GENETIC ALGORITHM IN GA-DS TOOLBOX J. Tvrdík University of Ostrava 1 Introduction The global optimization problem with box constrains is formed as follows: for a

More information

Jeff Howbert Introduction to Machine Learning Winter

Jeff Howbert Introduction to Machine Learning Winter Classification / Regression Support Vector Machines Jeff Howbert Introduction to Machine Learning Winter 2012 1 Topics SVM classifiers for linearly separable classes SVM classifiers for non-linearly separable

More information

Machine Learning and Bayesian Inference. Unsupervised learning. Can we find regularity in data without the aid of labels?

Machine Learning and Bayesian Inference. Unsupervised learning. Can we find regularity in data without the aid of labels? Machine Learning and Bayesian Inference Dr Sean Holden Computer Laboratory, Room FC6 Telephone extension 6372 Email: sbh11@cl.cam.ac.uk www.cl.cam.ac.uk/ sbh11/ Unsupervised learning Can we find regularity

More information

CSC321 Lecture 9: Generalization

CSC321 Lecture 9: Generalization CSC321 Lecture 9: Generalization Roger Grosse Roger Grosse CSC321 Lecture 9: Generalization 1 / 27 Overview We ve focused so far on how to optimize neural nets how to get them to make good predictions

More information

Lecture 8: Policy Gradient I 2

Lecture 8: Policy Gradient I 2 Lecture 8: Policy Gradient I 2 Emma Brunskill CS234 Reinforcement Learning. Winter 2018 Additional reading: Sutton and Barto 2018 Chp. 13 2 With many slides from or derived from David Silver and John Schulman

More information

Evolutionary Programming Using a Mixed Strategy Adapting to Local Fitness Landscape

Evolutionary Programming Using a Mixed Strategy Adapting to Local Fitness Landscape Evolutionary Programming Using a Mixed Strategy Adapting to Local Fitness Landscape Liang Shen Department of Computer Science Aberystwyth University Ceredigion, SY23 3DB UK lls08@aber.ac.uk Jun He Department

More information

Advanced Optimization

Advanced Optimization Advanced Optimization Lecture 3: 1: Randomized Algorithms for for Continuous Discrete Problems Problems November 22, 2016 Master AIC Université Paris-Saclay, Orsay, France Anne Auger INRIA Saclay Ile-de-France

More information

Reading Group on Deep Learning Session 1

Reading Group on Deep Learning Session 1 Reading Group on Deep Learning Session 1 Stephane Lathuiliere & Pablo Mesejo 2 June 2016 1/31 Contents Introduction to Artificial Neural Networks to understand, and to be able to efficiently use, the popular

More information

On the Pathological Behavior of Adaptive Differential Evolution on Hybrid Objective Functions

On the Pathological Behavior of Adaptive Differential Evolution on Hybrid Objective Functions On the Pathological Behavior of Adaptive Differential Evolution on Hybrid Objective Functions ABSTRACT Ryoji Tanabe Graduate School of Arts and Sciences The University of Tokyo Tokyo, Japan rt.ryoji.tanabe@gmail.com

More information

Lecture 16 Deep Neural Generative Models

Lecture 16 Deep Neural Generative Models Lecture 16 Deep Neural Generative Models CMSC 35246: Deep Learning Shubhendu Trivedi & Risi Kondor University of Chicago May 22, 2017 Approach so far: We have considered simple models and then constructed

More information

Bio-inspired Continuous Optimization: The Coming of Age

Bio-inspired Continuous Optimization: The Coming of Age Bio-inspired Continuous Optimization: The Coming of Age Anne Auger Nikolaus Hansen Nikolas Mauny Raymond Ros Marc Schoenauer TAO Team, INRIA Futurs, FRANCE http://tao.lri.fr First.Last@inria.fr CEC 27,

More information

REINFORCE Framework for Stochastic Policy Optimization and its use in Deep Learning

REINFORCE Framework for Stochastic Policy Optimization and its use in Deep Learning REINFORCE Framework for Stochastic Policy Optimization and its use in Deep Learning Ronen Tamari The Hebrew University of Jerusalem Advanced Seminar in Deep Learning (#67679) February 28, 2016 Ronen Tamari

More information

Mixture Models & EM. Nicholas Ruozzi University of Texas at Dallas. based on the slides of Vibhav Gogate

Mixture Models & EM. Nicholas Ruozzi University of Texas at Dallas. based on the slides of Vibhav Gogate Mixture Models & EM icholas Ruozzi University of Texas at Dallas based on the slides of Vibhav Gogate Previously We looed at -means and hierarchical clustering as mechanisms for unsupervised learning -means

More information

CLASSICAL gradient methods and evolutionary algorithms

CLASSICAL gradient methods and evolutionary algorithms IEEE TRANSACTIONS ON EVOLUTIONARY COMPUTATION, VOL. 2, NO. 2, JULY 1998 45 Evolutionary Algorithms and Gradient Search: Similarities and Differences Ralf Salomon Abstract Classical gradient methods and

More information

Dynamic Search Fireworks Algorithm with Covariance Mutation for Solving the CEC 2015 Learning Based Competition Problems

Dynamic Search Fireworks Algorithm with Covariance Mutation for Solving the CEC 2015 Learning Based Competition Problems Dynamic Search Fireworks Algorithm with Covariance Mutation for Solving the CEC 05 Learning Based Competition Problems Chao Yu, Ling Chen Kelley,, and Ying Tan, The Key Laboratory of Machine Perception

More information

Forecasting & Futurism

Forecasting & Futurism Article from: Forecasting & Futurism December 2013 Issue 8 A NEAT Approach to Neural Network Structure By Jeff Heaton Jeff Heaton Neural networks are a mainstay of artificial intelligence. These machine-learning

More information

Path Integral Stochastic Optimal Control for Reinforcement Learning

Path Integral Stochastic Optimal Control for Reinforcement Learning Preprint August 3, 204 The st Multidisciplinary Conference on Reinforcement Learning and Decision Making RLDM203 Path Integral Stochastic Optimal Control for Reinforcement Learning Farbod Farshidian Institute

More information

Stochastic Methods for Continuous Optimization

Stochastic Methods for Continuous Optimization Stochastic Methods for Continuous Optimization Anne Auger et Dimo Brockhoff Paris-Saclay Master - Master 2 Informatique - Parcours Apprentissage, Information et Contenu (AIC) 2016 Slides taken from Auger,

More information

Mixture Models & EM. Nicholas Ruozzi University of Texas at Dallas. based on the slides of Vibhav Gogate

Mixture Models & EM. Nicholas Ruozzi University of Texas at Dallas. based on the slides of Vibhav Gogate Mixture Models & EM icholas Ruozzi University of Texas at Dallas based on the slides of Vibhav Gogate Previously We looed at -means and hierarchical clustering as mechanisms for unsupervised learning -means

More information

Gaussian Mixture Models

Gaussian Mixture Models Gaussian Mixture Models Pradeep Ravikumar Co-instructor: Manuela Veloso Machine Learning 10-701 Some slides courtesy of Eric Xing, Carlos Guestrin (One) bad case for K- means Clusters may overlap Some

More information

Integer weight training by differential evolution algorithms

Integer weight training by differential evolution algorithms Integer weight training by differential evolution algorithms V.P. Plagianakos, D.G. Sotiropoulos, and M.N. Vrahatis University of Patras, Department of Mathematics, GR-265 00, Patras, Greece. e-mail: vpp

More information

Probabilistic Graphical Models

Probabilistic Graphical Models Probabilistic Graphical Models Brown University CSCI 2950-P, Spring 2013 Prof. Erik Sudderth Lecture 13: Learning in Gaussian Graphical Models, Non-Gaussian Inference, Monte Carlo Methods Some figures

More information

Generalization and Function Approximation

Generalization and Function Approximation Generalization and Function Approximation 0 Generalization and Function Approximation Suggested reading: Chapter 8 in R. S. Sutton, A. G. Barto: Reinforcement Learning: An Introduction MIT Press, 1998.

More information

Lecture 9: Policy Gradient II 1

Lecture 9: Policy Gradient II 1 Lecture 9: Policy Gradient II 1 Emma Brunskill CS234 Reinforcement Learning. Winter 2019 Additional reading: Sutton and Barto 2018 Chp. 13 1 With many slides from or derived from David Silver and John

More information

Adaptive Generalized Crowding for Genetic Algorithms

Adaptive Generalized Crowding for Genetic Algorithms Carnegie Mellon University From the SelectedWorks of Ole J Mengshoel Fall 24 Adaptive Generalized Crowding for Genetic Algorithms Ole J Mengshoel, Carnegie Mellon University Severinio Galan Antonio de

More information

Artificial Intelligence Heuristic Search Methods

Artificial Intelligence Heuristic Search Methods Artificial Intelligence Heuristic Search Methods Chung-Ang University, Jaesung Lee The original version of this content is created by School of Mathematics, University of Birmingham professor Sandor Zoltan

More information

Machine Learning Lecture 7

Machine Learning Lecture 7 Course Outline Machine Learning Lecture 7 Fundamentals (2 weeks) Bayes Decision Theory Probability Density Estimation Statistical Learning Theory 23.05.2016 Discriminative Approaches (5 weeks) Linear Discriminant

More information

Lecture 8: Policy Gradient

Lecture 8: Policy Gradient Lecture 8: Policy Gradient Hado van Hasselt Outline 1 Introduction 2 Finite Difference Policy Gradient 3 Monte-Carlo Policy Gradient 4 Actor-Critic Policy Gradient Introduction Vapnik s rule Never solve

More information

Today s s Lecture. Applicability of Neural Networks. Back-propagation. Review of Neural Networks. Lecture 20: Learning -4. Markov-Decision Processes

Today s s Lecture. Applicability of Neural Networks. Back-propagation. Review of Neural Networks. Lecture 20: Learning -4. Markov-Decision Processes Today s s Lecture Lecture 20: Learning -4 Review of Neural Networks Markov-Decision Processes Victor Lesser CMPSCI 683 Fall 2004 Reinforcement learning 2 Back-propagation Applicability of Neural Networks

More information

Different Criteria for Active Learning in Neural Networks: A Comparative Study

Different Criteria for Active Learning in Neural Networks: A Comparative Study Different Criteria for Active Learning in Neural Networks: A Comparative Study Jan Poland and Andreas Zell University of Tübingen, WSI - RA Sand 1, 72076 Tübingen, Germany Abstract. The field of active

More information

Verification of a hypothesis about unification and simplification for position updating formulas in particle swarm optimization.

Verification of a hypothesis about unification and simplification for position updating formulas in particle swarm optimization. nd Workshop on Advanced Research and Technology in Industry Applications (WARTIA ) Verification of a hypothesis about unification and simplification for position updating formulas in particle swarm optimization

More information

Neural Network to Control Output of Hidden Node According to Input Patterns

Neural Network to Control Output of Hidden Node According to Input Patterns American Journal of Intelligent Systems 24, 4(5): 96-23 DOI:.5923/j.ajis.2445.2 Neural Network to Control Output of Hidden Node According to Input Patterns Takafumi Sasakawa, Jun Sawamoto 2,*, Hidekazu

More information

Policy Gradient Reinforcement Learning for Robotics

Policy Gradient Reinforcement Learning for Robotics Policy Gradient Reinforcement Learning for Robotics Michael C. Koval mkoval@cs.rutgers.edu Michael L. Littman mlittman@cs.rutgers.edu May 9, 211 1 Introduction Learning in an environment with a continuous

More information

Nonparametric Bayesian Methods (Gaussian Processes)

Nonparametric Bayesian Methods (Gaussian Processes) [70240413 Statistical Machine Learning, Spring, 2015] Nonparametric Bayesian Methods (Gaussian Processes) Jun Zhu dcszj@mail.tsinghua.edu.cn http://bigml.cs.tsinghua.edu.cn/~jun State Key Lab of Intelligent

More information

5. Simulated Annealing 5.1 Basic Concepts. Fall 2010 Instructor: Dr. Masoud Yaghini

5. Simulated Annealing 5.1 Basic Concepts. Fall 2010 Instructor: Dr. Masoud Yaghini 5. Simulated Annealing 5.1 Basic Concepts Fall 2010 Instructor: Dr. Masoud Yaghini Outline Introduction Real Annealing and Simulated Annealing Metropolis Algorithm Template of SA A Simple Example References

More information

Large Scale Continuous EDA Using Mutual Information

Large Scale Continuous EDA Using Mutual Information Large Scale Continuous EDA Using Mutual Information Qi Xu School of Computer Science University of Birmingham Email: qxx506@student.bham.ac.uk Momodou L. Sanyang School of Computer Science University of

More information

Stable Adaptive Momentum for Rapid Online Learning in Nonlinear Systems

Stable Adaptive Momentum for Rapid Online Learning in Nonlinear Systems Stable Adaptive Momentum for Rapid Online Learning in Nonlinear Systems Thore Graepel and Nicol N. Schraudolph Institute of Computational Science ETH Zürich, Switzerland {graepel,schraudo}@inf.ethz.ch

More information

CSC 578 Neural Networks and Deep Learning

CSC 578 Neural Networks and Deep Learning CSC 578 Neural Networks and Deep Learning Fall 2018/19 3. Improving Neural Networks (Some figures adapted from NNDL book) 1 Various Approaches to Improve Neural Networks 1. Cost functions Quadratic Cross

More information

CPSC 540: Machine Learning

CPSC 540: Machine Learning CPSC 540: Machine Learning Expectation Maximization Mark Schmidt University of British Columbia Winter 2018 Last Time: Learning with MAR Values We discussed learning with missing at random values in data:

More information

On and Off-Policy Relational Reinforcement Learning

On and Off-Policy Relational Reinforcement Learning On and Off-Policy Relational Reinforcement Learning Christophe Rodrigues, Pierre Gérard, and Céline Rouveirol LIPN, UMR CNRS 73, Institut Galilée - Université Paris-Nord first.last@lipn.univ-paris13.fr

More information

Unsupervised Learning

Unsupervised Learning 2018 EE448, Big Data Mining, Lecture 7 Unsupervised Learning Weinan Zhang Shanghai Jiao Tong University http://wnzhang.net http://wnzhang.net/teaching/ee448/index.html ML Problem Setting First build and

More information

An Evolution Strategy for the Induction of Fuzzy Finite-state Automata

An Evolution Strategy for the Induction of Fuzzy Finite-state Automata Journal of Mathematics and Statistics 2 (2): 386-390, 2006 ISSN 1549-3644 Science Publications, 2006 An Evolution Strategy for the Induction of Fuzzy Finite-state Automata 1,2 Mozhiwen and 1 Wanmin 1 College

More information

Balancing and Control of a Freely-Swinging Pendulum Using a Model-Free Reinforcement Learning Algorithm

Balancing and Control of a Freely-Swinging Pendulum Using a Model-Free Reinforcement Learning Algorithm Balancing and Control of a Freely-Swinging Pendulum Using a Model-Free Reinforcement Learning Algorithm Michail G. Lagoudakis Department of Computer Science Duke University Durham, NC 2778 mgl@cs.duke.edu

More information

Stochastic optimization and a variable metric approach

Stochastic optimization and a variable metric approach The challenges for stochastic optimization and a variable metric approach Microsoft Research INRIA Joint Centre, INRIA Saclay April 6, 2009 Content 1 Introduction 2 The Challenges 3 Stochastic Search 4

More information