LOCAL SEARCH. Today. Reading AIMA Chapter , Goals Local search algorithms. Introduce adversarial search 1/31/14

Similar documents
CS 331: Artificial Intelligence Local Search 1. Tough real-world problems

Local Search & Optimization

Local Search & Optimization

A.I.: Beyond Classical Search

Local Beam Search. CS 331: Artificial Intelligence Local Search II. Local Beam Search Example. Local Beam Search Example. Local Beam Search Example

Local Search and Optimization

Chapter 4 Beyond Classical Search 4.1 Local search algorithms and optimization problems

Local search algorithms

Local search algorithms. Chapter 4, Sections 3 4 1

School of EECS Washington State University. Artificial Intelligence

CS 380: ARTIFICIAL INTELLIGENCE

Local and Stochastic Search

Summary. AIMA sections 4.3,4.4. Hill-climbing Simulated annealing Genetic algorithms (briey) Local search in continuous spaces (very briey)

22c:145 Artificial Intelligence

Local and Online search algorithms

Local search algorithms. Chapter 4, Sections 3 4 1

Local Search (Greedy Descent): Maintain an assignment of a value to each variable. Repeat:

New rubric: AI in the news

Constraint Satisfaction Problems

The Story So Far... The central problem of this course: Smartness( X ) arg max X. Possibly with some constraints on X.

Pengju

Methods for finding optimal configurations

Iterative improvement algorithms. Beyond Classical Search. Outline. Example: Traveling Salesperson Problem

Local search algorithms

Computer simulation can be thought as a virtual experiment. There are

Motivation: We have already seen an example of a system of nonlinear equations when we studied Gaussian integration (p.8 of integration notes)

Gradient Descent. Sargur Srihari

Local search and agents

Constraint satisfaction search. Combinatorial optimization search.

Metaheuristics. 2.3 Local Search 2.4 Simulated annealing. Adrian Horga

Scaling Up. So far, we have considered methods that systematically explore the full search space, possibly using principled pruning (A* etc.).

Optimization Methods via Simulation

Finding optimal configurations ( combinatorial optimization)

Machine Learning Brett Bernstein. Recitation 1: Gradients and Directional Derivatives

Genetic Algorithms. Donald Richards Penn State University

CSC242: Artificial Intelligence. Lecture 4 Local Search

Incremental Stochastic Gradient Descent

Chapter 6 Constraint Satisfaction Problems

Optimization and Gradient Descent

CSC242: Intro to AI. Lecture 5. Tuesday, February 26, 13

MATH 4211/6211 Optimization Basics of Optimization Problems

Gradient Descent. Dr. Xiaowei Huang

Optimization. Announcements. Last Time. Searching: So Far. Complete Searching. Optimization

Topic 8c Multi Variable Optimization

CSC 4510 Machine Learning

Support Vector Machines: Training with Stochastic Gradient Descent. Machine Learning Fall 2017

Metaheuristics and Local Search

26. Directional Derivatives & The Gradient

Design and Optimization of Energy Systems Prof. C. Balaji Department of Mechanical Engineering Indian Institute of Technology, Madras

Introduction to gradient descent

Metaheuristics and Local Search. Discrete optimization problems. Solution approaches

Beyond Classical Search

PROBLEM SOLVING AND SEARCH IN ARTIFICIAL INTELLIGENCE

LECTURE 22: SWARM INTELLIGENCE 3 / CLASSICAL OPTIMIZATION

Methods for finding optimal configurations

Zebo Peng Embedded Systems Laboratory IDA, Linköping University

HOMEWORK #4: LOGISTIC REGRESSION

POLI 8501 Introduction to Maximum Likelihood Estimation

Lecture 9: Search 8. Victor R. Lesser. CMPSCI 683 Fall 2010

22 Path Optimisation Methods

Day 3 Lecture 3. Optimizing deep networks

Optimization II: Unconstrained Multivariable

) (d o f. For the previous layer in a neural network (just the rightmost layer if a single neuron), the required update equation is: 2.

AI Programming CS F-20 Neural Networks

Numerical optimization

Chapter 2. Optimization. Gradients, convexity, and ALS

Recitation 1. Gradients and Directional Derivatives. Brett Bernstein. CDS at NYU. January 21, 2018

Computational statistics

NEWTON S METHOD EVAN BANYASH, CHANDLER BOGOLIN, AND DONNA YOUNG. This paper is dedicated to Dr. Ritter. We love him so.

Lecture 5: Logistic Regression. Neural Networks

Optimization Tutorial 1. Basic Gradient Descent

Mathematical optimization

Lecture 4: Types of errors. Bayesian regression models. Logistic regression

Learning in State-Space Reinforcement Learning CIS 32

CS 542G: Robustifying Newton, Constraints, Nonlinear Least Squares

Optimization II: Unconstrained Multivariable

Bindel, Fall 2011 Intro to Scientific Computing (CS 3220) Week 6: Monday, Mar 7. e k+1 = 1 f (ξ k ) 2 f (x k ) e2 k.

Math (P)Review Part II:

Lecture 9 Evolutionary Computation: Genetic algorithms

Artificial Intelligence Heuristic Search Methods

CPSC 340: Machine Learning and Data Mining. Stochastic Gradient Fall 2017

5. Simulated Annealing 5.2 Advanced Concepts. Fall 2010 Instructor: Dr. Masoud Yaghini

Physics 403. Segev BenZvi. Numerical Methods, Maximum Likelihood, and Least Squares. Department of Physics and Astronomy University of Rochester

MIT Manufacturing Systems Analysis Lecture 14-16

15. LECTURE 15. I can calculate the dot product of two vectors and interpret its meaning. I can find the projection of one vector onto another one.

Simulated Annealing. Local Search. Cost function. Solution space

Machine Learning CS 4900/5900. Lecture 03. Razvan C. Bunescu School of Electrical Engineering and Computer Science

5. Simulated Annealing 5.1 Basic Concepts. Fall 2010 Instructor: Dr. Masoud Yaghini

Evolutionary computation

Unconstrained minimization of smooth functions

Multi-Objective Optimization Methods for Optimal Funding Allocations to Mitigate Chemical and Biological Attacks

Iterative Methods for Solving A x = b

CPSC 540: Machine Learning

Unit 2: Solving Scalar Equations. Notes prepared by: Amos Ron, Yunpeng Li, Mark Cowlishaw, Steve Wright Instructor: Steve Wright

Improving the Convergence of Back-Propogation Learning with Second Order Methods

GENG2140, S2, 2012 Week 7: Curve fitting

Introduction to Natural Computation. Lecture 9. Multilayer Perceptrons and Backpropagation. Peter Lewis

RATES OF CHANGE. A violin string vibrates. The rate of vibration can be measured in cycles per second (c/s),;

Data Mining and Machine Learning (Machine Learning: Symbolische Ansätze)

CSC242: Intro to AI. Lecture 7 Games of Imperfect Knowledge & Constraint Satisfaction

Transcription:

LOCAL SEARCH Today Reading AIMA Chapter 4.1-4.2, 5.1-5.2 Goals Local search algorithms n hill-climbing search n simulated annealing n local beam search n genetic algorithms n gradient descent and Newton-Rhapson Introduce adversarial search 1

Recall the N-Queens problem incremental formulation N-Queens alternative approach complete state formulation 2

Local search The basic idea: 1. Randomly initialize (complete) state 2. If not goal state, a. make local modification to state to generate a neighbor state OR b. enumerate all neighbor states and choose the best 3. Repeat step 2 until goal state is found (or out of time) Requires the ability to quickly: Generate a random (probably-not-optimal) state Evaluate the quality of a state Move to other states (well-defined neighborhood function) Graph Coloring 1. Start with random coloring of nodes 2. If not goal state, change color of one node 3. Repeat 2 Western Australia Northern Territory South Australia Queensland New South Wales WA NT SA Q NSW V Victoria Tasmania T (a) (b) 3

Local Search Algorithms Useful when path to the goal state is irrelevant Keep track of current state only Explore nearby neighbor (successor) states Algorithms include: Hill-climbing Simulated annealing Local beam search Genetic algorithms Gradient descent (Newton-Rhapson) Local Search Algorithms 4

Hill-climbing Search "Like climbing Everest in thick fog with amnesia" Hill-climbing Search: 8-queens problem h = number of pairs of queens that are attacking each other h = 17 for the above state 5

Hill-climbing search: 8-queens problem A local minimum with h = 1 Problems with hill-climbing? 6

Hill-climbing Performance Complete No Optimal - No Time Depend Space O(1) Hill-climbing Variants Stochastic Hill Climbing Randomly chooses uphill successors Probability of selection proportional to steepness First-choice hill climbing Choose first generated uphill successor Random-restart hill climbing Runs multiple hill-climbing searches from random initial states 7

Simulated annealing search Idea: escape local maxima by allowing some "bad" moves but gradually decrease their frequency Local Beam Search Idea: Keep as many states in memory as possible Start with k randomly generated states Generate all successors of all k states If goal is found, stop. Else select the k best successors from the complete list of successors and repeat. What s one possible shortcoming of this approach? 8

Local Beam Search Idea: Keep as many states in memory as possible Start with k randomly generated states Generate all successors of all k states If goal is found, stop. Else select the k best successors from the complete list of successors and repeat. Possible problem: All k states can become concentrated in the same part of the search space Stochastic beam search Choose k successors at random where the probability of selection is proportional to its objective function value Genetic Algorithms Generate a successor state by combining two parent states Begin with k randomly generated states (population) represented as strings over some finite alphabet Evaluate the fitness of each state via fitness function (higher values = better state) Repeat k times: n Randomly select 2 states proportional to their fitness n Randomly pick a crossover point and produce a new state n Randomly mutate each location of the new state 9

Genetic Algorithms Fitness function: number of non-attacking pairs of queens 24/(24+23+20+11) = 31% 23/(24+23+20+11) = 29% etc Genetic Algorithms 3275411 24748552 32748552 10

Genetic Algorithms Crossover can produce an offspring that is in an entirely different area of the search space than either parent Sometimes offspring is outside of the feasible or evaluable region Either replace entire population at each step (generational GA) or replace just a few (low fitness) members of the population (steady-state GA) The benefit comes from having a representation where contiguous blocks are actually meaningful Gradient-based methods 11

Gradient-based methods Gradient-based methods are similar to hill-climbing Find the best direction and take it Nevertheless, they are widely used Their operation is similar to a blind man walking up a hill, whose only knowledge of of the hill comes from what passes under his feet. If the hill is predictable in some fashion, he will reach the top, but it is easy to imagine confounding terrain - Goffe et al., 1994 Newton-Rhapson Method Newton-Rhapson is a method for finding roots of a function, i.e. finding x such that g(x) = 0 g(x) x 0 (x 0, y 0 ) (x 1, y 1 ) Step 1: Make an initial guess x 0 Step 2: Compute new point x 1 using the update rule: g(x n ) x n+1 = x n g 0 (x n ) Intuition behind the update rule: The tangent line at x 0 is a linear approximation of g(x) at the point x 0. Since the tangent line is an approximation of g(x), the root of the tangent line is probably a better estimate of the root of g(x) than our initial guess x 0. So, to find the root of the tangent line we use the formula for slope and then solve for x 1 : g 0 (x 0 )= y 1 y 0 remember me?! x 1 x 0 Using a bit of algebra, we solve for x 1 which gives us the above update rule. We then iterate this process until we converge to the root of g(x). 12

Newton-Rhapson applied to optimization When we re minimizing a function we want to find the point x * such that f(x * ) < f(x) for all x Recall from calculus that the slope at such a point x* is zero, i.e. f (x * ) = 0 So we can restate the problem as follows: we want to find the point x * such that f (x * ) = 0 Now we can use the Newton-Rhapson method to find the root of the first derivative f (x). The update rule in this case is: x n+1 = x n f 0 (x n ) f 00 (x n ) The function f (x) is the second derivative. Ask yourself: Why does the second derivative appear in this formula? Newton-Rhapson applied to optimization In the multivariate case, the update rule looks like this: x n+1 = x n f 0 (x n ) f 00 (x n ) univariate x n+1 = x n H 1 f (x n)r f (x n ) multivariate The second derivative is given by the Hessian matrix (denoted as H) which is a square matrix that contains the second-order partial derivatives The first derivative is represented by the gradient (denoted using the upside down triangle) 13

Gradient Ascent (Descent) Sometimes the Hessian is too computationally expensive to compute or it cannot be inverted In this case, we can replace the second derivative with a step size constant γ x x + rf (x) The gradient (the upside down triangle) gives the direction of steepest ascent. The step cost (gamma) determines how far we step in that direction. For minimization, we would change the addition to subtraction (i.e. we want to move in the direction opposite to the direction of steepest ascent) Some information about the step size gamma: The user can set the step size to any (typically positive) value A step size too small results in slow progress. A step size too large can overshoot the minimum/ maximum. The step size can be determined using a line search (think binary search). However, make sure that the line search itself isn t computationally expensive The step size can change at each iteration. It doesn t have to stay the same The stopping criteria: gradient sufficiently close to zero, or difference between new and old points below threshold Gradient Ascent (Descent) function GRADIENT-ASCENT(F,γ) returns solution rf COMPUTE-GRADIENT(F ) x a randomly selected value while stopping criteria x x + rf (x) return x 14

Gradient Ascent (Descent) Local search summary Hill-climbing search Stochastic hill-climbing search First-choice Random restart hill-climbing Simulated annealing Local beam search Stochastic local beam search Genetic algorithms Gradient-based methods Newton-Rhapson Gradient ascent (descent) 15