Local search algorithms

Similar documents
Local search algorithms

LOCAL SEARCH. Today. Reading AIMA Chapter , Goals Local search algorithms. Introduce adversarial search 1/31/14

A.I.: Beyond Classical Search

Chapter 4 Beyond Classical Search 4.1 Local search algorithms and optimization problems

Local Search & Optimization

Local Search & Optimization

CS 331: Artificial Intelligence Local Search 1. Tough real-world problems

Local Search and Optimization

School of EECS Washington State University. Artificial Intelligence

CS 380: ARTIFICIAL INTELLIGENCE

Local search algorithms. Chapter 4, Sections 3 4 1

Scaling Up. So far, we have considered methods that systematically explore the full search space, possibly using principled pruning (A* etc.).

22c:145 Artificial Intelligence

Local and Stochastic Search

Summary. AIMA sections 4.3,4.4. Hill-climbing Simulated annealing Genetic algorithms (briey) Local search in continuous spaces (very briey)

CSCI 360 Introduc/on to Ar/ficial Intelligence Week 2: Problem Solving and Op/miza/on. Instructor: Wei-Min Shen

Local and Online search algorithms

Local Beam Search. CS 331: Artificial Intelligence Local Search II. Local Beam Search Example. Local Beam Search Example. Local Beam Search Example

Local search algorithms. Chapter 4, Sections 3 4 1

Local search and agents

Machine Learning and Data Mining. Linear regression. Prof. Alexander Ihler

Iterative improvement algorithms. Beyond Classical Search. Outline. Example: Traveling Salesperson Problem

Methods for finding optimal configurations

Pengju

New rubric: AI in the news

Finding optimal configurations ( combinatorial optimization)

Bellman s Curse of Dimensionality

State Space Representa,ons and Search Algorithms

Random Search. Shin Yoo CS454, Autumn 2017, School of Computing, KAIST

PROBLEM SOLVING AND SEARCH IN ARTIFICIAL INTELLIGENCE

Local Search. Shin Yoo CS492D, Fall 2015, School of Computing, KAIST

The Story So Far... The central problem of this course: Smartness( X ) arg max X. Possibly with some constraints on X.

Design and Optimization of Energy Systems Prof. C. Balaji Department of Mechanical Engineering Indian Institute of Technology, Madras

Beyond Classical Search

Optimization. Announcements. Last Time. Searching: So Far. Complete Searching. Optimization

CSE 473: Ar+ficial Intelligence. Example. Par+cle Filters for HMMs. An HMM is defined by: Ini+al distribu+on: Transi+ons: Emissions:

Optimization Methods via Simulation

Last Lecture Recap UVA CS / Introduc8on to Machine Learning and Data Mining. Lecture 3: Linear Regression

CSE 473: Ar+ficial Intelligence. Probability Recap. Markov Models - II. Condi+onal probability. Product rule. Chain rule.

CSC 4510 Machine Learning

Local Search (Greedy Descent): Maintain an assignment of a value to each variable. Repeat:

Methods for finding optimal configurations

Lecture 17: Numerical Optimization October 2014

CSE 473: Ar+ficial Intelligence

x 2 i 10 cos(2πx i ). i=1

Metaheuristics. 2.3 Local Search 2.4 Simulated annealing. Adrian Horga

Least Mean Squares Regression. Machine Learning Fall 2017

Introduction to gradient descent

Metaheuristics and Local Search

Lecture 35 Minimization and maximization of functions. Powell s method in multidimensions Conjugate gradient method. Annealing methods.

Physics 403. Segev BenZvi. Numerical Methods, Maximum Likelihood, and Least Squares. Department of Physics and Astronomy University of Rochester

Introduction to Simulated Annealing 22c:145

Metaheuristics and Local Search. Discrete optimization problems. Solution approaches

Machine Learning and Data Mining. Linear regression. Kalev Kask

Bindel, Fall 2011 Intro to Scientific Computing (CS 3220) Week 6: Monday, Mar 7. e k+1 = 1 f (ξ k ) 2 f (x k ) e2 k.

Fundamentals of Metaheuristics

UVA CS 4501: Machine Learning. Lecture 6: Linear Regression Model with Dr. Yanjun Qi. University of Virginia

Genetic Algorithms. Donald Richards Penn State University

CSCI 360 Introduc/on to Ar/ficial Intelligence Week 2: Problem Solving and Op/miza/on

Gradient Descent. Dr. Xiaowei Huang

CSCI 360 Introduc/on to Ar/ficial Intelligence Week 2: Problem Solving and Op/miza/on. Professor Wei-Min Shen Week 8.1 and 8.2

CSE 473: Ar+ficial Intelligence. Hidden Markov Models. Bayes Nets. Two random variable at each +me step Hidden state, X i Observa+on, E i

CS 6140: Machine Learning Spring What We Learned Last Week 2/26/16

Input layer. Weight matrix [ ] Output layer

Summary. Local Search and cycle cut set. Local Search Strategies for CSP. Greedy Local Search. Local Search. and Cycle Cut Set. Strategies for CSP

Computational statistics

CSC242: Artificial Intelligence. Lecture 4 Local Search

Incremental Stochastic Gradient Descent

Practical Numerical Methods in Physics and Astronomy. Lecture 5 Optimisation and Search Techniques

CS 781 Lecture 9 March 10, 2011 Topics: Local Search and Optimization Metropolis Algorithm Greedy Optimization Hopfield Networks Max Cut Problem Nash

Physics General Physics II Electricity, Magne/sm, Op/cs and Modern Physics Review Lecture Chapter Spring 2017 Semester

Optimization II: Unconstrained Multivariable

Optimization and Gradient Descent

HILL-CLIMBING JEFFREY L. POPYACK

Exponen'al growth and differen'al equa'ons

Report due date. Please note: report has to be handed in by Monday, May 16 noon.

AM 205: lecture 18. Last time: optimization methods Today: conditions for optimality

AI Programming CS F-20 Neural Networks

26. Directional Derivatives & The Gradient

MA/CS 109 Lecture 7. Back To Exponen:al Growth Popula:on Models

12. LOCAL SEARCH. gradient descent Metropolis algorithm Hopfield neural networks maximum cut Nash equilibria

Simulated Annealing. Local Search. Cost function. Solution space

Overview. Optimization. Easy optimization problems. Monte Carlo for Optimization. 1. Survey MC ideas for optimization: (a) Multistart

Gradient Descent. Sargur Srihari

Search. Search is a key component of intelligent problem solving. Get closer to the goal if time is not enough

12. LOCAL SEARCH. gradient descent Metropolis algorithm Hopfield neural networks maximum cut Nash equilibria

CSC242: Intro to AI. Lecture 5. Tuesday, February 26, 13

11 More Regression; Newton s Method; ROC Curves

Machine Learning CS 4900/5900. Lecture 03. Razvan C. Bunescu School of Electrical Engineering and Computer Science

CPSC 340: Machine Learning and Data Mining. Stochastic Gradient Fall 2017

Part 2. Representation Learning Algorithms

how should the GA proceed?

Unit #5 : Implicit Differentiation, Related Rates. Goals: Introduce implicit differentiation. Study problems involving related rates.

Zebo Peng Embedded Systems Laboratory IDA, Linköping University

Lecture 4: Simulated Annealing. An Introduction to Meta-Heuristics, Produced by Qiangfu Zhao (Since 2012), All rights reserved

POLI 8501 Introduction to Maximum Likelihood Estimation

Computer simulation can be thought as a virtual experiment. There are

SIO223A A Brief Introduction to Geophysical Modeling and Inverse Theory.

CS 124 Math Review Section January 29, 2018

Numerical Methods in Physics

Transcription:

Local search algorithms CS171, Fall 2016 Introduc<on to Ar<ficial Intelligence Prof. Alexander Ihler Reading: R&N 4.1-4.2

Local search algorithms In many op<miza<on problems, the path to the goal is irrelevant; the goal state itself is the solu<on Local search: widely used for very big problems Returns good but not op-mal solu<ons State space = set of "complete" configura<ons Find configura<on sa<sfying constraints Examples: n-queens, VLSI layout, airline flight schedules Local search algorithms Keep a single "current" state, or small set of states Itera<vely try to improve it / them Very memory efficient keeps only one or a few states You control how much memory you use

Example: n-queens Goal: Put n queens on an n n board with no two queens on the same row, column, or diagonal Neighbor: move one queen to another row Search: go from one neighbor to the next

Algorithm design considera<ons How do you represent your problem? What is a complete state? What is your objec<ve func<on? How do you measure cost or value of a state? What is a neighbor of a state? Or, what is a step from one state to another? How can you compute a neighbor or a step? Are there any constraints you can exploit?

Random restart wrapper We ll use stochas<c local search methods Return different solu<on for each trial & ini<al state Almost every trial hits difficul<es (see sequel) Most trials will not yield a good result (sad!) Using many random restarts improves your chances Many shots at goal may finally get a good one Restart a random ini<al state, many -mes Report the best result found across many trials

! Random restart wrapper best_found à RandomState() // initialize to something! while not (tired of doing it): // now do repeated local search! result à LocalSearch( RandomState() )! if (Cost(result) < Cost(best_found)):! best_found = result // keep best result found so far!! return best_found!! Typically, you are tired of doing it means that some resource limit is exceeded, e.g., number of iterations, wall clock time, CPU time, etc. It may also mean that Result improvements are small and infrequent, e.g., less than 0.1% Result improvement in the last week of run time.

Tabu search wrapper Add recently visited states to a tabu-list Temporarily excluded from being visited again Forces solver away from explored regions Avoid gemng stuck in local minima (in principle) Implemented as a hash table + FIFO queue Unit <me cost per step; constant memory cost You control how much memory is used

Tabu search wrapper New State FIFO QUEUE Oldest State HASH TABLE State Present? UNTIL ( you are <red of doing it ) DO { set Neighbor to makeneighbor( CurrentState ); IF ( Neighbor is in HASH ) THEN ( discard Neighbor ); ELSE { push Neighbor onto FIFO, pop OldestState; remove OldestState from HASH, insert Neighbor; set CurrentState to Neighbor; run yourfavoritelocalsearch on CurrentState; } }

Local search algorithms Hill-climbing search Gradient descent in con<nuous state spaces Can use e.g. Newton s method to find roots Simulated annealing search Local beam search Gene<c algorithms

Hill-climbing search like trying to find the top of Mount Everest in a thick fog while suffering from amnesia

Ex: Hill-climbing, 8-queens h = # of pairs of queens that are attacking each other, either directly or indirectly h=17 for this state Each number indicates h if we move a queen in its column to that square 12 (boxed) = best h among all neighors; select one randomly

Ex: Hill-climbing, 8-queens A local minimum with h=1 All one-step neighbors have higher h values What can you do to get out of this local minimum?

Hill-climbing difficul<es Note: these difficul<es apply to all local search algorithms, and usually become much worse as the search space becomes higher dimensional Problem: depending on ini<al state, can get stuck in local maxima

Hill-climbing difficul<es Note: these difficul<es apply to all local search algorithms, and usually become much worse as the search space becomes higher dimensional Ridge problem: every neighbor appears to be downhill But, search space has an uphill (just not in neighbors) Ridge: Fold a piece of paper and hold it tilted up at an unfavorable angle to every possible search space step. Every step leads downhill; but the ridge leads uphill. States / steps (discrete)

Gradient descent Hill-climbing in con<nuous state spaces Denote state as µ; cost as J(µ) How to change µ to improve J(µ)? Choose a direction in which J(µ) is decreasing Derivative Positive => increasing Negative => decreasing

Gradient descent Hill-climbing in continuous spaces Gradient vector Indicates direction of steepest ascent (negative = steepest descent) (c) Alexander Ihler

Gradient descent Hill-climbing in continuous spaces Gradient = the most direct direc<on up-hill in the objec<ve (cost) func<on, so its nega<ve minimizes the cost func<on. * Assume we have some cost-function: and we want minimize over continuous variables x 1,x 2,..,x n 1. Compute the gradient : 2. Take a small step downhill in the direction of the gradient: 3. Check if 4. If true then accept move, if not reject. (or, Armijo rule, etc.) (decrease step size, etc.) 5. Repeat.

Gradient descent Hill-climbing in continuous spaces How do I determine the gradient? Derive formula using mul<variate calculus. Ask a mathema<cian or a domain expert. Do a literature search. Varia<ons of gradient descent can improve performance for this or that special case. See Numerical Recipes in C (and in other languages) by Press, Teukolsky, Veuerling, and Flannery. Simulated Annealing, Linear Programming too Works well in smooth spaces; poorly in rough.

Newton s method Want to find the roots of f(x) Root : value of x for which f(x)=0 Ini<alize to some point x Compute the tangent at x & compute where it crosses x-axis Op<miza<on: find roots of rf(x) Does not always converge; some<mes unstable If converges, usually very fast ( Step size = 1/rrf ; inverse curvature) Works well for smooth, non-pathological func<ons, lineariza<on accurate Works poorly for wiggly, ill-behaved func<ons (Multivariate: r f(x) = gradient vector r 2 f(x) = matrix of 2 nd derivatives a/b = a b -1, matrix inverse)

Simulated annealing search Idea: escape local maxima by allowing some "bad" moves but gradually decrease their frequency Improvement: Track the BestResultFoundSoFar. Here, this slide follows Fig. 4.5 of the textbook, which is simplified.

Typical annealing schedule Usually use a decaying exponen<al Axis values scaled to fit problem characteris<cs Temperature

Pr( accept worse successor ) Decreases as temperature T decreases Increases as E decreases Some<mes, step size also decreases with T (accept bad moves early on) (accept not much worse) Temperature e ΔE / T Temperature T High Low High Medium Low ΔE Low High Medium

Goal: ratchet up a jagged slope Value A Value=42 B Value=41 C Value=45 D Value=44 E Value=48 F Value=47 G Value=51 Arbitrary (Fictitious) Search Space Coordinate Your random restart wrapper starts here. You want to get here. HOW?? This is an illustrative cartoon

Goal: ratchet up a jagged slope Your random restart wrapper starts here. x -1-4 e x.37.018 A Value=42 ΔE(AB)=-1 P(AB).37 B Value=41 ΔE(BA)=1 ΔE(BC)=4 P(BA)=1 P(BC)=1 This is an illustrative cartoon C Value=45 ΔE(CB)=-4 ΔE(CD)=-1 P(CB).018 P(CD).37 E Value=48 ΔE(ED)=-4 ΔE(EF)=-1 P(ED).018 P(EF).37 D Value=44 ΔE(DC)=1 ΔE(DE)=4 P(DC)=1 P(DE)=1 F Value=47 ΔE(FE)=1 ΔE(FG)=4 P(FE)=1 P(FG)=1 G Value=51 ΔE(GF)=-4 P(GF).018 From A you will accept a move to B with P(AB).37. From B you are equally likely to go to A or to C. From C you are 20X more likely to go to D than to B. From D you are equally likely to go to C or to E. From E you are 20X more likely to go to F than to D. From F you are equally likely to go to E or to G. Remember best point you ever found (G or neighbor?).

Proper<es of simulated annealing One can prove: If T decreases slowly enough, then simulated annealing search will find a global op<mum with probability approaching 1 Unfortunately this can take a VERY VERY long <me Note: in any finite search space, random guessing also will find a global op<mum with probability approaching 1 So, ul<mately this is a very weak claim Ozen works very well in prac<ce But usually VERY VERY slow Widely used in VLSI layout, airline scheduling, etc.

Local beam search Keep track of k states rather than just one Start with k randomly generated states At each itera<on, all the successors of all k states are generated If any one is a goal state, stop; else select the k best successors from the complete list and repeat. Concentrates search effort in areas believed to be frui{ul May lose diversity as search progresses, resul<ng in wasted effort

Local beam search a1 b1 k1 Create k random ini<al states Generate their children a2 b2 k2 Select the k best children Repeat indefinitely Is it beuer than simply running k searches? Maybe??

Gene<c algorithms State = a string over a finite alphabet (an individual) A successor state is generated by combining two parent states Start with k randomly generated states (popula<on) Evalua<on func<on (fitness func<on). Higher values for beuer states. Select individuals for next genera<on based on fitness P(indiv. in next gen) = indiv. fitness / total popula<on fitness Crossover: fit parents to yield next genera<on (offspring) Mutate the offspring randomly with some low probability

fitness = #non-attacking queens probability of being in next generation = fitness/(σ_i fitness_i) Fitness func<on: #non-auacking queen pairs min = 0, max = 8 7/2 = 28 Σ i fitness_i = 24+23+20+11 = 78 How to convert a fitness value into a probability of being in the next generation. P(pick child_1 for next gen.) = fitness_1/(σ_i fitness_i) = 24/78 = 31% P(pick child_2 for next gen.) = fitness_2/(σ_i fitness_i) = 23/78 = 29%; etc

Par<ally observable systems What if we don t even know what state we re in? Can reason over belief states What worlds might we be in? State es<ma<on or filtering task Typical for probabilis<c reasoning May become hard to represent ( state is now very large!) Recall: vacuum world

Par<ally observable systems What if we don t even know what state we re in? Can reason over belief states What worlds might we be in? State es<ma<on or filtering task Typical for probabilis<c reasoning May become hard to represent ( state is now very large!) Ozen use approximate belief state Animation: Deiter Fox, UW Par<cle filters Popula<on approach to state es<ma<on Keep list of (many) possible states Observa<ons: increase/decrease weights Resampling improves density of samples in high-probability regions

Linear Programming CS171, Fall 2016 Introduc<on to Ar<ficial Intelligence Prof. Alexander Ihler

Linear programming Restricted type of problem, but Efficient, op3mal solu<ons Problems of the form: Maximize: v T x Subject to: A x b C x = d (linear objective) (linear constraints) Very efficient, off the shelf solvers available for LPs Can quickly solve large problems (1000s of variables) Problems with addi<onal special structure ) solve very large systems!

Summary Local search maintains a complete solu<on Seeks consistent (also complete) solu<on vs: path search maintains a consistent solu<on; seeks complete Goal of both: consistent & complete solu<on Types: hill climbing, gradient ascent simulated annealing, Monte Carlo methods Popula<on methods: beam search; gene<c / evolu<onary algorithms Wrappers: random restart; tabu search Local search ozen works well on large problems Abandons op<mality Always has some answer available (best found so far)