Artificial Intelligence Local search and agents Instructor: Fabrice Popineau [These slides adapted from Stuart Russell, Dan Klein and Pieter Abbeel @ai.berkeley.edu]
Local search algorithms In many optimization problems, path is irrelevant; the goal state is the solution Then state space = set of complete configurations; find configuration satisfying constraints, e.g., n-queens problem; or, find optimal configuration, e.g., travelling salesperson problem In such cases, can use iterative improvement algorithms; keep a single current state, try to improve it Constant space, suitable for online as well as offline search
Heuristic for n-queens problem Goal: n queens on board with no conflicts, i.e., no queen attacking another States: n queens on board, one per column Heuristic value function: number of conflicts
Hill-climbing algorithm function HILL-CLIMBING(problem) returns a state current make-node(problem.initial-state) loop do neighbor a highest-valued successor of current if neighbor.value current.value then return current.state current neighbor Like climbing Everest in thick fog with amnesia
Global and local maxima Random restarts find global optimum duh Random sideways moves Escape from shoulders Loop forever on flat local maxima
Hill-climbing on the 8-queens problem No sideways moves: Succeeds w/ prob. 0.14 Average number of moves per trial: 4 when succeeding, 3 when getting stuck Expected total number of moves needed: 3(1-p)/p + 4 =~ 22 moves Allowing 100 sideways moves: Succeeds w/ prob. 0.94 Average number of moves per trial: 21 when succeeding, 65 when getting stuck Expected total number of moves needed: 65(1-p)/p + 21 =~ 25 moves Moral: algorithms with knobs to twiddle are irritating
Hill Climbing Quiz Starting from X, where do you end up? Starting from Y, where do you end up? Starting from Z, where do you end up?
Simulated annealing Resembles the annealing process used to cool metals slowly to reach an ordered (lowenergy) state Basic idea: Allow bad moves occasionally, depending on temperature High temperature => more bad moves allowed, shake the system out of its local minimum Gradually reduce temperature according to some schedule Sounds pretty flaky, doesn t it? Theorem: simulated annealing finds the global optimum with probability 1 for a slow enough cooling schedule
Simulated annealing algorithm function SIMULATED-ANNEALING(problem,schedule) returns a state current make-node(problem.initial-state) for t = 1 to do T schedule(t) if T = 0 then return current next a randomly selected successor of current E next.value current.value if E > 0 then current next else current next only with probability e E/T
Simulated Annealing Theoretical guarantee: Stationary distribution: If T decreased slowly enough, will converge to optimal state! Is this an interesting guarantee? Sounds like magic, but reality is reality: The more downhill steps you need to escape a local optimum, the less likely you are to ever make them all in a row People think hard about ridge operators which let you jump around the space in better ways
Local beam search Basic idea: K copies of a local search algorithm, initialized randomly For each iteration Generate ALL successors from K current states Choose best K of these to be the new current states Why is this different from K local searches in parallel? The searches communicate! Come over here, the grass is greener! What other well-known algorithm does this remind you of? Evolution! Or, K chosen randomly with a bias towards good ones
Genetic Algorithms Genetic algorithms use a natural selection metaphor Keep best N hypotheses at each step (selection) based on a fitness function Also have pairwise crossover operators, with optional mutation to give variety Possibly the most misunderstood, misapplied (and even maligned) technique around
Example: N-Queens Why does crossover make sense here? When wouldn t it make sense? What would mutation be? What would a good fitness function be?
Searching in the real world Nondeterminism: actions have unpredictable effects Modified problem formulation to allow multiple outcomes Solutions are now contingency plans New algorithm to find them: AND-OR search May need plans with loops! Partial observability: percept is not the whole state New concept: belief state = set of states agent could be in Modified formulation for search in belief state space; add observation model Simple and general agent design Nondeterminism and partial observability
The erratic vacuum world If square is dirty, Suck sometimes cleans up dirt in adjacent square as well E.g., state 1 could go to 5 or 7 If square is clean, Suck may dump dirt on it by accident E.g., state 4 could go to 4 or 2
Problem formulation Results( s,a) returns a set of states Results( 1,Suck) = {5,7} Results( 4,Suck) = {2,4} Results( 1,Right) = {2} Everything else is the same as before
Contingent solutions From state 1, does [Suck] solve the problem? Not necessarily! What about [Suck,Right,Suck]? Not necessarily! [Suck; if state=5 then [Right,Suck] else []] This is a contingent solution (a.k.a. a branching or conditional plan) Great! So, how do we find such solutions?
AND-OR search trees OR-node: Agent chooses action; At least one branch must be solved AND-node: Nature chooses outcome; All branches must be solved
AND-OR search made easy AND-OR search: call OR-Search on the root node OR-search(node): succeeds if AND-search succeeds on the outcome set for any action AND-search(set of nodes): succeeds if OR-search succeeds on ALL nodes in the set
AND-OR search function AND-OR-GRAPH-SEARCH(problem) returns a conditional plan, or failure OR-SEARCH(problem.initial-state,problem,[]) function OR-SEARCH(state,problem,path) returns a conditional plan, or failure if problem.goal-test(state) then return the empty plan if state is on path then return failure for each action in problem.actions(state) do plan AND-SEARCH(results(state,action),problem,[state path]) if plan failure then return [action plan] return failure
AND-OR search contd. function AND-SEARCH(states,problem,path) returns a conditional plan, or failure for each s i in states do plan i OR-SEARCH(s i,problem,path) if plan i = failure then return failure return [if s 1 then plan 1 else if s 2 then plan 2 else... if s n 1 then plan n 1 else plan n ]
Slippery vacuum world Sometimes movement fails There is no guaranteed contingent solution! There is a cyclic solution: [Suck, L1 : Right, if State = 5 then L1 else Suck] Here L1 is a label Modify AND-OR-GRAPH-SEARCH to add a label when it finds a repeated state, try adding a branch to the label instead of failure A cyclic plan is a cyclic solution if Every leaf is a goal state From every point in the plan there is a path to a leaf
What does nondeterminism really mean? Example: your hotel key card doesn t open the door of your room Explanation 1: you didn t put it in quite right This is nondeterminism; keep trying Explanation 2: something wrong with the key This is partial observability; get a new key Explanation 3: it isn t your room This is embarrassing; get a new brain A nondeterministic model is appropriate when outcomes don t depend deterministically on some hidden state
Partial observability
Extreme partial observability: Sensorless worlds Vacuum world with known geometry, dirt, but no sensors at all! Belief state: set of all environment states the agent could be in More generally, what the agent knows given all percepts to date Right Suck Suck Left
Sensorless problem formulation Underlying physical problem has Actions P, Result P, Cost P. Initial state: a belief state b (set of physical states s) N physical states => 2 N belief states Goal test: every element s in b satisfies Goal-Test P (s) Actions: union of Actions P (s) for each s in b This is OK if doing an illegal action has no effect Transition model: Deterministic: Result(b,a) = union of Result P (s,a) for each s in b Nondeterministic: Result(b,a) = union of Results P (s,a) for each s in b Step-Cost(b,a,b ) = Step-Cost P (s,a,s ) for any s in b Goal-Test P, and Step-
Search in sensorless belief state space Everything works exactly as before! Solutions are still action sequences! Some opportunities for improvement: If any s in b is unsolvable, b is unsolvable If b is superset of b and b is in tree, discard b If b is a superset of b and b has a solution, b has same solution
What use are sensorless problems? They correspond to many real-world robotic manipulation problems A part orientation conveyor consists of a sequence of slanted guides that orient the part correctly no matter what its initial orientation It s a lot cheaper and more reliable than using a camera and robot arm!
Partial observability: formulation Partially observable problem formulation has to say what the agent can observe: Deterministic: Percept(s) is the percept received in physical state s Nondeterministic: Percepts(s) is the set of possible percepts received in s Fully observable: Percept(s) = s Sensorless: Percept(s)=null Local sensing vacuum world Percept(s1) = [A,Dirty] Percept(s3) = [A,Dirty]
Partial observability: belief state transition model b = Predict(b,a) updates the belief state just for the action Identical to transition model for sensorless problems Possible-Percepts(b ) is the set of percepts that could come next Union of Percept(s) for every s in b Update(b,p) is the new belief state if percept p is received Just the states s in b for which p = Percept(s) Results(b,a) contains Update(Predict(b,a),p) for each p in Possible-Percepts(Predict(b,a))
Example: Results(b 0,Right) b 0 b =Predict(b 0,Right) Update(b,[B,Dirty]) Update(b,[B,Clean]) Possible-Percepts(b )
Percept is p given by the environment Repeat after me: b <- Update(Predict(b,a),p) Maintaining belief state in an agent This is the predict-update cycle Also known as monitoring, filtering, state estimation Localization and mapping are two special cases
Summary Nondeterminism requires contingent plans AND-OR search finds them Sensorless problems require ordinary plans Search in belief state space to find them General partial observability induces nondeterminism for percepts AND-OR search in belief state space Predict-Update cycle for belief state transitions
Next Time: Adversarial Search!