Discrete Mathematics for CS Spring 2005 Clancy/Wagner Notes 25. Minesweeper. Optimal play in Minesweeper

Similar documents
CS 5522: Artificial Intelligence II

Probabilistic Reasoning. (Mostly using Bayesian Networks)

Discrete Mathematics and Probability Theory Spring 2016 Rao and Walrand Note 14

PROBABILISTIC REASONING SYSTEMS

Conditional probabilities and graphical models

At the start of the term, we saw the following formula for computing the sum of the first n integers:

Announcements. CS 188: Artificial Intelligence Fall Causality? Example: Traffic. Topology Limits Distributions. Example: Reverse Traffic

the time it takes until a radioactive substance undergoes a decay

CS1800: Mathematical Induction. Professor Kevin Gold

Artificial Intelligence Bayes Nets: Independence

Discrete Mathematics and Probability Theory Spring 2014 Anant Sahai Note 10

Bayes Nets: Independence

1. When applied to an affected person, the test comes up positive in 90% of cases, and negative in 10% (these are called false negatives ).

Final. Introduction to Artificial Intelligence. CS 188 Spring You have approximately 2 hours and 50 minutes.

CS 188: Artificial Intelligence Spring Announcements

Cartesian-product sample spaces and independence

Bayes Nets III: Inference

CS 125 Section #12 (More) Probability and Randomized Algorithms 11/24/14. For random numbers X which only take on nonnegative integer values, E(X) =

Bayesian Learning. Artificial Intelligence Programming. 15-0: Learning vs. Deduction

P (E) = P (A 1 )P (A 2 )... P (A n ).

CS 188: Artificial Intelligence. Bayes Nets

Evolutionary Models. Evolutionary Models

Probabilistic Models

CS 5522: Artificial Intelligence II

The Minesweeper game: Percolation and Complexity

Probabilistic Models. Models describe how (a portion of) the world works

CS1800: Sequences & Sums. Professor Kevin Gold

Introduction to Algorithms / Algorithms I Lecturer: Michael Dinitz Topic: Intro to Learning Theory Date: 12/8/16

CS 188: Artificial Intelligence Spring Announcements

Overview. CS389L: Automated Logical Reasoning. Lecture 7: Validity Proofs and Properties of FOL. Motivation for semantic argument method

CSL302/612 Artificial Intelligence End-Semester Exam 120 Minutes

Where are we in CS 440?

CS 486: Applied Logic Lecture 7, February 11, Compactness. 7.1 Compactness why?

Using first-order logic, formalize the following knowledge:

CS 343: Artificial Intelligence

Chapter 2. Mathematical Reasoning. 2.1 Mathematical Models

Probabilistic Reasoning

CS 343: Artificial Intelligence

CS4705. Probability Review and Naïve Bayes. Slides from Dragomir Radev

Bayes Networks. CS540 Bryan R Gibson University of Wisconsin-Madison. Slides adapted from those used by Prof. Jerry Zhu, CS540-1

CS 188: Artificial Intelligence Fall 2008

Quantifying uncertainty & Bayesian networks

Uncertainty and Bayesian Networks

Where are we in CS 440?

Definition: A "system" of equations is a set or collection of equations that you deal with all together at once.

The exam is closed book, closed calculator, and closed notes except your one-page crib sheet.

CS 361: Probability & Statistics

Conditional Probability, Independence and Bayes Theorem Class 3, Jeremy Orloff and Jonathan Bloom

The Naïve Bayes Classifier. Machine Learning Fall 2017

Lecture 4 October 18th

6.867 Machine learning, lecture 23 (Jaakkola)

Lecture 12 : Recurrences DRAFT

Discrete Mathematics for CS Spring 2006 Vazirani Lecture 22

Guest lecturer: Mark Reynolds, The University of Western Australia. May 7, 2014

Announcements. CS 188: Artificial Intelligence Spring Probability recap. Outline. Bayes Nets: Big Picture. Graphical Model Notation

Logic: The Big Picture

Impagliazzo s Hardcore Lemma

2 P vs. NP and Diagonalization

CSC 411 Lecture 3: Decision Trees

Modeling and reasoning with uncertainty

Notes on Paths, Trees and Lagrange Inversion

CS 5114: Theory of Algorithms

1 Propositional Logic

UC Berkeley CS 170: Efficient Algorithms and Intractable Problems Handout 22 Lecturer: David Wagner April 24, Notes 22 for CS 170

CMPSCI 240: Reasoning about Uncertainty

Notes for Lecture Notes 2

MATH 521, WEEK 2: Rational and Real Numbers, Ordered Sets, Countable Sets

Informatics 2D Reasoning and Agents Semester 2,

Bayes Nets: Sampling

The Inductive Proof Template

Probability theory basics

Self-assessment due: Monday 3/18/2019 at 11:59pm (submit via Gradescope)

1 Terminology and setup

17.1 Correctness of First-Order Tableaux

Discrete Structures for Computer Science

Apropos of an errata in ÜB 10 exercise 3

Natural deduction for truth-functional logic

Lecture 23: More PSPACE-Complete, Randomized Complexity

Undirected Graphical Models

Lecture 9 Classification of States

First order logic on Galton-Watson trees

CHAPTER 3: THE INTEGERS Z

Pengju XJTU 2016

Bayesian Updating: Discrete Priors: Spring

35 Chapter CHAPTER 4: Mathematical Proof

Introduction to Spring 2009 Artificial Intelligence Midterm Exam

Bayesian Networks BY: MOHAMAD ALSABBAGH

Lecture 10 Algorithmic version of the local lemma

Discrete Structures Proofwriting Checklist

CS 301. Lecture 18 Decidable languages. Stephen Checkoway. April 2, 2018

CIS 2033 Lecture 5, Fall

CS 188: Artificial Intelligence Fall 2009

Bayesian Updating: Discrete Priors: Spring

CPSC 540: Machine Learning

Discrete Mathematics and Probability Theory Fall 2013 Vazirani Note 1

CS 6820 Fall 2014 Lectures, October 3-20, 2014

Uncertainty. Logic and Uncertainty. Russell & Norvig. Readings: Chapter 13. One problem with logical-agent approaches: C:145 Artificial

1 Circuit Complexity. CS 6743 Lecture 15 1 Fall Definitions

Bayesian networks. Soleymani. CE417: Introduction to Artificial Intelligence Sharif University of Technology Spring 2018

Q520: Answers to the Homework on Hopfield Networks. 1. For each of the following, answer true or false with an explanation:

Transcription:

CS 70 Discrete Mathematics for CS Spring 2005 Clancy/Wagner Notes 25 Minesweeper Our final application of probability is to Minesweeper. We begin by discussing how to play the game optimally; this is probably infeasible, but a good approximation is to probe the safest squares. This motivates the computation of the probability that each square contains a mine, which is a nice application of what we have learned about both logic and probability. Optimal play in Minesweeper We have seen many cases in Minesweeper where a purely logical analysis is insufficient, because there are situations in which no move is guaranteed safe. Therefore, no Minesweeper program can win 100% of the time. We have measured performance by the proportion of wins as a function of the initial density of mines. Can we find the algorithm that plays better than all others, i.e., has a higher probability of winning? The first step in the argument is to move from algorithms, of which there are infinitely many, to strategies, of which there are only finitely many. Basically, a strategy says what to do at every point in the game. Notice that a strategy is specific to a particular number of mines M and total number of squares N (as well as the shape of the board: Definition 25.1 (Strategy: A strategy for a Minesweeper game is a tree; each node is labelled with a square to be probed and each branch is labelled with the number of mines discovered to be adjacent to that node. Every node has 9 children with branches labelled 0,...,8; no node label repeats the label of an ancestor; and the tree is complete with N M levels. Figure 1 shows an example. Even for fixed N and M, there are a lot of possible strategies: N M 1 i0 (N i 9i (For N 6, M 3, this is 68507889249886074290797726533575766546371837952000000000. Still, the number is finite. It is easy to see that every (terminating algorithm for Minesweeper (given fixed N, M, and (1,1 0 5 6 7 8 (1,2 (2,2 (3,3 (3,3 (3,3 (10,1 (10,1 (10,1 (10,1 Figure 1: First two levels of a Minesweeper strategy. CS 70, Spring 2005, Notes 25 1

OPTIMAL board shape corresponds to exactly one strategy. Furthermore, with every strategy there is an associated probability of winning. Indeed, it is possible to calculate this exactly, but it is probably easier to measure it using repeated trials. Definition 25.2 (Optimality: A strategy for Minesweeper is optimal if its probability of winning is at least as high as that of all other strategies. Clearly, there is an optimal strategy for any given N, M, and board shape. Moreover, we can construct an optimal method of play in general (for arbitrary N, M, and board shape: enumerate every strategy for that configuration and play the optimal one. The performance profile of this algorithm will dominate that of any other algorithm. (Note that the existence of an optimal algorithm rests on the finite number of possible strategies. There are infinitely many algorithms, and it is easy to imagine problems for which any given algorithm can always be improved (e.g., by spending a little bit more computation time so that no optimal algorithm exists. Unfortunately, it is probably the case that no practical optimal algorithm exists. Instead, we will try a simpler approach: pick a safe square when one is known, otherwise pick the safest square as judged by the probability that it contains a mine. The probability space The first step is to identify the set of random variables we need: As in the propositional logic case, we want one Boolean variable X i j which is true iff square (i, j actually contains a mine. Using N to refer to the total number of squares, we can also label these variables as X 1,...,X N, which will come in handy. We ll also have variables D i j corresponding to the display contents for those k squares that have been probed or marked as mines. We can also label these variables D 1,...,D k for simplicity. The domain for each of these variables is m,0,1,2,...,8. We ll call these variables the Numbers, and the corresponding X i j variables will be called Known. (Note that Numbers includes marked mines; we assume that only logically guaranteed mines are marked. Finally, we ll use and M to refer to the total number of mines. (Both N and M are constants. As an example, consider the following display: MINES LEFT: 3 3 2 m 1 2 2 Here N is 12, M is 4, (1,1, (3,1, and (1,2 are Known. Now we must write down our probability space, i.e., the joint distribution over all the variables. It turns out to be easiest to do this in two parts, using the chain rule: P(X 1,...,X N,D 1,...,D k P(D 1,...,D k X 1,...,X N P(X 1,...,X N Why this way? Because (1 the prior distribution of mines, P(X 1,...,X N, is easy to compute because mines are scattered uniformly at random, and (2 the display variables are determined by the underlying mines, so the conditional distribution P(D 1,...,D k X 1,...,X N is also relatively easy to describe. CS 70, Spring 2005, Notes 25 2

First, the prior P(X 1,...,X N. Remember that the first square selected is always safe. Without loss of generality, let us call this square X 1 ; we know that X 1 f alse. For the remaining N 1 squares, M mines are scattered at random. There are ( M ways to do this and each is equally likely, so we have { ( 1/ P(x 2,...,x N M if #(x 2,...,x N M 0 otherwise where #(x 2,...,x N denotes the number of trues in the sample point x 2,...,x N. To check, we can calculate P(X i, the prior initial probability of a square X i (i 1 containing a mine: this is given by ( N 2 / M 1 ( N 1 M M/(N 1 which is in accord with our expectations. (From the assumption of uniform scattering and the fact that P(X i for all i 1, one is tempted to write the joint distribution as P(X 2,...,X N P(X 2 P(X 3...P(X n WRONG But independence does not hold, because the total number of mines is fixed! For example, if the first M squares all get mines, then the next square has probability 0 of getting a mine. Turning to the display variables, we know they are determined precisely according to the rules of Minesweeper, given the actual contents of all the squares. I.e., for all combinations of values d 1,...,d k,x 1,...,x N, { 1 if d1,...,d P(d 1,...,d k x 1,...,x N k correctly displays x 1,...,x N 0 otherwise Finding safer squares Now we need to compute, for each square (i, j, the quantity P(X i j known,numbers. We develop a simple and computable expression for this probability in a series of steps. First, we need to massage the expression into a form containing the terms that we know about the prior over all the X-variables, and the conditional probability of the display variables given those variables. The expression P(X i j known,numbers is missing the variables other than X i j ; call these Unknown. For example, in the 4 3 display above, X i j might be X 2,1 and there are 8 squares in Unknown. We introduce them by summing over them (the standard summation over constituent sample points for an event: P(X i j e P(X i j known,numbers P(X i j, known,numbers For example, with 8 squares, this is a sum of 2 8 256 terms. We d like to have an expression with numbers conditioned on X-variables, so we apply Bayes rule: P(X i j e α P(numbers known,x i j,p(x i j, known α P(numbers known,x i j,p(x i j,,known/p(known α P(numbers known,x i j,p(x i j,,known CS 70, Spring 2005, Notes 25 3

So far, so good; the variables known,x i j, constitute all the X-variables, so we have terms here that we already defined in our probability space given earlier. The only problem is that we have too many of them! The number of variables could be as large as N, so the summation is over O(2 N cases. The solution to this problem is to identify a subset of these variables that affect the display and to simplify the expressions using conditional independence so that the summation covers only this subset. Let Fringe denote those variables (not including X i j that are adjacent to numbered squares. For example, if X i j refers to the square (2,1, then Fringe contains (2,2, (3,2, (4,2, (4,1. The idea is that the Numbers are completely determined just by Known, Fringe, and X i j (if it is adjacent to a number. Given these, the Numbers are conditionally independent of the remaining variables, which we call Background. In our example, Background consists of the top row: (1,3, (2,3, (3,3, (4,3. There are actually two slightly different cases to deal with, depending on whether X i j is adjacent to a number. Case 1: X i j is adjacent to a number. P(X i j e α P(numbers known,,x i j P(known,,X i j replacing by f ringe, background α P(numbers known, f ringe,x i j P(known,,X i j by conditional independence α f ringe P(numbers known, f ringe,x i j background because the first term doesn t depend on background P(known,,X i j Now, in this last expression, the term P(numbers known, f ringe,x i j is 0 unless, to use our logical terminology, the assignment denoted by f ringe,x i j is a model of the CNF expression implied by the evidence. If it is a model, then the probability is 1. So the summation over f ringe reduces to a simpler summation over the models of the evidence: P(X i j e α { f ringe: f ringe,x i j models(e} background P(known,,X i j The sum over the background variables, which may still have a huge number of cases, can be simplified because the prior probability term P(known,,X i j is 1/ ( M or 0, depending on whether #(known,,x i j M. Therefore, we just have to count the number of cases where the background has the right number of mines. This is given by ( B M L, where B is the size of the background and L is #(known, f ringe,x i j, i.e., the number of mines not in the background. Finally, we obtain P(X i j e α { f ringe: f ringe,x i j models(e} α { f ringe: f ringe,x i j models(e} ( ( B N 1 / M L M ( B M L which is simple to compute provided we can enumerate the models for the fringe variables. This costs O(2 Fringe, roughly the same as the logical algorithm. Notice that the term 1/ ( M disappears into the normalizing constant because it does not depend on f ringe. (1 CS 70, Spring 2005, Notes 25 4

Now we apply this formula to compute P(X 2,1 e in our example. First, let s enumerate the models. When X 2,1 true, these are the fringe models: 2 m m 1 2 m 2 2 m m 1 2 m 2 When X 2,1 f alse, these are the fringe models: 2 m m m 1 2 2 2 m m m 1 2 2 For each of these models, ( B M L ( 4 1 4, so we have P(X 2,1 e α 3 4, 3 4 1/2,1/2 2 m 1 2 m 2 m 2 m m 1 2 2 m (This is in accord with the intuitive argument that says there is exactly one mine in (2,1 or (2,2, and it is equally likely to be in either. We can show similarly that P(X 3,2 e 1/3,2/3 (i.e., 1/3 probability of a mine. Case 2: X i j is not adjacent to a number. The derivation is very similar, but X i j acts like a background variable rather than a fringe variable: P(X i j e α P(numbers known,,x i j P(known,,X i j α P(numbers known, f ringep(known,,x i j α f ringe P(numbers known, f ringe background P(known,,X i j Now the fringe variables constitute the entire model: P(X i j e α f ringe models(e background P(known,,X i j and the final expression is very similar. P(X i j e α f ringe models(e ( B M L (2 It is clear that this expression is the same for all X i j that are not adjacent to a number, so we need only do this once to get the probability of safety for background squares. (Each fringe square must be evaluated separately. 1 1 Note: the meaning of Fringe and Background differs in Case 1 and Case 2. In the first, Fringe includes all number-adjacent variables except X i j, while Background includes all other variables. In the second, Fringe includes all number-adjacent variables, while Background includes all other variables except X i j. This makes for somewhat simpler mathematical expressions. The code in minep.scm uses the opposite convention: the fringe is all number-adjacent variables, and the background is all other variables. CS 70, Spring 2005, Notes 25 5

Total mines: 6 To find: 6 - - - - 0.298 0.298 0.298 0.737-2 - - 0.298 --2-- 0.298 0.737 - - - - 0.298 0.105 0.105 0.895 - - 2 1 0.737 0.895 --2-- --1-- Figure 2: Example of a situation in which probabilistic inference distinguishes between very safe squares (2,2 and (3,2 and very unsafe squares (2,1 and (4.2. Essentially, sharing a mine between the two 2 numbers would mean forcing 3 mines into the three background squares (1,1, (4,3, and (4,4, which can happen in only one way and hence is unlikely. Let us calculate, say, P(X 1,4 e in our example. We have B3 and M 4. When X 1,4 true, L4, so ( ( B M L 3 0 1. When X 1,4 f alse, L3, so ( ( B M L 3 1 3. There are 6 fringe models, so P(X 1,4 e α 6 1, 6 3 1/4,3/4 That is, each background square has a 1/4 probability of being a mine. So the safest move is in a background square. We can check the theorem we proved in the linearity-of-expectation homework: that the sum of probabilities equals the number of mines left. The sum of probabilities is 1 2 + 1 2 + 1 3 + 1 3 + 1 3 + 1 4 + 1 4 + 1 4 + 1 4 3 We can also check the work by calculating the probability of a mine after the first move (before finding out the number in the square. In that case, all the remaining squares besides X i j and the initial square are background squares (BN 2, and L1 when X i j true and 0 when X i j f alse. There are no variables in the fringe, so there is exactly one (empty model. Hence ( ( N 2 N 2 P(X i j e α, M 1 M ( N 2 M 1 ( N 2 M 1 ( N 2 M 1 ( M ( + N 2 M ( n + 1 by Pascal s identity, k (N 2! (M 1!(N M 1! which agrees with our earlier calculation. M!(N M 1! (N 1! ( n + k 1 M N 1 Figure 2 shows a case where probability really helps in deciding what to do. Figure 3 shows the performance of all four methods for minesweeper. Each improvement in the algorithm gives an improvement in play across all mine densities, but improvement is most pronounced for the more difficult cases. ( n k CS 70, Spring 2005, Notes 25 6

1 0.8 Brain-dead Mark II Mark III Probabilistic 0.6 % wins 0.4 0.2 0 0 2 4 6 8 10 12 14 16 Number of mines Figure 3: Performance of the brain-dead, Mark II, Mark III, and probailistic algorithms on a 4 4 board, averaged over 100 trials. CS 70, Spring 2005, Notes 25 7