Playing Abstract games with Hidden States (Spatial and Non-Spatial).

Similar documents
Use of Wijsman's Theorem for the Ratio of Maximal Invariant Densities in Signal Detection Applications

Report Documentation Page

Attribution Concepts for Sub-meter Resolution Ground Physics Models

P. Kestener and A. Arneodo. Laboratoire de Physique Ecole Normale Supérieure de Lyon 46, allée d Italie Lyon cedex 07, FRANCE

Report Documentation Page

REGENERATION OF SPENT ADSORBENTS USING ADVANCED OXIDATION (PREPRINT)

A report (dated September 20, 2011) on. scientific research carried out under Grant: FA

Crowd Behavior Modeling in COMBAT XXI

Broadband matched-field source localization in the East China Sea*

Diagonal Representation of Certain Matrices

Optimizing Robotic Team Performance with Probabilistic Model Checking

CRS Report for Congress

Predictive Model for Archaeological Resources. Marine Corps Base Quantico, Virginia John Haynes Jesse Bellavance

System Reliability Simulation and Optimization by Component Reliability Allocation

On Applying Point-Interval Logic to Criminal Forensics

Analysis Comparison between CFD and FEA of an Idealized Concept V- Hull Floor Configuration in Two Dimensions. Dr. Bijan Khatib-Shahidi & Rob E.

High-Fidelity Computational Simulation of Nonlinear Fluid- Structure Interaction Problems

Thermo-Kinetic Model of Burning for Polymeric Materials

Molecular Characterization and Proposed Taxonomic Placement of the Biosimulant 'BG'

Quantitation and Ratio Determination of Uranium Isotopes in Water and Soil Using Inductively Coupled Plasma Mass Spectrometry (ICP-MS)

DIRECTIONAL WAVE SPECTRA USING NORMAL SPREADING FUNCTION

7. PERFORMING ORGANIZATION NAME(S) AND ADDRESS(ES) 8. PERFORMING ORGANIZATION REPORT NUMBER

Dynamics of Droplet-Droplet and Droplet-Film Collision. C. K. Law Princeton University

Extension of the BLT Equation to Incorporate Electromagnetic Field Propagation

REPORT DOCUMENTATION PAGE

Real-Time Environmental Information Network and Analysis System (REINAS)

SW06 Shallow Water Acoustics Experiment Data Analysis

Report Documentation Page

Wavelets and Affine Distributions A Time-Frequency Perspective

Estimation of Vertical Distributions of Water Vapor and Aerosols from Spaceborne Observations of Scattered Sunlight

Fleet Maintenance Simulation With Insufficient Data

Closed-form and Numerical Reverberation and Propagation: Inclusion of Convergence Effects

Improvements in Modeling Radiant Emission from the Interaction Between Spacecraft Emanations and the Residual Atmosphere in LEO

Computer Simulation of Sand Ripple Growth and Migration.

Catalytic Oxidation of CW Agents Using H O in 2 2 Ionic Liquids

Sensitivity of West Florida Shelf Simulations to Initial and Boundary Conditions Provided by HYCOM Data-Assimilative Ocean Hindcasts

REPORT DOCUMENTATION PAGE. Theoretical Study on Nano-Catalyst Burn Rate. Yoshiyuki Kawazoe (Tohoku Univ) N/A AOARD UNIT APO AP

Comparative Analysis of Flood Routing Methods

Metrology Experiment for Engineering Students: Platinum Resistance Temperature Detector

PIPS 3.0. Pamela G. Posey NRL Code 7322 Stennis Space Center, MS Phone: Fax:

Effects of Constrictions on Blast-Hazard Areas

Direct Numerical Simulation of Aeolian Tones

A Bound on Mean-Square Estimation Error Accounting for System Model Mismatch

Marginal Sea - Open Ocean Exchange

USMC Enlisted Endstrength Model

INFRARED SPECTRAL MEASUREMENTS OF SHUTTLE ENGINE FIRINGS

STUDY OF DETECTION LIMITS AND QUANTITATION ACCURACY USING 300 MHZ NMR

Volume 6 Water Surface Profiles

A GENERAL PROCEDURE TO SET UP THE DYADIC GREEN S FUNCTION OF MULTILAYER CONFORMAL STRUCTURES AND ITS APPLICATION TO MICROSTRIP ANTENNAS

Design for Lifecycle Cost using Time-Dependent Reliability

Ocean Acoustics Turbulence Study

FRACTAL CONCEPTS AND THE ANALYSIS OF ATMOSPHERIC PROCESSES

Development and Application of Acoustic Metamaterials with Locally Resonant Microstructures

HIGH-POWER SOLID-STATE LASER: LETHALITY TESTING AND MODELING

Coastal Engineering Technical Note

Guide to the Development of a Deterioration Rate Curve Using Condition State Inspection Data

Scattering of Internal Gravity Waves at Finite Topography

THE EULER FUNCTION OF FIBONACCI AND LUCAS NUMBERS AND FACTORIALS

Aviation Weather Routing Tool: A Decision Aid for Manned/Unmanned Aircraft Routing

uniform distribution theory

Understanding Near-Surface and In-cloud Turbulent Fluxes in the Coastal Stratocumulus-topped Boundary Layers

Contract No. N C0123

Super-Parameterization of Boundary Layer Roll Vortices in Tropical Cyclone Models

USER S GUIDE. ESTCP Project ER

Determining the Stratification of Exchange Flows in Sea Straits

Super-Parameterization of Boundary Layer Roll Vortices in Tropical Cyclone Models

Award # N LONG TERM GOALS

High Resolution Surface Characterization from Marine Radar Measurements

SMA Bending. Cellular Shape Memory Structures: Experiments & Modeling N. Triantafyllidis (UM), J. Shaw (UM), D. Grummon (MSU)

EFFECTS OF LOCAL METEOROLOGICAL VARIABILITY ON SURFACE AND SUBSURFACE SEISMIC-ACOUSTIC SIGNALS

Coupled Ocean-Atmosphere Modeling of the Coastal Zone

The Mechanics of Bubble Growth and Rise in Sediments

Experimental and Theoretical Studies of Ice-Albedo Feedback Processes in the Arctic Basin

Satellite Observations of Surface Fronts, Currents and Winds in the Northeast South China Sea

Abyssal Current Steering of Upper Ocean Current Pathways in an Ocean Model with High Vertical Resolution

Analysis of Mixing and Dynamics Associated with the Dissolution of Hurricane-Induced Cold Wakes

Babylonian resistor networks

Estimation of Vertical Distributions of Water Vapor from Spaceborne Observations of Scattered Sunlight

HIGH PERFORMANCE CONTROLLERS BASED ON REAL PARAMETERS TO ACCOUNT FOR PARAMETER VARIATIONS DUE TO IRON SATURATION

Characterization of Caribbean Meso-Scale Eddies

Experimental and Theoretical Studies of Ice-Albedo Feedback Processes in the Arctic Basin

LAGRANGIAN MEASUREMENTS OF EDDY CHARACTERISTICS IN THE CALIFORNIA CURRENT

Report Documentation Page

Predicting Tropical Cyclone Formation and Structure Change

Physics-Aware Informative Coverage Planning for Autonomous Vehicles

Mixture Distributions for Modeling Lead Time Demand in Coordinated Supply Chains. Barry Cobb. Alan Johnson

Using Dye to Study Lateral Mixing in the Ocean: 100 m to 1 km

Coastal Mixing and Optics

An Examination of 3D Environmental Variability on Broadband Acoustic Propagation Near the Mid-Atlantic Bight

Nano and Biological Technology Panel: Quantum Information Science

Study of Electromagnetic Scattering From Material Object Doped Randomely WithThin Metallic Wires Using Finite Element Method

Vortex Rossby Waves and Hurricane Evolution in the Presence of Convection and Potential Vorticity and Hurricane Motion

HYCOM Caspian Sea Modeling. Part I: An Overview of the Model and Coastal Upwelling. Naval Research Laboratory, Stennis Space Center, USA

Astronomical Institute of the Romanian Academy, str. Cutitul de. Argint-5, Bucharest 28, Romania,

An Observational and Modeling Study of Air-Sea Fluxes at Very High Wind Speeds

Range-Dependent Acoustic Propagation in Shallow Water with Elastic Bottom Effects

Shallow Water Fluctuations and Communications

Two-Dimensional Simulation of Truckee River Hydrodynamics

Assimilation of Synthetic-Aperture Radar Data into Navy Wave Prediction Models

Modeling the Impact of Extreme Events on Margin Sedimentation

Transcription:

Playing Abstract games with Hidden States (Spatial and Non-Spatial). Gregory Calbert, Hing-Wah Kwok Peter Smet, Jason Scholz, Michael Webb VE Group, C2D, DSTO.

Report Documentation Page Form Approved OMB No. 0704-0188 Public reporting burden for the collection of information is estimated to average 1 hour per response, including the time for reviewing instructions, searching existing data sources, gathering and maintaining the data needed, and completing and reviewing the collection of information. Send comments regarding this burden estimate or any other aspect of this collection of information, including suggestions for reducing this burden, to Washington Headquarters Services, Directorate for Information Operations and Reports, 1215 Jefferson Davis Highway, Suite 1204, Arlington VA 22202-4302. Respondents should be aware that notwithstanding any other provision of law, no person shall be subject to a penalty for failing to comply with a collection of information if it does not display a currently valid OMB control number. 1. REPORT DATE 01 OCT 2003 2. REPORT TYPE N/A 3. DATES COVERED - 4. TITLE AND SUBTITLE Playing Abstract games with Hidden States (Spatial and Non-Spatial) 5a. CONTRACT NUMBER 5b. GRANT NUMBER 5c. PROGRAM ELEMENT NUMBER 6. AUTHOR(S) 5d. PROJECT NUMBER 5e. TASK NUMBER 5f. WORK UNIT NUMBER 7. PERFORMING ORGANIZATION NAME(S) AND ADDRESS(ES) Defence Science And Technology Organisation 8. PERFORMING ORGANIZATION REPORT NUMBER 9. SPONSORING/MONITORING AGENCY NAME(S) AND ADDRESS(ES) 10. SPONSOR/MONITOR S ACRONYM(S) 12. DISTRIBUTION/AVAILABILITY STATEMENT Approved for public release, distribution unlimited 11. SPONSOR/MONITOR S REPORT NUMBER(S) 13. SUPPLEMENTARY NOTES See also ADM001929. Proceedings, Held in Sydney, Australia on July 8-10, 2003., The original document contains color images. 14. ABSTRACT 15. SUBJECT TERMS 16. SECURITY CLASSIFICATION OF: 17. LIMITATION OF ABSTRACT UU a. REPORT unclassified b. ABSTRACT unclassified c. THIS PAGE unclassified 18. NUMBER OF PAGES 20 19a. NAME OF RESPONSIBLE PERSON Standard Form 298 (Rev. 8-98) Prescribed by ANSI Std Z39-18

Outline Our domain of research. The mathematics of strategically complex game playing Move evaluation, Min/max depth search, Temporal difference learning. Application to our network checkers game. The HARD PROBLEM: hidden spatial states. Information theoretic advisors, Combine with TD(0) Open research questions.

Our domain of research Broadly described as medium resolution wargaming. Maneuver of forces, hidden forces, ambiguously defined end-states. Use machine learning techniques to develop strategies. Also use machine learning to capture and generalize expertise of human players Monte Carlo simulation to develop risk analysis. Currently looking at chess/checkers variants Networked agents, Hidden agents.

The Mathematics of Strategically Complex Gaming Often classical game theory won t work, notably because of the tyranny of dimension. Too many states and strategic paths over time. Solving tabular stochastic dynamic games is impossible. Checkers (a game of complexity in the lower end) Has maneuver, materiel diversity, complex strategy 8 14 50 10 possible strategic paths over the course of a game. Move to the theatre chess game (operational war-game). Gaming can still be viewed as approximating Bellman s stateaction equation ( for the optimal sequence of actions at each state and time s, s 1, L, s ) with various methods. t t + T

Evaluation Q Bellman s equation for optimal policy satisfies π * ( s t, a t ) = E R ( s t, a t ) + max Q ( s t + 1, a t + 1 ) a ' The function Q( s, a) is called the action-value function, specifying the total future reward of taking action a, from state s. Games like chess/checkers no intermediate, only terminal reward. Gaming playing approximation to the action-value function by a function of a linear sum of weighted features. Q The weights are called the advisors to feature of state (force balance). φi (s) Q w i s = f ( w π * iφ i ( s ))

Min-Max depth search. arg max a Essentially you project into the future to a limited horizon. The optimal action is chosen to be { min ( max ( L ( min Q ( s ' ' ' ', a ' ' ' ' )} Best way to explain this is tree search, with the value at the leaves backed up to the root node through mini-max search. Limited by the branching factor of the move (checkers roughly 8, chess, 36, go 80). -4 max -5 min -4 Your moves min Opponents moves -5 max max 9-4 max Your moves -10-5 8 9-4 Opponents moves

Temporal difference learning. Remembering Bellman s equation, at the optimal policy with no intermediate rewards the difference, * E π ( Q( s t 1, a t + 1 ) Q( s + t t t t In temporal difference learning you learn by shifting the past approximation Q ( s t, a t ) towards the future approximation, Q ( s t, a This incrementally minimises the temporal difference as is required by Bellman. Done by setting (if you have state numbers) Qnew ( st, at ) = αqold ( st+, at+ 1) + (1 α) Qold ( st, a The learning rate αmust satisfy some simple stochastic convergence conditions., a ) s, a ) = + 1 t +1 1 t 0 ) ).

Temporal Difference in Game Playing. Too many states to solve, so do TD learning on the advisors w i so we can generalize to new novel states. Here we use gradient descent, incrementing the vector of advisors by r ( Q( s, a ) Q( s, a )) rq( s, a ). Δ w = α t+ 1 t+ 1 t t w t t Changes in advisor weights can be done on-line (as the game is played) or off-line (at the end of each game). Here Q ( s, a ) = Pr(winning game s, a ) t t Equivalently get a terminal reward of 1 if win, 0 of loss. t t

Function Approximation We approximate this probability by Q ( s, a ) Terminal state The advisors usually reflect importance of balance in pieces, mobility,.. Other features of the game. w1 ( N 1 N 2 ) + number of pieces of side. Ni ( ( ) 1 + exp w i φ ( s = 1 / ) Q( s T, 1fora win, ) = 1/2for a draw, 0fora loss. other i terms

An example in an imperfect information game Network checkers Network vulnerability through dynamic games. Pieces connected by a network of varying topology. Only pieces in the largest connected sub-graph exhibit mobility, isolated pieces don t. Each side aware of materiel and the largest sub-graph size. Network details hidden (distribution of degree etc. ).

Example Largest connected sub-graph All in largest connected sub-graph

Results of Learning Advisor Weights 1.4 1.2 Edge poor 1 advisor weights 0.8 0.6 0.4 unkinged (20) kings (20) subgraph (20) unkinged (66) kings (66) subgraph (66) 0.2 0-0.2 1 501 1001 1501 2001 2501 games Edge rich

Hidden Spatial States Begun research onto hidden spatial states. Difficult for the following reasons Vastly increased branching factor of possible states. For example, suppose we have one invisible piece. Positive prob. Of being in j squares New branching factor = Old branching factor * j Have to construct a distribution of opponent s probable states. Pr( opponent state is s) > 0 Pruning of the estimated states risky (opponent could exploit this).

Our approach Start with a small number of invisible pieces. Use theorem of total probability and conditioning on events to develop a Markov chain for the probability of hidden pieces in some state. Generate Pr( board1), LPr(boardn) Really a non-markov problem (opponent s strategies will be history dependent). Know when pieces are taken including hidden piece. If you run into a hidden piece you loose a turn (but gain information on the hidden pieces location). If you try to take a hidden piece and its not there you also loose a turn, but gain information on location.

Estimation example: null move by opponent. Pr Pr new new (hidden (hidden at at (3,3)) (3,5)) = 1 / = 1 / 2 Pr 2 Pr old old (hidden (hidden at at (4,4)) (4,4))

Estimation example: reconnaissance move by self. Pr Pr new old (hidden (hidden at x ) = at x not at (4,2))

Information theoretic advisor Opponent s movement of hidden pieces increases uncertainty of state. Reconnaissance moves decrease uncertainty. Entropy the best way to model this. If opponent s states have probability p 1, p 2, L, p n Then the entropy is H = p i log 2 p i. We incorporate the value of reconnaissance moves through a term in the evaluation function that takes into account the entropy.

Evaluation of Moves - Have to calculate the expectation over the possible board states. Consider moves that are legal for a particular opposition board state (with probability >0) or a reconnaissance move. - Reconnaissance move- find out where the opponent is or isn t. - Intend to use temporal difference learning to find the advisor weights. - Depth of search nearly impossible, since have to carry on the same estimation/evaluation cycle.

Current research questions If you don t see an opponent move Was it a reconnaissance move made or a hidden move? Have to learn this through Bayesian methods. Want to look at entropy balance We`therefore have to estimate our opponents estimate of our state probabilities. Pruning What happens when we prune boards with extremely low probability? Are there fast an frugal heuristics to generate strategies equal or better than the computationally expensive way?

Questions?