Be#er Generaliza,on with Forecasts. Tom Schaul Mark Ring

Similar documents
CS 6140: Machine Learning Spring What We Learned Last Week 2/26/16

CSCI 360 Introduc/on to Ar/ficial Intelligence Week 2: Problem Solving and Op/miza/on. Professor Wei-Min Shen Week 8.1 and 8.2

CSCI 360 Introduc/on to Ar/ficial Intelligence Week 2: Problem Solving and Op/miza/on

Bias/variance tradeoff, Model assessment and selec+on

Bellman s Curse of Dimensionality

CSE 473: Ar+ficial Intelligence

CSE 473: Ar+ficial Intelligence. Probability Recap. Markov Models - II. Condi+onal probability. Product rule. Chain rule.

Last Lecture Recap UVA CS / Introduc8on to Machine Learning and Data Mining. Lecture 3: Linear Regression

STAD68: Machine Learning

Outline. Logic. Knowledge bases. Wumpus world characteriza/on. Wumpus World PEAS descrip/on. A simple knowledge- based agent

CS 6140: Machine Learning Spring What We Learned Last Week. Survey 2/26/16. VS. Model

CS 6140: Machine Learning Spring 2016

CSE 473: Ar+ficial Intelligence. Example. Par+cle Filters for HMMs. An HMM is defined by: Ini+al distribu+on: Transi+ons: Emissions:

Sta$s$cal sequence recogni$on

Predicate abstrac,on and interpola,on. Many pictures and examples are borrowed from The So'ware Model Checker BLAST presenta,on.

Introduc)on to Ar)ficial Intelligence

Least Mean Squares Regression. Machine Learning Fall 2017

PSPACE, NPSPACE, L, NL, Savitch's Theorem. More new problems that are representa=ve of space bounded complexity classes

Deriva'on of The Kalman Filter. Fred DePiero CalPoly State University EE 525 Stochas'c Processes

Machine Learning & Data Mining CS/CNS/EE 155. Lecture 8: Hidden Markov Models

Machine Learning & Data Mining CS/CNS/EE 155. Lecture 11: Hidden Markov Models

CSE 473: Ar+ficial Intelligence. Hidden Markov Models. Bayes Nets. Two random variable at each +me step Hidden state, X i Observa+on, E i

Mixture Models. Michael Kuhn

Lecture 6 Isospin. What is Isospin? Rota4ons in Isospin space Reac4on rates Quarks and Isospin Gell- Mann- Nishijima formula FK

Slides modified from: PATTERN RECOGNITION AND MACHINE LEARNING CHRISTOPHER M. BISHOP

Ensemble of Climate Models

Interes'ng- Phrase Mining for Ad- Hoc Text Analy'cs

The BCS Model. Sara Changizi. This presenta5on closely follows parts of chapter 6 in Ring & Schuck The nuclear many body problem.

STA 4273H: Sta-s-cal Machine Learning

COMP 562: Introduction to Machine Learning

Computer Vision. Pa0ern Recogni4on Concepts Part I. Luis F. Teixeira MAP- i 2012/13

ECE 8440 Unit 17. Parks- McClellan Algorithm

Extreme value sta-s-cs of smooth Gaussian random fields in cosmology

Par$cle Filters Part I: Theory. Peter Jan van Leeuwen Data- Assimila$on Research Centre DARC University of Reading

Numerical Methods in Biomedical Engineering

CSE446: Linear Regression Regulariza5on Bias / Variance Tradeoff Winter 2015

Linear Regression with mul2ple variables. Mul2ple features. Machine Learning

Lecture 6 Isospin. What is Isospin? Rota4ons in Isospin space Reac4on rates Quarks and Isospin Heavier quarks FK

Logis&c Regression. Robot Image Credit: Viktoriya Sukhanova 123RF.com

Outline. What is Machine Learning? Why Machine Learning? 9/29/08. Machine Learning Approaches to Biological Research: Bioimage Informa>cs and Beyond

Exponen'al func'ons and exponen'al growth. UBC Math 102

Introduc)on to Ar)ficial Intelligence

Temporal Models.

Machine Learning and Data Mining. Linear regression. Prof. Alexander Ihler

The ultraviolet structure of N=8 supergravity at higher loops

Experimental Designs for Planning Efficient Accelerated Life Tests

Least Square Es?ma?on, Filtering, and Predic?on: ECE 5/639 Sta?s?cal Signal Processing II: Linear Es?ma?on

Parallel Repetition of entangled games on the uniform distribution

Predic've Analy'cs for Energy Systems State Es'ma'on

Founda'ons of Large- Scale Mul'media Informa'on Management and Retrieval. Lecture #3 Machine Learning. Edward Chang

Unsupervised Learning: K- Means & PCA

CS 4700: Foundations of Artificial Intelligence Ungraded Homework Solutions

BITS F464: MACHINE LEARNING

Other types of errors due to using a finite no. of bits: Round- off error due to rounding of products

CSE 21 Math for Algorithms and Systems Analysis. Lecture 10 Condi<onal Probability

Reduced Models for Process Simula2on and Op2miza2on

Fundamentals of Data Assimila1on

Parameter Es*ma*on: Cracking Incomplete Data

State Space Representa,ons and Search Algorithms

Graph structure learning for network inference

Streaming - 2. Bloom Filters, Distinct Item counting, Computing moments. credits:

Numerical Methods in Physics

LBT for Procedural and Reac1ve Systems Part 2: Reac1ve Systems Basic Theory

Lecture 3: Minimizing Large Sums. Peter Richtárik

Basics of Linear Temporal Proper2es

Seman&cs with Dense Vectors. Dorota Glowacka

Core Theme: Evolu(on accts for unity/diversity of life

Finding Objects through Stochas4c Shortest Path Problems

Sta$s$cal Significance Tes$ng In Theory and In Prac$ce

CSCI 360 Introduc/on to Ar/ficial Intelligence Week 2: Problem Solving and Op/miza/on. Instructor: Wei-Min Shen

Classification. Chris Amato Northeastern University. Some images and slides are used from: Rob Platt, CS188 UC Berkeley, AIMA

Introduction to Particle Filters for Data Assimilation

Computer Vision. Pa0ern Recogni4on Concepts. Luis F. Teixeira MAP- i 2014/15

Data Prepara)on. Dino Pedreschi. Anna Monreale. Università di Pisa

Exact data mining from in- exact data Nick Freris

Evalua&ng Snow- Albedo Feedback in Climate Models

Sequential Feature Explanations for Anomaly Detection

Algorithms for NLP. Classifica(on III. Taylor Berg- Kirkpatrick CMU Slides: Dan Klein UC Berkeley

Molecular Programming Models. Based on notes by Dave Doty

Associa'on of U.S. tornado counts with the large- scale environment on monthly 'me- scales

Tsybakov noise adap/ve margin- based ac/ve learning

Cse537 Ar*fficial Intelligence Short Review 1 for Midterm 2. Professor Anita Wasilewska Computer Science Department Stony Brook University

Algorithms for NLP. Classifica(on III. Taylor Berg- Kirkpatrick CMU Slides: Dan Klein UC Berkeley

Unit 2. Projec.ons and Subspaces

An Introduc+on to Sta+s+cs and Machine Learning for Quan+ta+ve Biology. Anirvan Sengupta Dept. of Physics and Astronomy Rutgers University

Con$nuous Op$miza$on: The "Right" Language for Fast Graph Algorithms? Aleksander Mądry

Introduc)on to Ar)ficial Intelligence

Proofs Sec*ons 1.6, 1.7 and 1.8 of Rosen

A Switching Lemma Tutorial. Benjamin Rossman University of Toronto and NII, Tokyo

The Working Set Model for Program Behavior. Peter J. Denning Massachuse=s Ins?tute of Technology, Cambridge, Massachuse=s

Class Notes. Examining Repeated Measures Data on Individuals

2. Discrete Random Variables Part II: Expecta:on. ECE 302 Spring 2012 Purdue University, School of ECE Prof. Ilya Pollak

Minimum Edit Distance. Defini'on of Minimum Edit Distance

Social and Economic Networks Matthew O. Jackson

Lecture 2: Foundations of Concept Learning

WRF- Hydro Development and Performance Tes9ng

Op#mal Control of Nonlinear Systems with Temporal Logic Specifica#ons

Statistical Learning Learning From Examples

CS 6604: Data Mining Large Networks and Time-series. B. Aditya Prakash Lecture #8: Epidemics: Thresholds

Least Squares Parameter Es.ma.on

Transcription:

Be#er Generaliza,on with Forecasts Tom Schaul Mark Ring

Which Representa,ons? Type: Feature- based representa,ons (state = feature vector) Quality 1: Usefulness for linear policies Quality 2: Generaliza,on Be#er Generaliza,on with Forecasts Tom Schaul 29/04/2013 2 /20

ü Mo,va,on Representa,on Outline Predic,ve State Representa,ons General Value Func,ons, aka Forecasts Simplified subclass of Forecasts Evalua,ng Generaliza,on Results 3 /20

Predic,ve State Representa,ons Ques,on/test: Will I hit the wall if I take a step le\ and then a step back? Expected answer = feature Defined as a set of testable predic,ons Observable quan,,es (wall sensor) Condi,onal on ac,on sequence (step le\, step back) Open- loop (ends at t+2) 4 /20

General Value Func,ons = A\er how many steps will I encounter a door if I head to the the wall in front of me, and follow it clockwise? General Value Func,ons More general ques,ons Closed- loop: arbitrary length sequences We call them Forecasts 5 /20

Forecast Components Condi,onal on an op,on: following a policy (straight+clockwise) un,l termina,on (door) I: states of interest (s): termina,on probability : policy Target value (expecta,on): any func,on of the state, cumula,ve c(s): accumulated value before termina,on z(s): final value upon termina,on in s 6 /20

Simplified Forecasts Constant components: c(s) = 0 I = all states Defined by only a target set of states T z(s): 1 if s in T, 0 elsewhere (s): 1 if s in T, 1- elsewhere : implicitly defined: maximizes the expected z à Only one free parameter: T à Output: feature vector (s) 7 /20

Outline ü Mo,va,on ü Representa,ons Evalua,ng Generaliza,on Ideal vs. es,mated forecast values Canonical forecast ordering Quality measures Results 8 /20

Dis,nguish: Forecast Values Forecast defini,on ( ques,on ) From target set T Ideal forecast value (true answer ) Forecast value es,mate (approxima,on) can be learned Focus: quality of representa,on Use ideal forecast values as features We can ignore the learning issues (i.e., we can cheat!) Namely: policy itera,on with transi,on model 9 /20

Forecast Genera,on (1) Canonical (breadth- first) exhaus,ve genera,on First layer based on observa,ons Forecasts can build upon other forecasts Unique ordering (lexicographic,e- breaking) 10 /20

Forecast Genera,on (2) 1. Ini,ally: observa,ons define target sets T 2. Compute ideal forecast values from T Cheat 1: transi,on model (infinite experience) Cheat 2: knowledge of state 3. Threshold for new candidate T sets 4. (Ignore redundant sets) 5. Go to step 2, un,l no T is le\ 11 /20

Evalua,on of Feature Sets Quality measures: 1. Op,mal policy using linear func,on approxima,on (LFA) on features (LSTD+PI) 2. Distance between es,mated external value V (using LFA on features) and true V* (MSE) Generaliza,on: consider random subset of states (e.g. 50%) train the LFA based on this limited experience see how that V generalizes to the remaining states 12 /20

ü Mo,va,on ü Representa,ons Outline ü Evalua,ng Generaliza,on Results Two mazes Simplis,c agent 1 binary observa,on (wall sensor) 1 binary ac,on (forward/rotate le\) Comparison to PSRs as baseline 13 /20

14 /20 4- Arm- Maze: PSRs

15 /20 4- Arm- Maze: Forecasts

16 /20 7- Rooms- Maze: PSRs

17 /20 7- Room- Maze: Forecasts

18 /20 PSR Features

19 /20 Forecast Features

Summary Forecasts as representa,ons: Temporal abstrac,on (closed- loop) Simple structure Useful features for linear evalua,on and control Good generaliza,on with linear FA Future work 1. Removing the cheats: using learned es,mates instead of ideal forecast values 2. Smarter forecast genera,on than exhaus,ve 20 /20