Resource Allocation In Trait Introgression A Markov Decision Process Approach

Size: px
Start display at page:

Download "Resource Allocation In Trait Introgression A Markov Decision Process Approach"

Transcription

1 Resource Allocation In Trait Introgression A Markov Decision Process Approach Ye Han Iowa State University yeh@iastateedu Nov 29, 2016 Ye Han (ISU) MDP in Trait Introgression Nov 29, / 27

2 Acknowledgement Collaborators Lizhi Wang William D Beavis John N Cameron Partially funded by Plant Sciences Institute Ye Han (ISU) MDP in Trait Introgression Nov 29, / 27

3 Introgression Goal Ye Han (ISU) MDP in Trait Introgression Nov 29, / 27

4 Outline 1 Trait Introgression as An Engineering Process 2 Dynamic Programming Approach 3 Case Study and Results 4 Conclusions Ye Han (ISU) MDP in Trait Introgression Nov 29, / 27

5 Outline 1 Trait Introgression as An Engineering Process 2 Dynamic Programming Approach 3 Case Study and Results 4 Conclusions Ye Han (ISU) MDP in Trait Introgression Nov 29, / 27

6 Flowchart of Trait Introgression Process Ye Han (ISU) MDP in Trait Introgression Nov 29, / 27

7 Start The Start point: Elite Recipient Donor Ye Han (ISU) MDP in Trait Introgression Nov 29, / 27

8 Success The Success point: Ideal Individual Ye Han (ISU) MDP in Trait Introgression Nov 29, / 27

9 Selection The metric Selection step: Select breeding parents according to a Ye Han (ISU) MDP in Trait Introgression Nov 29, / 27

10 Resource Allocation Design an efficient resources allocation plan to improve the process Ye Han (ISU) MDP in Trait Introgression Nov 29, / 27

11 Resource Allocation Design an efficient resources allocation plan to improve the process Ye Han (ISU) MDP in Trait Introgression Nov 29, / 27

12 Outline 1 Trait Introgression as An Engineering Process 2 Dynamic Programming Approach 3 Case Study and Results 4 Conclusions Ye Han (ISU) MDP in Trait Introgression Nov 29, / 27

13 Dynamic Programming Structures Budget and deadline for the project Cost for producing progeny Revenue for harvesting the target ideal individual Determine the progeny number to produce for each generation Maximize the expected net present value Ye Han (ISU) MDP in Trait Introgression Nov 29, / 27

14 Markov Decision Processes Model Major components: Decision epochs States Actions Transition probabilities Rewards Apply backwards induction to derive results Ye Han (ISU) MDP in Trait Introgression Nov 29, / 27

15 Decision Epochs Finite horizon Beginning of each generation Denoted as {1, 2,, T } Ye Han (ISU) MDP in Trait Introgression Nov 29, / 27

16 States Denoted as S = (m k, b) {failure} {success}, k {1, 2,, T 1}, b {B, B 1,, 1} m k : Genotype status indicator b: Budget indicator Ye Han (ISU) MDP in Trait Introgression Nov 29, / 27

17 Actions Produce an amount of progeny Denoted A = {0, 1, 2,, a max } Ye Han (ISU) MDP in Trait Introgression Nov 29, / 27

18 Transition Probabilities Under action a: Probability of transferring from one state to another S B S B 1 SB a SB a 1 SB a 2 S1 failure success S B 0 0 Wa Ŵ a S B Wa Ŵ a S B Wa 0 0 Ŵ a S M a = a Wa 0 Ŵ a S a Ŵa Ŵ a S a S failure success Ye Han (ISU) MDP in Trait Introgression Nov 29, / 27

19 Rewards Rewards = Revenue Cost Denoted as r t (a, s, T ) = R t (s, T ) C(a) R t (s, T ): revenue at generation t and state s given deadline T, decreasing of t C(a): cost for action a, increasing of a Ye Han (ISU) MDP in Trait Introgression Nov 29, / 27

20 Objective Maximize the expected net present value: max π E π { T t=0 λt [r t (a, s, T )]} Ye Han (ISU) MDP in Trait Introgression Nov 29, / 27

21 Outline 1 Trait Introgression as An Engineering Process 2 Dynamic Programming Approach 3 Case Study and Results 4 Conclusions Ye Han (ISU) MDP in Trait Introgression Nov 29, / 27

22 Case Study PCV metric for parental selection Maximum progeny number for one generation: 1000 progeny Cost for producing one progeny: $10/progeny Deadline: 8 generations R t (s, T ) = (R qt)i(s = success)i(t T ) where R = $2, 000, 000 and q = $100, 000/generation and T = 8 Ye Han (ISU) MDP in Trait Introgression Nov 29, / 27

23 A Random Simulation with Fixed Progeny Amount Generation Population log 10 PCV Ye Han (ISU) MDP in Trait Introgression Nov 29, / 27

24 A Random Simulation with Dynamic Progeny Amount Generation Population log 10 PCV Ye Han (ISU) MDP in Trait Introgression Nov 29, / 27

25 Results Budget, time and probability of success Ye Han (ISU) MDP in Trait Introgression Nov 29, / 27

26 Results Comparison with fixed budget strategies (Total budget: $32,000) Ye Han (ISU) MDP in Trait Introgression Nov 29, / 27

27 Results Budget allocation Ye Han (ISU) MDP in Trait Introgression Nov 29, / 27

28 Results Revenue Ye Han (ISU) MDP in Trait Introgression Nov 29, / 27

29 Results Optimal budget Ye Han (ISU) MDP in Trait Introgression Nov 29, / 27

30 Outline 1 Trait Introgression as An Engineering Process 2 Dynamic Programming Approach 3 Case Study and Results 4 Conclusions Ye Han (ISU) MDP in Trait Introgression Nov 29, / 27

31 Contributions Defined the resources allocation problem for trait introgression process Proposed a Markov decision processes model for better breeding strategy design Improved the breeding strategy in terms of time, budget and probability of success compared with fixed budget strategies Ye Han (ISU) MDP in Trait Introgression Nov 29, / 27

32 Q & A Thank you! Ye Han (ISU) MDP in Trait Introgression Nov 29, / 27

33 Transition Probabilities Transition between each interval under action a (m T for success or failure): W a = m 1 m 2 m 3 m T 2 m T 1 m T m 1 w 1,1 w 1,2 w 1,3 w 1,T 2 w 1,T 1 w 1,T m 2 0 w 2,2 w 2,3 w 2,T 2 w 2,T 1 w 2,T m w 3,3 w 3,T 2 w 3,T 1 w 3,T m T w T 2,T 2 w T 2,T 1 w T 2,T m T w T 1,T 1 w T 1,T m T Ye Han (ISU) MDP in Trait Introgression Nov 29, / 27

34 Transition Probabilities Transition probability matrix under action a: S B S B 1 SB a SB a 1 SB a 2 S1 failure success S B 0 0 Wa Ŵ a S B Wa Ŵ a S B Wa 0 0 Ŵ a S M a = a Wa 0 Ŵ a S a Ŵa Ŵ a S a S failure success Ye Han (ISU) MDP in Trait Introgression Nov 29, / 27

35 Transition Probabilities where: W a = W a(1 : T 1, 1 : T 1) = m 1 m 2 m 3 m T 2 m T 1 m 1 w 1,1 w 1,2 w 1,3 w 1,T 2 w 1,T 1 m 2 0 w 2,2 w 2,3 w 2,T 2 w 2,T 1 m w 3,3 w 3,T 2 w 3,T 1, m T w T 2,T 2 w T 2,T 1 m T w T 1,T 1 m T m 1 w 1,T m 2 w 2,T m 3 w 3,T Ŵ a = W a(1 : T 1, T ) = m T 2 m T 1 w T 2,T w T 1,T and S b = (m 1, b) (m 2, b) (m 3, b) (m T 2, b) (m T 1, b) Ye Han (ISU) MDP in Trait Introgression Nov 29, / 27

A New Metric for Parental Selection in Plant Breeding

A New Metric for Parental Selection in Plant Breeding Graduate Theses and Dissertations Graduate College 2014 A New Metric for Parental Selection in Plant Breeding Ye Han Iowa State University Follow this and additional works at: http://libdriastateedu/etd

More information

Managing segregating populations

Managing segregating populations Managing segregating populations Aim of the module At the end of the module, we should be able to: Apply the general principles of managing segregating populations generated from parental crossing; Describe

More information

TASK 6.3 Modelling and data analysis support

TASK 6.3 Modelling and data analysis support Wheat and barley Legacy for Breeding Improvement TASK 6.3 Modelling and data analysis support FP7 European Project Task 6.3: How can statistical models contribute to pre-breeding? Daniela Bustos-Korts

More information

Allocating Resources, in the Future

Allocating Resources, in the Future Allocating Resources, in the Future Sid Banerjee School of ORIE May 3, 2018 Simons Workshop on Mathematical and Computational Challenges in Real-Time Decision Making online resource allocation: basic model......

More information

STOCHASTIC MODELS FOR RELIABILITY, AVAILABILITY, AND MAINTAINABILITY

STOCHASTIC MODELS FOR RELIABILITY, AVAILABILITY, AND MAINTAINABILITY STOCHASTIC MODELS FOR RELIABILITY, AVAILABILITY, AND MAINTAINABILITY Ph.D. Assistant Professor Industrial and Systems Engineering Auburn University RAM IX Summit November 2 nd 2016 Outline Introduction

More information

Markov decision processes

Markov decision processes CS 2740 Knowledge representation Lecture 24 Markov decision processes Milos Hauskrecht milos@cs.pitt.edu 5329 Sennott Square Administrative announcements Final exam: Monday, December 8, 2008 In-class Only

More information

Christopher Watkins and Peter Dayan. Noga Zaslavsky. The Hebrew University of Jerusalem Advanced Seminar in Deep Learning (67679) November 1, 2015

Christopher Watkins and Peter Dayan. Noga Zaslavsky. The Hebrew University of Jerusalem Advanced Seminar in Deep Learning (67679) November 1, 2015 Q-Learning Christopher Watkins and Peter Dayan Noga Zaslavsky The Hebrew University of Jerusalem Advanced Seminar in Deep Learning (67679) November 1, 2015 Noga Zaslavsky Q-Learning (Watkins & Dayan, 1992)

More information

The phenotype of this worm is wild type. When both genes are mutant: The phenotype of this worm is double mutant Dpy and Unc phenotype.

The phenotype of this worm is wild type. When both genes are mutant: The phenotype of this worm is double mutant Dpy and Unc phenotype. Series 1: Cross Diagrams There are two alleles for each trait in a diploid organism In C. elegans gene symbols are ALWAYS italicized. To represent two different genes on the same chromosome: When both

More information

Federal Aviation Administration Optimal Aircraft Rerouting During Commercial Space Launches

Federal Aviation Administration Optimal Aircraft Rerouting During Commercial Space Launches Federal Aviation Administration Optimal Aircraft Rerouting During Commercial Space Launches Rachael Tompa Mykel Kochenderfer Stanford University Oct 28, 2015 1 Motivation! Problem: Launch vehicle anomaly

More information

Lesson Plan non mendelian inheritance

Lesson Plan non mendelian inheritance Lesson Plan non mendelian inheritance LEYNAR LEYTON NARANJO PH.D. STUDENT. INSTITUTE OF PLANT BREEDING, GENETICS AND GENOMICS Author(s): Author Affiliation and Location. Author Website Author Contact Information

More information

The phenotype of this worm is wild type. When both genes are mutant: The phenotype of this worm is double mutant Dpy and Unc phenotype.

The phenotype of this worm is wild type. When both genes are mutant: The phenotype of this worm is double mutant Dpy and Unc phenotype. Series 2: Cross Diagrams - Complementation There are two alleles for each trait in a diploid organism In C. elegans gene symbols are ALWAYS italicized. To represent two different genes on the same chromosome:

More information

Infinite-Horizon Average Reward Markov Decision Processes

Infinite-Horizon Average Reward Markov Decision Processes Infinite-Horizon Average Reward Markov Decision Processes Dan Zhang Leeds School of Business University of Colorado at Boulder Dan Zhang, Spring 2012 Infinite Horizon Average Reward MDP 1 Outline The average

More information

Reinforcement Learning and NLP

Reinforcement Learning and NLP 1 Reinforcement Learning and NLP Kapil Thadani kapil@cs.columbia.edu RESEARCH Outline 2 Model-free RL Markov decision processes (MDPs) Derivative-free optimization Policy gradients Variance reduction Value

More information

The Reinforcement Learning Problem

The Reinforcement Learning Problem The Reinforcement Learning Problem Slides based on the book Reinforcement Learning by Sutton and Barto Formalizing Reinforcement Learning Formally, the agent and environment interact at each of a sequence

More information

How to connect to CGIAR wheat (CIMMYT and ICARDA) CRP??- Public wheat breeding for developing world

How to connect to CGIAR wheat (CIMMYT and ICARDA) CRP??- Public wheat breeding for developing world Wheat breeding only exploits 10% of the diversity available The public sector can t breed elite varieties-how to connect to private sector breeders?? How to connect to CGIAR wheat (CIMMYT and ICARDA) CRP??-

More information

Control Theory : Course Summary

Control Theory : Course Summary Control Theory : Course Summary Author: Joshua Volkmann Abstract There are a wide range of problems which involve making decisions over time in the face of uncertainty. Control theory draws from the fields

More information

Me n d e l s P e a s Exer c i se 1 - Par t 1

Me n d e l s P e a s Exer c i se 1 - Par t 1 !! Me n d e l s P e a s Exer c i se 1 - Par t 1 TR UE - BR E E D I N G O R G A N I S M S Go a l In this exercise you will use StarGenetics, a genetics experiment simulator, to understand the concept of

More information

Markov Decision Processes and Solving Finite Problems. February 8, 2017

Markov Decision Processes and Solving Finite Problems. February 8, 2017 Markov Decision Processes and Solving Finite Problems February 8, 2017 Overview of Upcoming Lectures Feb 8: Markov decision processes, value iteration, policy iteration Feb 13: Policy gradients Feb 15:

More information

Three essays on bilevel optimization algorithms and applications

Three essays on bilevel optimization algorithms and applications Graduate Theses and Dissertations Graduate College 2012 Three essays on bilevel optimization algorithms and applications Pan Xu Iowa State University Follow this and additional works at: http://lib.dr.iastate.edu/etd

More information

Discrete planning (an introduction)

Discrete planning (an introduction) Sistemi Intelligenti Corso di Laurea in Informatica, A.A. 2017-2018 Università degli Studi di Milano Discrete planning (an introduction) Nicola Basilico Dipartimento di Informatica Via Comelico 39/41-20135

More information

Stochastic Primal-Dual Methods for Reinforcement Learning

Stochastic Primal-Dual Methods for Reinforcement Learning Stochastic Primal-Dual Methods for Reinforcement Learning Alireza Askarian 1 Amber Srivastava 1 1 Department of Mechanical Engineering University of Illinois at Urbana Champaign Big Data Optimization,

More information

Construction of designs for twophase experiments with Tabu search

Construction of designs for twophase experiments with Tabu search Construction of designs for twophase experiments with Tabu search Nha Vo-Thanh In collaboration with Professor Hans-Peter Piepho University of Hohenheim Common bean experiment A breeding program by CIAT

More information

CS 4100 // artificial intelligence. Recap/midterm review!

CS 4100 // artificial intelligence. Recap/midterm review! CS 4100 // artificial intelligence instructor: byron wallace Recap/midterm review! Attribution: many of these slides are modified versions of those distributed with the UC Berkeley CS188 materials Thanks

More information

Lecture 22 Integer Linear Programming Prototype Examples. November 2, 2009

Lecture 22 Integer Linear Programming Prototype Examples. November 2, 2009 Integer Linear Programming Prototype Examples November 2, 2009 Outline Lecture 22 Examples of ILP problems - modeling aspect Chapter 9.1 of the book Operations Research Methods 1 Example: Project Selection

More information

6.254 : Game Theory with Engineering Applications Lecture 13: Extensive Form Games

6.254 : Game Theory with Engineering Applications Lecture 13: Extensive Form Games 6.254 : Game Theory with Engineering Lecture 13: Extensive Form Games Asu Ozdaglar MIT March 18, 2010 1 Introduction Outline Extensive Form Games with Perfect Information One-stage Deviation Principle

More information

Planning in Markov Decision Processes

Planning in Markov Decision Processes Carnegie Mellon School of Computer Science Deep Reinforcement Learning and Control Planning in Markov Decision Processes Lecture 3, CMU 10703 Katerina Fragkiadaki Markov Decision Process (MDP) A Markov

More information

When one gene is wild type and the other mutant:

When one gene is wild type and the other mutant: Series 2: Cross Diagrams Linkage Analysis There are two alleles for each trait in a diploid organism In C. elegans gene symbols are ALWAYS italicized. To represent two different genes on the same chromosome:

More information

, and rewards and transition matrices as shown below:

, and rewards and transition matrices as shown below: CSE 50a. Assignment 7 Out: Tue Nov Due: Thu Dec Reading: Sutton & Barto, Chapters -. 7. Policy improvement Consider the Markov decision process (MDP) with two states s {0, }, two actions a {0, }, discount

More information

Janusz Marecki Zvi Topol

Janusz Marecki Zvi Topol Welcome Janusz Marecki Janusz Marecki Zvi Topol Janusz Marecki Zvi Topol Milind Tambe Solving MDPs with Continuous Time Why do I care about continuous time? 30 min At the airport 10:45 12:00 Start 10:15

More information

Outline. A quiz

Outline. A quiz Introduction to Bayesian Networks Anders Ringgaard Kristensen Outline Causal networks Bayesian Networks Evidence Conditional Independence and d-separation Compilation The moral graph The triangulated graph

More information

Introduction to Bayesian Networks

Introduction to Bayesian Networks Introduction to Bayesian Networks Anders Ringgaard Kristensen Slide 1 Outline Causal networks Bayesian Networks Evidence Conditional Independence and d-separation Compilation The moral graph The triangulated

More information

MARKOV DECISION PROCESSES (MDP) AND REINFORCEMENT LEARNING (RL) Versione originale delle slide fornita dal Prof. Francesco Lo Presti

MARKOV DECISION PROCESSES (MDP) AND REINFORCEMENT LEARNING (RL) Versione originale delle slide fornita dal Prof. Francesco Lo Presti 1 MARKOV DECISION PROCESSES (MDP) AND REINFORCEMENT LEARNING (RL) Versione originale delle slide fornita dal Prof. Francesco Lo Presti Historical background 2 Original motivation: animal learning Early

More information

The Simplex and Policy Iteration Methods are Strongly Polynomial for the Markov Decision Problem with Fixed Discount

The Simplex and Policy Iteration Methods are Strongly Polynomial for the Markov Decision Problem with Fixed Discount The Simplex and Policy Iteration Methods are Strongly Polynomial for the Markov Decision Problem with Fixed Discount Yinyu Ye Department of Management Science and Engineering and Institute of Computational

More information

Reinforcement Learning

Reinforcement Learning Reinforcement Learning Dipendra Misra Cornell University dkm@cs.cornell.edu https://dipendramisra.wordpress.com/ Task Grasp the green cup. Output: Sequence of controller actions Setup from Lenz et. al.

More information

Reward-modulated inference

Reward-modulated inference Buck Shlegeris Matthew Alger COMP3740, 2014 Outline Supervised, unsupervised, and reinforcement learning Neural nets RMI Results with RMI Types of machine learning supervised unsupervised reinforcement

More information

CSE 546 Final Exam, Autumn 2013

CSE 546 Final Exam, Autumn 2013 CSE 546 Final Exam, Autumn 0. Personal info: Name: Student ID: E-mail address:. There should be 5 numbered pages in this exam (including this cover sheet).. You can use any material you brought: any book,

More information

An Introduction to Markov Decision Processes. MDP Tutorial - 1

An Introduction to Markov Decision Processes. MDP Tutorial - 1 An Introduction to Markov Decision Processes Bob Givan Purdue University Ron Parr Duke University MDP Tutorial - 1 Outline Markov Decision Processes defined (Bob) Objective functions Policies Finding Optimal

More information

Markov Decision Processes Infinite Horizon Problems

Markov Decision Processes Infinite Horizon Problems Markov Decision Processes Infinite Horizon Problems Alan Fern * * Based in part on slides by Craig Boutilier and Daniel Weld 1 What is a solution to an MDP? MDP Planning Problem: Input: an MDP (S,A,R,T)

More information

Partially Observable Markov Decision Processes (POMDPs) Pieter Abbeel UC Berkeley EECS

Partially Observable Markov Decision Processes (POMDPs) Pieter Abbeel UC Berkeley EECS Partially Observable Markov Decision Processes (POMDPs) Pieter Abbeel UC Berkeley EECS Many slides adapted from Jur van den Berg Outline POMDPs Separation Principle / Certainty Equivalence Locally Optimal

More information

Reinforcement Learning. Donglin Zeng, Department of Biostatistics, University of North Carolina

Reinforcement Learning. Donglin Zeng, Department of Biostatistics, University of North Carolina Reinforcement Learning Introduction Introduction Unsupervised learning has no outcome (no feedback). Supervised learning has outcome so we know what to predict. Reinforcement learning is in between it

More information

Genetic controls of apple fruit-specific auxin metabolism. PI: Yanmin Zhu Co-PI(2): James Mattheis

Genetic controls of apple fruit-specific auxin metabolism. PI: Yanmin Zhu Co-PI(2): James Mattheis FINAL PROJECT REPORT Project Title: Genetic controls of apple fruit-specific auxin metabolism PI: Yanmin Zhu Co-PI(2): James Mattheis Organization: TFRL-ARS-USDA Organization: TFRL-ARS-USDA Telephone:

More information

Power Allocation over Two Identical Gilbert-Elliott Channels

Power Allocation over Two Identical Gilbert-Elliott Channels Power Allocation over Two Identical Gilbert-Elliott Channels Junhua Tang School of Electronic Information and Electrical Engineering Shanghai Jiao Tong University, China Email: junhuatang@sjtu.edu.cn Parisa

More information

Markov Decision Processes: Biosens II

Markov Decision Processes: Biosens II Markov Decision Processes: Biosens II E. Jørgensen & Lars R. Nielsen Department of Genetics and Biotechnology Faculty of Agricultural Sciences, University of Århus / 008 : Markov Decision Processes Examples

More information

Decision Theory: Q-Learning

Decision Theory: Q-Learning Decision Theory: Q-Learning CPSC 322 Decision Theory 5 Textbook 12.5 Decision Theory: Q-Learning CPSC 322 Decision Theory 5, Slide 1 Lecture Overview 1 Recap 2 Asynchronous Value Iteration 3 Q-Learning

More information

FY SUMMARY BUDGET

FY SUMMARY BUDGET SCHOOL Cole Middle School Budgeted Pupil Count 461.0 BEGINNING FUND BALANCE (Includes ALL Reserves) Object/ Source 736,466.00 736,466.00 REVENUES Local Sources 1000-1999 584,080 584,079.95 Intermediate

More information

Infinite-Horizon Discounted Markov Decision Processes

Infinite-Horizon Discounted Markov Decision Processes Infinite-Horizon Discounted Markov Decision Processes Dan Zhang Leeds School of Business University of Colorado at Boulder Dan Zhang, Spring 2012 Infinite Horizon Discounted MDP 1 Outline The expected

More information

Variation in oat trichome presence. Miller S., Hizbai B.T., Wight C.P., Gardner K.,Yan W., Tinker N.A

Variation in oat trichome presence. Miller S., Hizbai B.T., Wight C.P., Gardner K.,Yan W., Tinker N.A Variation in oat trichome presence Miller S., Hizbai B.T., Wight C.P., Gardner K.,Yan W., Tinker N.A Groat trichomes: hair like growths on the grain Present a major health hazard to growers and processors

More information

Reinforcement Learning as Variational Inference: Two Recent Approaches

Reinforcement Learning as Variational Inference: Two Recent Approaches Reinforcement Learning as Variational Inference: Two Recent Approaches Rohith Kuditipudi Duke University 11 August 2017 Outline 1 Background 2 Stein Variational Policy Gradient 3 Soft Q-Learning 4 Closing

More information

Dynamic Programming Model Integer Programming Model

Dynamic Programming Model Integer Programming Model Dynamic Programming Model Integer Programming Model D. L. Bricker Dept of Mechanical & Industrial Engineering The University of Iowa Optimal Redundancy 4/1/2002 page 1 of 26 One of the systems of a communication

More information

A Tour of Reinforcement Learning The View from Continuous Control. Benjamin Recht University of California, Berkeley

A Tour of Reinforcement Learning The View from Continuous Control. Benjamin Recht University of California, Berkeley A Tour of Reinforcement Learning The View from Continuous Control Benjamin Recht University of California, Berkeley trustable, scalable, predictable Control Theory! Reinforcement Learning is the study

More information

Project Discussions: SNL/ADMM, MDP/Randomization, Quadratic Regularization, and Online Linear Programming

Project Discussions: SNL/ADMM, MDP/Randomization, Quadratic Regularization, and Online Linear Programming Project Discussions: SNL/ADMM, MDP/Randomization, Quadratic Regularization, and Online Linear Programming Yinyu Ye Department of Management Science and Engineering Stanford University Stanford, CA 94305,

More information

Dual Interpretations and Duality Applications (continued)

Dual Interpretations and Duality Applications (continued) Dual Interpretations and Duality Applications (continued) Yinyu Ye Department of Management Science and Engineering Stanford University Stanford, CA 94305, U.S.A. http://www.stanford.edu/ yyye (LY, Chapters

More information

f X (x) = λe λx, , x 0, k 0, λ > 0 Γ (k) f X (u)f X (z u)du

f X (x) = λe λx, , x 0, k 0, λ > 0 Γ (k) f X (u)f X (z u)du 11 COLLECTED PROBLEMS Do the following problems for coursework 1. Problems 11.4 and 11.5 constitute one exercise leading you through the basic ruin arguments. 2. Problems 11.1 through to 11.13 but excluding

More information

Optimizing a Dynamic Order-Picking Process

Optimizing a Dynamic Order-Picking Process Optimizing a Dynamic Order-Picking Process Yossi Bukchin, Eugene Khmelnitsky, Pini Yakuel Department of Industrial Engineering, Tel-Aviv University, Tel-Aviv 69978, ISRAEL Abstract This research studies

More information

The Knowledge Gradient for Sequential Decision Making with Stochastic Binary Feedbacks

The Knowledge Gradient for Sequential Decision Making with Stochastic Binary Feedbacks The Knowledge Gradient for Sequential Decision Making with Stochastic Binary Feedbacks Yingfei Wang, Chu Wang and Warren B. Powell Princeton University Yingfei Wang Optimal Learning Methods June 22, 2016

More information

Implementation Status & Results Burkina Faso Local Government Support Project (P120517)

Implementation Status & Results Burkina Faso Local Government Support Project (P120517) Public Disclosure Authorized Public Disclosure Authorized The World Bank Implementation Status & Results Burkina Faso Local Government Support Project (P120517) Operation Name: Local Government Support

More information

Autonomous Helicopter Flight via Reinforcement Learning

Autonomous Helicopter Flight via Reinforcement Learning Autonomous Helicopter Flight via Reinforcement Learning Authors: Andrew Y. Ng, H. Jin Kim, Michael I. Jordan, Shankar Sastry Presenters: Shiv Ballianda, Jerrolyn Hebert, Shuiwang Ji, Kenley Malveaux, Huy

More information

An MDP-Based Approach to Online Mechanism Design

An MDP-Based Approach to Online Mechanism Design An MDP-Based Approach to Online Mechanism Design The Harvard community has made this article openly available. Please share how this access benefits you. Your story matters Citation Parkes, David C., and

More information

SGZ Macro Week 3, Lecture 2: Suboptimal Equilibria. SGZ 2008 Macro Week 3, Day 1 Lecture 2

SGZ Macro Week 3, Lecture 2: Suboptimal Equilibria. SGZ 2008 Macro Week 3, Day 1 Lecture 2 SGZ Macro Week 3, : Suboptimal Equilibria 1 Basic Points Effects of shocks can be magnified (damped) in suboptimal economies Multiple equilibria (stationary states, dynamic paths) in suboptimal economies

More information

1 Markov decision processes

1 Markov decision processes 2.997 Decision-Making in Large-Scale Systems February 4 MI, Spring 2004 Handout #1 Lecture Note 1 1 Markov decision processes In this class we will study discrete-time stochastic systems. We can describe

More information

Towards Uncertainty-Aware Path Planning On Road Networks Using Augmented-MDPs. Lorenzo Nardi and Cyrill Stachniss

Towards Uncertainty-Aware Path Planning On Road Networks Using Augmented-MDPs. Lorenzo Nardi and Cyrill Stachniss Towards Uncertainty-Aware Path Planning On Road Networks Using Augmented-MDPs Lorenzo Nardi and Cyrill Stachniss Navigation under uncertainty C B C B A A 2 `B` is the most likely position C B C B A A 3

More information

Alireza Shafaei. Machine Learning Reading Group The University of British Columbia Summer 2017

Alireza Shafaei. Machine Learning Reading Group The University of British Columbia Summer 2017 s s Machine Learning Reading Group The University of British Columbia Summer 2017 (OCO) Convex 1/29 Outline (OCO) Convex Stochastic Bernoulli s (OCO) Convex 2/29 At each iteration t, the player chooses

More information

The convergence limit of the temporal difference learning

The convergence limit of the temporal difference learning The convergence limit of the temporal difference learning Ryosuke Nomura the University of Tokyo September 3, 2013 1 Outline Reinforcement Learning Convergence limit Construction of the feature vector

More information

Lecture 3: Policy Evaluation Without Knowing How the World Works / Model Free Policy Evaluation

Lecture 3: Policy Evaluation Without Knowing How the World Works / Model Free Policy Evaluation Lecture 3: Policy Evaluation Without Knowing How the World Works / Model Free Policy Evaluation CS234: RL Emma Brunskill Winter 2018 Material builds on structure from David SIlver s Lecture 4: Model-Free

More information

Laplacian Agent Learning: Representation Policy Iteration

Laplacian Agent Learning: Representation Policy Iteration Laplacian Agent Learning: Representation Policy Iteration Sridhar Mahadevan Example of a Markov Decision Process a1: $0 Heaven $1 Earth What should the agent do? a2: $100 Hell $-1 V a1 ( Earth ) = f(0,1,1,1,1,...)

More information

The concept of breeding value. Gene251/351 Lecture 5

The concept of breeding value. Gene251/351 Lecture 5 The concept of breeding value Gene251/351 Lecture 5 Key terms Estimated breeding value (EB) Heritability Contemporary groups Reading: No prescribed reading from Simm s book. Revision: Quantitative traits

More information

An Optimal Index Policy for the Multi-Armed Bandit Problem with Re-Initializing Bandits

An Optimal Index Policy for the Multi-Armed Bandit Problem with Re-Initializing Bandits An Optimal Index Policy for the Multi-Armed Bandit Problem with Re-Initializing Bandits Peter Jacko YEQT III November 20, 2009 Basque Center for Applied Mathematics (BCAM), Bilbao, Spain Example: Congestion

More information

Zenaidi, Mohamed Ridha; Rezki, Zouheir; Alouini, Mohamed-Slim

Zenaidi, Mohamed Ridha; Rezki, Zouheir; Alouini, Mohamed-Slim KAUST Repository Performance imits of Online Energy Harvesting Communications with Noisy Channel State Information at the Transmitter Item type Authors Citation Eprint version DOI Publisher Journal Rights

More information

Reinforcement Learning

Reinforcement Learning Reinforcement Learning Yihay Manour Google Inc. & Tel-Aviv Univerity Outline Goal of Reinforcement Learning Mathematical Model (MDP) Planning Learning Current Reearch iue 2 Goal of Reinforcement Learning

More information

The World Bank Citizen Voice and Action for Government Accountability (P147834)

The World Bank Citizen Voice and Action for Government Accountability (P147834) Public Disclosure Authorized EAST ASIA AND PACIFIC Indonesia Health, Nutrition & Population Global Practice Recipient Executed Activities Investment Project Financing FY 2014 Seq No: 3 ARCHIVED on 24-Jan-2017

More information

This question has three parts, each of which can be answered concisely, but be prepared to explain and justify your concise answer.

This question has three parts, each of which can be answered concisely, but be prepared to explain and justify your concise answer. This question has three parts, each of which can be answered concisely, but be prepared to explain and justify your concise answer. 1. Suppose you have a policy and its action-value function, q, then you

More information

Simulation. Stochastic scheduling example: Can we get the work done in time?

Simulation. Stochastic scheduling example: Can we get the work done in time? Simulation Stochastic scheduling example: Can we get the work done in time? Example of decision making under uncertainty, combination of algorithms and probability distributions 1 Example study planning

More information

Markov Decision Processes Chapter 17. Mausam

Markov Decision Processes Chapter 17. Mausam Markov Decision Processes Chapter 17 Mausam Planning Agent Static vs. Dynamic Fully vs. Partially Observable Environment What action next? Deterministic vs. Stochastic Perfect vs. Noisy Instantaneous vs.

More information

Markov Models and Reinforcement Learning. Stephen G. Ware CSCI 4525 / 5525

Markov Models and Reinforcement Learning. Stephen G. Ware CSCI 4525 / 5525 Markov Models and Reinforcement Learning Stephen G. Ware CSCI 4525 / 5525 Camera Vacuum World (CVW) 2 discrete rooms with cameras that detect dirt. A mobile robot with a vacuum. The goal is to ensure both

More information

Bayesian Contextual Multi-armed Bandits

Bayesian Contextual Multi-armed Bandits Bayesian Contextual Multi-armed Bandits Xiaoting Zhao Joint Work with Peter I. Frazier School of Operations Research and Information Engineering Cornell University October 22, 2012 1 / 33 Outline 1 Motivating

More information

Online Learning Schemes for Power Allocation in Energy Harvesting Communications

Online Learning Schemes for Power Allocation in Energy Harvesting Communications Online Learning Schemes for Power Allocation in Energy Harvesting Communications Pranav Sakulkar and Bhaskar Krishnamachari Ming Hsieh Department of Electrical Engineering Viterbi School of Engineering

More information

Raghavendra P and S. Hittalmani * Department of Genetics and Plant Breeding, University of Agricultural Sciences GKVK, Bangalore , India

Raghavendra P and S. Hittalmani * Department of Genetics and Plant Breeding, University of Agricultural Sciences GKVK, Bangalore , India SAARC J. Agri., 13(2): 198-213 (2015) GENETIC PARAMETERS OF TWO BC 2 F 1 POPULATIONS FOR DEVELOPMENT OF SUPERIOR MALE STERILE LINES PERTAINING TO MORPHO-FLORAL TRAITS FOR AEROBIC RICE (Oryza sativa L.)

More information

Laboratory III Quantitative Genetics

Laboratory III Quantitative Genetics Laboratory III Quantitative Genetics Genetics Biology 303 Spring 2007 Dr. Wadsworth Introduction Mendel's experimental approach depended on the fact that he chose phenotypes that varied in simple and discrete

More information

Administration. CSCI567 Machine Learning (Fall 2018) Outline. Outline. HW5 is available, due on 11/18. Practice final will also be available soon.

Administration. CSCI567 Machine Learning (Fall 2018) Outline. Outline. HW5 is available, due on 11/18. Practice final will also be available soon. Administration CSCI567 Machine Learning Fall 2018 Prof. Haipeng Luo U of Southern California Nov 7, 2018 HW5 is available, due on 11/18. Practice final will also be available soon. Remaining weeks: 11/14,

More information

Decision Theory: Markov Decision Processes

Decision Theory: Markov Decision Processes Decision Theory: Markov Decision Processes CPSC 322 Lecture 33 March 31, 2006 Textbook 12.5 Decision Theory: Markov Decision Processes CPSC 322 Lecture 33, Slide 1 Lecture Overview Recap Rewards and Policies

More information

Outline. CSE 573: Artificial Intelligence Autumn Agent. Partial Observability. Markov Decision Process (MDP) 10/31/2012

Outline. CSE 573: Artificial Intelligence Autumn Agent. Partial Observability. Markov Decision Process (MDP) 10/31/2012 CSE 573: Artificial Intelligence Autumn 2012 Reasoning about Uncertainty & Hidden Markov Models Daniel Weld Many slides adapted from Dan Klein, Stuart Russell, Andrew Moore & Luke Zettlemoyer 1 Outline

More information

March 11, Dynamic decision making. Pantelis P. Analytis. Introduction. One-shot time allocation decisions. Dynamic e ort allocation

March 11, Dynamic decision making. Pantelis P. Analytis. Introduction. One-shot time allocation decisions. Dynamic e ort allocation March 11, 2018 1/29 1 2 3 4 2/29 Chasing life-changing goals 3/29 Festinger (1942) - A theoretical interpretation of shifts in aspiration level Easy- Difficult I Curve of Resultant Force Fie. 2. Derivation

More information

REINFORCE Framework for Stochastic Policy Optimization and its use in Deep Learning

REINFORCE Framework for Stochastic Policy Optimization and its use in Deep Learning REINFORCE Framework for Stochastic Policy Optimization and its use in Deep Learning Ronen Tamari The Hebrew University of Jerusalem Advanced Seminar in Deep Learning (#67679) February 28, 2016 Ronen Tamari

More information

3. Properties of the relationship matrix

3. Properties of the relationship matrix 3. Properties of the relationship matrix 3.1 Partitioning of the relationship matrix The additive relationship matrix, A, can be written as the product of a lower triangular matrix, T, a diagonal matrix,

More information

RECURSION EQUATION FOR

RECURSION EQUATION FOR Math 46 Lecture 8 Infinite Horizon discounted reward problem From the last lecture: The value function of policy u for the infinite horizon problem with discount factor a and initial state i is W i, u

More information

Finding the Value of Information About a State Variable in a Markov Decision Process 1

Finding the Value of Information About a State Variable in a Markov Decision Process 1 05/25/04 1 Finding the Value of Information About a State Variable in a Markov Decision Process 1 Gilvan C. Souza The Robert H. Smith School of usiness, The University of Maryland, College Park, MD, 20742

More information

Course 16:198:520: Introduction To Artificial Intelligence Lecture 13. Decision Making. Abdeslam Boularias. Wednesday, December 7, 2016

Course 16:198:520: Introduction To Artificial Intelligence Lecture 13. Decision Making. Abdeslam Boularias. Wednesday, December 7, 2016 Course 16:198:520: Introduction To Artificial Intelligence Lecture 13 Decision Making Abdeslam Boularias Wednesday, December 7, 2016 1 / 45 Overview We consider probabilistic temporal models where the

More information

Reinforcement Learning

Reinforcement Learning Reinforcement Learning Model-Based Reinforcement Learning Model-based, PAC-MDP, sample complexity, exploration/exploitation, RMAX, E3, Bayes-optimal, Bayesian RL, model learning Vien Ngo MLR, University

More information

Markov decision processes and interval Markov chains: exploiting the connection

Markov decision processes and interval Markov chains: exploiting the connection Markov decision processes and interval Markov chains: exploiting the connection Mingmei Teo Supervisors: Prof. Nigel Bean, Dr Joshua Ross University of Adelaide July 10, 2013 Intervals and interval arithmetic

More information

BREEDING, GENETICS, AND PHYSIOLOGY. Phenotypic Analysis of the 2006 MY2 Mapping Population in Arkansas

BREEDING, GENETICS, AND PHYSIOLOGY. Phenotypic Analysis of the 2006 MY2 Mapping Population in Arkansas BREEDING, GENETICS, AND PHYSIOLOGY Phenotypic Analysis of the 2006 MY2 Mapping Population in Arkansas E.J. Boza, K.A.K. Moldenhauer, R.D. Cartwright, S. Linscombe, J.H. Oard, and M.M. Blocker ABSTRACT

More information

The Epoch-Greedy Algorithm for Contextual Multi-armed Bandits John Langford and Tong Zhang

The Epoch-Greedy Algorithm for Contextual Multi-armed Bandits John Langford and Tong Zhang The Epoch-Greedy Algorithm for Contextual Multi-armed Bandits John Langford and Tong Zhang Presentation by Terry Lam 02/2011 Outline The Contextual Bandit Problem Prior Works The Epoch Greedy Algorithm

More information

1 MDP Value Iteration Algorithm

1 MDP Value Iteration Algorithm CS 0. - Active Learning Problem Set Handed out: 4 Jan 009 Due: 9 Jan 009 MDP Value Iteration Algorithm. Implement the value iteration algorithm given in the lecture. That is, solve Bellman s equation using

More information

Regulation of Agricultural Biotechnology in the United States: Role of USDA-APHIS Biotechnology Regulatory Services

Regulation of Agricultural Biotechnology in the United States: Role of USDA-APHIS Biotechnology Regulatory Services Regulation of Agricultural Biotechnology in the United States: Role of USDA-APHIS Biotechnology Regulatory Services Bill Doley USDA-APHIS-BRS October 24, 2016 Regulation Under the Coordinated Framework

More information

Low-Regret for Online Decision-Making

Low-Regret for Online Decision-Making Siddhartha Banerjee and Alberto Vera November 6, 2018 1/17 Introduction Compensated Coupling Bayes Selector Conclusion Motivation 2/17 Introduction Compensated Coupling Bayes Selector Conclusion Motivation

More information

Solutions to Problem Set 4

Solutions to Problem Set 4 Question 1 Solutions to 7.014 Problem Set 4 Because you have not read much scientific literature, you decide to study the genetics of garden peas. You have two pure breeding pea strains. One that is tall

More information

Seed Production Strategies and Progeny Selection in Greater Yam Breeding

Seed Production Strategies and Progeny Selection in Greater Yam Breeding Seed Production Strategies and Progeny Selection in Greater Yam Breeding K. Abraham, M. T. Sreekumari and M. N. Sheela Central Tuber Crops Research Institute Trivandrum, India Greater yam a food crop of

More information

Dynamic control of a tandem system with abandonments

Dynamic control of a tandem system with abandonments Dynamic control of a tandem system with abandonments Gabriel Zayas-Cabán 1, Jingui Xie 2, Linda V. Green 3, and Mark E. Lewis 4 1 Center for Healthcare Engineering and Patient Safety University of Michigan

More information

Reading Response: Due Wednesday. R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction 1

Reading Response: Due Wednesday. R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction 1 Reading Response: Due Wednesday R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction 1 Another Example Get to the top of the hill as quickly as possible. reward = 1 for each step where

More information

Directed Reading B. Section: Traits and Inheritance A GREAT IDEA

Directed Reading B. Section: Traits and Inheritance A GREAT IDEA Skills Worksheet Directed Reading B Section: Traits and Inheritance A GREAT IDEA 1. One set of instructions for an inherited trait is a(n) a. allele. c. genotype. d. gene. 2. How many sets of the same

More information

Optimal and Heuristic Resource Allocation Policies in Serial Production Systems

Optimal and Heuristic Resource Allocation Policies in Serial Production Systems Clemson University TigerPrints All Theses Theses 12-2008 Optimal and Heuristic Resource Allocation Policies in Serial Production Systems Ramesh Arumugam Clemson University, rarumug@clemson.edu Follow this

More information