Resource Allocation In Trait Introgression A Markov Decision Process Approach
|
|
- Cori Barton
- 5 years ago
- Views:
Transcription
1 Resource Allocation In Trait Introgression A Markov Decision Process Approach Ye Han Iowa State University yeh@iastateedu Nov 29, 2016 Ye Han (ISU) MDP in Trait Introgression Nov 29, / 27
2 Acknowledgement Collaborators Lizhi Wang William D Beavis John N Cameron Partially funded by Plant Sciences Institute Ye Han (ISU) MDP in Trait Introgression Nov 29, / 27
3 Introgression Goal Ye Han (ISU) MDP in Trait Introgression Nov 29, / 27
4 Outline 1 Trait Introgression as An Engineering Process 2 Dynamic Programming Approach 3 Case Study and Results 4 Conclusions Ye Han (ISU) MDP in Trait Introgression Nov 29, / 27
5 Outline 1 Trait Introgression as An Engineering Process 2 Dynamic Programming Approach 3 Case Study and Results 4 Conclusions Ye Han (ISU) MDP in Trait Introgression Nov 29, / 27
6 Flowchart of Trait Introgression Process Ye Han (ISU) MDP in Trait Introgression Nov 29, / 27
7 Start The Start point: Elite Recipient Donor Ye Han (ISU) MDP in Trait Introgression Nov 29, / 27
8 Success The Success point: Ideal Individual Ye Han (ISU) MDP in Trait Introgression Nov 29, / 27
9 Selection The metric Selection step: Select breeding parents according to a Ye Han (ISU) MDP in Trait Introgression Nov 29, / 27
10 Resource Allocation Design an efficient resources allocation plan to improve the process Ye Han (ISU) MDP in Trait Introgression Nov 29, / 27
11 Resource Allocation Design an efficient resources allocation plan to improve the process Ye Han (ISU) MDP in Trait Introgression Nov 29, / 27
12 Outline 1 Trait Introgression as An Engineering Process 2 Dynamic Programming Approach 3 Case Study and Results 4 Conclusions Ye Han (ISU) MDP in Trait Introgression Nov 29, / 27
13 Dynamic Programming Structures Budget and deadline for the project Cost for producing progeny Revenue for harvesting the target ideal individual Determine the progeny number to produce for each generation Maximize the expected net present value Ye Han (ISU) MDP in Trait Introgression Nov 29, / 27
14 Markov Decision Processes Model Major components: Decision epochs States Actions Transition probabilities Rewards Apply backwards induction to derive results Ye Han (ISU) MDP in Trait Introgression Nov 29, / 27
15 Decision Epochs Finite horizon Beginning of each generation Denoted as {1, 2,, T } Ye Han (ISU) MDP in Trait Introgression Nov 29, / 27
16 States Denoted as S = (m k, b) {failure} {success}, k {1, 2,, T 1}, b {B, B 1,, 1} m k : Genotype status indicator b: Budget indicator Ye Han (ISU) MDP in Trait Introgression Nov 29, / 27
17 Actions Produce an amount of progeny Denoted A = {0, 1, 2,, a max } Ye Han (ISU) MDP in Trait Introgression Nov 29, / 27
18 Transition Probabilities Under action a: Probability of transferring from one state to another S B S B 1 SB a SB a 1 SB a 2 S1 failure success S B 0 0 Wa Ŵ a S B Wa Ŵ a S B Wa 0 0 Ŵ a S M a = a Wa 0 Ŵ a S a Ŵa Ŵ a S a S failure success Ye Han (ISU) MDP in Trait Introgression Nov 29, / 27
19 Rewards Rewards = Revenue Cost Denoted as r t (a, s, T ) = R t (s, T ) C(a) R t (s, T ): revenue at generation t and state s given deadline T, decreasing of t C(a): cost for action a, increasing of a Ye Han (ISU) MDP in Trait Introgression Nov 29, / 27
20 Objective Maximize the expected net present value: max π E π { T t=0 λt [r t (a, s, T )]} Ye Han (ISU) MDP in Trait Introgression Nov 29, / 27
21 Outline 1 Trait Introgression as An Engineering Process 2 Dynamic Programming Approach 3 Case Study and Results 4 Conclusions Ye Han (ISU) MDP in Trait Introgression Nov 29, / 27
22 Case Study PCV metric for parental selection Maximum progeny number for one generation: 1000 progeny Cost for producing one progeny: $10/progeny Deadline: 8 generations R t (s, T ) = (R qt)i(s = success)i(t T ) where R = $2, 000, 000 and q = $100, 000/generation and T = 8 Ye Han (ISU) MDP in Trait Introgression Nov 29, / 27
23 A Random Simulation with Fixed Progeny Amount Generation Population log 10 PCV Ye Han (ISU) MDP in Trait Introgression Nov 29, / 27
24 A Random Simulation with Dynamic Progeny Amount Generation Population log 10 PCV Ye Han (ISU) MDP in Trait Introgression Nov 29, / 27
25 Results Budget, time and probability of success Ye Han (ISU) MDP in Trait Introgression Nov 29, / 27
26 Results Comparison with fixed budget strategies (Total budget: $32,000) Ye Han (ISU) MDP in Trait Introgression Nov 29, / 27
27 Results Budget allocation Ye Han (ISU) MDP in Trait Introgression Nov 29, / 27
28 Results Revenue Ye Han (ISU) MDP in Trait Introgression Nov 29, / 27
29 Results Optimal budget Ye Han (ISU) MDP in Trait Introgression Nov 29, / 27
30 Outline 1 Trait Introgression as An Engineering Process 2 Dynamic Programming Approach 3 Case Study and Results 4 Conclusions Ye Han (ISU) MDP in Trait Introgression Nov 29, / 27
31 Contributions Defined the resources allocation problem for trait introgression process Proposed a Markov decision processes model for better breeding strategy design Improved the breeding strategy in terms of time, budget and probability of success compared with fixed budget strategies Ye Han (ISU) MDP in Trait Introgression Nov 29, / 27
32 Q & A Thank you! Ye Han (ISU) MDP in Trait Introgression Nov 29, / 27
33 Transition Probabilities Transition between each interval under action a (m T for success or failure): W a = m 1 m 2 m 3 m T 2 m T 1 m T m 1 w 1,1 w 1,2 w 1,3 w 1,T 2 w 1,T 1 w 1,T m 2 0 w 2,2 w 2,3 w 2,T 2 w 2,T 1 w 2,T m w 3,3 w 3,T 2 w 3,T 1 w 3,T m T w T 2,T 2 w T 2,T 1 w T 2,T m T w T 1,T 1 w T 1,T m T Ye Han (ISU) MDP in Trait Introgression Nov 29, / 27
34 Transition Probabilities Transition probability matrix under action a: S B S B 1 SB a SB a 1 SB a 2 S1 failure success S B 0 0 Wa Ŵ a S B Wa Ŵ a S B Wa 0 0 Ŵ a S M a = a Wa 0 Ŵ a S a Ŵa Ŵ a S a S failure success Ye Han (ISU) MDP in Trait Introgression Nov 29, / 27
35 Transition Probabilities where: W a = W a(1 : T 1, 1 : T 1) = m 1 m 2 m 3 m T 2 m T 1 m 1 w 1,1 w 1,2 w 1,3 w 1,T 2 w 1,T 1 m 2 0 w 2,2 w 2,3 w 2,T 2 w 2,T 1 m w 3,3 w 3,T 2 w 3,T 1, m T w T 2,T 2 w T 2,T 1 m T w T 1,T 1 m T m 1 w 1,T m 2 w 2,T m 3 w 3,T Ŵ a = W a(1 : T 1, T ) = m T 2 m T 1 w T 2,T w T 1,T and S b = (m 1, b) (m 2, b) (m 3, b) (m T 2, b) (m T 1, b) Ye Han (ISU) MDP in Trait Introgression Nov 29, / 27
A New Metric for Parental Selection in Plant Breeding
Graduate Theses and Dissertations Graduate College 2014 A New Metric for Parental Selection in Plant Breeding Ye Han Iowa State University Follow this and additional works at: http://libdriastateedu/etd
More informationManaging segregating populations
Managing segregating populations Aim of the module At the end of the module, we should be able to: Apply the general principles of managing segregating populations generated from parental crossing; Describe
More informationTASK 6.3 Modelling and data analysis support
Wheat and barley Legacy for Breeding Improvement TASK 6.3 Modelling and data analysis support FP7 European Project Task 6.3: How can statistical models contribute to pre-breeding? Daniela Bustos-Korts
More informationAllocating Resources, in the Future
Allocating Resources, in the Future Sid Banerjee School of ORIE May 3, 2018 Simons Workshop on Mathematical and Computational Challenges in Real-Time Decision Making online resource allocation: basic model......
More informationSTOCHASTIC MODELS FOR RELIABILITY, AVAILABILITY, AND MAINTAINABILITY
STOCHASTIC MODELS FOR RELIABILITY, AVAILABILITY, AND MAINTAINABILITY Ph.D. Assistant Professor Industrial and Systems Engineering Auburn University RAM IX Summit November 2 nd 2016 Outline Introduction
More informationMarkov decision processes
CS 2740 Knowledge representation Lecture 24 Markov decision processes Milos Hauskrecht milos@cs.pitt.edu 5329 Sennott Square Administrative announcements Final exam: Monday, December 8, 2008 In-class Only
More informationChristopher Watkins and Peter Dayan. Noga Zaslavsky. The Hebrew University of Jerusalem Advanced Seminar in Deep Learning (67679) November 1, 2015
Q-Learning Christopher Watkins and Peter Dayan Noga Zaslavsky The Hebrew University of Jerusalem Advanced Seminar in Deep Learning (67679) November 1, 2015 Noga Zaslavsky Q-Learning (Watkins & Dayan, 1992)
More informationThe phenotype of this worm is wild type. When both genes are mutant: The phenotype of this worm is double mutant Dpy and Unc phenotype.
Series 1: Cross Diagrams There are two alleles for each trait in a diploid organism In C. elegans gene symbols are ALWAYS italicized. To represent two different genes on the same chromosome: When both
More informationFederal Aviation Administration Optimal Aircraft Rerouting During Commercial Space Launches
Federal Aviation Administration Optimal Aircraft Rerouting During Commercial Space Launches Rachael Tompa Mykel Kochenderfer Stanford University Oct 28, 2015 1 Motivation! Problem: Launch vehicle anomaly
More informationLesson Plan non mendelian inheritance
Lesson Plan non mendelian inheritance LEYNAR LEYTON NARANJO PH.D. STUDENT. INSTITUTE OF PLANT BREEDING, GENETICS AND GENOMICS Author(s): Author Affiliation and Location. Author Website Author Contact Information
More informationThe phenotype of this worm is wild type. When both genes are mutant: The phenotype of this worm is double mutant Dpy and Unc phenotype.
Series 2: Cross Diagrams - Complementation There are two alleles for each trait in a diploid organism In C. elegans gene symbols are ALWAYS italicized. To represent two different genes on the same chromosome:
More informationInfinite-Horizon Average Reward Markov Decision Processes
Infinite-Horizon Average Reward Markov Decision Processes Dan Zhang Leeds School of Business University of Colorado at Boulder Dan Zhang, Spring 2012 Infinite Horizon Average Reward MDP 1 Outline The average
More informationReinforcement Learning and NLP
1 Reinforcement Learning and NLP Kapil Thadani kapil@cs.columbia.edu RESEARCH Outline 2 Model-free RL Markov decision processes (MDPs) Derivative-free optimization Policy gradients Variance reduction Value
More informationThe Reinforcement Learning Problem
The Reinforcement Learning Problem Slides based on the book Reinforcement Learning by Sutton and Barto Formalizing Reinforcement Learning Formally, the agent and environment interact at each of a sequence
More informationHow to connect to CGIAR wheat (CIMMYT and ICARDA) CRP??- Public wheat breeding for developing world
Wheat breeding only exploits 10% of the diversity available The public sector can t breed elite varieties-how to connect to private sector breeders?? How to connect to CGIAR wheat (CIMMYT and ICARDA) CRP??-
More informationControl Theory : Course Summary
Control Theory : Course Summary Author: Joshua Volkmann Abstract There are a wide range of problems which involve making decisions over time in the face of uncertainty. Control theory draws from the fields
More informationMe n d e l s P e a s Exer c i se 1 - Par t 1
!! Me n d e l s P e a s Exer c i se 1 - Par t 1 TR UE - BR E E D I N G O R G A N I S M S Go a l In this exercise you will use StarGenetics, a genetics experiment simulator, to understand the concept of
More informationMarkov Decision Processes and Solving Finite Problems. February 8, 2017
Markov Decision Processes and Solving Finite Problems February 8, 2017 Overview of Upcoming Lectures Feb 8: Markov decision processes, value iteration, policy iteration Feb 13: Policy gradients Feb 15:
More informationThree essays on bilevel optimization algorithms and applications
Graduate Theses and Dissertations Graduate College 2012 Three essays on bilevel optimization algorithms and applications Pan Xu Iowa State University Follow this and additional works at: http://lib.dr.iastate.edu/etd
More informationDiscrete planning (an introduction)
Sistemi Intelligenti Corso di Laurea in Informatica, A.A. 2017-2018 Università degli Studi di Milano Discrete planning (an introduction) Nicola Basilico Dipartimento di Informatica Via Comelico 39/41-20135
More informationStochastic Primal-Dual Methods for Reinforcement Learning
Stochastic Primal-Dual Methods for Reinforcement Learning Alireza Askarian 1 Amber Srivastava 1 1 Department of Mechanical Engineering University of Illinois at Urbana Champaign Big Data Optimization,
More informationConstruction of designs for twophase experiments with Tabu search
Construction of designs for twophase experiments with Tabu search Nha Vo-Thanh In collaboration with Professor Hans-Peter Piepho University of Hohenheim Common bean experiment A breeding program by CIAT
More informationCS 4100 // artificial intelligence. Recap/midterm review!
CS 4100 // artificial intelligence instructor: byron wallace Recap/midterm review! Attribution: many of these slides are modified versions of those distributed with the UC Berkeley CS188 materials Thanks
More informationLecture 22 Integer Linear Programming Prototype Examples. November 2, 2009
Integer Linear Programming Prototype Examples November 2, 2009 Outline Lecture 22 Examples of ILP problems - modeling aspect Chapter 9.1 of the book Operations Research Methods 1 Example: Project Selection
More information6.254 : Game Theory with Engineering Applications Lecture 13: Extensive Form Games
6.254 : Game Theory with Engineering Lecture 13: Extensive Form Games Asu Ozdaglar MIT March 18, 2010 1 Introduction Outline Extensive Form Games with Perfect Information One-stage Deviation Principle
More informationPlanning in Markov Decision Processes
Carnegie Mellon School of Computer Science Deep Reinforcement Learning and Control Planning in Markov Decision Processes Lecture 3, CMU 10703 Katerina Fragkiadaki Markov Decision Process (MDP) A Markov
More informationWhen one gene is wild type and the other mutant:
Series 2: Cross Diagrams Linkage Analysis There are two alleles for each trait in a diploid organism In C. elegans gene symbols are ALWAYS italicized. To represent two different genes on the same chromosome:
More information, and rewards and transition matrices as shown below:
CSE 50a. Assignment 7 Out: Tue Nov Due: Thu Dec Reading: Sutton & Barto, Chapters -. 7. Policy improvement Consider the Markov decision process (MDP) with two states s {0, }, two actions a {0, }, discount
More informationJanusz Marecki Zvi Topol
Welcome Janusz Marecki Janusz Marecki Zvi Topol Janusz Marecki Zvi Topol Milind Tambe Solving MDPs with Continuous Time Why do I care about continuous time? 30 min At the airport 10:45 12:00 Start 10:15
More informationOutline. A quiz
Introduction to Bayesian Networks Anders Ringgaard Kristensen Outline Causal networks Bayesian Networks Evidence Conditional Independence and d-separation Compilation The moral graph The triangulated graph
More informationIntroduction to Bayesian Networks
Introduction to Bayesian Networks Anders Ringgaard Kristensen Slide 1 Outline Causal networks Bayesian Networks Evidence Conditional Independence and d-separation Compilation The moral graph The triangulated
More informationMARKOV DECISION PROCESSES (MDP) AND REINFORCEMENT LEARNING (RL) Versione originale delle slide fornita dal Prof. Francesco Lo Presti
1 MARKOV DECISION PROCESSES (MDP) AND REINFORCEMENT LEARNING (RL) Versione originale delle slide fornita dal Prof. Francesco Lo Presti Historical background 2 Original motivation: animal learning Early
More informationThe Simplex and Policy Iteration Methods are Strongly Polynomial for the Markov Decision Problem with Fixed Discount
The Simplex and Policy Iteration Methods are Strongly Polynomial for the Markov Decision Problem with Fixed Discount Yinyu Ye Department of Management Science and Engineering and Institute of Computational
More informationReinforcement Learning
Reinforcement Learning Dipendra Misra Cornell University dkm@cs.cornell.edu https://dipendramisra.wordpress.com/ Task Grasp the green cup. Output: Sequence of controller actions Setup from Lenz et. al.
More informationReward-modulated inference
Buck Shlegeris Matthew Alger COMP3740, 2014 Outline Supervised, unsupervised, and reinforcement learning Neural nets RMI Results with RMI Types of machine learning supervised unsupervised reinforcement
More informationCSE 546 Final Exam, Autumn 2013
CSE 546 Final Exam, Autumn 0. Personal info: Name: Student ID: E-mail address:. There should be 5 numbered pages in this exam (including this cover sheet).. You can use any material you brought: any book,
More informationAn Introduction to Markov Decision Processes. MDP Tutorial - 1
An Introduction to Markov Decision Processes Bob Givan Purdue University Ron Parr Duke University MDP Tutorial - 1 Outline Markov Decision Processes defined (Bob) Objective functions Policies Finding Optimal
More informationMarkov Decision Processes Infinite Horizon Problems
Markov Decision Processes Infinite Horizon Problems Alan Fern * * Based in part on slides by Craig Boutilier and Daniel Weld 1 What is a solution to an MDP? MDP Planning Problem: Input: an MDP (S,A,R,T)
More informationPartially Observable Markov Decision Processes (POMDPs) Pieter Abbeel UC Berkeley EECS
Partially Observable Markov Decision Processes (POMDPs) Pieter Abbeel UC Berkeley EECS Many slides adapted from Jur van den Berg Outline POMDPs Separation Principle / Certainty Equivalence Locally Optimal
More informationReinforcement Learning. Donglin Zeng, Department of Biostatistics, University of North Carolina
Reinforcement Learning Introduction Introduction Unsupervised learning has no outcome (no feedback). Supervised learning has outcome so we know what to predict. Reinforcement learning is in between it
More informationGenetic controls of apple fruit-specific auxin metabolism. PI: Yanmin Zhu Co-PI(2): James Mattheis
FINAL PROJECT REPORT Project Title: Genetic controls of apple fruit-specific auxin metabolism PI: Yanmin Zhu Co-PI(2): James Mattheis Organization: TFRL-ARS-USDA Organization: TFRL-ARS-USDA Telephone:
More informationPower Allocation over Two Identical Gilbert-Elliott Channels
Power Allocation over Two Identical Gilbert-Elliott Channels Junhua Tang School of Electronic Information and Electrical Engineering Shanghai Jiao Tong University, China Email: junhuatang@sjtu.edu.cn Parisa
More informationMarkov Decision Processes: Biosens II
Markov Decision Processes: Biosens II E. Jørgensen & Lars R. Nielsen Department of Genetics and Biotechnology Faculty of Agricultural Sciences, University of Århus / 008 : Markov Decision Processes Examples
More informationDecision Theory: Q-Learning
Decision Theory: Q-Learning CPSC 322 Decision Theory 5 Textbook 12.5 Decision Theory: Q-Learning CPSC 322 Decision Theory 5, Slide 1 Lecture Overview 1 Recap 2 Asynchronous Value Iteration 3 Q-Learning
More informationFY SUMMARY BUDGET
SCHOOL Cole Middle School Budgeted Pupil Count 461.0 BEGINNING FUND BALANCE (Includes ALL Reserves) Object/ Source 736,466.00 736,466.00 REVENUES Local Sources 1000-1999 584,080 584,079.95 Intermediate
More informationInfinite-Horizon Discounted Markov Decision Processes
Infinite-Horizon Discounted Markov Decision Processes Dan Zhang Leeds School of Business University of Colorado at Boulder Dan Zhang, Spring 2012 Infinite Horizon Discounted MDP 1 Outline The expected
More informationVariation in oat trichome presence. Miller S., Hizbai B.T., Wight C.P., Gardner K.,Yan W., Tinker N.A
Variation in oat trichome presence Miller S., Hizbai B.T., Wight C.P., Gardner K.,Yan W., Tinker N.A Groat trichomes: hair like growths on the grain Present a major health hazard to growers and processors
More informationReinforcement Learning as Variational Inference: Two Recent Approaches
Reinforcement Learning as Variational Inference: Two Recent Approaches Rohith Kuditipudi Duke University 11 August 2017 Outline 1 Background 2 Stein Variational Policy Gradient 3 Soft Q-Learning 4 Closing
More informationDynamic Programming Model Integer Programming Model
Dynamic Programming Model Integer Programming Model D. L. Bricker Dept of Mechanical & Industrial Engineering The University of Iowa Optimal Redundancy 4/1/2002 page 1 of 26 One of the systems of a communication
More informationA Tour of Reinforcement Learning The View from Continuous Control. Benjamin Recht University of California, Berkeley
A Tour of Reinforcement Learning The View from Continuous Control Benjamin Recht University of California, Berkeley trustable, scalable, predictable Control Theory! Reinforcement Learning is the study
More informationProject Discussions: SNL/ADMM, MDP/Randomization, Quadratic Regularization, and Online Linear Programming
Project Discussions: SNL/ADMM, MDP/Randomization, Quadratic Regularization, and Online Linear Programming Yinyu Ye Department of Management Science and Engineering Stanford University Stanford, CA 94305,
More informationDual Interpretations and Duality Applications (continued)
Dual Interpretations and Duality Applications (continued) Yinyu Ye Department of Management Science and Engineering Stanford University Stanford, CA 94305, U.S.A. http://www.stanford.edu/ yyye (LY, Chapters
More informationf X (x) = λe λx, , x 0, k 0, λ > 0 Γ (k) f X (u)f X (z u)du
11 COLLECTED PROBLEMS Do the following problems for coursework 1. Problems 11.4 and 11.5 constitute one exercise leading you through the basic ruin arguments. 2. Problems 11.1 through to 11.13 but excluding
More informationOptimizing a Dynamic Order-Picking Process
Optimizing a Dynamic Order-Picking Process Yossi Bukchin, Eugene Khmelnitsky, Pini Yakuel Department of Industrial Engineering, Tel-Aviv University, Tel-Aviv 69978, ISRAEL Abstract This research studies
More informationThe Knowledge Gradient for Sequential Decision Making with Stochastic Binary Feedbacks
The Knowledge Gradient for Sequential Decision Making with Stochastic Binary Feedbacks Yingfei Wang, Chu Wang and Warren B. Powell Princeton University Yingfei Wang Optimal Learning Methods June 22, 2016
More informationImplementation Status & Results Burkina Faso Local Government Support Project (P120517)
Public Disclosure Authorized Public Disclosure Authorized The World Bank Implementation Status & Results Burkina Faso Local Government Support Project (P120517) Operation Name: Local Government Support
More informationAutonomous Helicopter Flight via Reinforcement Learning
Autonomous Helicopter Flight via Reinforcement Learning Authors: Andrew Y. Ng, H. Jin Kim, Michael I. Jordan, Shankar Sastry Presenters: Shiv Ballianda, Jerrolyn Hebert, Shuiwang Ji, Kenley Malveaux, Huy
More informationAn MDP-Based Approach to Online Mechanism Design
An MDP-Based Approach to Online Mechanism Design The Harvard community has made this article openly available. Please share how this access benefits you. Your story matters Citation Parkes, David C., and
More informationSGZ Macro Week 3, Lecture 2: Suboptimal Equilibria. SGZ 2008 Macro Week 3, Day 1 Lecture 2
SGZ Macro Week 3, : Suboptimal Equilibria 1 Basic Points Effects of shocks can be magnified (damped) in suboptimal economies Multiple equilibria (stationary states, dynamic paths) in suboptimal economies
More information1 Markov decision processes
2.997 Decision-Making in Large-Scale Systems February 4 MI, Spring 2004 Handout #1 Lecture Note 1 1 Markov decision processes In this class we will study discrete-time stochastic systems. We can describe
More informationTowards Uncertainty-Aware Path Planning On Road Networks Using Augmented-MDPs. Lorenzo Nardi and Cyrill Stachniss
Towards Uncertainty-Aware Path Planning On Road Networks Using Augmented-MDPs Lorenzo Nardi and Cyrill Stachniss Navigation under uncertainty C B C B A A 2 `B` is the most likely position C B C B A A 3
More informationAlireza Shafaei. Machine Learning Reading Group The University of British Columbia Summer 2017
s s Machine Learning Reading Group The University of British Columbia Summer 2017 (OCO) Convex 1/29 Outline (OCO) Convex Stochastic Bernoulli s (OCO) Convex 2/29 At each iteration t, the player chooses
More informationThe convergence limit of the temporal difference learning
The convergence limit of the temporal difference learning Ryosuke Nomura the University of Tokyo September 3, 2013 1 Outline Reinforcement Learning Convergence limit Construction of the feature vector
More informationLecture 3: Policy Evaluation Without Knowing How the World Works / Model Free Policy Evaluation
Lecture 3: Policy Evaluation Without Knowing How the World Works / Model Free Policy Evaluation CS234: RL Emma Brunskill Winter 2018 Material builds on structure from David SIlver s Lecture 4: Model-Free
More informationLaplacian Agent Learning: Representation Policy Iteration
Laplacian Agent Learning: Representation Policy Iteration Sridhar Mahadevan Example of a Markov Decision Process a1: $0 Heaven $1 Earth What should the agent do? a2: $100 Hell $-1 V a1 ( Earth ) = f(0,1,1,1,1,...)
More informationThe concept of breeding value. Gene251/351 Lecture 5
The concept of breeding value Gene251/351 Lecture 5 Key terms Estimated breeding value (EB) Heritability Contemporary groups Reading: No prescribed reading from Simm s book. Revision: Quantitative traits
More informationAn Optimal Index Policy for the Multi-Armed Bandit Problem with Re-Initializing Bandits
An Optimal Index Policy for the Multi-Armed Bandit Problem with Re-Initializing Bandits Peter Jacko YEQT III November 20, 2009 Basque Center for Applied Mathematics (BCAM), Bilbao, Spain Example: Congestion
More informationZenaidi, Mohamed Ridha; Rezki, Zouheir; Alouini, Mohamed-Slim
KAUST Repository Performance imits of Online Energy Harvesting Communications with Noisy Channel State Information at the Transmitter Item type Authors Citation Eprint version DOI Publisher Journal Rights
More informationReinforcement Learning
Reinforcement Learning Yihay Manour Google Inc. & Tel-Aviv Univerity Outline Goal of Reinforcement Learning Mathematical Model (MDP) Planning Learning Current Reearch iue 2 Goal of Reinforcement Learning
More informationThe World Bank Citizen Voice and Action for Government Accountability (P147834)
Public Disclosure Authorized EAST ASIA AND PACIFIC Indonesia Health, Nutrition & Population Global Practice Recipient Executed Activities Investment Project Financing FY 2014 Seq No: 3 ARCHIVED on 24-Jan-2017
More informationThis question has three parts, each of which can be answered concisely, but be prepared to explain and justify your concise answer.
This question has three parts, each of which can be answered concisely, but be prepared to explain and justify your concise answer. 1. Suppose you have a policy and its action-value function, q, then you
More informationSimulation. Stochastic scheduling example: Can we get the work done in time?
Simulation Stochastic scheduling example: Can we get the work done in time? Example of decision making under uncertainty, combination of algorithms and probability distributions 1 Example study planning
More informationMarkov Decision Processes Chapter 17. Mausam
Markov Decision Processes Chapter 17 Mausam Planning Agent Static vs. Dynamic Fully vs. Partially Observable Environment What action next? Deterministic vs. Stochastic Perfect vs. Noisy Instantaneous vs.
More informationMarkov Models and Reinforcement Learning. Stephen G. Ware CSCI 4525 / 5525
Markov Models and Reinforcement Learning Stephen G. Ware CSCI 4525 / 5525 Camera Vacuum World (CVW) 2 discrete rooms with cameras that detect dirt. A mobile robot with a vacuum. The goal is to ensure both
More informationBayesian Contextual Multi-armed Bandits
Bayesian Contextual Multi-armed Bandits Xiaoting Zhao Joint Work with Peter I. Frazier School of Operations Research and Information Engineering Cornell University October 22, 2012 1 / 33 Outline 1 Motivating
More informationOnline Learning Schemes for Power Allocation in Energy Harvesting Communications
Online Learning Schemes for Power Allocation in Energy Harvesting Communications Pranav Sakulkar and Bhaskar Krishnamachari Ming Hsieh Department of Electrical Engineering Viterbi School of Engineering
More informationRaghavendra P and S. Hittalmani * Department of Genetics and Plant Breeding, University of Agricultural Sciences GKVK, Bangalore , India
SAARC J. Agri., 13(2): 198-213 (2015) GENETIC PARAMETERS OF TWO BC 2 F 1 POPULATIONS FOR DEVELOPMENT OF SUPERIOR MALE STERILE LINES PERTAINING TO MORPHO-FLORAL TRAITS FOR AEROBIC RICE (Oryza sativa L.)
More informationLaboratory III Quantitative Genetics
Laboratory III Quantitative Genetics Genetics Biology 303 Spring 2007 Dr. Wadsworth Introduction Mendel's experimental approach depended on the fact that he chose phenotypes that varied in simple and discrete
More informationAdministration. CSCI567 Machine Learning (Fall 2018) Outline. Outline. HW5 is available, due on 11/18. Practice final will also be available soon.
Administration CSCI567 Machine Learning Fall 2018 Prof. Haipeng Luo U of Southern California Nov 7, 2018 HW5 is available, due on 11/18. Practice final will also be available soon. Remaining weeks: 11/14,
More informationDecision Theory: Markov Decision Processes
Decision Theory: Markov Decision Processes CPSC 322 Lecture 33 March 31, 2006 Textbook 12.5 Decision Theory: Markov Decision Processes CPSC 322 Lecture 33, Slide 1 Lecture Overview Recap Rewards and Policies
More informationOutline. CSE 573: Artificial Intelligence Autumn Agent. Partial Observability. Markov Decision Process (MDP) 10/31/2012
CSE 573: Artificial Intelligence Autumn 2012 Reasoning about Uncertainty & Hidden Markov Models Daniel Weld Many slides adapted from Dan Klein, Stuart Russell, Andrew Moore & Luke Zettlemoyer 1 Outline
More informationMarch 11, Dynamic decision making. Pantelis P. Analytis. Introduction. One-shot time allocation decisions. Dynamic e ort allocation
March 11, 2018 1/29 1 2 3 4 2/29 Chasing life-changing goals 3/29 Festinger (1942) - A theoretical interpretation of shifts in aspiration level Easy- Difficult I Curve of Resultant Force Fie. 2. Derivation
More informationREINFORCE Framework for Stochastic Policy Optimization and its use in Deep Learning
REINFORCE Framework for Stochastic Policy Optimization and its use in Deep Learning Ronen Tamari The Hebrew University of Jerusalem Advanced Seminar in Deep Learning (#67679) February 28, 2016 Ronen Tamari
More information3. Properties of the relationship matrix
3. Properties of the relationship matrix 3.1 Partitioning of the relationship matrix The additive relationship matrix, A, can be written as the product of a lower triangular matrix, T, a diagonal matrix,
More informationRECURSION EQUATION FOR
Math 46 Lecture 8 Infinite Horizon discounted reward problem From the last lecture: The value function of policy u for the infinite horizon problem with discount factor a and initial state i is W i, u
More informationFinding the Value of Information About a State Variable in a Markov Decision Process 1
05/25/04 1 Finding the Value of Information About a State Variable in a Markov Decision Process 1 Gilvan C. Souza The Robert H. Smith School of usiness, The University of Maryland, College Park, MD, 20742
More informationCourse 16:198:520: Introduction To Artificial Intelligence Lecture 13. Decision Making. Abdeslam Boularias. Wednesday, December 7, 2016
Course 16:198:520: Introduction To Artificial Intelligence Lecture 13 Decision Making Abdeslam Boularias Wednesday, December 7, 2016 1 / 45 Overview We consider probabilistic temporal models where the
More informationReinforcement Learning
Reinforcement Learning Model-Based Reinforcement Learning Model-based, PAC-MDP, sample complexity, exploration/exploitation, RMAX, E3, Bayes-optimal, Bayesian RL, model learning Vien Ngo MLR, University
More informationMarkov decision processes and interval Markov chains: exploiting the connection
Markov decision processes and interval Markov chains: exploiting the connection Mingmei Teo Supervisors: Prof. Nigel Bean, Dr Joshua Ross University of Adelaide July 10, 2013 Intervals and interval arithmetic
More informationBREEDING, GENETICS, AND PHYSIOLOGY. Phenotypic Analysis of the 2006 MY2 Mapping Population in Arkansas
BREEDING, GENETICS, AND PHYSIOLOGY Phenotypic Analysis of the 2006 MY2 Mapping Population in Arkansas E.J. Boza, K.A.K. Moldenhauer, R.D. Cartwright, S. Linscombe, J.H. Oard, and M.M. Blocker ABSTRACT
More informationThe Epoch-Greedy Algorithm for Contextual Multi-armed Bandits John Langford and Tong Zhang
The Epoch-Greedy Algorithm for Contextual Multi-armed Bandits John Langford and Tong Zhang Presentation by Terry Lam 02/2011 Outline The Contextual Bandit Problem Prior Works The Epoch Greedy Algorithm
More information1 MDP Value Iteration Algorithm
CS 0. - Active Learning Problem Set Handed out: 4 Jan 009 Due: 9 Jan 009 MDP Value Iteration Algorithm. Implement the value iteration algorithm given in the lecture. That is, solve Bellman s equation using
More informationRegulation of Agricultural Biotechnology in the United States: Role of USDA-APHIS Biotechnology Regulatory Services
Regulation of Agricultural Biotechnology in the United States: Role of USDA-APHIS Biotechnology Regulatory Services Bill Doley USDA-APHIS-BRS October 24, 2016 Regulation Under the Coordinated Framework
More informationLow-Regret for Online Decision-Making
Siddhartha Banerjee and Alberto Vera November 6, 2018 1/17 Introduction Compensated Coupling Bayes Selector Conclusion Motivation 2/17 Introduction Compensated Coupling Bayes Selector Conclusion Motivation
More informationSolutions to Problem Set 4
Question 1 Solutions to 7.014 Problem Set 4 Because you have not read much scientific literature, you decide to study the genetics of garden peas. You have two pure breeding pea strains. One that is tall
More informationSeed Production Strategies and Progeny Selection in Greater Yam Breeding
Seed Production Strategies and Progeny Selection in Greater Yam Breeding K. Abraham, M. T. Sreekumari and M. N. Sheela Central Tuber Crops Research Institute Trivandrum, India Greater yam a food crop of
More informationDynamic control of a tandem system with abandonments
Dynamic control of a tandem system with abandonments Gabriel Zayas-Cabán 1, Jingui Xie 2, Linda V. Green 3, and Mark E. Lewis 4 1 Center for Healthcare Engineering and Patient Safety University of Michigan
More informationReading Response: Due Wednesday. R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction 1
Reading Response: Due Wednesday R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction 1 Another Example Get to the top of the hill as quickly as possible. reward = 1 for each step where
More informationDirected Reading B. Section: Traits and Inheritance A GREAT IDEA
Skills Worksheet Directed Reading B Section: Traits and Inheritance A GREAT IDEA 1. One set of instructions for an inherited trait is a(n) a. allele. c. genotype. d. gene. 2. How many sets of the same
More informationOptimal and Heuristic Resource Allocation Policies in Serial Production Systems
Clemson University TigerPrints All Theses Theses 12-2008 Optimal and Heuristic Resource Allocation Policies in Serial Production Systems Ramesh Arumugam Clemson University, rarumug@clemson.edu Follow this
More information