Micro-Randomized Trials & mhealth. S.A. Murphy NRC 8/2014

Size: px
Start display at page:

Download "Micro-Randomized Trials & mhealth. S.A. Murphy NRC 8/2014"

Transcription

1 Micro-Randomized Trials & mhealth S.A. Murphy NRC 8/2014

2 mhealth Goal: Design a Continually Learning Mobile Health Intervention: HeartSteps + Designing a Micro-Randomized Trial 2

3 Data from wearable devices that sense and provide treatments State at j th decision time (high dimensional) Action at j th decision time (treatment) Y j : Response (time-varying response) 3

4 Examples 1) Decision Times (Times at which a treatment can be provided.) 1) Regular intervals in time (e.g. every 10 minutes) 2) At user demand HeartSteps includes two sets of decision times 1) Momentary: Approximately every hours 2) Daily: Each evening at user specified time. 4

5 Examples 2) State S j 1) Passively collected (location, weather, busyness of calendar, social context, activity on device) 2) Actively collected (answers to questions) HeartSteps includes activity recognition (walking, driving, standing/sitting), weather, location, calendar, adherence, step count, self-report (usefulness, burden, self-efficacy, etc.), whether momentary intervention is on. 5

6 Examples 3) Actions A j 1) Treatments that can be provided at decision time 2) Whether to provide a treatment HeartSteps includes two types of treatments 1) Momentary Lock Screen Recommendation 2) Daily Activity Planning 6

7 Examples 3) Actions A j 1) Treatments that can be provided at decision time 2) Whether to provide a treatment HeartSteps includes two types of treatments 1) Momentary Lock Screen Recommendation, 2) Daily Activity Planning 7

8 Daily Activity Planning 8

9 Momentary Lock Screen Recommendation 9

10 Examples 4) Proximal Response Y j HeartSteps: Activity (step count) between decision times or daily activity. 10

11 Scientific Goals 1) Assess if there are proximal causal effects of the actions on the response. 2) Assess if there are delayed causal effects; assess if the proximal/delayed causal effects vary by particular state variables. 3) Construct a treatment policy, aka Just-in-Time Adaptive Intervention that inputs state and outputs actions. 4) Construct an online training algorithm that will result in a Continually Updating Just-in-Time Adaptive Intervention 11

12 Today s Focus 1) Assess if there are proximal causal effects of the actions on the reward. 2) Assess if there are delayed causal effects; assess if the proximal/delayed causal effects vary by particular state variables. 3) Construct a treatment policy, aka Just-in-Time Adaptive Intervention that inputs state and outputs actions. 4) Construct an online training algorithm that will result in a Continually Updating Just-in-Time Adaptive Intervention 12

13 Proposed Experimental Design: Micro-Randomized Trial Randomize between actions at decision times Each person may be randomized 100 s of times. These are sequential, full factorial, designs. 13

14 Why Micro-Randomization? Factorial designs are the gold standard when collecting data to build a treatment involving many components Actions are often intended to have a proximal effect. randomization is the gold standard in providing data to assess a causal effect Sequential randomization will enhance quality of many interesting subsequent data analyses. 14

15 Justifying the Sample Size for a Micro-Randomized Trial Focus on whether to provide a Momentary Lock Screen Recommendation, e.g. A j {0,1} Randomization in HeartSteps P[A j =1]=.4 Size to Detect a Proximal Causal Effect 15

16 Proximal Causal Effect Recall that Y j is the proximal response recorded after action A j A j is only delivered if the momentary intervention is on at time j. Set R j =1 if the momentary intervention is on at time j, otherwise R j =0 16

17 Proximal Causal Effect Define Ā j ={A 1,A 2,...,A j },ā j ={a 1,a 2,...,a j } Define Y j (ā j ) to be the observed response, Y j if Ā j =ā j, e.g., Y j =Y j (Ā j ) Define R j (ā j 1 ) to be the observed intervention on indicator if Ā j 1 =ā j 1 17

18 Proximal Causal Effect Define the Proximal Causal Effect at time j as E [ Y j (Āj 1,1) Y j (Āj 1,0) R j (Āj 1)=1 ] What does this estimand mean? 18

19 Proximal Causal Effect The randomization implies that E [ Y j (Āj 1,1) Y j (Āj 1,0) R j (Āj 1)=1 ] = E [ Y j R j =1,A j =1 ] E [ Y j R j =1,A j =0 ] Put β j =E [ Y j R j =1,A j =1 ] E [ Y j R j =1,A j =0 ] 19

20 Change Notation t th time on j th day: β tj =E [ Y tj R tj =1,A tj =1 ] E [ Y tj R tj =1,A tj =0 ] 20

21 Test for Sample Size Calculation We construct a test statistic for H 0 :β tj =0, t,j A simple approach is parameterize β tj =β 0 +β 1 (j 1)+β 2 (j 1) 2 and test H 0 :β i =0,i=0,1,2 21

22 Alternative for Sample Size Calculation One calculates a sample size to detect a given alternative with a given power. Alternative: H 1 :β i =d i σ,i=0,1,2 σ 2 where is the residual variance. 22

23 Test Statistic for Sample Size Calculation Residual variance is σ 2 =VAR ( Y tj E[Y tj R tj =1] ) 23

24 Specify Alternative for Sample Size Calculation Scientist specifies standardized d i s initial proximal treatment effect: d 0, average proximal effect over trial duration: 1 5J J j=1 5 t=1( d0 +d 1 (j 1)+d 2 (j 1) 2), and day of maximal proximal effect: We solve for d i s. d 1 2d 2 24

25 Test Statistic for Sample Size Calculation The model E[Y tj R tj =1]= α tj +γ tj R tj +β tj R tj (A tj p tj ) where p tj is the randomization probability p tj is.4 25

26 Test Statistic for Sample Size Calculation Use GEE to fit model E[Y tj R tj =1]= α tj +γ tj R tj +β tj R tj (A tj p tj ) where β tj =β 0 +β 1 (j 1)+β 2 (j 1) 2 You select parameterization of α tj, γ tj 26

27 Test Statistic for Sample Size Put Calculation Y =(Y 11,...,Y 51,Y 12,...,Y 5J ) T p is the total number of parameters (p >3); X is the associated design matrix (5J by p) N is sample size Last 3 columns of X are R tj (A tj p tj ),R tj (A tj p tj )(j 1), R tj (A tj p tj )(j 1) 2 27

28 Test Statistic for Sample Size Calculation You choose your favorite correlation matrix associated with Var(Y X). ˆΣ= ( N ˆσ 2 N X T i X i ) 1 { N X T i }( Corr(Y N i X i )X i X T i X i ) 1 i=1 i=1 i=1 K is 3 by p matrix picking out β coefficients 28

29 Test Statistic for Sample Size Calculation GEE test statistic is Nˆβ T (KˆΣK T ) 1ˆβ The asymptotic distribution is a Chi- Squared on 3 degrees of freedom with non-centrality parameter: d T (Σ β ) 1 d 29

30 Test Statistic for Sample Size Calculation The asymptotic distribution is a Chi- Squared on 3 degrees of freedom with non-centrality parameter: d T (Σ β ) 1 d Σ β only depends on polynomials in (j-1), R tj the distribution of and on the variance of the sequential randomizations. 30

31 Test Statistic for Sample Size Calculation The asymptotic distribution of the test statistic does not depend on the form of α tj, γ tj aaaaaaa or on the correlation matrix. The asymptotic distribution does depend on the distribution of R tj 31

32 Standardized d i s initial proximal effect: d 0 =0 average proximal effect: 1 5J J j=1 5 t=1 Example ( d0 +d 1 (j 1)+d 2 (j 1) 2) day of maximal proximal effect: d 1 2d 2 =28 P[R tj =1 A jt,j 1 =0]=.9, P[R tj =1 A t,j 1 =1]=.8 32

33 Average Proximal Effect P[R tj =1 A jt,j 1 =0]=.9, P[R tj =1 A t,j 1 =1]=.8 33

34 Average Proximal Effect P[R tj =1 A jt,j 1 =0]=.7, P[R tj =1 A t,j 1 =1]=.5 34

35 Scientific Goals 1) Assess if there are proximal causal effects of the actions on the response. 2) Assess if there are delayed causal effects; assess if the proximal/delayed causal effects vary by particular state variables. 3) Construct a treatment policy, aka Just-in-Time Adaptive Intervention that inputs state and outputs actions. 4) Construct an online training algorithm that will result in a Continually Updating Just-in-Time Adaptive Intervention 35

36 Collaborators: P. Liao, P. Klasnja, A. Tewari if you have questions! 36

Assessing Time-Varying Causal Effect. Moderation

Assessing Time-Varying Causal Effect. Moderation Assessing Time-Varying Causal Effect JITAIs Moderation HeartSteps JITAI JITAIs Susan Murphy 11.07.16 JITAIs Outline Introduction to mobile health Causal Treatment Effects (aka Causal Excursions) (A wonderfully

More information

Time-Varying Causal. Treatment Effects

Time-Varying Causal. Treatment Effects Time-Varying Causal JOOLHEALTH Treatment Effects Bar-Fit Susan A Murphy 12.14.17 HeartSteps SARA Sense 2 Stop Disclosures Consulted with Sanofi on mobile health engagement and adherence. 2 Outline Introduction

More information

Randomization-Based Inference With Complex Data Need Not Be Complex!

Randomization-Based Inference With Complex Data Need Not Be Complex! Randomization-Based Inference With Complex Data Need Not Be Complex! JITAIs JITAIs Susan Murphy 07.18.17 HeartSteps JITAI JITAIs Sequential Decision Making Use data to inform science and construct decision

More information

An Online Actor Critic Algorithm and a Statistical Decision Procedure for Personalizing Intervention

An Online Actor Critic Algorithm and a Statistical Decision Procedure for Personalizing Intervention An Online Actor Critic Algorithm and a Statistical Decision Procedure for Personalizing Intervention by Huitian Lei A dissertation submitted in partial fulfillment of the requirements for the degree of

More information

Robustness of the Contextual Bandit Algorithm to A Physical Activity Motivation Effect

Robustness of the Contextual Bandit Algorithm to A Physical Activity Motivation Effect Robustness of the Contextual Bandit Algorithm to A Physical Activity Motivation Effect Xige Zhang April 10, 2016 1 Introduction Technological advances in mobile devices have seen a growing popularity in

More information

HHS Public Access Author manuscript Stat Med. Author manuscript; available in PMC 2017 May 30.

HHS Public Access Author manuscript Stat Med. Author manuscript; available in PMC 2017 May 30. Sample Size Calculations for Micro-randomized Trials in mhealth Peng Liao a,*,, Predrag Klasnja b, Ambuj Tewari a, and Susan A. Murphy a a Department of Statistics, University of Michigan, Ann Arbor, MI

More information

Assessing Time-Varying Causal Effect Moderation in Mobile Health

Assessing Time-Varying Causal Effect Moderation in Mobile Health Assessing Time-Varying Causal Effect Moderation in Mobile Health arxiv:1601.00237v3 [stat.me] 17 Aug 2016 Audrey Boruvka 1, Daniel Almirall 2, Katie Witkiewitz 3, and Susan A. Murphy 1,2 1 Department of

More information

Bios 6649: Clinical Trials - Statistical Design and Monitoring

Bios 6649: Clinical Trials - Statistical Design and Monitoring Bios 6649: Clinical Trials - Statistical Design and Monitoring Spring Semester 2015 John M. Kittelson Department of Biostatistics & Informatics Colorado School of Public Health University of Colorado Denver

More information

REINFORCE Framework for Stochastic Policy Optimization and its use in Deep Learning

REINFORCE Framework for Stochastic Policy Optimization and its use in Deep Learning REINFORCE Framework for Stochastic Policy Optimization and its use in Deep Learning Ronen Tamari The Hebrew University of Jerusalem Advanced Seminar in Deep Learning (#67679) February 28, 2016 Ronen Tamari

More information

Properties of the least squares estimates

Properties of the least squares estimates Properties of the least squares estimates 2019-01-18 Warmup Let a and b be scalar constants, and X be a scalar random variable. Fill in the blanks E ax + b) = Var ax + b) = Goal Recall that the least squares

More information

Reinforcement Learning and Control

Reinforcement Learning and Control CS9 Lecture notes Andrew Ng Part XIII Reinforcement Learning and Control We now begin our study of reinforcement learning and adaptive control. In supervised learning, we saw algorithms that tried to make

More information

Structural Nested Mean Models for Assessing Time-Varying Effect Moderation. Daniel Almirall

Structural Nested Mean Models for Assessing Time-Varying Effect Moderation. Daniel Almirall 1 Structural Nested Mean Models for Assessing Time-Varying Effect Moderation Daniel Almirall Center for Health Services Research, Durham VAMC & Dept. of Biostatistics, Duke University Medical Joint work

More information

Estimating Optimal Dynamic Treatment Regimes from Clustered Data

Estimating Optimal Dynamic Treatment Regimes from Clustered Data Estimating Optimal Dynamic Treatment Regimes from Clustered Data Bibhas Chakraborty Department of Biostatistics, Columbia University bc2425@columbia.edu Society for Clinical Trials Annual Meetings Boston,

More information

Classification Logistic Regression

Classification Logistic Regression Announcements: Classification Logistic Regression Machine Learning CSE546 Sham Kakade University of Washington HW due on Friday. Today: Review: sub-gradients,lasso Logistic Regression October 3, 26 Sham

More information

Structural Nested Mean Models for Assessing Time-Varying Effect Moderation. Daniel Almirall

Structural Nested Mean Models for Assessing Time-Varying Effect Moderation. Daniel Almirall 1 Structural Nested Mean Models for Assessing Time-Varying Effect Moderation Daniel Almirall Center for Health Services Research, Durham VAMC & Duke University Medical, Dept. of Biostatistics Joint work

More information

CS246 Final Exam, Winter 2011

CS246 Final Exam, Winter 2011 CS246 Final Exam, Winter 2011 1. Your name and student ID. Name:... Student ID:... 2. I agree to comply with Stanford Honor Code. Signature:... 3. There should be 17 numbered pages in this exam (including

More information

An Experimental Evaluation of High-Dimensional Multi-Armed Bandits

An Experimental Evaluation of High-Dimensional Multi-Armed Bandits An Experimental Evaluation of High-Dimensional Multi-Armed Bandits Naoki Egami Romain Ferrali Kosuke Imai Princeton University Talk at Political Data Science Conference Washington University, St. Louis

More information

F3: Classical normal linear rgression model distribution, interval estimation and hypothesis testing

F3: Classical normal linear rgression model distribution, interval estimation and hypothesis testing F3: Classical normal linear rgression model distribution, interval estimation and hypothesis testing Feng Li Department of Statistics, Stockholm University What we have learned last time... 1 Estimating

More information

Vector Autoregressive Model. Vector Autoregressions II. Estimation of Vector Autoregressions II. Estimation of Vector Autoregressions I.

Vector Autoregressive Model. Vector Autoregressions II. Estimation of Vector Autoregressions II. Estimation of Vector Autoregressions I. Vector Autoregressive Model Vector Autoregressions II Empirical Macroeconomics - Lect 2 Dr. Ana Beatriz Galvao Queen Mary University of London January 2012 A VAR(p) model of the m 1 vector of time series

More information

United States Multi-Hazard Early Warning System

United States Multi-Hazard Early Warning System United States Multi-Hazard Early Warning System Saving Lives Through Partnership Lynn Maximuk National Weather Service Director, Central Region Kansas City, Missouri America s s Weather Enterprise: Protecting

More information

From Ads to Interventions: Contextual Bandits in Mobile Health

From Ads to Interventions: Contextual Bandits in Mobile Health From Ads to Interventions: Contextual Bandits in Mobile Health Ambuj Tewari and Susan A. Murphy Abstract The first paper on contextual bandits was written by Michael Woodroofe in 1979 (Journal of the American

More information

Integrated Electricity Demand and Price Forecasting

Integrated Electricity Demand and Price Forecasting Integrated Electricity Demand and Price Forecasting Create and Evaluate Forecasting Models The many interrelated factors which influence demand for electricity cannot be directly modeled by closed-form

More information

Linear Regression Models P8111

Linear Regression Models P8111 Linear Regression Models P8111 Lecture 25 Jeff Goldsmith April 26, 2016 1 of 37 Today s Lecture Logistic regression / GLMs Model framework Interpretation Estimation 2 of 37 Linear regression Course started

More information

Chapter 10 Verification and Validation of Simulation Models. Banks, Carson, Nelson & Nicol Discrete-Event System Simulation

Chapter 10 Verification and Validation of Simulation Models. Banks, Carson, Nelson & Nicol Discrete-Event System Simulation Chapter 10 Verification and Validation of Simulation Models Banks, Carson, Nelson & Nicol Discrete-Event System Simulation The Black Box [Bank Example: Validate I-O Transformation] A model was developed

More information

Adaptive Trial Designs

Adaptive Trial Designs Adaptive Trial Designs Wenjing Zheng, Ph.D. Methods Core Seminar Center for AIDS Prevention Studies University of California, San Francisco Nov. 17 th, 2015 Trial Design! Ethical:!eg.! Safety!! Efficacy!

More information

Bayesian Social Learning with Random Decision Making in Sequential Systems

Bayesian Social Learning with Random Decision Making in Sequential Systems Bayesian Social Learning with Random Decision Making in Sequential Systems Yunlong Wang supervised by Petar M. Djurić Department of Electrical and Computer Engineering Stony Brook University Stony Brook,

More information

Set-valued dynamic treatment regimes for competing outcomes

Set-valued dynamic treatment regimes for competing outcomes Set-valued dynamic treatment regimes for competing outcomes Eric B. Laber Department of Statistics, North Carolina State University JSM, Montreal, QC, August 5, 2013 Acknowledgments Zhiqiang Tan Jamie

More information

Grundlagen der Künstlichen Intelligenz

Grundlagen der Künstlichen Intelligenz Grundlagen der Künstlichen Intelligenz Uncertainty & Probabilities & Bandits Daniel Hennes 16.11.2017 (WS 2017/18) University Stuttgart - IPVS - Machine Learning & Robotics 1 Today Uncertainty Probability

More information

Grundlagen der Künstlichen Intelligenz

Grundlagen der Künstlichen Intelligenz Grundlagen der Künstlichen Intelligenz Reinforcement learning Daniel Hennes 4.12.2017 (WS 2017/18) University Stuttgart - IPVS - Machine Learning & Robotics 1 Today Reinforcement learning Model based and

More information

Comparing Adaptive Interventions Using Data Arising from a SMART: With Application to Autism, ADHD, and Mood Disorders

Comparing Adaptive Interventions Using Data Arising from a SMART: With Application to Autism, ADHD, and Mood Disorders Comparing Adaptive Interventions Using Data Arising from a SMART: With Application to Autism, ADHD, and Mood Disorders Daniel Almirall, Xi Lu, Connie Kasari, Inbal N-Shani, Univ. of Michigan, Univ. of

More information

Core Courses for Students Who Enrolled Prior to Fall 2018

Core Courses for Students Who Enrolled Prior to Fall 2018 Biostatistics and Applied Data Analysis Students must take one of the following two sequences: Sequence 1 Biostatistics and Data Analysis I (PHP 2507) This course, the first in a year long, two-course

More information

Switching-state Dynamical Modeling of Daily Behavioral Data

Switching-state Dynamical Modeling of Daily Behavioral Data Switching-state Dynamical Modeling of Daily Behavioral Data Randy Ardywibowo Shuai Huang Cao Xiao Shupeng Gui Yu Cheng Ji Liu Xiaoning Qian Texas A&M University University of Washington IBM T.J. Watson

More information

Reinforcement Learning and NLP

Reinforcement Learning and NLP 1 Reinforcement Learning and NLP Kapil Thadani kapil@cs.columbia.edu RESEARCH Outline 2 Model-free RL Markov decision processes (MDPs) Derivative-free optimization Policy gradients Variance reduction Value

More information

Extracting mobility behavior from cell phone data DATA SIM Summer School 2013

Extracting mobility behavior from cell phone data DATA SIM Summer School 2013 Extracting mobility behavior from cell phone data DATA SIM Summer School 2013 PETER WIDHALM Mobility Department Dynamic Transportation Systems T +43(0) 50550-6655 F +43(0) 50550-6439 peter.widhalm@ait.ac.at

More information

CS 188: Artificial Intelligence Spring Announcements

CS 188: Artificial Intelligence Spring Announcements CS 188: Artificial Intelligence Spring 2011 Lecture 12: Probability 3/2/2011 Pieter Abbeel UC Berkeley Many slides adapted from Dan Klein. 1 Announcements P3 due on Monday (3/7) at 4:59pm W3 going out

More information

Outline

Outline 2559 Outline cvonck@111zeelandnet.nl 1. Review of analysis of variance (ANOVA), simple regression analysis (SRA), and path analysis (PA) 1.1 Similarities and differences between MRA with dummy variables

More information

Lecture 9: Learning Optimal Dynamic Treatment Regimes. Donglin Zeng, Department of Biostatistics, University of North Carolina

Lecture 9: Learning Optimal Dynamic Treatment Regimes. Donglin Zeng, Department of Biostatistics, University of North Carolina Lecture 9: Learning Optimal Dynamic Treatment Regimes Introduction Refresh: Dynamic Treatment Regimes (DTRs) DTRs: sequential decision rules, tailored at each stage by patients time-varying features and

More information

Agronomy at scale Principles and approaches with examples from

Agronomy at scale Principles and approaches with examples from Agronomy at scale Principles and approaches with examples from Pieter Pypers, 22-11-2016 What is agronomy at scale? Definition: agronomy = the science of soil management and crop production at scale =

More information

Planning in Markov Decision Processes

Planning in Markov Decision Processes Carnegie Mellon School of Computer Science Deep Reinforcement Learning and Control Planning in Markov Decision Processes Lecture 3, CMU 10703 Katerina Fragkiadaki Markov Decision Process (MDP) A Markov

More information

SVAN 2016 Mini Course: Stochastic Convex Optimization Methods in Machine Learning

SVAN 2016 Mini Course: Stochastic Convex Optimization Methods in Machine Learning SVAN 2016 Mini Course: Stochastic Convex Optimization Methods in Machine Learning Mark Schmidt University of British Columbia, May 2016 www.cs.ubc.ca/~schmidtm/svan16 Some images from this lecture are

More information

A Measure of Robustness to Misspecification

A Measure of Robustness to Misspecification A Measure of Robustness to Misspecification Susan Athey Guido W. Imbens December 2014 Graduate School of Business, Stanford University, and NBER. Electronic correspondence: athey@stanford.edu. Graduate

More information

geographic patterns and processes are captured and represented using computer technologies

geographic patterns and processes are captured and represented using computer technologies Proposed Certificate in Geographic Information Science Department of Geographical and Sustainability Sciences Submitted: November 9, 2016 Geographic information systems (GIS) capture the complex spatial

More information

A Causal Agent Quantum Ontology

A Causal Agent Quantum Ontology A Causal Agent Quantum Ontology Kathryn Blackmond Laskey Department of Systems Engineering and Operations Research George Mason University Presented at Quantum Interaction 2008 Overview Need for foundational

More information

Lecture 13: Covariance. Lisa Yan July 25, 2018

Lecture 13: Covariance. Lisa Yan July 25, 2018 Lecture 13: Covariance Lisa Yan July 25, 2018 Announcements Hooray midterm Grades (hopefully) by Monday Problem Set #3 Should be graded by Monday as well (instead of Friday) Quick note about Piazza 2 Goals

More information

Chapter 6: Temporal Difference Learning

Chapter 6: Temporal Difference Learning Chapter 6: emporal Difference Learning Objectives of this chapter: Introduce emporal Difference (D) learning Focus first on policy evaluation, or prediction, methods Compare efficiency of D learning with

More information

Time Planning and Control. Time-Scaled Network

Time Planning and Control. Time-Scaled Network Time Planning and Control Time-Scaled Network Processes of Time Planning and Control 1.Visualize and define the activities..sequence the activities (Job Logic). 3.stimate the activity duration. 4.Schedule

More information

Counterfactual Model for Learning Systems

Counterfactual Model for Learning Systems Counterfactual Model for Learning Systems CS 7792 - Fall 28 Thorsten Joachims Department of Computer Science & Department of Information Science Cornell University Imbens, Rubin, Causal Inference for Statistical

More information

Reinforcement Learning. Introduction

Reinforcement Learning. Introduction Reinforcement Learning Introduction Reinforcement Learning Agent interacts and learns from a stochastic environment Science of sequential decision making Many faces of reinforcement learning Optimal control

More information

Economics 583: Econometric Theory I A Primer on Asymptotics

Economics 583: Econometric Theory I A Primer on Asymptotics Economics 583: Econometric Theory I A Primer on Asymptotics Eric Zivot January 14, 2013 The two main concepts in asymptotic theory that we will use are Consistency Asymptotic Normality Intuition consistency:

More information

CSC321 Lecture 22: Q-Learning

CSC321 Lecture 22: Q-Learning CSC321 Lecture 22: Q-Learning Roger Grosse Roger Grosse CSC321 Lecture 22: Q-Learning 1 / 21 Overview Second of 3 lectures on reinforcement learning Last time: policy gradient (e.g. REINFORCE) Optimize

More information

Markov Decision Processes Chapter 17. Mausam

Markov Decision Processes Chapter 17. Mausam Markov Decision Processes Chapter 17 Mausam Planning Agent Static vs. Dynamic Fully vs. Partially Observable Environment What action next? Deterministic vs. Stochastic Perfect vs. Noisy Instantaneous vs.

More information

Outline of GLMs. Definitions

Outline of GLMs. Definitions Outline of GLMs Definitions This is a short outline of GLM details, adapted from the book Nonparametric Regression and Generalized Linear Models, by Green and Silverman. The responses Y i have density

More information

Today s Outline. Recap: MDPs. Bellman Equations. Q-Value Iteration. Bellman Backup 5/7/2012. CSE 473: Artificial Intelligence Reinforcement Learning

Today s Outline. Recap: MDPs. Bellman Equations. Q-Value Iteration. Bellman Backup 5/7/2012. CSE 473: Artificial Intelligence Reinforcement Learning CSE 473: Artificial Intelligence Reinforcement Learning Dan Weld Today s Outline Reinforcement Learning Q-value iteration Q-learning Exploration / exploitation Linear function approximation Many slides

More information

Bias-Variance Error Bounds for Temporal Difference Updates

Bias-Variance Error Bounds for Temporal Difference Updates Bias-Variance Bounds for Temporal Difference Updates Michael Kearns AT&T Labs mkearns@research.att.com Satinder Singh AT&T Labs baveja@research.att.com Abstract We give the first rigorous upper bounds

More information

CMU Lecture 12: Reinforcement Learning. Teacher: Gianni A. Di Caro

CMU Lecture 12: Reinforcement Learning. Teacher: Gianni A. Di Caro CMU 15-781 Lecture 12: Reinforcement Learning Teacher: Gianni A. Di Caro REINFORCEMENT LEARNING Transition Model? State Action Reward model? Agent Goal: Maximize expected sum of future rewards 2 MDP PLANNING

More information

Sign Changes By Bart Hopkins Jr. READ ONLINE

Sign Changes By Bart Hopkins Jr. READ ONLINE Sign Changes By Bart Hopkins Jr. READ ONLINE New Treatments For Depression. Helping you find available treatments for depression and anxiety 9/22/2017 Get DeJ Loaf's "Changes" here: Sign in to make your

More information

Chapter 6: Temporal Difference Learning

Chapter 6: Temporal Difference Learning Chapter 6: emporal Difference Learning Objectives of this chapter: Introduce emporal Difference (D) learning Focus first on policy evaluation, or prediction, methods hen extend to control methods R. S.

More information

Reinforcement Learning Wrap-up

Reinforcement Learning Wrap-up Reinforcement Learning Wrap-up Slides courtesy of Dan Klein and Pieter Abbeel University of California, Berkeley [These slides were created by Dan Klein and Pieter Abbeel for CS188 Intro to AI at UC Berkeley.

More information

ECE 5984: Introduction to Machine Learning

ECE 5984: Introduction to Machine Learning ECE 5984: Introduction to Machine Learning Topics: Ensemble Methods: Bagging, Boosting Readings: Murphy 16.4; Hastie 16 Dhruv Batra Virginia Tech Administrativia HW3 Due: April 14, 11:55pm You will implement

More information

Machine Learning for OR & FE

Machine Learning for OR & FE Machine Learning for OR & FE Regression II: Regularization and Shrinkage Methods Martin Haugh Department of Industrial Engineering and Operations Research Columbia University Email: martin.b.haugh@gmail.com

More information

You know I m not goin diss you on the internet Cause my mama taught me better than that I m a survivor (What?) I m not goin give up (What?

You know I m not goin diss you on the internet Cause my mama taught me better than that I m a survivor (What?) I m not goin give up (What? You know I m not goin diss you on the internet Cause my mama taught me better than that I m a survivor (What?) I m not goin give up (What?) I m not goin stop (What?) I m goin work harder (What?) Sir David

More information

CS 6375 Machine Learning

CS 6375 Machine Learning CS 6375 Machine Learning Nicholas Ruozzi University of Texas at Dallas Slides adapted from David Sontag and Vibhav Gogate Course Info. Instructor: Nicholas Ruozzi Office: ECSS 3.409 Office hours: Tues.

More information

Linear regression COMS 4771

Linear regression COMS 4771 Linear regression COMS 4771 1. Old Faithful and prediction functions Prediction problem: Old Faithful geyser (Yellowstone) Task: Predict time of next eruption. 1 / 40 Statistical model for time between

More information

Timing Recovery at Low SNR Cramer-Rao bound, and outperforming the PLL

Timing Recovery at Low SNR Cramer-Rao bound, and outperforming the PLL T F T I G E O R G A I N S T I T U T E O H E O F E A L P R O G R ESS S A N D 1 8 8 5 S E R V L O G Y I C E E C H N O Timing Recovery at Low SNR Cramer-Rao bound, and outperforming the PLL Aravind R. Nayak

More information

An Analytic Solution to Discrete Bayesian Reinforcement Learning

An Analytic Solution to Discrete Bayesian Reinforcement Learning An Analytic Solution to Discrete Bayesian Reinforcement Learning Pascal Poupart (U of Waterloo) Nikos Vlassis (U of Amsterdam) Jesse Hoey (U of Toronto) Kevin Regan (U of Waterloo) 1 Motivation Automated

More information

Introduction to Reinforcement Learning

Introduction to Reinforcement Learning CSCI-699: Advanced Topics in Deep Learning 01/16/2019 Nitin Kamra Spring 2019 Introduction to Reinforcement Learning 1 What is Reinforcement Learning? So far we have seen unsupervised and supervised learning.

More information

A Gate-keeping Approach for Selecting Adaptive Interventions under General SMART Designs

A Gate-keeping Approach for Selecting Adaptive Interventions under General SMART Designs 1 / 32 A Gate-keeping Approach for Selecting Adaptive Interventions under General SMART Designs Tony Zhong, DrPH Icahn School of Medicine at Mount Sinai (Feb 20, 2019) Workshop on Design of mhealth Intervention

More information

Statistics in medicine

Statistics in medicine Statistics in medicine Lecture 3: Bivariate association : Categorical variables Proportion in one group One group is measured one time: z test Use the z distribution as an approximation to the binomial

More information

Autonomous Helicopter Flight via Reinforcement Learning

Autonomous Helicopter Flight via Reinforcement Learning Autonomous Helicopter Flight via Reinforcement Learning Authors: Andrew Y. Ng, H. Jin Kim, Michael I. Jordan, Shankar Sastry Presenters: Shiv Ballianda, Jerrolyn Hebert, Shuiwang Ji, Kenley Malveaux, Huy

More information

Big-Geo-Data EHR Infrastructure Development for On-Demand Analytics

Big-Geo-Data EHR Infrastructure Development for On-Demand Analytics Big-Geo-Data EHR Infrastructure Development for On-Demand Analytics Sohayla Pruitt, MA Senior Geospatial Scientist Duke Medicine DUHS DHTS EIM HIRS Page 1 Institute of Medicine, World Health Organization,

More information

Lecture 18: Reinforcement Learning Sanjeev Arora Elad Hazan

Lecture 18: Reinforcement Learning Sanjeev Arora Elad Hazan COS 402 Machine Learning and Artificial Intelligence Fall 2016 Lecture 18: Reinforcement Learning Sanjeev Arora Elad Hazan Some slides borrowed from Peter Bodik and David Silver Course progress Learning

More information

Reinforcement Learning

Reinforcement Learning Reinforcement Learning Ron Parr CompSci 7 Department of Computer Science Duke University With thanks to Kris Hauser for some content RL Highlights Everybody likes to learn from experience Use ML techniques

More information

Lecture 25. Ingo Ruczinski. November 24, Department of Biostatistics Johns Hopkins Bloomberg School of Public Health Johns Hopkins University

Lecture 25. Ingo Ruczinski. November 24, Department of Biostatistics Johns Hopkins Bloomberg School of Public Health Johns Hopkins University Lecture 25 Department of Biostatistics Johns Hopkins Bloomberg School of Public Health Johns Hopkins University November 24, 2015 1 2 3 4 5 6 7 8 9 10 11 1 Hypothesis s of homgeneity 2 Estimating risk

More information

CSE 546 Final Exam, Autumn 2013

CSE 546 Final Exam, Autumn 2013 CSE 546 Final Exam, Autumn 0. Personal info: Name: Student ID: E-mail address:. There should be 5 numbered pages in this exam (including this cover sheet).. You can use any material you brought: any book,

More information

Prof. Dr. Ann Nowé. Artificial Intelligence Lab ai.vub.ac.be

Prof. Dr. Ann Nowé. Artificial Intelligence Lab ai.vub.ac.be REINFORCEMENT LEARNING AN INTRODUCTION Prof. Dr. Ann Nowé Artificial Intelligence Lab ai.vub.ac.be REINFORCEMENT LEARNING WHAT IS IT? What is it? Learning from interaction Learning about, from, and while

More information

ARTIFICIAL INTELLIGENCE. Reinforcement learning

ARTIFICIAL INTELLIGENCE. Reinforcement learning INFOB2KI 2018-2019 Utrecht University The Netherlands ARTIFICIAL INTELLIGENCE Reinforcement learning Lecturer: Silja Renooij These slides are part of the INFOB2KI Course Notes available from www.cs.uu.nl/docs/vakken/b2ki/schema.html

More information

Module 8 Linear Programming. CS 886 Sequential Decision Making and Reinforcement Learning University of Waterloo

Module 8 Linear Programming. CS 886 Sequential Decision Making and Reinforcement Learning University of Waterloo Module 8 Linear Programming CS 886 Sequential Decision Making and Reinforcement Learning University of Waterloo Policy Optimization Value and policy iteration Iterative algorithms that implicitly solve

More information

Introduction to Bayesian Statistics and Markov Chain Monte Carlo Estimation. EPSY 905: Multivariate Analysis Spring 2016 Lecture #10: April 6, 2016

Introduction to Bayesian Statistics and Markov Chain Monte Carlo Estimation. EPSY 905: Multivariate Analysis Spring 2016 Lecture #10: April 6, 2016 Introduction to Bayesian Statistics and Markov Chain Monte Carlo Estimation EPSY 905: Multivariate Analysis Spring 2016 Lecture #10: April 6, 2016 EPSY 905: Intro to Bayesian and MCMC Today s Class An

More information

LECTURE 5. Introduction to Econometrics. Hypothesis testing

LECTURE 5. Introduction to Econometrics. Hypothesis testing LECTURE 5 Introduction to Econometrics Hypothesis testing October 18, 2016 1 / 26 ON TODAY S LECTURE We are going to discuss how hypotheses about coefficients can be tested in regression models We will

More information

Machine Learning CSE546 Carlos Guestrin University of Washington. October 7, Efficiency: If size(w) = 100B, each prediction is expensive:

Machine Learning CSE546 Carlos Guestrin University of Washington. October 7, Efficiency: If size(w) = 100B, each prediction is expensive: Simple Variable Selection LASSO: Sparse Regression Machine Learning CSE546 Carlos Guestrin University of Washington October 7, 2013 1 Sparsity Vector w is sparse, if many entries are zero: Very useful

More information

Hypothesis Tests and Estimation for Population Variances. Copyright 2014 Pearson Education, Inc.

Hypothesis Tests and Estimation for Population Variances. Copyright 2014 Pearson Education, Inc. Hypothesis Tests and Estimation for Population Variances 11-1 Learning Outcomes Outcome 1. Formulate and carry out hypothesis tests for a single population variance. Outcome 2. Develop and interpret confidence

More information

Reinforcement Learning Active Learning

Reinforcement Learning Active Learning Reinforcement Learning Active Learning Alan Fern * Based in part on slides by Daniel Weld 1 Active Reinforcement Learning So far, we ve assumed agent has a policy We just learned how good it is Now, suppose

More information

Lecture 3. Linear Regression II Bastian Leibe RWTH Aachen

Lecture 3. Linear Regression II Bastian Leibe RWTH Aachen Advanced Machine Learning Lecture 3 Linear Regression II 02.11.2015 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de/ leibe@vision.rwth-aachen.de This Lecture: Advanced Machine Learning Regression

More information

Probabilistic classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2016

Probabilistic classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2016 Probabilistic classification CE-717: Machine Learning Sharif University of Technology M. Soleymani Fall 2016 Topics Probabilistic approach Bayes decision theory Generative models Gaussian Bayes classifier

More information

Introduction to Reinforcement Learning. Part 5: Temporal-Difference Learning

Introduction to Reinforcement Learning. Part 5: Temporal-Difference Learning Introduction to Reinforcement Learning Part 5: emporal-difference Learning What everybody should know about emporal-difference (D) learning Used to learn value functions without human input Learns a guess

More information

Reinforcement learning an introduction

Reinforcement learning an introduction Reinforcement learning an introduction Prof. Dr. Ann Nowé Computational Modeling Group AIlab ai.vub.ac.be November 2013 Reinforcement Learning What is it? Learning from interaction Learning about, from,

More information

Ch 2: Simple Linear Regression

Ch 2: Simple Linear Regression Ch 2: Simple Linear Regression 1. Simple Linear Regression Model A simple regression model with a single regressor x is y = β 0 + β 1 x + ɛ, where we assume that the error ɛ is independent random component

More information

Integrated approaches for analysis of cluster randomised trials

Integrated approaches for analysis of cluster randomised trials Integrated approaches for analysis of cluster randomised trials Invited Session 4.1 - Recent developments in CRTs Joint work with L. Turner, F. Li, J. Gallis and D. Murray Mélanie PRAGUE - SCT 2017 - Liverpool

More information

Linear Classification. CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington

Linear Classification. CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington Linear Classification CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington 1 Example of Linear Classification Red points: patterns belonging

More information

REINFORCEMENT LEARNING

REINFORCEMENT LEARNING REINFORCEMENT LEARNING Larry Page: Where s Google going next? DeepMind's DQN playing Breakout Contents Introduction to Reinforcement Learning Deep Q-Learning INTRODUCTION TO REINFORCEMENT LEARNING Contents

More information

Sum-Product Networks. STAT946 Deep Learning Guest Lecture by Pascal Poupart University of Waterloo October 17, 2017

Sum-Product Networks. STAT946 Deep Learning Guest Lecture by Pascal Poupart University of Waterloo October 17, 2017 Sum-Product Networks STAT946 Deep Learning Guest Lecture by Pascal Poupart University of Waterloo October 17, 2017 Introduction Outline What is a Sum-Product Network? Inference Applications In more depth

More information

L2: Two-variable regression model

L2: Two-variable regression model L2: Two-variable regression model Feng Li feng.li@cufe.edu.cn School of Statistics and Mathematics Central University of Finance and Economics Revision: September 4, 2014 What we have learned last time...

More information

Large-scale Information Processing, Summer Recommender Systems (part 2)

Large-scale Information Processing, Summer Recommender Systems (part 2) Large-scale Information Processing, Summer 2015 5 th Exercise Recommender Systems (part 2) Emmanouil Tzouridis tzouridis@kma.informatik.tu-darmstadt.de Knowledge Mining & Assessment SVM question When a

More information

Machine Learning

Machine Learning Machine Learning 10-601 Tom M. Mitchell Machine Learning Department Carnegie Mellon University February 2, 2015 Today: Logistic regression Generative/Discriminative classifiers Readings: (see class website)

More information

Lecture 25: Learning 4. Victor R. Lesser. CMPSCI 683 Fall 2010

Lecture 25: Learning 4. Victor R. Lesser. CMPSCI 683 Fall 2010 Lecture 25: Learning 4 Victor R. Lesser CMPSCI 683 Fall 2010 Final Exam Information Final EXAM on Th 12/16 at 4:00pm in Lederle Grad Res Ctr Rm A301 2 Hours but obviously you can leave early! Open Book

More information

An Introduction to Causal Mediation Analysis. Xu Qin University of Chicago Presented at the Central Iowa R User Group Meetup Aug 10, 2016

An Introduction to Causal Mediation Analysis. Xu Qin University of Chicago Presented at the Central Iowa R User Group Meetup Aug 10, 2016 An Introduction to Causal Mediation Analysis Xu Qin University of Chicago Presented at the Central Iowa R User Group Meetup Aug 10, 2016 1 Causality In the applications of statistics, many central questions

More information

Preface Introduction to Statistics and Data Analysis Overview: Statistical Inference, Samples, Populations, and Experimental Design The Role of

Preface Introduction to Statistics and Data Analysis Overview: Statistical Inference, Samples, Populations, and Experimental Design The Role of Preface Introduction to Statistics and Data Analysis Overview: Statistical Inference, Samples, Populations, and Experimental Design The Role of Probability Sampling Procedures Collection of Data Measures

More information

Course 16:198:520: Introduction To Artificial Intelligence Lecture 13. Decision Making. Abdeslam Boularias. Wednesday, December 7, 2016

Course 16:198:520: Introduction To Artificial Intelligence Lecture 13. Decision Making. Abdeslam Boularias. Wednesday, December 7, 2016 Course 16:198:520: Introduction To Artificial Intelligence Lecture 13 Decision Making Abdeslam Boularias Wednesday, December 7, 2016 1 / 45 Overview We consider probabilistic temporal models where the

More information

Short Course: Multiagent Systems. Multiagent Systems. Lecture 1: Basics Agents Environments. Reinforcement Learning. This course is about:

Short Course: Multiagent Systems. Multiagent Systems. Lecture 1: Basics Agents Environments. Reinforcement Learning. This course is about: Short Course: Multiagent Systems Lecture 1: Basics Agents Environments Reinforcement Learning Multiagent Systems This course is about: Agents: Sensing, reasoning, acting Multiagent Systems: Distributed

More information

Econometrics. Week 11. Fall Institute of Economic Studies Faculty of Social Sciences Charles University in Prague

Econometrics. Week 11. Fall Institute of Economic Studies Faculty of Social Sciences Charles University in Prague Econometrics Week 11 Institute of Economic Studies Faculty of Social Sciences Charles University in Prague Fall 2012 1 / 30 Recommended Reading For the today Advanced Time Series Topics Selected topics

More information