Doubly Robust Policy Evaluation and Learning

Size: px
Start display at page:

Download "Doubly Robust Policy Evaluation and Learning"

Transcription

1 Doubly Robust Policy Evaluation and Learning Miroslav Dudik, John Langford and Lihong Li Yahoo! Research Discussed by Miao Liu October 9, 2011 October 9, / 17

2 1 Introduction 2 Problem Definition and Approaches Two Common Solutions Doubly Robust (DR) Estimator 3 Experiments Multiclass Classification with Bandit Feedback Estimating Average User Visits October 9, / 17

3 This paper is about decision making in environment where feedback is received only for chosen action (medical studies, content recommendation etc). These problems are instance of contextual bandits. Definition (Contextual bandit problem) In a contextual bandits problem, there is a distribution D over pairs (x, r ) where x X is the context (feature vector), a A {1,...,k} is one of the k arms to be pulled, and r [0, 1] A is a vector of rewards. The problem is a repeated game: on each round, a sample (x, r ) is drawn from D, the context x is announced, and then for precisely one arm a chosen by the player, its reward r a is revealed. Definition (Contextual bandit algorithm) A contextual bandits algorithm B determines an arm a A {1,...,k} to pull at each time step t, based on previous observation sequence (x 1, a 1, r a,1 ),..., (x t 1, a t 1, r a,t 1 ), and the current context x t. October 9, / 17

4 Definition The input data generation A new example (x, r ) D. Policy a p(a x, h), where h is the history of previous observations, i.e. the concatenation of all preceding contexts, actions and observed rewards. Reward r a is revealed, whereas other rewards r a with a a are not observed Note the distribution D and policy p are unknown. Given a data set S = (x, h, a, r a ), the tasks of interest are 1 Policy evaluation V π = E (x, r ) D [r π(x) ] 2 Policy optimization π = argmax π V π Better evaluation generally leads to better optimization. The key challenge here: only partial information about the reward is available which prohibits direct policy evaluation on data S. October 9, / 17

5 Two common solutions I. Direct method (DM) Notes: Forms an estimate ˆρ a (x) of the expected reward conditioned on the context and action. Estimates the policy value by ˆV DM π = 1 ˆρ S π(x) (x). (1) x S If ˆρ a (x) is a good approximation of the true expected reward, defined as ρ a (x) = E (x, r ) D [r a x], then (1) is a close to V π. If ˆρ a (x) is unbiased, ˆV π DM is an unbiased estimate of V π. A problem with DM: without knowing π, the estimate ˆρ is biased. October 9, / 17

6 Two common solutions II. Inverse propensity score (IPS) Forms an approximation ˆp(a x, h) of p(a x, h). Corrects for the shift in action proportions between the old, data collection policy and the new policy: ˆV π IPS = 1 S (x,h,a,r a) S r a I(π(x) = a) ˆp(a x, h) (2) Notes: where I( ) is an indicator function. If ˆp(a x, h) p(a x, h), then (2) will approximately be an unbiased estimate of V π (from importance sampling theory). Since data collection policy p(a x, h) is typically well understood, it is often easier to obtain a good estimate ˆp. However, IPS typically has a much larger variance, due to the range of random variable increasing (Especially when p(a x, h) gets smaller). October 9, / 17

7 Doubly Robust (DR) Estimator DR estimators take advantage of both the estimate of expected reward ˆρ a (x) and estimate of action probabilities ˆp(a x, h). DR estimators was first suggested by Cassel et al. (1976) for regression, but not studied for policy learning: [ ˆV DR π = 1 (ra ˆρ a (x))i ( π(x) = a ) ] + ˆρ S ˆp(a x, h) π(x) (x) (x,h,a,r a) S (3) Informally, (3) uses ˆρ as a baseline and if there is data available, a correction is applied. The estimator (3) is accurate if at least one the estimators, ˆρ and ˆp, is accurate. A basic question How does (3) perform as the estimates ˆρ and ˆp deviate from the truth? October 9, / 17

8 Bias Analysis Denote the additive deviation of ˆρ from ρ and δ a multiplicative deviation of ˆp from p: (a, x) = ˆρ a (x) ρ a (x), δ(a, x, h) = 1 p(a x, h)/ˆp(a x, h) (4) To remove clutter, use shorthands ρ a for ρ a (x), I for I(π(x) = a), p for p(π(x) x, h), ˆp for ˆp(π(x) x, h), and δ for δ(π(x), x, h). To evaluate E[ ˆV DR π ], it suffices to focus on a single term in Eq.(3), conditioning on h; [ ] (ra ˆρ a )I E (x, r ) D,a p( x,h) + ˆρ ˆp π(x) [ ] (ra ρ a )I = E x, r,a h + ρ ˆp π(x) + (5) [ ] (ρa ρ a )I = E x,a h + (1 I/ˆp) + E x [ρ π (x)] ˆp = E x h [ (1 p/ˆp)] + V π = E x h [ δ] + V π. October 9, / 17

9 Theorem Let and δ be defined as above. Then, the bias of the doubly robust estimator is [ E S [ ˆV DR π ] V π = 1 S E S δ] (x,h) S If the past policy and the past policy estimate are stationary (i.e., independent of h), the expression simplifies to E S [ ˆV π DR ] V π = E x [ δ]. In contrast (assume stationarity): E S [ ˆV π DM ] V π = E x [ ] E S [ ˆV π IPS ] V π = E x [ρ π(x) δ], (6) IPS is a special case of DR for ˆρ a (x) 0 If either 0, or δ 0, ˆV DR π V π, while DM requires 0 and IPS requires δ 0. October 9, / 17

10 Variance Analysis Define ɛ = (r a ρ a )I/ˆp and note E[ɛ x, a] = 0, we have [( )] (ra ˆρ a )I E x, r,a + ˆρ p π(x) =E x, r,a [ɛ 2 ] + E x [ρ 2 π(x) ] + 2E x,a[ρ π(x) (1 I/ˆp)] + E x,a [ 2 (1 I/ˆp) 2 ] =E x, r,a [ɛ 2 ] + E x [ρ 2 π(x) ] + 2E x[ρ π(x) δ] + E x [ 2 (1 2p/ˆp + p/ˆp 2 )] =E x, r,a [ɛ 2 ] + E x [ρ 2 π(x) ] + 2E x[ρ π(x) δ] + E x [ 2 (1 2p/ˆp + p 2 /ˆp 2 + p(1 p)/ˆp 2 )] =E x, r,a [ɛ 2 ] + E x [(ρ π(x)+ δ ) 2 ] + E x [ 2 p(1 p)/ˆp 2 ] =E x, r,a [ɛ 2 ] + E x [(ρ π(x)+ δ ) 2 ] + E x [ 1 p p 2 (1 δ) 2 ] Summing across all terms in Eq.3 and combining with Theorem3, the variance is obtained October 9, / 17

11 Theorem Let, δ and ɛ be defined as above. If the past policy and the policy estimate are stationary, then the variance of the doubly robust estimator is ( Var[ ˆV DR] π = 1 [ ] ) E x, r,a [ɛ 2 1 p ] + Var x[ρ π(x) (x) + δ] + E x 2 (1 δ) 2. S p E x, r,a [ɛ 2 ] accounts for the randomness in rewards. Var x[ρ π(x) + δ] is the variance of the estimator due to the randomness in x. The last term can be viewed as the importance weighting penalty. Similar expression can be derived for the IPS estimator ( Var[ ˆV IPS] π = 1 [ E x, r,a [ɛ 2 1 p ]+Var x[ρ π(x) (x) ρ π(x) δ]+e x S p The variance of the estimator for the direct method Var[ ˆV π DM] = 1 S Varx[ρ π(x)(x) + ] ρ π(x) 2 (1 δ) 2 ] ). which does not have terms depending either on the past policy or the randomness in the rewards. October 9, / 17

12 Data Setup Assume data are drawn IID from a fixed distribution: (x, c) D, where x X is the feature vector and c {1, 2,, k} is the class label. A typical goal is to find a classifer π : X {1, 2,, k} minimizing the classification error: e(π) = E (x,c) D [I(π(x) c)] A classifier π may be interpreted as an action-selection policy, i.e. the data point (x, c) can be turned into a cost-sensitive classification example (x, l 1,, l k ), where l a = I(a c) is the loss for predicting a. A partially labeled data is constructed by randomly selecting a label a UNIF(1,, k) and revealing the component l a. The final data is in the form of (x, a, l a ). p(a x) = 1/k is assumed to be known. October 9, / 17

13 Policy Evaluation For each data set: V π = E (x, r ) D [r π(x) ] 1 Randomly split data into training and test sets of roughly the same size; 2 Obtain a classifier π by running a direct loss minimization (DLM) algorithm of McAllester et al. (2011) on the training set with fully revealed losses; 3 Compute the classification error on fully observed test data (ground truth); 4 Obtain a partially labeled test set. Both DM and DR require estimating the expected loss denoted as l(x, a) for given (x, a). Use a liner loss model: ˆl(x, a) = w a x, parameterized by k weighted vectors {w a } a {1,,k}, and use least-squares ridge regression to fit w a. Both IPS and DR are unbiased since p(a h) = 1/k is accurate. October 9, / 17

14 Policy Optimization π = argmax π V π Repeat the following steps 30 times: 1 Randomly split data into tranining (70%) and test (30%) sets; 2 Obtain a partially labeled training data set; 3 Use IPS and DR estimators to impute unrevealed losses in the training data; 4 Two cost-sensitive multiclass classification algorithms are used to learn a classifier from the losses completed by either IPS or DR: the first is DLM (McAllester et al., 2011), the other is the Filter Tree reduction of Beygelzimer et al. (2008). 5 Evaluate the learned classifiers on the test data. October 9, / 17

15 Performance Comparisons October 9, / 17

16 Estimating Average User Visits The dataset: D = {(b i, x i, v i )} i=1,,n, where N = , b i is the ith(unique) bcookie 1, x i {0, 1} 5000 is the corresponding (sparse) binary feature vector, and v i is the number of visits. A sample mean might be a biased estimator if the uniform sampling scheme is not ensured. This is known as covariate shift 2, a special case of contextual bandit problem with k = 2 arms. The partially labeled data consists of tuples (x i, a i, r i ), where a i {0, 1} indicates whether bcookie b i is sampled, r i = p i v i is the observed number of visits, and p i is the probability that a i = 1. DR estimator requires building a reward model ˆρ(x) = ωx The sampling proabilities p i of the b i is defined as (Gretton et al. 208)p i = min{n ( x i, x ), 1}, where x is the first principle component of all features {x i }. 1 A bcookie is unique string that identifies a user. 2 October 9, / 17

17 Performance Comparisons Figure: 3. Comparison of IPS and DR: rmse(left), bias(right). The ground truth value is October 9, / 17

Doubly Robust Policy Evaluation and Learning

Doubly Robust Policy Evaluation and Learning Miroslav Dudík John Langford Yahoo! Research, New York, NY, USA 10018 Lihong Li Yahoo! Research, Santa Clara, CA, USA 95054 MDUDIK@YAHOO-INC.COM JL@YAHOO-INC.COM LIHONG@YAHOO-INC.COM Abstract We study

More information

Learning with Exploration

Learning with Exploration Learning with Exploration John Langford (Yahoo!) { With help from many } Austin, March 24, 2011 Yahoo! wants to interactively choose content and use the observed feedback to improve future content choices.

More information

Doubly Robust Policy Evaluation and Optimization 1

Doubly Robust Policy Evaluation and Optimization 1 Statistical Science 2014, Vol. 29, No. 4, 485 511 DOI: 10.1214/14-STS500 Institute of Mathematical Statistics, 2014 Doubly Robust Policy valuation and Optimization 1 Miroslav Dudík, Dumitru rhan, John

More information

Doubly Robust Policy Evaluation and Optimization 1

Doubly Robust Policy Evaluation and Optimization 1 Statistical Science 2014, Vol. 29, No. 4, 485 511 DOI: 10.1214/14-STS500 c Institute of Mathematical Statistics, 2014 Doubly Robust Policy Evaluation and Optimization 1 Miroslav Dudík, Dumitru Erhan, John

More information

Reducing contextual bandits to supervised learning

Reducing contextual bandits to supervised learning Reducing contextual bandits to supervised learning Daniel Hsu Columbia University Based on joint work with A. Agarwal, S. Kale, J. Langford, L. Li, and R. Schapire 1 Learning to interact: example #1 Practicing

More information

Counterfactual Evaluation and Learning

Counterfactual Evaluation and Learning SIGIR 26 Tutorial Counterfactual Evaluation and Learning Adith Swaminathan, Thorsten Joachims Department of Computer Science & Department of Information Science Cornell University Website: http://www.cs.cornell.edu/~adith/cfactsigir26/

More information

Outline. Offline Evaluation of Online Metrics Counterfactual Estimation Advanced Estimators. Case Studies & Demo Summary

Outline. Offline Evaluation of Online Metrics Counterfactual Estimation Advanced Estimators. Case Studies & Demo Summary Outline Offline Evaluation of Online Metrics Counterfactual Estimation Advanced Estimators 1. Self-Normalized Estimator 2. Doubly Robust Estimator 3. Slates Estimator Case Studies & Demo Summary IPS: Issues

More information

Estimation Considerations in Contextual Bandits

Estimation Considerations in Contextual Bandits Estimation Considerations in Contextual Bandits Maria Dimakopoulou Zhengyuan Zhou Susan Athey Guido Imbens arxiv:1711.07077v4 [stat.ml] 16 Dec 2018 Abstract Contextual bandit algorithms are sensitive to

More information

Learning for Contextual Bandits

Learning for Contextual Bandits Learning for Contextual Bandits Alina Beygelzimer 1 John Langford 2 IBM Research 1 Yahoo! Research 2 NYC ML Meetup, Sept 21, 2010 Example of Learning through Exploration Repeatedly: 1. A user comes to

More information

Slides at:

Slides at: Learning to Interact John Langford @ Microsoft Research (with help from many) Slides at: http://hunch.net/~jl/interact.pdf For demo: Raw RCV1 CCAT-or-not: http://hunch.net/~jl/vw_raw.tar.gz Simple converter:

More information

Counterfactual Model for Learning Systems

Counterfactual Model for Learning Systems Counterfactual Model for Learning Systems CS 7792 - Fall 28 Thorsten Joachims Department of Computer Science & Department of Information Science Cornell University Imbens, Rubin, Causal Inference for Statistical

More information

Off-policy Evaluation and Learning

Off-policy Evaluation and Learning COMS E6998.001: Bandits and Reinforcement Learning Fall 2017 Off-policy Evaluation and Learning Lecturer: Alekh Agarwal Lecture 7, October 11 Note: LaTeX template courtesy of UC Berkeley EECS dept. Disclaimer:

More information

Learning from Logged Implicit Exploration Data

Learning from Logged Implicit Exploration Data Learning from Logged Implicit Exploration Data Alexander L. Strehl Facebook Inc. 1601 S California Ave Palo Alto, CA 94304 astrehl@facebook.com Lihong Li Yahoo! Research 4401 Great America Parkway Santa

More information

1 A Support Vector Machine without Support Vectors

1 A Support Vector Machine without Support Vectors CS/CNS/EE 53 Advanced Topics in Machine Learning Problem Set 1 Handed out: 15 Jan 010 Due: 01 Feb 010 1 A Support Vector Machine without Support Vectors In this question, you ll be implementing an online

More information

Bayesian Contextual Multi-armed Bandits

Bayesian Contextual Multi-armed Bandits Bayesian Contextual Multi-armed Bandits Xiaoting Zhao Joint Work with Peter I. Frazier School of Operations Research and Information Engineering Cornell University October 22, 2012 1 / 33 Outline 1 Motivating

More information

New Algorithms for Contextual Bandits

New Algorithms for Contextual Bandits New Algorithms for Contextual Bandits Lev Reyzin Georgia Institute of Technology Work done at Yahoo! 1 S A. Beygelzimer, J. Langford, L. Li, L. Reyzin, R.E. Schapire Contextual Bandit Algorithms with Supervised

More information

Administration. CSCI567 Machine Learning (Fall 2018) Outline. Outline. HW5 is available, due on 11/18. Practice final will also be available soon.

Administration. CSCI567 Machine Learning (Fall 2018) Outline. Outline. HW5 is available, due on 11/18. Practice final will also be available soon. Administration CSCI567 Machine Learning Fall 2018 Prof. Haipeng Luo U of Southern California Nov 7, 2018 HW5 is available, due on 11/18. Practice final will also be available soon. Remaining weeks: 11/14,

More information

Optimal Treatment Regimes for Survival Endpoints from a Classification Perspective. Anastasios (Butch) Tsiatis and Xiaofei Bai

Optimal Treatment Regimes for Survival Endpoints from a Classification Perspective. Anastasios (Butch) Tsiatis and Xiaofei Bai Optimal Treatment Regimes for Survival Endpoints from a Classification Perspective Anastasios (Butch) Tsiatis and Xiaofei Bai Department of Statistics North Carolina State University 1/35 Optimal Treatment

More information

Linear and Logistic Regression. Dr. Xiaowei Huang

Linear and Logistic Regression. Dr. Xiaowei Huang Linear and Logistic Regression Dr. Xiaowei Huang https://cgi.csc.liv.ac.uk/~xiaowei/ Up to now, Two Classical Machine Learning Algorithms Decision tree learning K-nearest neighbor Model Evaluation Metrics

More information

1 [15 points] Frequent Itemsets Generation With Map-Reduce

1 [15 points] Frequent Itemsets Generation With Map-Reduce Data Mining Learning from Large Data Sets Final Exam Date: 15 August 2013 Time limit: 120 minutes Number of pages: 11 Maximum score: 100 points You can use the back of the pages if you run out of space.

More information

cxx ab.ec Warm up OH 2 ax 16 0 axtb Fix any a, b, c > What is the x 2 R that minimizes ax 2 + bx + c

cxx ab.ec Warm up OH 2 ax 16 0 axtb Fix any a, b, c > What is the x 2 R that minimizes ax 2 + bx + c Warm up D cai.yo.ie p IExrL9CxsYD Sglx.Ddl f E Luo fhlexi.si dbll Fix any a, b, c > 0. 1. What is the x 2 R that minimizes ax 2 + bx + c x a b Ta OH 2 ax 16 0 x 1 Za fhkxiiso3ii draulx.h dp.d 2. What is

More information

Lecture 9: Learning Optimal Dynamic Treatment Regimes. Donglin Zeng, Department of Biostatistics, University of North Carolina

Lecture 9: Learning Optimal Dynamic Treatment Regimes. Donglin Zeng, Department of Biostatistics, University of North Carolina Lecture 9: Learning Optimal Dynamic Treatment Regimes Introduction Refresh: Dynamic Treatment Regimes (DTRs) DTRs: sequential decision rules, tailored at each stage by patients time-varying features and

More information

Adaptive Crowdsourcing via EM with Prior

Adaptive Crowdsourcing via EM with Prior Adaptive Crowdsourcing via EM with Prior Peter Maginnis and Tanmay Gupta May, 205 In this work, we make two primary contributions: derivation of the EM update for the shifted and rescaled beta prior and

More information

Reinforcement Learning as Classification Leveraging Modern Classifiers

Reinforcement Learning as Classification Leveraging Modern Classifiers Reinforcement Learning as Classification Leveraging Modern Classifiers Michail G. Lagoudakis and Ronald Parr Department of Computer Science Duke University Durham, NC 27708 Machine Learning Reductions

More information

Parametric Models. Dr. Shuang LIANG. School of Software Engineering TongJi University Fall, 2012

Parametric Models. Dr. Shuang LIANG. School of Software Engineering TongJi University Fall, 2012 Parametric Models Dr. Shuang LIANG School of Software Engineering TongJi University Fall, 2012 Today s Topics Maximum Likelihood Estimation Bayesian Density Estimation Today s Topics Maximum Likelihood

More information

ISyE 691 Data mining and analytics

ISyE 691 Data mining and analytics ISyE 691 Data mining and analytics Regression Instructor: Prof. Kaibo Liu Department of Industrial and Systems Engineering UW-Madison Email: kliu8@wisc.edu Office: Room 3017 (Mechanical Engineering Building)

More information

Bandit Algorithms. Zhifeng Wang ... Department of Statistics Florida State University

Bandit Algorithms. Zhifeng Wang ... Department of Statistics Florida State University Bandit Algorithms Zhifeng Wang Department of Statistics Florida State University Outline Multi-Armed Bandits (MAB) Exploration-First Epsilon-Greedy Softmax UCB Thompson Sampling Adversarial Bandits Exp3

More information

Data Integration for Big Data Analysis for finite population inference

Data Integration for Big Data Analysis for finite population inference for Big Data Analysis for finite population inference Jae-kwang Kim ISU January 23, 2018 1 / 36 What is big data? 2 / 36 Data do not speak for themselves Knowledge Reproducibility Information Intepretation

More information

Lecture 10: Contextual Bandits

Lecture 10: Contextual Bandits CSE599i: Online and Adaptive Machine Learning Winter 208 Lecturer: Lalit Jain Lecture 0: Contextual Bandits Scribes: Neeraja Abhyankar, Joshua Fan, Kunhui Zhang Disclaimer: These notes have not been subjected

More information

Chapter 7: Model Assessment and Selection

Chapter 7: Model Assessment and Selection Chapter 7: Model Assessment and Selection DD3364 April 20, 2012 Introduction Regression: Review of our problem Have target variable Y to estimate from a vector of inputs X. A prediction model ˆf(X) has

More information

CS 195-5: Machine Learning Problem Set 1

CS 195-5: Machine Learning Problem Set 1 CS 95-5: Machine Learning Problem Set Douglas Lanman dlanman@brown.edu 7 September Regression Problem Show that the prediction errors y f(x; ŵ) are necessarily uncorrelated with any linear function of

More information

Evaluation. Andrea Passerini Machine Learning. Evaluation

Evaluation. Andrea Passerini Machine Learning. Evaluation Andrea Passerini passerini@disi.unitn.it Machine Learning Basic concepts requires to define performance measures to be optimized Performance of learning algorithms cannot be evaluated on entire domain

More information

Machine Learning. Lecture 9: Learning Theory. Feng Li.

Machine Learning. Lecture 9: Learning Theory. Feng Li. Machine Learning Lecture 9: Learning Theory Feng Li fli@sdu.edu.cn https://funglee.github.io School of Computer Science and Technology Shandong University Fall 2018 Why Learning Theory How can we tell

More information

Data Mining. CS57300 Purdue University. Bruno Ribeiro. February 8, 2018

Data Mining. CS57300 Purdue University. Bruno Ribeiro. February 8, 2018 Data Mining CS57300 Purdue University Bruno Ribeiro February 8, 2018 Decision trees Why Trees? interpretable/intuitive, popular in medical applications because they mimic the way a doctor thinks model

More information

Bandit models: a tutorial

Bandit models: a tutorial Gdt COS, December 3rd, 2015 Multi-Armed Bandit model: general setting K arms: for a {1,..., K}, (X a,t ) t N is a stochastic process. (unknown distributions) Bandit game: a each round t, an agent chooses

More information

CSE 546 Final Exam, Autumn 2013

CSE 546 Final Exam, Autumn 2013 CSE 546 Final Exam, Autumn 0. Personal info: Name: Student ID: E-mail address:. There should be 5 numbered pages in this exam (including this cover sheet).. You can use any material you brought: any book,

More information

Learning with multiple models. Boosting.

Learning with multiple models. Boosting. CS 2750 Machine Learning Lecture 21 Learning with multiple models. Boosting. Milos Hauskrecht milos@cs.pitt.edu 5329 Sennott Square Learning with multiple models: Approach 2 Approach 2: use multiple models

More information

ECE521 week 3: 23/26 January 2017

ECE521 week 3: 23/26 January 2017 ECE521 week 3: 23/26 January 2017 Outline Probabilistic interpretation of linear regression - Maximum likelihood estimation (MLE) - Maximum a posteriori (MAP) estimation Bias-variance trade-off Linear

More information

EXAM IN STATISTICAL MACHINE LEARNING STATISTISK MASKININLÄRNING

EXAM IN STATISTICAL MACHINE LEARNING STATISTISK MASKININLÄRNING EXAM IN STATISTICAL MACHINE LEARNING STATISTISK MASKININLÄRNING DATE AND TIME: August 30, 2018, 14.00 19.00 RESPONSIBLE TEACHER: Niklas Wahlström NUMBER OF PROBLEMS: 5 AIDING MATERIAL: Calculator, mathematical

More information

10-701/ Machine Learning, Fall

10-701/ Machine Learning, Fall 0-70/5-78 Machine Learning, Fall 2003 Homework 2 Solution If you have questions, please contact Jiayong Zhang .. (Error Function) The sum-of-squares error is the most common training

More information

Diagnostics. Gad Kimmel

Diagnostics. Gad Kimmel Diagnostics Gad Kimmel Outline Introduction. Bootstrap method. Cross validation. ROC plot. Introduction Motivation Estimating properties of an estimator. Given data samples say the average. x 1, x 2,...,

More information

COS513: FOUNDATIONS OF PROBABILISTIC MODELS LECTURE 9: LINEAR REGRESSION

COS513: FOUNDATIONS OF PROBABILISTIC MODELS LECTURE 9: LINEAR REGRESSION COS513: FOUNDATIONS OF PROBABILISTIC MODELS LECTURE 9: LINEAR REGRESSION SEAN GERRISH AND CHONG WANG 1. WAYS OF ORGANIZING MODELS In probabilistic modeling, there are several ways of organizing models:

More information

Optimal and Adaptive Online Learning

Optimal and Adaptive Online Learning Optimal and Adaptive Online Learning Haipeng Luo Advisor: Robert Schapire Computer Science Department Princeton University Examples of Online Learning (a) Spam detection 2 / 34 Examples of Online Learning

More information

Tutorial: PART 1. Online Convex Optimization, A Game- Theoretic Approach to Learning.

Tutorial: PART 1. Online Convex Optimization, A Game- Theoretic Approach to Learning. Tutorial: PART 1 Online Convex Optimization, A Game- Theoretic Approach to Learning http://www.cs.princeton.edu/~ehazan/tutorial/tutorial.htm Elad Hazan Princeton University Satyen Kale Yahoo Research

More information

The Naïve Bayes Classifier. Machine Learning Fall 2017

The Naïve Bayes Classifier. Machine Learning Fall 2017 The Naïve Bayes Classifier Machine Learning Fall 2017 1 Today s lecture The naïve Bayes Classifier Learning the naïve Bayes Classifier Practical concerns 2 Today s lecture The naïve Bayes Classifier Learning

More information

EXAM IN STATISTICAL MACHINE LEARNING STATISTISK MASKININLÄRNING

EXAM IN STATISTICAL MACHINE LEARNING STATISTISK MASKININLÄRNING EXAM IN STATISTICAL MACHINE LEARNING STATISTISK MASKININLÄRNING DATE AND TIME: June 9, 2018, 09.00 14.00 RESPONSIBLE TEACHER: Andreas Svensson NUMBER OF PROBLEMS: 5 AIDING MATERIAL: Calculator, mathematical

More information

Reducing Multiclass to Binary: A Unifying Approach for Margin Classifiers

Reducing Multiclass to Binary: A Unifying Approach for Margin Classifiers Reducing Multiclass to Binary: A Unifying Approach for Margin Classifiers Erin Allwein, Robert Schapire and Yoram Singer Journal of Machine Learning Research, 1:113-141, 000 CSE 54: Seminar on Learning

More information

Taming the Monster: A Fast and Simple Algorithm for Contextual Bandits

Taming the Monster: A Fast and Simple Algorithm for Contextual Bandits Taming the Monster: A Fast and Simple Algorithm for Contextual Bandits Alekh Agarwal, Daniel Hsu 2, Satyen Kale 3, John Langford, Lihong Li, and Robert E. Schapire 4 Microsoft Research 2 Columbia University

More information

Machine Learning, Fall 2009: Midterm

Machine Learning, Fall 2009: Midterm 10-601 Machine Learning, Fall 009: Midterm Monday, November nd hours 1. Personal info: Name: Andrew account: E-mail address:. You are permitted two pages of notes and a calculator. Please turn off all

More information

(Deep) Reinforcement Learning

(Deep) Reinforcement Learning Martin Matyášek Artificial Intelligence Center Czech Technical University in Prague October 27, 2016 Martin Matyášek VPD, 2016 1 / 17 Reinforcement Learning in a picture R. S. Sutton and A. G. Barto 2015

More information

Machine Learning Linear Classification. Prof. Matteo Matteucci

Machine Learning Linear Classification. Prof. Matteo Matteucci Machine Learning Linear Classification Prof. Matteo Matteucci Recall from the first lecture 2 X R p Regression Y R Continuous Output X R p Y {Ω 0, Ω 1,, Ω K } Classification Discrete Output X R p Y (X)

More information

CS 598 Statistical Reinforcement Learning. Nan Jiang

CS 598 Statistical Reinforcement Learning. Nan Jiang CS 598 Statistical Reinforcement Learning Nan Jiang Overview What s this course about? A grad-level seminar course on theory of RL 3 What s this course about? A grad-level seminar course on theory of RL

More information

Lecture 10 : Contextual Bandits

Lecture 10 : Contextual Bandits CMSC 858G: Bandits, Experts and Games 11/07/16 Lecture 10 : Contextual Bandits Instructor: Alex Slivkins Scribed by: Guowei Sun and Cheng Jie 1 Problem statement and examples In this lecture, we will be

More information

Bias-Variance Tradeoff

Bias-Variance Tradeoff What s learning, revisited Overfitting Generative versus Discriminative Logistic Regression Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University September 19 th, 2007 Bias-Variance Tradeoff

More information

Introduction. Chapter 1

Introduction. Chapter 1 Chapter 1 Introduction In this book we will be concerned with supervised learning, which is the problem of learning input-output mappings from empirical data (the training dataset). Depending on the characteristics

More information

Decision Trees. CS57300 Data Mining Fall Instructor: Bruno Ribeiro

Decision Trees. CS57300 Data Mining Fall Instructor: Bruno Ribeiro Decision Trees CS57300 Data Mining Fall 2016 Instructor: Bruno Ribeiro Goal } Classification without Models Well, partially without a model } Today: Decision Trees 2015 Bruno Ribeiro 2 3 Why Trees? } interpretable/intuitive,

More information

Introduction and Models

Introduction and Models CSE522, Winter 2011, Learning Theory Lecture 1 and 2-01/04/2011, 01/06/2011 Lecturer: Ofer Dekel Introduction and Models Scribe: Jessica Chang Machine learning algorithms have emerged as the dominant and

More information

Multi-armed bandit models: a tutorial

Multi-armed bandit models: a tutorial Multi-armed bandit models: a tutorial CERMICS seminar, March 30th, 2016 Multi-Armed Bandit model: general setting K arms: for a {1,..., K}, (X a,t ) t N is a stochastic process. (unknown distributions)

More information

Least Squares Regression

Least Squares Regression E0 70 Machine Learning Lecture 4 Jan 7, 03) Least Squares Regression Lecturer: Shivani Agarwal Disclaimer: These notes are a brief summary of the topics covered in the lecture. They are not a substitute

More information

Estimating the accuracy of a hypothesis Setting. Assume a binary classification setting

Estimating the accuracy of a hypothesis Setting. Assume a binary classification setting Estimating the accuracy of a hypothesis Setting Assume a binary classification setting Assume input/output pairs (x, y) are sampled from an unknown probability distribution D = p(x, y) Train a binary classifier

More information

CPSC 340: Machine Learning and Data Mining

CPSC 340: Machine Learning and Data Mining CPSC 340: Machine Learning and Data Mining Linear Classifiers: predictions Original version of these slides by Mark Schmidt, with modifications by Mike Gelbart. 1 Admin Assignment 4: Due Friday of next

More information

Statistical Machine Learning from Data

Statistical Machine Learning from Data Samy Bengio Statistical Machine Learning from Data 1 Statistical Machine Learning from Data Ensembles Samy Bengio IDIAP Research Institute, Martigny, Switzerland, and Ecole Polytechnique Fédérale de Lausanne

More information

Be able to define the following terms and answer basic questions about them:

Be able to define the following terms and answer basic questions about them: CS440/ECE448 Section Q Fall 2017 Final Review Be able to define the following terms and answer basic questions about them: Probability o Random variables, axioms of probability o Joint, marginal, conditional

More information

Least Squares Regression

Least Squares Regression CIS 50: Machine Learning Spring 08: Lecture 4 Least Squares Regression Lecturer: Shivani Agarwal Disclaimer: These notes are designed to be a supplement to the lecture. They may or may not cover all the

More information

Double Robustness. Bang and Robins (2005) Kang and Schafer (2007)

Double Robustness. Bang and Robins (2005) Kang and Schafer (2007) Double Robustness Bang and Robins (2005) Kang and Schafer (2007) Set-Up Assume throughout that treatment assignment is ignorable given covariates (similar to assumption that data are missing at random

More information

Large-scale Information Processing, Summer Recommender Systems (part 2)

Large-scale Information Processing, Summer Recommender Systems (part 2) Large-scale Information Processing, Summer 2015 5 th Exercise Recommender Systems (part 2) Emmanouil Tzouridis tzouridis@kma.informatik.tu-darmstadt.de Knowledge Mining & Assessment SVM question When a

More information

Statistical Data Mining and Machine Learning Hilary Term 2016

Statistical Data Mining and Machine Learning Hilary Term 2016 Statistical Data Mining and Machine Learning Hilary Term 2016 Dino Sejdinovic Department of Statistics Oxford Slides and other materials available at: http://www.stats.ox.ac.uk/~sejdinov/sdmml Naïve Bayes

More information

Machine Learning Basics

Machine Learning Basics Security and Fairness of Deep Learning Machine Learning Basics Anupam Datta CMU Spring 2019 Image Classification Image Classification Image classification pipeline Input: A training set of N images, each

More information

Online Advertising is Big Business

Online Advertising is Big Business Online Advertising Online Advertising is Big Business Multiple billion dollar industry $43B in 2013 in USA, 17% increase over 2012 [PWC, Internet Advertising Bureau, April 2013] Higher revenue in USA

More information

Machine Learning. VC Dimension and Model Complexity. Eric Xing , Fall 2015

Machine Learning. VC Dimension and Model Complexity. Eric Xing , Fall 2015 Machine Learning 10-701, Fall 2015 VC Dimension and Model Complexity Eric Xing Lecture 16, November 3, 2015 Reading: Chap. 7 T.M book, and outline material Eric Xing @ CMU, 2006-2015 1 Last time: PAC and

More information

The exam is closed book, closed notes except your one-page (two sides) or two-page (one side) crib sheet.

The exam is closed book, closed notes except your one-page (two sides) or two-page (one side) crib sheet. CS 189 Spring 013 Introduction to Machine Learning Final You have 3 hours for the exam. The exam is closed book, closed notes except your one-page (two sides) or two-page (one side) crib sheet. Please

More information

Active Learning and Optimized Information Gathering

Active Learning and Optimized Information Gathering Active Learning and Optimized Information Gathering Lecture 7 Learning Theory CS 101.2 Andreas Krause Announcements Project proposal: Due tomorrow 1/27 Homework 1: Due Thursday 1/29 Any time is ok. Office

More information

Matching. Quiz 2. Matching. Quiz 2. Exact Matching. Estimand 2/25/14

Matching. Quiz 2. Matching. Quiz 2. Exact Matching. Estimand 2/25/14 STA 320 Design and Analysis of Causal Studies Dr. Kari Lock Morgan and Dr. Fan Li Department of Statistical Science Duke University Frequency 0 2 4 6 8 Quiz 2 Histogram of Quiz2 10 12 14 16 18 20 Quiz2

More information

Variance Reduction and Ensemble Methods

Variance Reduction and Ensemble Methods Variance Reduction and Ensemble Methods Nicholas Ruozzi University of Texas at Dallas Based on the slides of Vibhav Gogate and David Sontag Last Time PAC learning Bias/variance tradeoff small hypothesis

More information

Markov Decision Processes

Markov Decision Processes Markov Decision Processes Noel Welsh 11 November 2010 Noel Welsh () Markov Decision Processes 11 November 2010 1 / 30 Annoucements Applicant visitor day seeks robot demonstrators for exciting half hour

More information

Introduction to Machine Learning

Introduction to Machine Learning Introduction to Machine Learning 3. Instance Based Learning Alex Smola Carnegie Mellon University http://alex.smola.org/teaching/cmu2013-10-701 10-701 Outline Parzen Windows Kernels, algorithm Model selection

More information

Evaluation requires to define performance measures to be optimized

Evaluation requires to define performance measures to be optimized Evaluation Basic concepts Evaluation requires to define performance measures to be optimized Performance of learning algorithms cannot be evaluated on entire domain (generalization error) approximation

More information

Crowd-Learning: Improving the Quality of Crowdsourcing Using Sequential Learning

Crowd-Learning: Improving the Quality of Crowdsourcing Using Sequential Learning Crowd-Learning: Improving the Quality of Crowdsourcing Using Sequential Learning Mingyan Liu (Joint work with Yang Liu) Department of Electrical Engineering and Computer Science University of Michigan,

More information

SF2930 Regression Analysis

SF2930 Regression Analysis SF2930 Regression Analysis Alexandre Chotard Tree-based regression and classication 20 February 2017 1 / 30 Idag Overview Regression trees Pruning Bagging, random forests 2 / 30 Today Overview Regression

More information

arxiv: v1 [stat.ml] 12 Feb 2018

arxiv: v1 [stat.ml] 12 Feb 2018 Practical Evaluation and Optimization of Contextual Bandit Algorithms arxiv:1802.04064v1 [stat.ml] 12 Feb 2018 Alberto Bietti Inria, MSR-Inria Joint Center alberto.bietti@inria.fr Alekh Agarwal Microsoft

More information

Statistical Learning. Philipp Koehn. 10 November 2015

Statistical Learning. Philipp Koehn. 10 November 2015 Statistical Learning Philipp Koehn 10 November 2015 Outline 1 Learning agents Inductive learning Decision tree learning Measuring learning performance Bayesian learning Maximum a posteriori and maximum

More information

Data Mining und Maschinelles Lernen

Data Mining und Maschinelles Lernen Data Mining und Maschinelles Lernen Ensemble Methods Bias-Variance Trade-off Basic Idea of Ensembles Bagging Basic Algorithm Bagging with Costs Randomization Random Forests Boosting Stacking Error-Correcting

More information

6.867 Machine Learning

6.867 Machine Learning 6.867 Machine Learning Problem Set 2 Due date: Wednesday October 6 Please address all questions and comments about this problem set to 6867-staff@csail.mit.edu. You will need to use MATLAB for some of

More information

Midterm, Fall 2003

Midterm, Fall 2003 5-78 Midterm, Fall 2003 YOUR ANDREW USERID IN CAPITAL LETTERS: YOUR NAME: There are 9 questions. The ninth may be more time-consuming and is worth only three points, so do not attempt 9 unless you are

More information

From Binary to Multiclass Classification. CS 6961: Structured Prediction Spring 2018

From Binary to Multiclass Classification. CS 6961: Structured Prediction Spring 2018 From Binary to Multiclass Classification CS 6961: Structured Prediction Spring 2018 1 So far: Binary Classification We have seen linear models Learning algorithms Perceptron SVM Logistic Regression Prediction

More information

Bounded Off-Policy Evaluation with Missing Data for Course Recommendation and Curriculum Design

Bounded Off-Policy Evaluation with Missing Data for Course Recommendation and Curriculum Design Bounded Off-Policy Evaluation with Missing Data for Course Recommendation and Curriculum Design William Hoiles University of California, Los Angeles Mihaela van der Schaar University of California, Los

More information

Reinforcement Learning. Introduction

Reinforcement Learning. Introduction Reinforcement Learning Introduction Reinforcement Learning Agent interacts and learns from a stochastic environment Science of sequential decision making Many faces of reinforcement learning Optimal control

More information

An Efficient Estimation Method for Longitudinal Surveys with Monotone Missing Data

An Efficient Estimation Method for Longitudinal Surveys with Monotone Missing Data An Efficient Estimation Method for Longitudinal Surveys with Monotone Missing Data Jae-Kwang Kim 1 Iowa State University June 28, 2012 1 Joint work with Dr. Ming Zhou (when he was a PhD student at ISU)

More information

Algorithmisches Lernen/Machine Learning

Algorithmisches Lernen/Machine Learning Algorithmisches Lernen/Machine Learning Part 1: Stefan Wermter Introduction Connectionist Learning (e.g. Neural Networks) Decision-Trees, Genetic Algorithms Part 2: Norman Hendrich Support-Vector Machines

More information

Classification. Classification is similar to regression in that the goal is to use covariates to predict on outcome.

Classification. Classification is similar to regression in that the goal is to use covariates to predict on outcome. Classification Classification is similar to regression in that the goal is to use covariates to predict on outcome. We still have a vector of covariates X. However, the response is binary (or a few classes),

More information

10-701/ Machine Learning - Midterm Exam, Fall 2010

10-701/ Machine Learning - Midterm Exam, Fall 2010 10-701/15-781 Machine Learning - Midterm Exam, Fall 2010 Aarti Singh Carnegie Mellon University 1. Personal info: Name: Andrew account: E-mail address: 2. There should be 15 numbered pages in this exam

More information

Machine Learning And Applications: Supervised Learning-SVM

Machine Learning And Applications: Supervised Learning-SVM Machine Learning And Applications: Supervised Learning-SVM Raphaël Bournhonesque École Normale Supérieure de Lyon, Lyon, France raphael.bournhonesque@ens-lyon.fr 1 Supervised vs unsupervised learning Machine

More information

ESS2222. Lecture 4 Linear model

ESS2222. Lecture 4 Linear model ESS2222 Lecture 4 Linear model Hosein Shahnas University of Toronto, Department of Earth Sciences, 1 Outline Logistic Regression Predicting Continuous Target Variables Support Vector Machine (Some Details)

More information

Methods for Interpretable Treatment Regimes

Methods for Interpretable Treatment Regimes Methods for Interpretable Treatment Regimes John Sperger University of North Carolina at Chapel Hill May 4 th 2018 Sperger (UNC) Interpretable Treatment Regimes May 4 th 2018 1 / 16 Why Interpretable Methods?

More information

Reinforcement Learning and NLP

Reinforcement Learning and NLP 1 Reinforcement Learning and NLP Kapil Thadani kapil@cs.columbia.edu RESEARCH Outline 2 Model-free RL Markov decision processes (MDPs) Derivative-free optimization Policy gradients Variance reduction Value

More information

A Magiv CV Theory for Large-Margin Classifiers

A Magiv CV Theory for Large-Margin Classifiers A Magiv CV Theory for Large-Margin Classifiers Hui Zou School of Statistics, University of Minnesota June 30, 2018 Joint work with Boxiang Wang Outline 1 Background 2 Magic CV formula 3 Magic support vector

More information

The Epoch-Greedy Algorithm for Contextual Multi-armed Bandits John Langford and Tong Zhang

The Epoch-Greedy Algorithm for Contextual Multi-armed Bandits John Langford and Tong Zhang The Epoch-Greedy Algorithm for Contextual Multi-armed Bandits John Langford and Tong Zhang Presentation by Terry Lam 02/2011 Outline The Contextual Bandit Problem Prior Works The Epoch Greedy Algorithm

More information

Empirical Evaluation (Ch 5)

Empirical Evaluation (Ch 5) Empirical Evaluation (Ch 5) how accurate is a hypothesis/model/dec.tree? given 2 hypotheses, which is better? accuracy on training set is biased error: error train (h) = #misclassifications/ S train error

More information

Decoupled Collaborative Ranking

Decoupled Collaborative Ranking Decoupled Collaborative Ranking Jun Hu, Ping Li April 24, 2017 Jun Hu, Ping Li WWW2017 April 24, 2017 1 / 36 Recommender Systems Recommendation system is an information filtering technique, which provides

More information

Bayesian Learning (II)

Bayesian Learning (II) Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen Bayesian Learning (II) Niels Landwehr Overview Probabilities, expected values, variance Basic concepts of Bayesian learning MAP

More information