Learning with Decision-Dependent Uncertainty

Size: px
Start display at page:

Download "Learning with Decision-Dependent Uncertainty"

Transcription

1 Learning with Decision-Dependent Uncertainty Tito Homem-de-Mello Department of Mechanical and Industrial Engineering University of Illinois at Chicago NSF Workshop, University of Maryland, May 2010 Tito Homem-de-Mello (UIC) Learning with Decision-Dependent Uncertainty NSF Workshop May / 7

2 Learning and deciding Many decision-making problems fit the following framework: Collect data Estimate parameters Make decision Make new decision Update parameter estimates Collect more data Tito Homem-de-Mello (UIC) Learning with Decision-Dependent Uncertainty NSF Workshop May / 7

3 Convergence Does this process converge? This is clearly the case when the observed data are i.i.d. observations from a given distribution F. In that case, ˆF n is the empirical distribution corresponding to samples from F, so under mild conditions ˆx n x. But what if the demand depends on the decisions? Many cases (e.g. some revenue management problems) fall into that framework. In that case, the true distribution is F = F (x, ). If the dependence is ignored, the forecast/decision process converges to x satisfying x = argmin E F (x, )[G(x, Y )], which can be suboptimal. (Cooper+HM+Kleywegt 2006, Lee+HM+Kleywegt 2009) Tito Homem-de-Mello (UIC) Learning with Decision-Dependent Uncertainty NSF Workshop May / 7

4 Convergence Does this process converge? This is clearly the case when the observed data are i.i.d. observations from a given distribution F. In that case, ˆF n is the empirical distribution corresponding to samples from F, so under mild conditions ˆx n x. But what if the demand depends on the decisions? Many cases (e.g. some revenue management problems) fall into that framework. In that case, the true distribution is F = F (x, ). If the dependence is ignored, the forecast/decision process converges to x satisfying x = argmin E F (x, )[G(x, Y )], which can be suboptimal. (Cooper+HM+Kleywegt 2006, Lee+HM+Kleywegt 2009) Tito Homem-de-Mello (UIC) Learning with Decision-Dependent Uncertainty NSF Workshop May / 7

5 Convergence Does this process converge? This is clearly the case when the observed data are i.i.d. observations from a given distribution F. In that case, ˆF n is the empirical distribution corresponding to samples from F, so under mild conditions ˆx n x. But what if the demand depends on the decisions? Many cases (e.g. some revenue management problems) fall into that framework. In that case, the true distribution is F = F (x, ). If the dependence is ignored, the forecast/decision process converges to x satisfying x = argmin E F (x, )[G(x, Y )], which can be suboptimal. (Cooper+HM+Kleywegt 2006, Lee+HM+Kleywegt 2009) Tito Homem-de-Mello (UIC) Learning with Decision-Dependent Uncertainty NSF Workshop May / 7

6 Convergence Does this process converge? This is clearly the case when the observed data are i.i.d. observations from a given distribution F. In that case, ˆF n is the empirical distribution corresponding to samples from F, so under mild conditions ˆx n x. But what if the demand depends on the decisions? Many cases (e.g. some revenue management problems) fall into that framework. In that case, the true distribution is F = F (x, ). If the dependence is ignored, the forecast/decision process converges to x satisfying x = argmin E F (x, )[G(x, Y )], which can be suboptimal. (Cooper+HM+Kleywegt 2006, Lee+HM+Kleywegt 2009) Tito Homem-de-Mello (UIC) Learning with Decision-Dependent Uncertainty NSF Workshop May / 7

7 Learning the dependence Can we learn the structure of the dependence of F on x, and simultaneously optimize the objective function? For example, consider the simple case where Y n = β 0 + β 1 x + ε n, where β 0 and β 1 are unknown constants and {ε n } is i.i.d. G(x, Y ) = xy. At iteration n, Estimate β 0 and β 1 from ˆx 1,..., ˆx n using least-squares; Compute ˆx n+1 = ˆβ 0 n 2 ˆβ. 1 n This kind of procedure is called certainty equivalent control in the control and econometrics literature. Very hard to show convergence! Tito Homem-de-Mello (UIC) Learning with Decision-Dependent Uncertainty NSF Workshop May / 7

8 Learning the dependence Can we learn the structure of the dependence of F on x, and simultaneously optimize the objective function? For example, consider the simple case where Y n = β 0 + β 1 x + ε n, where β 0 and β 1 are unknown constants and {ε n } is i.i.d. G(x, Y ) = xy. At iteration n, Estimate β 0 and β 1 from ˆx 1,..., ˆx n using least-squares; Compute ˆx n+1 = ˆβ 0 n 2 ˆβ. 1 n This kind of procedure is called certainty equivalent control in the control and econometrics literature. Very hard to show convergence! Tito Homem-de-Mello (UIC) Learning with Decision-Dependent Uncertainty NSF Workshop May / 7

9 Learning the dependence Can we learn the structure of the dependence of F on x, and simultaneously optimize the objective function? For example, consider the simple case where Y n = β 0 + β 1 x + ε n, where β 0 and β 1 are unknown constants and {ε n } is i.i.d. G(x, Y ) = xy. At iteration n, Estimate β 0 and β 1 from ˆx 1,..., ˆx n using least-squares; Compute ˆx n+1 = ˆβ 0 n 2 ˆβ. 1 n This kind of procedure is called certainty equivalent control in the control and econometrics literature. Very hard to show convergence! Tito Homem-de-Mello (UIC) Learning with Decision-Dependent Uncertainty NSF Workshop May / 7

10 Learning the dependence Can we learn the structure of the dependence of F on x, and simultaneously optimize the objective function? For example, consider the simple case where Y n = β 0 + β 1 x + ε n, where β 0 and β 1 are unknown constants and {ε n } is i.i.d. G(x, Y ) = xy. At iteration n, Estimate β 0 and β 1 from ˆx 1,..., ˆx n using least-squares; Compute ˆx n+1 = ˆβ 0 n 2 ˆβ. 1 n This kind of procedure is called certainty equivalent control in the control and econometrics literature. Very hard to show convergence! Tito Homem-de-Mello (UIC) Learning with Decision-Dependent Uncertainty NSF Workshop May / 7

11 Can we use stochastic approximation? We could apply an SA-type procedure to the function g(x) = E[G(x, Y )]. We cannot get derivatives estimators if we don t know how Y depends on x. Even if we know the dependence form, it may be hard to prove convergence. Again, suppose Y = β 0 + β 1 x + ε, G(x, Y ) = xy. Then, g (x) = β 0 + 2β 1 x. We can estimate β 0 and β 1 from regression, but does it converge? We could use finite differences, but convergence is slow. Tito Homem-de-Mello (UIC) Learning with Decision-Dependent Uncertainty NSF Workshop May / 7

12 Can we use stochastic approximation? We could apply an SA-type procedure to the function g(x) = E[G(x, Y )]. We cannot get derivatives estimators if we don t know how Y depends on x. Even if we know the dependence form, it may be hard to prove convergence. Again, suppose Y = β 0 + β 1 x + ε, G(x, Y ) = xy. Then, g (x) = β 0 + 2β 1 x. We can estimate β 0 and β 1 from regression, but does it converge? We could use finite differences, but convergence is slow. Tito Homem-de-Mello (UIC) Learning with Decision-Dependent Uncertainty NSF Workshop May / 7

13 Can we use stochastic approximation? We could apply an SA-type procedure to the function g(x) = E[G(x, Y )]. We cannot get derivatives estimators if we don t know how Y depends on x. Even if we know the dependence form, it may be hard to prove convergence. Again, suppose Y = β 0 + β 1 x + ε, G(x, Y ) = xy. Then, g (x) = β 0 + 2β 1 x. We can estimate β 0 and β 1 from regression, but does it converge? We could use finite differences, but convergence is slow. Tito Homem-de-Mello (UIC) Learning with Decision-Dependent Uncertainty NSF Workshop May / 7

14 Can we use stochastic approximation? We could apply an SA-type procedure to the function g(x) = E[G(x, Y )]. We cannot get derivatives estimators if we don t know how Y depends on x. Even if we know the dependence form, it may be hard to prove convergence. Again, suppose Y = β 0 + β 1 x + ε, G(x, Y ) = xy. Then, g (x) = β 0 + 2β 1 x. We can estimate β 0 and β 1 from regression, but does it converge? We could use finite differences, but convergence is slow. Tito Homem-de-Mello (UIC) Learning with Decision-Dependent Uncertainty NSF Workshop May / 7

15 Comparing the methods β 0 = 5, β 1 = 1, ε =Normal(0,1) x n CEC SA FD SA reg Opt n x 10 4 Tito Homem-de-Mello (UIC) Learning with Decision-Dependent Uncertainty NSF Workshop May / 7

16 Issues Is it better to first learn the dependence structure, and then optimize? (e.g., Besbes and Zeevi 2010 in the context of demand functions) Alternatively, we can view the problem as a black-box system where we only observe outputs (in this case, revenues) in terms of the inputs (the decisions). We want to optimize however, in the context we are interested in, function evaluations are expensive. We are looking at methods that choose decision points randomly according to some distribution, which is updated from iteration to iteration. Tito Homem-de-Mello (UIC) Learning with Decision-Dependent Uncertainty NSF Workshop May / 7

17 Issues Is it better to first learn the dependence structure, and then optimize? (e.g., Besbes and Zeevi 2010 in the context of demand functions) Alternatively, we can view the problem as a black-box system where we only observe outputs (in this case, revenues) in terms of the inputs (the decisions). We want to optimize however, in the context we are interested in, function evaluations are expensive. We are looking at methods that choose decision points randomly according to some distribution, which is updated from iteration to iteration. Tito Homem-de-Mello (UIC) Learning with Decision-Dependent Uncertainty NSF Workshop May / 7

18 Issues Is it better to first learn the dependence structure, and then optimize? (e.g., Besbes and Zeevi 2010 in the context of demand functions) Alternatively, we can view the problem as a black-box system where we only observe outputs (in this case, revenues) in terms of the inputs (the decisions). We want to optimize however, in the context we are interested in, function evaluations are expensive. We are looking at methods that choose decision points randomly according to some distribution, which is updated from iteration to iteration. Tito Homem-de-Mello (UIC) Learning with Decision-Dependent Uncertainty NSF Workshop May / 7

19 Issues Is it better to first learn the dependence structure, and then optimize? (e.g., Besbes and Zeevi 2010 in the context of demand functions) Alternatively, we can view the problem as a black-box system where we only observe outputs (in this case, revenues) in terms of the inputs (the decisions). We want to optimize however, in the context we are interested in, function evaluations are expensive. We are looking at methods that choose decision points randomly according to some distribution, which is updated from iteration to iteration. Tito Homem-de-Mello (UIC) Learning with Decision-Dependent Uncertainty NSF Workshop May / 7

The Perceptron algorithm

The Perceptron algorithm The Perceptron algorithm Tirgul 3 November 2016 Agnostic PAC Learnability A hypothesis class H is agnostic PAC learnable if there exists a function m H : 0,1 2 N and a learning algorithm with the following

More information

Lasso, Ridge, and Elastic Net

Lasso, Ridge, and Elastic Net Lasso, Ridge, and Elastic Net David Rosenberg New York University October 29, 2016 David Rosenberg (New York University) DS-GA 1003 October 29, 2016 1 / 14 A Very Simple Model Suppose we have one feature

More information

Stochastic Subgradient Methods

Stochastic Subgradient Methods Stochastic Subgradient Methods Stephen Boyd and Almir Mutapcic Notes for EE364b, Stanford University, Winter 26-7 April 13, 28 1 Noisy unbiased subgradient Suppose f : R n R is a convex function. We say

More information

Scenario Generation and Sampling Methods

Scenario Generation and Sampling Methods Scenario Generation and Sampling Methods Güzin Bayraksan Tito Homem-de-Mello SVAN 2016 IMPA May 11th, 2016 Bayraksan (OSU) & Homem-de-Mello (UAI) Scenario Generation and Sampling SVAN IMPA May 11 1 / 33

More information

Complexity of two and multi-stage stochastic programming problems

Complexity of two and multi-stage stochastic programming problems Complexity of two and multi-stage stochastic programming problems A. Shapiro School of Industrial and Systems Engineering, Georgia Institute of Technology, Atlanta, Georgia 30332-0205, USA The concept

More information

Neural Networks: Backpropagation

Neural Networks: Backpropagation Neural Networks: Backpropagation Seung-Hoon Na 1 1 Department of Computer Science Chonbuk National University 2018.10.25 eung-hoon Na (Chonbuk National University) Neural Networks: Backpropagation 2018.10.25

More information

STA141C: Big Data & High Performance Statistical Computing

STA141C: Big Data & High Performance Statistical Computing STA141C: Big Data & High Performance Statistical Computing Lecture 8: Optimization Cho-Jui Hsieh UC Davis May 9, 2017 Optimization Numerical Optimization Numerical Optimization: min X f (X ) Can be applied

More information

Linear classifiers: Overfitting and regularization

Linear classifiers: Overfitting and regularization Linear classifiers: Overfitting and regularization Emily Fox University of Washington January 25, 2017 Logistic regression recap 1 . Thus far, we focused on decision boundaries Score(x i ) = w 0 h 0 (x

More information

ECS289: Scalable Machine Learning

ECS289: Scalable Machine Learning ECS289: Scalable Machine Learning Cho-Jui Hsieh UC Davis Sept 29, 2016 Outline Convex vs Nonconvex Functions Coordinate Descent Gradient Descent Newton s method Stochastic Gradient Descent Numerical Optimization

More information

CSC 411: Lecture 04: Logistic Regression

CSC 411: Lecture 04: Logistic Regression CSC 411: Lecture 04: Logistic Regression Raquel Urtasun & Rich Zemel University of Toronto Sep 23, 2015 Urtasun & Zemel (UofT) CSC 411: 04-Prob Classif Sep 23, 2015 1 / 16 Today Key Concepts: Logistic

More information

ORIE 4741: Learning with Big Messy Data. Generalization

ORIE 4741: Learning with Big Messy Data. Generalization ORIE 4741: Learning with Big Messy Data Generalization Professor Udell Operations Research and Information Engineering Cornell September 23, 2017 1 / 21 Announcements midterm 10/5 makeup exam 10/2, by

More information

Fast Rates for Exp-concave Empirial Risk Minimization

Fast Rates for Exp-concave Empirial Risk Minimization Fast Rates for Exp-concave Empirial Risk Minimization Tomer Koren, Kfir Y. Levy Presenter: Zhe Li September 16, 2016 High Level Idea Why could we use Regularized Empirical Risk Minimization? Learning theory

More information

Communication by Regression: Achieving Shannon Capacity

Communication by Regression: Achieving Shannon Capacity Communication by Regression: Practical Achievement of Shannon Capacity Department of Statistics Yale University Workshop Infusing Statistics and Engineering Harvard University, June 5-6, 2011 Practical

More information

ECON 3150/4150 spring term 2014: Exercise set for the first seminar and DIY exercises for the first few weeks of the course

ECON 3150/4150 spring term 2014: Exercise set for the first seminar and DIY exercises for the first few weeks of the course ECON 3150/4150 spring term 2014: Exercise set for the first seminar and DIY exercises for the first few weeks of the course Ragnar Nymoen 13 January 2014. Exercise set to seminar 1 (week 6, 3-7 Feb) This

More information

Math 3330: Solution to midterm Exam

Math 3330: Solution to midterm Exam Math 3330: Solution to midterm Exam Question 1: (14 marks) Suppose the regression model is y i = β 0 + β 1 x i + ε i, i = 1,, n, where ε i are iid Normal distribution N(0, σ 2 ). a. (2 marks) Compute the

More information

Reinforcement Learning. Introduction

Reinforcement Learning. Introduction Reinforcement Learning Introduction Reinforcement Learning Agent interacts and learns from a stochastic environment Science of sequential decision making Many faces of reinforcement learning Optimal control

More information

Optimization in the Big Data Regime 2: SVRG & Tradeoffs in Large Scale Learning. Sham M. Kakade

Optimization in the Big Data Regime 2: SVRG & Tradeoffs in Large Scale Learning. Sham M. Kakade Optimization in the Big Data Regime 2: SVRG & Tradeoffs in Large Scale Learning. Sham M. Kakade Machine Learning for Big Data CSE547/STAT548 University of Washington S. M. Kakade (UW) Optimization for

More information

ECON 3150/4150, Spring term Lecture 6

ECON 3150/4150, Spring term Lecture 6 ECON 3150/4150, Spring term 2013. Lecture 6 Review of theoretical statistics for econometric modelling (II) Ragnar Nymoen University of Oslo 31 January 2013 1 / 25 References to Lecture 3 and 6 Lecture

More information

Machine Learning Basics Lecture 2: Linear Classification. Princeton University COS 495 Instructor: Yingyu Liang

Machine Learning Basics Lecture 2: Linear Classification. Princeton University COS 495 Instructor: Yingyu Liang Machine Learning Basics Lecture 2: Linear Classification Princeton University COS 495 Instructor: Yingyu Liang Review: machine learning basics Math formulation Given training data x i, y i : 1 i n i.i.d.

More information

Kernel Learning via Random Fourier Representations

Kernel Learning via Random Fourier Representations Kernel Learning via Random Fourier Representations L. Law, M. Mider, X. Miscouridou, S. Ip, A. Wang Module 5: Machine Learning L. Law, M. Mider, X. Miscouridou, S. Ip, A. Wang Kernel Learning via Random

More information

Penalized Squared Error and Likelihood: Risk Bounds and Fast Algorithms

Penalized Squared Error and Likelihood: Risk Bounds and Fast Algorithms university-logo Penalized Squared Error and Likelihood: Risk Bounds and Fast Algorithms Andrew Barron Cong Huang Xi Luo Department of Statistics Yale University 2008 Workshop on Sparsity in High Dimensional

More information

Lecture 3: Policy Evaluation Without Knowing How the World Works / Model Free Policy Evaluation

Lecture 3: Policy Evaluation Without Knowing How the World Works / Model Free Policy Evaluation Lecture 3: Policy Evaluation Without Knowing How the World Works / Model Free Policy Evaluation CS234: RL Emma Brunskill Winter 2018 Material builds on structure from David SIlver s Lecture 4: Model-Free

More information

Cross-Validation with Confidence

Cross-Validation with Confidence Cross-Validation with Confidence Jing Lei Department of Statistics, Carnegie Mellon University WHOA-PSI Workshop, St Louis, 2017 Quotes from Day 1 and Day 2 Good model or pure model? Occam s razor We really

More information

Computational and Statistical Learning Theory

Computational and Statistical Learning Theory Computational and Statistical Learning Theory TTIC 31120 Prof. Nati Srebro Lecture 17: Stochastic Optimization Part II: Realizable vs Agnostic Rates Part III: Nearest Neighbor Classification Stochastic

More information

Sampling Bounds for Stochastic Optimization

Sampling Bounds for Stochastic Optimization Sampling Bounds for Stochastic Optimization Moses Charikar 1, Chandra Chekuri 2, and Martin Pál 2 1 Computer Science Dept., Princeton University, Princeton, NJ 08544. 2 Lucent Bell Labs, 600 Mountain Avenue,

More information

CS60021: Scalable Data Mining. Large Scale Machine Learning

CS60021: Scalable Data Mining. Large Scale Machine Learning J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org 1 CS60021: Scalable Data Mining Large Scale Machine Learning Sourangshu Bhattacharya Example: Spam filtering Instance

More information

CPSC 340: Machine Learning and Data Mining. Stochastic Gradient Fall 2017

CPSC 340: Machine Learning and Data Mining. Stochastic Gradient Fall 2017 CPSC 340: Machine Learning and Data Mining Stochastic Gradient Fall 2017 Assignment 3: Admin Check update thread on Piazza for correct definition of trainndx. This could make your cross-validation code

More information

10-1: Composite and Inverse Functions

10-1: Composite and Inverse Functions Math 95 10-1: Composite and Inverse Functions Functions are a key component in the applications of algebra. When working with functions, we can perform many useful operations such as addition and multiplication

More information

Bootstrap & Confidence/Prediction intervals

Bootstrap & Confidence/Prediction intervals Bootstrap & Confidence/Prediction intervals Olivier Roustant Mines Saint-Étienne 2017/11 Olivier Roustant (EMSE) Bootstrap & Confidence/Prediction intervals 2017/11 1 / 9 Framework Consider a model with

More information

Stochastic Analogues to Deterministic Optimizers

Stochastic Analogues to Deterministic Optimizers Stochastic Analogues to Deterministic Optimizers ISMP 2018 Bordeaux, France Vivak Patel Presented by: Mihai Anitescu July 6, 2018 1 Apology I apologize for not being here to give this talk myself. I injured

More information

Statistical Data Mining and Machine Learning Hilary Term 2016

Statistical Data Mining and Machine Learning Hilary Term 2016 Statistical Data Mining and Machine Learning Hilary Term 2016 Dino Sejdinovic Department of Statistics Oxford Slides and other materials available at: http://www.stats.ox.ac.uk/~sejdinov/sdmml Naïve Bayes

More information

INF 5860 Machine learning for image classification. Lecture 14: Reinforcement learning May 9, 2018

INF 5860 Machine learning for image classification. Lecture 14: Reinforcement learning May 9, 2018 Machine learning for image classification Lecture 14: Reinforcement learning May 9, 2018 Page 3 Outline Motivation Introduction to reinforcement learning (RL) Value function based methods (Q-learning)

More information

Bilinear Programming: Applications in the Supply Chain Management

Bilinear Programming: Applications in the Supply Chain Management Bilinear Programming: Applications in the Supply Chain Management Artyom G. Nahapetyan Center for Applied Optimization Industrial and Systems Engineering Department University of Florida Gainesville, Florida

More information

Ph.D. Qualifying Exam Monday Tuesday, January 4 5, 2016

Ph.D. Qualifying Exam Monday Tuesday, January 4 5, 2016 Ph.D. Qualifying Exam Monday Tuesday, January 4 5, 2016 Put your solution to each problem on a separate sheet of paper. Problem 1. (5106) Find the maximum likelihood estimate of θ where θ is a parameter

More information

Introduction to Economic Time Series

Introduction to Economic Time Series Econometrics II Introduction to Economic Time Series Morten Nyboe Tabor Learning Goals 1 Give an account for the important differences between (independent) cross-sectional data and time series data. 2

More information

Stochastic Gradient Descent. CS 584: Big Data Analytics

Stochastic Gradient Descent. CS 584: Big Data Analytics Stochastic Gradient Descent CS 584: Big Data Analytics Gradient Descent Recap Simplest and extremely popular Main Idea: take a step proportional to the negative of the gradient Easy to implement Each iteration

More information

Advanced Econometrics I

Advanced Econometrics I Lecture Notes Autumn 2010 Dr. Getinet Haile, University of Mannheim 1. Introduction Introduction & CLRM, Autumn Term 2010 1 What is econometrics? Econometrics = economic statistics economic theory mathematics

More information

Simple linear regression

Simple linear regression Simple linear regression Biometry 755 Spring 2008 Simple linear regression p. 1/40 Overview of regression analysis Evaluate relationship between one or more independent variables (X 1,...,X k ) and a single

More information

Stochastic Gradient Descent. Ryan Tibshirani Convex Optimization

Stochastic Gradient Descent. Ryan Tibshirani Convex Optimization Stochastic Gradient Descent Ryan Tibshirani Convex Optimization 10-725 Last time: proximal gradient descent Consider the problem min x g(x) + h(x) with g, h convex, g differentiable, and h simple in so

More information

Economics 536 Lecture 7. Introduction to Specification Testing in Dynamic Econometric Models

Economics 536 Lecture 7. Introduction to Specification Testing in Dynamic Econometric Models University of Illinois Fall 2016 Department of Economics Roger Koenker Economics 536 Lecture 7 Introduction to Specification Testing in Dynamic Econometric Models In this lecture I want to briefly describe

More information

Big Data Analytics. Lucas Rego Drumond

Big Data Analytics. Lucas Rego Drumond Big Data Analytics Lucas Rego Drumond Information Systems and Machine Learning Lab (ISMLL) Institute of Computer Science University of Hildesheim, Germany Predictive Models Predictive Models 1 / 34 Outline

More information

Logistic Regression: Online, Lazy, Kernelized, Sequential, etc.

Logistic Regression: Online, Lazy, Kernelized, Sequential, etc. Logistic Regression: Online, Lazy, Kernelized, Sequential, etc. Harsha Veeramachaneni Thomson Reuter Research and Development April 1, 2010 Harsha Veeramachaneni (TR R&D) Logistic Regression April 1, 2010

More information

Neural Networks, Computation Graphs. CMSC 470 Marine Carpuat

Neural Networks, Computation Graphs. CMSC 470 Marine Carpuat Neural Networks, Computation Graphs CMSC 470 Marine Carpuat Binary Classification with a Multi-layer Perceptron φ A = 1 φ site = 1 φ located = 1 φ Maizuru = 1 φ, = 2 φ in = 1 φ Kyoto = 1 φ priest = 0 φ

More information

CS-E3210 Machine Learning: Basic Principles

CS-E3210 Machine Learning: Basic Principles CS-E3210 Machine Learning: Basic Principles Lecture 3: Regression I slides by Markus Heinonen Department of Computer Science Aalto University, School of Science Autumn (Period I) 2017 1 / 48 In a nutshell

More information

ORIE 4741: Learning with Big Messy Data. Proximal Gradient Method

ORIE 4741: Learning with Big Messy Data. Proximal Gradient Method ORIE 4741: Learning with Big Messy Data Proximal Gradient Method Professor Udell Operations Research and Information Engineering Cornell November 13, 2017 1 / 31 Announcements Be a TA for CS/ORIE 1380:

More information

Active Policy Iteration: Efficient Exploration through Active Learning for Value Function Approximation in Reinforcement Learning

Active Policy Iteration: Efficient Exploration through Active Learning for Value Function Approximation in Reinforcement Learning Active Policy Iteration: fficient xploration through Active Learning for Value Function Approximation in Reinforcement Learning Takayuki Akiyama, Hirotaka Hachiya, and Masashi Sugiyama Department of Computer

More information

ECE 5424: Introduction to Machine Learning

ECE 5424: Introduction to Machine Learning ECE 5424: Introduction to Machine Learning Topics: Ensemble Methods: Bagging, Boosting PAC Learning Readings: Murphy 16.4;; Hastie 16 Stefan Lee Virginia Tech Fighting the bias-variance tradeoff Simple

More information

Regression and Classification" with Linear Models" CMPSCI 383 Nov 15, 2011!

Regression and Classification with Linear Models CMPSCI 383 Nov 15, 2011! Regression and Classification" with Linear Models" CMPSCI 383 Nov 15, 2011! 1 Todayʼs topics" Learning from Examples: brief review! Univariate Linear Regression! Batch gradient descent! Stochastic gradient

More information

Stochastic Subgradient Method

Stochastic Subgradient Method Stochastic Subgradient Method Lingjie Weng, Yutian Chen Bren School of Information and Computer Science UC Irvine Subgradient Recall basic inequality for convex differentiable f : f y f x + f x T (y x)

More information

Linear Models in Machine Learning

Linear Models in Machine Learning CS540 Intro to AI Linear Models in Machine Learning Lecturer: Xiaojin Zhu jerryzhu@cs.wisc.edu We briefly go over two linear models frequently used in machine learning: linear regression for, well, regression,

More information

Lecture 1. Stochastic Optimization: Introduction. January 8, 2018

Lecture 1. Stochastic Optimization: Introduction. January 8, 2018 Lecture 1 Stochastic Optimization: Introduction January 8, 2018 Optimization Concerned with mininmization/maximization of mathematical functions Often subject to constraints Euler (1707-1783): Nothing

More information

Algorithmic Stability and Generalization Christoph Lampert

Algorithmic Stability and Generalization Christoph Lampert Algorithmic Stability and Generalization Christoph Lampert November 28, 2018 1 / 32 IST Austria (Institute of Science and Technology Austria) institute for basic research opened in 2009 located in outskirts

More information

Stochastic Gradient Descent

Stochastic Gradient Descent Stochastic Gradient Descent Machine Learning CSE546 Carlos Guestrin University of Washington October 9, 2013 1 Logistic Regression Logistic function (or Sigmoid): Learn P(Y X) directly Assume a particular

More information

An Evolving Gradient Resampling Method for Machine Learning. Jorge Nocedal

An Evolving Gradient Resampling Method for Machine Learning. Jorge Nocedal An Evolving Gradient Resampling Method for Machine Learning Jorge Nocedal Northwestern University NIPS, Montreal 2015 1 Collaborators Figen Oztoprak Stefan Solntsev Richard Byrd 2 Outline 1. How to improve

More information

Introduction to Reinforcement Learning. CMPT 882 Mar. 18

Introduction to Reinforcement Learning. CMPT 882 Mar. 18 Introduction to Reinforcement Learning CMPT 882 Mar. 18 Outline for the week Basic ideas in RL Value functions and value iteration Policy evaluation and policy improvement Model-free RL Monte-Carlo and

More information

A Mixed Efficient Global Optimization (m- EGO) Based Time-Dependent Reliability Analysis Method

A Mixed Efficient Global Optimization (m- EGO) Based Time-Dependent Reliability Analysis Method A Mixed Efficient Global Optimization (m- EGO) Based Time-Dependent Reliability Analysis Method ASME 2014 IDETC/CIE 2014 Paper number: DETC2014-34281 Zhen Hu, Ph.D. Candidate Advisor: Dr. Xiaoping Du Department

More information

ECS289: Scalable Machine Learning

ECS289: Scalable Machine Learning ECS289: Scalable Machine Learning Cho-Jui Hsieh UC Davis Sept 27, 2015 Outline Linear regression Ridge regression and Lasso Time complexity (closed form solution) Iterative Solvers Regression Input: training

More information

The regression model with one stochastic regressor.

The regression model with one stochastic regressor. The regression model with one stochastic regressor. 3150/4150 Lecture 6 Ragnar Nymoen 30 January 2012 We are now on Lecture topic 4 The main goal in this lecture is to extend the results of the regression

More information

A Geometric Approach to Statistical Learning Theory

A Geometric Approach to Statistical Learning Theory A Geometric Approach to Statistical Learning Theory Shahar Mendelson Centre for Mathematics and its Applications The Australian National University Canberra, Australia What is a learning problem A class

More information

Lecture 10 - Planning under Uncertainty (III)

Lecture 10 - Planning under Uncertainty (III) Lecture 10 - Planning under Uncertainty (III) Jesse Hoey School of Computer Science University of Waterloo March 27, 2018 Readings: Poole & Mackworth (2nd ed.)chapter 12.1,12.3-12.9 1/ 34 Reinforcement

More information

Quantifying Stochastic Model Errors via Robust Optimization

Quantifying Stochastic Model Errors via Robust Optimization Quantifying Stochastic Model Errors via Robust Optimization IPAM Workshop on Uncertainty Quantification for Multiscale Stochastic Systems and Applications Jan 19, 2016 Henry Lam Industrial & Operations

More information

Cross-Validation with Confidence

Cross-Validation with Confidence Cross-Validation with Confidence Jing Lei Department of Statistics, Carnegie Mellon University UMN Statistics Seminar, Mar 30, 2017 Overview Parameter est. Model selection Point est. MLE, M-est.,... Cross-validation

More information

Convex Optimization Lecture 16

Convex Optimization Lecture 16 Convex Optimization Lecture 16 Today: Projected Gradient Descent Conditional Gradient Descent Stochastic Gradient Descent Random Coordinate Descent Recall: Gradient Descent (Steepest Descent w.r.t Euclidean

More information

Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen. Bayesian Learning. Tobias Scheffer, Niels Landwehr

Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen. Bayesian Learning. Tobias Scheffer, Niels Landwehr Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen Bayesian Learning Tobias Scheffer, Niels Landwehr Remember: Normal Distribution Distribution over x. Density function with parameters

More information

Decision trees COMS 4771

Decision trees COMS 4771 Decision trees COMS 4771 1. Prediction functions (again) Learning prediction functions IID model for supervised learning: (X 1, Y 1),..., (X n, Y n), (X, Y ) are iid random pairs (i.e., labeled examples).

More information

Decision Tree Learning Lecture 2

Decision Tree Learning Lecture 2 Machine Learning Coms-4771 Decision Tree Learning Lecture 2 January 28, 2008 Two Types of Supervised Learning Problems (recap) Feature (input) space X, label (output) space Y. Unknown distribution D over

More information

Bayesian Methods for Machine Learning

Bayesian Methods for Machine Learning Bayesian Methods for Machine Learning CS 584: Big Data Analytics Material adapted from Radford Neal s tutorial (http://ftp.cs.utoronto.ca/pub/radford/bayes-tut.pdf), Zoubin Ghahramni (http://hunch.net/~coms-4771/zoubin_ghahramani_bayesian_learning.pdf),

More information

Linear regression COMS 4771

Linear regression COMS 4771 Linear regression COMS 4771 1. Old Faithful and prediction functions Prediction problem: Old Faithful geyser (Yellowstone) Task: Predict time of next eruption. 1 / 40 Statistical model for time between

More information

Stochastic optimization in Hilbert spaces

Stochastic optimization in Hilbert spaces Stochastic optimization in Hilbert spaces Aymeric Dieuleveut Aymeric Dieuleveut Stochastic optimization Hilbert spaces 1 / 48 Outline Learning vs Statistics Aymeric Dieuleveut Stochastic optimization Hilbert

More information

ECONOMETRIC THEORY. MODULE VI Lecture 19 Regression Analysis Under Linear Restrictions

ECONOMETRIC THEORY. MODULE VI Lecture 19 Regression Analysis Under Linear Restrictions ECONOMETRIC THEORY MODULE VI Lecture 9 Regression Analysis Under Linear Restrictions Dr Shalabh Department of Mathematics and Statistics Indian Institute of Technology Kanpur One of the basic objectives

More information

This article appeared in a journal published by Elsevier. The attached copy is furnished to the author for internal non-commercial research and

This article appeared in a journal published by Elsevier. The attached copy is furnished to the author for internal non-commercial research and This article appeared in a journal published by Elsevier. The attached copy is furnished to the author for internal non-commercial research and education use, including for instruction at the authors institution

More information

Exam Applied Statistical Regression. Good Luck!

Exam Applied Statistical Regression. Good Luck! Dr. M. Dettling Summer 2011 Exam Applied Statistical Regression Approved: Tables: Note: Any written material, calculator (without communication facility). Attached. All tests have to be done at the 5%-level.

More information

Latent Tree Approximation in Linear Model

Latent Tree Approximation in Linear Model Latent Tree Approximation in Linear Model Navid Tafaghodi Khajavi Dept. of Electrical Engineering, University of Hawaii, Honolulu, HI 96822 Email: navidt@hawaii.edu ariv:1710.01838v1 [cs.it] 5 Oct 2017

More information

CS 7180: Behavioral Modeling and Decisionmaking

CS 7180: Behavioral Modeling and Decisionmaking CS 7180: Behavioral Modeling and Decisionmaking in AI Markov Decision Processes for Complex Decisionmaking Prof. Amy Sliva October 17, 2012 Decisions are nondeterministic In many situations, behavior and

More information

Partially Observable Markov Decision Processes (POMDPs)

Partially Observable Markov Decision Processes (POMDPs) Partially Observable Markov Decision Processes (POMDPs) Geoff Hollinger Sequential Decision Making in Robotics Spring, 2011 *Some media from Reid Simmons, Trey Smith, Tony Cassandra, Michael Littman, and

More information

Linear Regression. Aarti Singh. Machine Learning / Sept 27, 2010

Linear Regression. Aarti Singh. Machine Learning / Sept 27, 2010 Linear Regression Aarti Singh Machine Learning 10-701/15-781 Sept 27, 2010 Discrete to Continuous Labels Classification Sports Science News Anemic cell Healthy cell Regression X = Document Y = Topic X

More information

The Slow Convergence of OLS Estimators of α, β and Portfolio. β and Portfolio Weights under Long Memory Stochastic Volatility

The Slow Convergence of OLS Estimators of α, β and Portfolio. β and Portfolio Weights under Long Memory Stochastic Volatility The Slow Convergence of OLS Estimators of α, β and Portfolio Weights under Long Memory Stochastic Volatility New York University Stern School of Business June 21, 2018 Introduction Bivariate long memory

More information

Machine Learning. Linear Models. Fabio Vandin October 10, 2017

Machine Learning. Linear Models. Fabio Vandin October 10, 2017 Machine Learning Linear Models Fabio Vandin October 10, 2017 1 Linear Predictors and Affine Functions Consider X = R d Affine functions: L d = {h w,b : w R d, b R} where ( d ) h w,b (x) = w, x + b = w

More information

ACE 562 Fall Lecture 2: Probability, Random Variables and Distributions. by Professor Scott H. Irwin

ACE 562 Fall Lecture 2: Probability, Random Variables and Distributions. by Professor Scott H. Irwin ACE 562 Fall 2005 Lecture 2: Probability, Random Variables and Distributions Required Readings: by Professor Scott H. Irwin Griffiths, Hill and Judge. Some Basic Ideas: Statistical Concepts for Economists,

More information

Logistic Regression Review Fall 2012 Recitation. September 25, 2012 TA: Selen Uguroglu

Logistic Regression Review Fall 2012 Recitation. September 25, 2012 TA: Selen Uguroglu Logistic Regression Review 10-601 Fall 2012 Recitation September 25, 2012 TA: Selen Uguroglu!1 Outline Decision Theory Logistic regression Goal Loss function Inference Gradient Descent!2 Training Data

More information

Stochastic Gradient Descent in Continuous Time

Stochastic Gradient Descent in Continuous Time Stochastic Gradient Descent in Continuous Time Justin Sirignano University of Illinois at Urbana Champaign with Konstantinos Spiliopoulos (Boston University) 1 / 27 We consider a diffusion X t X = R m

More information

Adaptive Modulation with Smoothed Flow Utility

Adaptive Modulation with Smoothed Flow Utility Adaptive Modulation with Smoothed Flow Utility Ekine Akuiyibo Stephen Boyd Electrical Engineering Department, Stanford University ITMANET 05/24/10 The problem choose transmit power(s) and flow rate(s)

More information

Lecture 14 : Online Learning, Stochastic Gradient Descent, Perceptron

Lecture 14 : Online Learning, Stochastic Gradient Descent, Perceptron CS446: Machine Learning, Fall 2017 Lecture 14 : Online Learning, Stochastic Gradient Descent, Perceptron Lecturer: Sanmi Koyejo Scribe: Ke Wang, Oct. 24th, 2017 Agenda Recap: SVM and Hinge loss, Representer

More information

CMU-Q Lecture 24:

CMU-Q Lecture 24: CMU-Q 15-381 Lecture 24: Supervised Learning 2 Teacher: Gianni A. Di Caro SUPERVISED LEARNING Hypotheses space Hypothesis function Labeled Given Errors Performance criteria Given a collection of input

More information

Stochastic Approximation: Mini-Batches, Optimistic Rates and Acceleration

Stochastic Approximation: Mini-Batches, Optimistic Rates and Acceleration Stochastic Approximation: Mini-Batches, Optimistic Rates and Acceleration Nati Srebro Toyota Technological Institute at Chicago a philanthropically endowed academic computer science institute dedicated

More information

Empirical likelihood and self-weighting approach for hypothesis testing of infinite variance processes and its applications

Empirical likelihood and self-weighting approach for hypothesis testing of infinite variance processes and its applications Empirical likelihood and self-weighting approach for hypothesis testing of infinite variance processes and its applications Fumiya Akashi Research Associate Department of Applied Mathematics Waseda University

More information

Density estimation Nonparametric conditional mean estimation Semiparametric conditional mean estimation. Nonparametrics. Gabriel Montes-Rojas

Density estimation Nonparametric conditional mean estimation Semiparametric conditional mean estimation. Nonparametrics. Gabriel Montes-Rojas 0 0 5 Motivation: Regression discontinuity (Angrist&Pischke) Outcome.5 1 1.5 A. Linear E[Y 0i X i] 0.2.4.6.8 1 X Outcome.5 1 1.5 B. Nonlinear E[Y 0i X i] i 0.2.4.6.8 1 X utcome.5 1 1.5 C. Nonlinearity

More information

Machine Learning Linear Models

Machine Learning Linear Models Machine Learning Linear Models Outline II - Linear Models 1. Linear Regression (a) Linear regression: History (b) Linear regression with Least Squares (c) Matrix representation and Normal Equation Method

More information

Christopher Watkins and Peter Dayan. Noga Zaslavsky. The Hebrew University of Jerusalem Advanced Seminar in Deep Learning (67679) November 1, 2015

Christopher Watkins and Peter Dayan. Noga Zaslavsky. The Hebrew University of Jerusalem Advanced Seminar in Deep Learning (67679) November 1, 2015 Q-Learning Christopher Watkins and Peter Dayan Noga Zaslavsky The Hebrew University of Jerusalem Advanced Seminar in Deep Learning (67679) November 1, 2015 Noga Zaslavsky Q-Learning (Watkins & Dayan, 1992)

More information

Statistics 135: Fall 2004 Final Exam

Statistics 135: Fall 2004 Final Exam Name: SID#: Statistics 135: Fall 2004 Final Exam There are 10 problems and the number of points for each is shown in parentheses. There is a normal table at the end. Show your work. 1. The designer of

More information

Gaussian Processes 1. Schedule

Gaussian Processes 1. Schedule 1 Schedule 17 Jan: Gaussian processes (Jo Eidsvik) 24 Jan: Hands-on project on Gaussian processes (Team effort, work in groups) 31 Jan: Latent Gaussian models and INLA (Jo Eidsvik) 7 Feb: Hands-on project

More information

The Randomized Newton Method for Convex Optimization

The Randomized Newton Method for Convex Optimization The Randomized Newton Method for Convex Optimization Vaden Masrani UBC MLRG April 3rd, 2018 Introduction We have some unconstrained, twice-differentiable convex function f : R d R that we want to minimize:

More information

Monte Carlo Sampling-Based Methods for Stochastic Optimization

Monte Carlo Sampling-Based Methods for Stochastic Optimization Monte Carlo Sampling-Based Methods for Stochastic Optimization Tito Homem-de-Mello School of Business Universidad Adolfo Ibañez Santiago, Chile tito.hmello@uai.cl Güzin Bayraksan Integrated Systems Engineering

More information

Review of the role of uncertainties in room acoustics

Review of the role of uncertainties in room acoustics Review of the role of uncertainties in room acoustics Ralph T. Muehleisen, Ph.D. PE, FASA, INCE Board Certified Principal Building Scientist and BEDTR Technical Lead Division of Decision and Information

More information

Introduction to Econometrics Midterm Examination Fall 2005 Answer Key

Introduction to Econometrics Midterm Examination Fall 2005 Answer Key Introduction to Econometrics Midterm Examination Fall 2005 Answer Key Please answer all of the questions and show your work Clearly indicate your final answer to each question If you think a question is

More information

Oblivious Equilibrium: A Mean Field Approximation for Large-Scale Dynamic Games

Oblivious Equilibrium: A Mean Field Approximation for Large-Scale Dynamic Games Oblivious Equilibrium: A Mean Field Approximation for Large-Scale Dynamic Games Gabriel Y. Weintraub, Lanier Benkard, and Benjamin Van Roy Stanford University {gweintra,lanierb,bvr}@stanford.edu Abstract

More information

Improving the Convergence of Back-Propogation Learning with Second Order Methods

Improving the Convergence of Back-Propogation Learning with Second Order Methods the of Back-Propogation Learning with Second Order Methods Sue Becker and Yann le Cun, Sept 1988 Kasey Bray, October 2017 Table of Contents 1 with Back-Propagation 2 the of BP 3 A Computationally Feasible

More information

Empirical Risk Minimization

Empirical Risk Minimization Empirical Risk Minimization Fabrice Rossi SAMM Université Paris 1 Panthéon Sorbonne 2018 Outline Introduction PAC learning ERM in practice 2 General setting Data X the input space and Y the output space

More information

Algorithms other than SGD. CS6787 Lecture 10 Fall 2017

Algorithms other than SGD. CS6787 Lecture 10 Fall 2017 Algorithms other than SGD CS6787 Lecture 10 Fall 2017 Machine learning is not just SGD Once a model is trained, we need to use it to classify new examples This inference task is not computed with SGD There

More information

MS&E 226: Small Data

MS&E 226: Small Data MS&E 226: Small Data Lecture 6: Bias and variance (v5) Ramesh Johari ramesh.johari@stanford.edu 1 / 49 Our plan today We saw in last lecture that model scoring methods seem to be trading off two different

More information