Mixed Integer Programming Approaches for Experimental Design

Size: px
Start display at page:

Download "Mixed Integer Programming Approaches for Experimental Design"

Transcription

1 Mixed Integer Programming Approaches for Experimental Design Juan Pablo Vielma Massachusetts Institute of Technology Joint work with Denis Saure DRO brown bag lunch seminars, Columbia Business School New York, NY, October, 2016.

2 Motivation: (Custom) Product Recommendations Feature SX530 RX100 Zoom 50x 3.6x Weight ounces 7.5 ounces Feature TG-4 G9 Waterproof Yes No Weight 7.36 lb 7.5 lb We recommend: Feature TG-4 Galaxy 2 Waterproof Yes No Viewfinder Electronic Optical Experimental Design with MIP 1 / 27

3 Motivation: (Custom) Product Recommendations Feature SX530 RX100 Zoom 50x 3.6x Feature TG-4 G9 Waterproof Yes No Weight ounces 7.5 ounces Weight 7.36 lb 7.5 lb Toubia, Hauser and Dahan (2003) We recommend: Toubia, Hauser, and Simester (2004) Feature TG-4 Galaxy 2 Waterproof Yes No Viewfinder Electronic Optical Experimental Design with MIP 2 / 27

4 Towards Optimal Product Recommendation Find enough information about preferences to recommend Feature SX530 RX100 Zoom 50x 3.6x Weight ounces 7.5 ounces Feature TG-4 G9 Waterproof Yes No Weight 7.36 lb 7.5 lb We recommend: Feature TG-4 Galaxy 2 Waterproof Yes No Viewfinder Electronic Optical How do I pick the next (1 st ) question to obtain the largest reduction of uncertainty or variance on preferences Compensatory model estimation (part-worths), not just assortment Experimental Design with MIP 3 / 27

5 Next Question To Reduce Variance : Bayesian Prior Distribution of Feature SX530 RX100 Zoom 50x 3.6x Weight ounces 7.5 ounces Bayesian Update Posterior Distribution Feature TG-4 Galaxy 2 Waterproof Yes No MCMC Bayesian Update Posterior Distribution Viewfinder Electronic Optical Black-box objective: Question Selection = Enumeration Question selection by Mixed Integer Programming (MIP) Experimental Design with MIP 4 / 27

6 Traveling Salesman Problem (TSP): Visit Cities Fast Paradoxes, Contradictions, and the Limits of Science Many research results define boundaries of what cannot be known, predicted, or described. Classifying these limitations shows us the structure of science and reason. Noson S. Yanofsky S A computer would have to check all these possible routes to find the shortest one. Experimental Design with MIP 5 / 27

7 MIP = Avoid Enumeration Number of tours for 49 cities Fastest supercomputer Assuming one floating point operation per tour: > years times the age of the universe! How long does it take on an iphone? Less than a second! 4 iterations of cutting plane method! = 48!/ flops Dantzig, Fulkerson and Johnson 1954 did it by hand! For more info see tutorial in ConcordeTSP app Cutting planes are the key for effectively solving (even NPhard) MIP problems in practice. Experimental Design with MIP 6 / 27

8 50+ Years of MIP = Significant Solver Speedups Algorithmic Improvements (Machine Independent): CPLEX v1.2 (1991) v11 (2007): 29,000x speedup Gurobi v1 (2009) v6.5 (2015): 48.7x speedup Commercial, but free for academic use (Reasonably) effective free / open source solvers: GLPK, COIN-OR (CBC) and SCIP (only for non-commercial) Easy to use, fast and versatile modeling languages Julia based JuMP modelling language Linear MIP solvers very mature and effective: Convex nonlinear MIP getting there (even MI-SDP!), quadratic nearly there Experimental Design with MIP 7 / 27

9 Choice-based Conjoint Analysis (CBCA) Feature Chewbacca BB-8 Wookiee Yes No Droid No Yes Blaster Yes No 0 1 1A = x 2 0 I would buy toy Product Profile x 1 x 2 Experimental Design with MIP 8 / 27

10 MNL Preference Model Utilities for 2 products, n features (e.g. n = 12) U 1 = x = X n i=1 i x 1 i + 1 U 2 = x = X n i=1 i x 2 i + 2 part-worths product profile noise (gumbel) Utility maximizing customer: x 1 x 2, U 1 U 2 Noise can result in response error: L x 1 x 2 = P x 1 x 2 = e x1 e x1 + e x2 Experimental Design with MIP 9 / 27

11 Next Question To Reduce Variance : Bayesian Prior Distribution of Feature SX530 RX100 Zoom 50x 3.6x Weight ounces 7.5 ounces Bayesian Update Posterior Distribution Feature TG-4 Galaxy 2 Waterproof Yes No MCMC Bayesian Update Posterior Distribution Viewfinder Electronic Optical MNL Preference Model Logistic Regression x 1 x 2, z 0 z = x 1 x 2 Experimental Design with MIP 10 / 27

12 Linear Experimental Design Feature TG-4 Galaxy 2 Waterproof Yes No Unknown 2 R n Feature TG-4 Galaxy 2 Waterproof Yes No Viewfinder Electronic Optical Questions: Z = z 1... z q T 2 R q n z i q i=1 Viewfinder Electronic Optical Answers: Y = T y 1... y q Model: P (Y,Z )=L (Y,Z )= Y q Objective: Choose Z to learn fast i=1 h yi, z i Experimental Design with MIP 11 / 27

13 Bayesian Framework Prior distribution Answer likelihood Posterior distribution N(µ, ) L (Y,Z) g ( Y,Z ) g ( Y,Z )= R R ( ; µ, ) L (Y,Z) ( ; µ, ) L (Y,Z) d fast = minimize posterior variance Experimental Design with MIP 12 / 27

14 Goal = Minimize Expected Posterior Variance f (Z) n f (Z) =E Y (det cov ( Y,Z)) 1/mo Possible solution approaches: g ( Y,Z ) / ( ; µ, ) L (Y,Z) I ( Y,Z) 2 ln g ( Y,Z ) / 2 ln L (Y,Z) cov ( Y,Z) I ˆ Y,Z 1, E N(µ, ) ni ( Y,Z) 1o max Z E Y det I ˆ Y,Z 1/m? Experimental Design with MIP 13 / 27

15 A Really Good Case = Linear Regression n f (Z) =E Y (det cov ( Y,Z)) 1/mo y i = z i + i, i N(0, 1) g ( Y,Z )= ( ; µ 0, 0 ) 0 = var ( Y,Z) = Z T Z min Z f (Z) = max Z det Z T Z + 1 1/m Z discrete MISDP or MISOCP for m = n Experimental Design with MIP 14 / 27

16 A Relatively Good Case = Few Questions n f (Z) =E Y (det cov ( Y,Z)) 1/mo Z = E ( cov ( z i q i=1 Rn, q n n Y,Z)=m Y, µ z i q, z it z jo q i=1 n Y,Z)=M Y, µ z i q, z it z jo q i=1 n f (Z) =f µ z i q, z it z jo q i=1 i,j=1 i,j=1 i,j=1 Experimental Design with MIP 15 / 27

17 A Relatively Good Case = Few Questions n f (Z) =E Y (det cov ( Y,Z)) 1/mo Z = E ( cov ( z i q Only requirements: i=1 Rn, q n n Y,Z)=m Y, µ z i q, z it z jo q N(µ, ) i=1 L (Y,Z)= Y q n Y,Z)=M Y, µ z i q h, z it z jo q i=1 yi, z i i=1 Logistic regression with small q n f (Z) =f µ z i q, z it z jo q i=1 i,j=1 i,j=1 i,j=1 Experimental Design with MIP 16 / 27

18 Question Selection for CBCA x 1 x 2 N(µ, ) x 1 x 2 cov( )= 1 Variance = D-Efficiency: Non-convex function Without previous slide, even evaluation requires dim d ( ) - dimensional integration f x 1,x 2 := E,x 1 / x 2 det( i ) 1/p cov( )= 2 Experimental Design with MIP 17 / 27

19 D-efficiency Simplification for CBCA D-efficiency = Non-convex function distance: d := µ x 1 x 2 f(d, v) variance: v := x 1 x 2 0 x 1 x 2 of Can evaluate f(d, v) with 1-dim integral Experimental Design with MIP 18 / 27

20 Simplification = Trade-off for known criteria ( µ) 0 1 ( µ) apple r Choice balance: Minimize distance to center µ x 1 x 2 µ Postchoice symmetry: Maximize variance of question x 1 x 2 0 x 1 x 2 Experimental Design with MIP 19 / 27

21 Optimization Model min f(d, v) s.t. µ x 1 x 2 = d x 1 x 2 0 x 1 x 2 = v A 1 x 1 + A 2 x 2 apple b linearize x k i xl j x 1 6= x 2 x 1,x 2 2 {0, 1} n Experimental Design with MIP 20 / 27

22 Technique 2: Piecewise Linear Functions D-efficiency = Non-convex function distance: d := µ x 1 x 2 variance: v := x 1 x 2 0 x 1 x 2 of f(d, v) Can evaluate f(d, v) with 1-dim integral Piecewise Linear Interpolation MIP formulation Experimental Design with MIP 21 / 27

23 MIP-based Adaptive Questionnaires Feature SX530 RX100 Zoom 50x 3.6x Weight ounces 7.5 ounces Feature TG-4 G9 Waterproof Yes No Weight 7.36 lb 7.5 lb We recommend: Feature TG-4 Galaxy 2 Waterproof Yes No Viewfinder Electronic Optical E Y,X 1,X 2 cov Y,X 1,X 2 Optimal one-step look-ahead moment-matching approximate Bayesian approach. Experimental Design with MIP 22 / 27

24 Optimal One-Step Look-Ahead Prior distribution Feature SX530 RX100 Zoom 50x 3.6x Weight ounces 7.5 ounces min f x 1,x x1,x 2 2 Feature TG-4 Galaxy 2 Waterproof Yes No Viewfinder Electronic Optical N µ i, i Feature TG-4 Galaxy G9 2 Waterproof Yes No Viewfinder Weight Electronic 7.36 lb Optical 7.5 lb Solve with MIP formulation Experimental Design with MIP 23 / 27

25 Moment-Matching Approximate Bayesian Update Answer likelihood Prior distribution Posterior distribution N µ i, i Feature TG-4 Galaxy 2 Waterproof Yes No Viewfinder Electronic Optical approx. N µ i+1, i+1 µ i+1 = E y, x 1,x 2 i+1 =cov y, x 1,x 2 1-dim integral Experimental Design with MIP 24 / 27

26 Computational Experiments 16 questions, 2 options, 12 and 24 features Simulate MNL responses with known Question Selection MIP-based using CPLEX and open source COIN-OR solver Knapsack-based geometric Heuristic by Toubia et al. Time limits of 1 s and 10 s Metrics: Estimator variance = det cov Y,X 1,X 2 1/2 Estimator distance = E Y,X 1,X 2 2 Computed for true posterior with MCMC Experimental Design with MIP 25 / 27

27 Results for 12 Features, 1 s time limit Estimator Variance Estimator Distance Heuristic (Avg. = 0.04 s, Max = 0.61s) COIN-OR (Avg. = 0.93 s, Max = 1s) CPLEX (Avg. = 0.21 s, Max = 0.48s) Experimental Design with MIP 26 / 27

28 Does it Scale? Results for 24 features Estimator Variance Estimator Distance Heuristic (Avg. = 0.19 s, Max = 3s) CPLEX 1s (Avg. = 1 s, Max = 1s) CPLEX 10s (Avg. = 7.7 s, Max = 10s) Experimental Design with MIP 27 / 27

29 Some improvements for 24 features Estimator Variance Estimator Distance Heuristic (Avg. = 0.19 s, Max = 3s) CPLEX 1s (Avg. = 1 s, Max = 1s) CPLEX 10s (Avg. = 7.7 s, Max = 10s) Experimental Design with MIP 28 / 27

30 Summary and Main Messages Always choose Chewbacca! MIP can now solve challenging problems in practice Even in near-real time Appropriate domain expertise can be crucial for MIP ing Commercial solvers best, but free solvers reasonable Integration into complex systems easy with JuMP Some scalability : get the most out of small data Adaptive Choice-based Conjoint Analysis Improves on existing geometric methods Experimental Design with MIP 29 / 27

Advanced Mixed Integer Programming Formulations for Non-Convex Optimization Problems in Statistical Learning

Advanced Mixed Integer Programming Formulations for Non-Convex Optimization Problems in Statistical Learning Advanced Mixed Integer Programming Formulations for Non-Convex Optimization Problems in Statistical Learning Juan Pablo Vielma Massachusetts Institute of Technology 2016 IISA International Conference on

More information

Advanced Mixed Integer Programming (MIP) Formulation Techniques

Advanced Mixed Integer Programming (MIP) Formulation Techniques Advanced Mixed Integer Programming (MIP) Formulation Techniques Juan Pablo Vielma Massachusetts Institute of Technology Center for Nonlinear Studies, Los Alamos National Laboratory. Los Alamos, New Mexico,

More information

Ellipsoidal Methods for Adaptive Choice-based Conjoint Analysis (CBCA)

Ellipsoidal Methods for Adaptive Choice-based Conjoint Analysis (CBCA) Ellipsoidal Methods for Adaptive Choice-based Conjoint Analysis (CBCA) Juan Pablo Vielma Massachusetts Institute of Technology Joint work with Denis Saure Operations Management Seminar, Rotman School of

More information

Mixed Integer Programming (MIP) for Daily Fantasy Sports, Statistics and Marketing

Mixed Integer Programming (MIP) for Daily Fantasy Sports, Statistics and Marketing Mixed Integer Programming (MIP) for Daily Fantasy Sports, Statistics and Marketing Juan Pablo Vielma Massachusetts Institute of Technology AM/ES 121, SEAS, Harvard. Boston, MA, November, 2016. MIP & Daily

More information

Mixed Integer Programming (MIP) for Causal Inference and Beyond

Mixed Integer Programming (MIP) for Causal Inference and Beyond Mixed Integer Programming (MIP) for Causal Inference and Beyond Juan Pablo Vielma Massachusetts Institute of Technology Columbia Business School New York, NY, October, 2016. Traveling Salesman Problem

More information

18 hours nodes, first feasible 3.7% gap Time: 92 days!! LP relaxation at root node: Branch and bound

18 hours nodes, first feasible 3.7% gap Time: 92 days!! LP relaxation at root node: Branch and bound The MIP Landscape 1 Example 1: LP still can be HARD SGM: Schedule Generation Model Example 157323 1: LP rows, still can 182812 be HARD columns, 6348437 nzs LP relaxation at root node: 18 hours Branch and

More information

Recent Advances in Mixed Integer Programming Modeling and Computation

Recent Advances in Mixed Integer Programming Modeling and Computation Recent Advances in Mixed Integer Programming Modeling and Computation Juan Pablo Vielma Massachusetts Institute of Technology MIT Center for Transportation & Logistics. Cambridge, Massachusetts, June,

More information

21. Set cover and TSP

21. Set cover and TSP CS/ECE/ISyE 524 Introduction to Optimization Spring 2017 18 21. Set cover and TSP ˆ Set covering ˆ Cutting problems and column generation ˆ Traveling salesman problem Laurent Lessard (www.laurentlessard.com)

More information

The Traveling Salesman Problem: An Overview. David P. Williamson, Cornell University Ebay Research January 21, 2014

The Traveling Salesman Problem: An Overview. David P. Williamson, Cornell University Ebay Research January 21, 2014 The Traveling Salesman Problem: An Overview David P. Williamson, Cornell University Ebay Research January 21, 2014 (Cook 2012) A highly readable introduction Some terminology (imprecise) Problem Traditional

More information

19. Fixed costs and variable bounds

19. Fixed costs and variable bounds CS/ECE/ISyE 524 Introduction to Optimization Spring 2017 18 19. Fixed costs and variable bounds ˆ Fixed cost example ˆ Logic and the Big M Method ˆ Example: facility location ˆ Variable lower bounds Laurent

More information

Support Vector Machines

Support Vector Machines Support Vector Machines Le Song Machine Learning I CSE 6740, Fall 2013 Naïve Bayes classifier Still use Bayes decision rule for classification P y x = P x y P y P x But assume p x y = 1 is fully factorized

More information

From structures to heuristics to global solvers

From structures to heuristics to global solvers From structures to heuristics to global solvers Timo Berthold Zuse Institute Berlin DFG Research Center MATHEON Mathematics for key technologies OR2013, 04/Sep/13, Rotterdam Outline From structures to

More information

16.410/413 Principles of Autonomy and Decision Making

16.410/413 Principles of Autonomy and Decision Making 6.4/43 Principles of Autonomy and Decision Making Lecture 8: (Mixed-Integer) Linear Programming for Vehicle Routing and Motion Planning Emilio Frazzoli Aeronautics and Astronautics Massachusetts Institute

More information

Advanced Introduction to Machine Learning CMU-10715

Advanced Introduction to Machine Learning CMU-10715 Advanced Introduction to Machine Learning CMU-10715 Gaussian Processes Barnabás Póczos http://www.gaussianprocess.org/ 2 Some of these slides in the intro are taken from D. Lizotte, R. Parr, C. Guesterin

More information

Machine Learning, Midterm Exam: Spring 2009 SOLUTION

Machine Learning, Midterm Exam: Spring 2009 SOLUTION 10-601 Machine Learning, Midterm Exam: Spring 2009 SOLUTION March 4, 2009 Please put your name at the top of the table below. If you need more room to work out your answer to a question, use the back of

More information

Machine Learning. Lecture 4: Regularization and Bayesian Statistics. Feng Li. https://funglee.github.io

Machine Learning. Lecture 4: Regularization and Bayesian Statistics. Feng Li. https://funglee.github.io Machine Learning Lecture 4: Regularization and Bayesian Statistics Feng Li fli@sdu.edu.cn https://funglee.github.io School of Computer Science and Technology Shandong University Fall 207 Overfitting Problem

More information

Gestion de la production. Book: Linear Programming, Vasek Chvatal, McGill University, W.H. Freeman and Company, New York, USA

Gestion de la production. Book: Linear Programming, Vasek Chvatal, McGill University, W.H. Freeman and Company, New York, USA Gestion de la production Book: Linear Programming, Vasek Chvatal, McGill University, W.H. Freeman and Company, New York, USA 1 Contents 1 Integer Linear Programming 3 1.1 Definitions and notations......................................

More information

MVE165/MMG630, Applied Optimization Lecture 6 Integer linear programming: models and applications; complexity. Ann-Brith Strömberg

MVE165/MMG630, Applied Optimization Lecture 6 Integer linear programming: models and applications; complexity. Ann-Brith Strömberg MVE165/MMG630, Integer linear programming: models and applications; complexity Ann-Brith Strömberg 2011 04 01 Modelling with integer variables (Ch. 13.1) Variables Linear programming (LP) uses continuous

More information

Gaussian Processes. Le Song. Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012

Gaussian Processes. Le Song. Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012 Gaussian Processes Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 01 Pictorial view of embedding distribution Transform the entire distribution to expected features Feature space Feature

More information

Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen. Bayesian Learning. Tobias Scheffer, Niels Landwehr

Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen. Bayesian Learning. Tobias Scheffer, Niels Landwehr Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen Bayesian Learning Tobias Scheffer, Niels Landwehr Remember: Normal Distribution Distribution over x. Density function with parameters

More information

Least Squares Regression

Least Squares Regression CIS 50: Machine Learning Spring 08: Lecture 4 Least Squares Regression Lecturer: Shivani Agarwal Disclaimer: These notes are designed to be a supplement to the lecture. They may or may not cover all the

More information

Branch-and-Bound. Leo Liberti. LIX, École Polytechnique, France. INF , Lecture p. 1

Branch-and-Bound. Leo Liberti. LIX, École Polytechnique, France. INF , Lecture p. 1 Branch-and-Bound Leo Liberti LIX, École Polytechnique, France INF431 2011, Lecture p. 1 Reminders INF431 2011, Lecture p. 2 Problems Decision problem: a question admitting a YES/NO answer Example HAMILTONIAN

More information

Bagging During Markov Chain Monte Carlo for Smoother Predictions

Bagging During Markov Chain Monte Carlo for Smoother Predictions Bagging During Markov Chain Monte Carlo for Smoother Predictions Herbert K. H. Lee University of California, Santa Cruz Abstract: Making good predictions from noisy data is a challenging problem. Methods

More information

Integer Linear Programming (ILP)

Integer Linear Programming (ILP) Integer Linear Programming (ILP) Zdeněk Hanzálek, Přemysl Šůcha hanzalek@fel.cvut.cz CTU in Prague March 8, 2017 Z. Hanzálek (CTU) Integer Linear Programming (ILP) March 8, 2017 1 / 43 Table of contents

More information

COMS 4721: Machine Learning for Data Science Lecture 10, 2/21/2017

COMS 4721: Machine Learning for Data Science Lecture 10, 2/21/2017 COMS 4721: Machine Learning for Data Science Lecture 10, 2/21/2017 Prof. John Paisley Department of Electrical Engineering & Data Science Institute Columbia University FEATURE EXPANSIONS FEATURE EXPANSIONS

More information

Travelling Salesman Problem

Travelling Salesman Problem Travelling Salesman Problem Fabio Furini November 10th, 2014 Travelling Salesman Problem 1 Outline 1 Traveling Salesman Problem Separation Travelling Salesman Problem 2 (Asymmetric) Traveling Salesman

More information

Overfitting, Bias / Variance Analysis

Overfitting, Bias / Variance Analysis Overfitting, Bias / Variance Analysis Professor Ameet Talwalkar Professor Ameet Talwalkar CS260 Machine Learning Algorithms February 8, 207 / 40 Outline Administration 2 Review of last lecture 3 Basic

More information

International ejournals

International ejournals ISSN 0976 1411 Available online at www.internationalejournals.com International ejournals International ejournal of Mathematics and Engineering 2 (2017) Vol. 8, Issue 1, pp 11 21 Optimization of Transportation

More information

Lecture 4: Types of errors. Bayesian regression models. Logistic regression

Lecture 4: Types of errors. Bayesian regression models. Logistic regression Lecture 4: Types of errors. Bayesian regression models. Logistic regression A Bayesian interpretation of regularization Bayesian vs maximum likelihood fitting more generally COMP-652 and ECSE-68, Lecture

More information

INTRODUCTION TO PATTERN RECOGNITION

INTRODUCTION TO PATTERN RECOGNITION INTRODUCTION TO PATTERN RECOGNITION INSTRUCTOR: WEI DING 1 Pattern Recognition Automatic discovery of regularities in data through the use of computer algorithms With the use of these regularities to take

More information

Reading Group on Deep Learning Session 1

Reading Group on Deep Learning Session 1 Reading Group on Deep Learning Session 1 Stephane Lathuiliere & Pablo Mesejo 2 June 2016 1/31 Contents Introduction to Artificial Neural Networks to understand, and to be able to efficiently use, the popular

More information

Simple Techniques for Improving SGD. CS6787 Lecture 2 Fall 2017

Simple Techniques for Improving SGD. CS6787 Lecture 2 Fall 2017 Simple Techniques for Improving SGD CS6787 Lecture 2 Fall 2017 Step Sizes and Convergence Where we left off Stochastic gradient descent x t+1 = x t rf(x t ; yĩt ) Much faster per iteration than gradient

More information

Ch 4. Linear Models for Classification

Ch 4. Linear Models for Classification Ch 4. Linear Models for Classification Pattern Recognition and Machine Learning, C. M. Bishop, 2006. Department of Computer Science and Engineering Pohang University of Science and echnology 77 Cheongam-ro,

More information

Bayesian Inference and MCMC

Bayesian Inference and MCMC Bayesian Inference and MCMC Aryan Arbabi Partly based on MCMC slides from CSC412 Fall 2018 1 / 18 Bayesian Inference - Motivation Consider we have a data set D = {x 1,..., x n }. E.g each x i can be the

More information

Machine Learning, Fall 2012 Homework 2

Machine Learning, Fall 2012 Homework 2 0-60 Machine Learning, Fall 202 Homework 2 Instructors: Tom Mitchell, Ziv Bar-Joseph TA in charge: Selen Uguroglu email: sugurogl@cs.cmu.edu SOLUTIONS Naive Bayes, 20 points Problem. Basic concepts, 0

More information

Least Squares Regression

Least Squares Regression E0 70 Machine Learning Lecture 4 Jan 7, 03) Least Squares Regression Lecturer: Shivani Agarwal Disclaimer: These notes are a brief summary of the topics covered in the lecture. They are not a substitute

More information

Overview. Probabilistic Interpretation of Linear Regression Maximum Likelihood Estimation Bayesian Estimation MAP Estimation

Overview. Probabilistic Interpretation of Linear Regression Maximum Likelihood Estimation Bayesian Estimation MAP Estimation Overview Probabilistic Interpretation of Linear Regression Maximum Likelihood Estimation Bayesian Estimation MAP Estimation Probabilistic Interpretation: Linear Regression Assume output y is generated

More information

Linear classifiers: Overfitting and regularization

Linear classifiers: Overfitting and regularization Linear classifiers: Overfitting and regularization Emily Fox University of Washington January 25, 2017 Logistic regression recap 1 . Thus far, we focused on decision boundaries Score(x i ) = w 0 h 0 (x

More information

Computer Vision Group Prof. Daniel Cremers. 10a. Markov Chain Monte Carlo

Computer Vision Group Prof. Daniel Cremers. 10a. Markov Chain Monte Carlo Group Prof. Daniel Cremers 10a. Markov Chain Monte Carlo Markov Chain Monte Carlo In high-dimensional spaces, rejection sampling and importance sampling are very inefficient An alternative is Markov Chain

More information

CSC 2541: Bayesian Methods for Machine Learning

CSC 2541: Bayesian Methods for Machine Learning CSC 2541: Bayesian Methods for Machine Learning Radford M. Neal, University of Toronto, 2011 Lecture 3 More Markov Chain Monte Carlo Methods The Metropolis algorithm isn t the only way to do MCMC. We ll

More information

Lecture Note 1: Introduction to optimization. Xiaoqun Zhang Shanghai Jiao Tong University

Lecture Note 1: Introduction to optimization. Xiaoqun Zhang Shanghai Jiao Tong University Lecture Note 1: Introduction to optimization Xiaoqun Zhang Shanghai Jiao Tong University Last updated: September 23, 2017 1.1 Introduction 1. Optimization is an important tool in daily life, business and

More information

Mixed Integer Programming Solvers: from Where to Where. Andrea Lodi University of Bologna, Italy

Mixed Integer Programming Solvers: from Where to Where. Andrea Lodi University of Bologna, Italy Mixed Integer Programming Solvers: from Where to Where Andrea Lodi University of Bologna, Italy andrea.lodi@unibo.it November 30, 2011 @ Explanatory Workshop on Locational Analysis, Sevilla A. Lodi, MIP

More information

LINEAR MODELS FOR CLASSIFICATION. J. Elder CSE 6390/PSYC 6225 Computational Modeling of Visual Perception

LINEAR MODELS FOR CLASSIFICATION. J. Elder CSE 6390/PSYC 6225 Computational Modeling of Visual Perception LINEAR MODELS FOR CLASSIFICATION Classification: Problem Statement 2 In regression, we are modeling the relationship between a continuous input variable x and a continuous target variable t. In classification,

More information

Section #2: Linear and Integer Programming

Section #2: Linear and Integer Programming Section #2: Linear and Integer Programming Prof. Dr. Sven Seuken 8.3.2012 (with most slides borrowed from David Parkes) Housekeeping Game Theory homework submitted? HW-00 and HW-01 returned Feedback on

More information

Pattern Recognition and Machine Learning. Bishop Chapter 2: Probability Distributions

Pattern Recognition and Machine Learning. Bishop Chapter 2: Probability Distributions Pattern Recognition and Machine Learning Chapter 2: Probability Distributions Cécile Amblard Alex Kläser Jakob Verbeek October 11, 27 Probability Distributions: General Density Estimation: given a finite

More information

STA 4273H: Sta-s-cal Machine Learning

STA 4273H: Sta-s-cal Machine Learning STA 4273H: Sta-s-cal Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistical Sciences! rsalakhu@cs.toronto.edu! h0p://www.cs.utoronto.ca/~rsalakhu/ Lecture 2 In our

More information

ECE521 week 3: 23/26 January 2017

ECE521 week 3: 23/26 January 2017 ECE521 week 3: 23/26 January 2017 Outline Probabilistic interpretation of linear regression - Maximum likelihood estimation (MLE) - Maximum a posteriori (MAP) estimation Bias-variance trade-off Linear

More information

Gaussian and Linear Discriminant Analysis; Multiclass Classification

Gaussian and Linear Discriminant Analysis; Multiclass Classification Gaussian and Linear Discriminant Analysis; Multiclass Classification Professor Ameet Talwalkar Slide Credit: Professor Fei Sha Professor Ameet Talwalkar CS260 Machine Learning Algorithms October 13, 2015

More information

Algorithm Design Strategies V

Algorithm Design Strategies V Algorithm Design Strategies V Joaquim Madeira Version 0.0 October 2016 U. Aveiro, October 2016 1 Overview The 0-1 Knapsack Problem Revisited The Fractional Knapsack Problem Greedy Algorithms Example Coin

More information

Last updated: Oct 22, 2012 LINEAR CLASSIFIERS. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition

Last updated: Oct 22, 2012 LINEAR CLASSIFIERS. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition Last updated: Oct 22, 2012 LINEAR CLASSIFIERS Problems 2 Please do Problem 8.3 in the textbook. We will discuss this in class. Classification: Problem Statement 3 In regression, we are modeling the relationship

More information

Ensembles. Léon Bottou COS 424 4/8/2010

Ensembles. Léon Bottou COS 424 4/8/2010 Ensembles Léon Bottou COS 424 4/8/2010 Readings T. G. Dietterich (2000) Ensemble Methods in Machine Learning. R. E. Schapire (2003): The Boosting Approach to Machine Learning. Sections 1,2,3,4,6. Léon

More information

Computer Vision Group Prof. Daniel Cremers. 9. Gaussian Processes - Regression

Computer Vision Group Prof. Daniel Cremers. 9. Gaussian Processes - Regression Group Prof. Daniel Cremers 9. Gaussian Processes - Regression Repetition: Regularized Regression Before, we solved for w using the pseudoinverse. But: we can kernelize this problem as well! First step:

More information

What is an integer program? Modelling with Integer Variables. Mixed Integer Program. Let us start with a linear program: max cx s.t.

What is an integer program? Modelling with Integer Variables. Mixed Integer Program. Let us start with a linear program: max cx s.t. Modelling with Integer Variables jesla@mandtudk Department of Management Engineering Technical University of Denmark What is an integer program? Let us start with a linear program: st Ax b x 0 where A

More information

Eliciting Consumer Preferences using Robust Adaptive Choice Questionnaires

Eliciting Consumer Preferences using Robust Adaptive Choice Questionnaires Eliciting Consumer Preferences using Robust Adaptive Choice Questionnaires JACOB ABERNETHY, THEODOROS EVGENIOU, OLIVIER TOUBIA, and JEAN-PHILIPPE VERT 1 1 Jacob Abernethy is a graduate student at Toyota

More information

COMP90051 Statistical Machine Learning

COMP90051 Statistical Machine Learning COMP90051 Statistical Machine Learning Semester 2, 2017 Lecturer: Trevor Cohn 2. Statistical Schools Adapted from slides by Ben Rubinstein Statistical Schools of Thought Remainder of lecture is to provide

More information

Machine Learning (CS 567) Lecture 5

Machine Learning (CS 567) Lecture 5 Machine Learning (CS 567) Lecture 5 Time: T-Th 5:00pm - 6:20pm Location: GFS 118 Instructor: Sofus A. Macskassy (macskass@usc.edu) Office: SAL 216 Office hours: by appointment Teaching assistant: Cheol

More information

Probabilistic Graphical Models & Applications

Probabilistic Graphical Models & Applications Probabilistic Graphical Models & Applications Learning of Graphical Models Bjoern Andres and Bernt Schiele Max Planck Institute for Informatics The slides of today s lecture are authored by and shown with

More information

Neural Network Training

Neural Network Training Neural Network Training Sargur Srihari Topics in Network Training 0. Neural network parameters Probabilistic problem formulation Specifying the activation and error functions for Regression Binary classification

More information

Welcome to... Problem Analysis and Complexity Theory , 3 VU

Welcome to... Problem Analysis and Complexity Theory , 3 VU Welcome to... Problem Analysis and Complexity Theory 716.054, 3 VU Birgit Vogtenhuber Institute for Software Technology email: bvogt@ist.tugraz.at office: Inffeldgasse 16B/II, room IC02044 slides: http://www.ist.tugraz.at/pact17.html

More information

Large-scale optimization and decomposition methods: outline. Column Generation and Cutting Plane methods: a unified view

Large-scale optimization and decomposition methods: outline. Column Generation and Cutting Plane methods: a unified view Large-scale optimization and decomposition methods: outline I Solution approaches for large-scaled problems: I Delayed column generation I Cutting plane methods (delayed constraint generation) 7 I Problems

More information

Design of Text Mining Experiments. Matt Taddy, University of Chicago Booth School of Business faculty.chicagobooth.edu/matt.

Design of Text Mining Experiments. Matt Taddy, University of Chicago Booth School of Business faculty.chicagobooth.edu/matt. Design of Text Mining Experiments Matt Taddy, University of Chicago Booth School of Business faculty.chicagobooth.edu/matt.taddy/research Active Learning: a flavor of design of experiments Optimal : consider

More information

Unit 1A: Computational Complexity

Unit 1A: Computational Complexity Unit 1A: Computational Complexity Course contents: Computational complexity NP-completeness Algorithmic Paradigms Readings Chapters 3, 4, and 5 Unit 1A 1 O: Upper Bounding Function Def: f(n)= O(g(n)) if

More information

Artificial Neural Networks

Artificial Neural Networks Artificial Neural Networks Stephan Dreiseitl University of Applied Sciences Upper Austria at Hagenberg Harvard-MIT Division of Health Sciences and Technology HST.951J: Medical Decision Support Knowledge

More information

Probabilistic classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2016

Probabilistic classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2016 Probabilistic classification CE-717: Machine Learning Sharif University of Technology M. Soleymani Fall 2016 Topics Probabilistic approach Bayes decision theory Generative models Gaussian Bayes classifier

More information

NP-Completeness. Andreas Klappenecker. [based on slides by Prof. Welch]

NP-Completeness. Andreas Klappenecker. [based on slides by Prof. Welch] NP-Completeness Andreas Klappenecker [based on slides by Prof. Welch] 1 Prelude: Informal Discussion (Incidentally, we will never get very formal in this course) 2 Polynomial Time Algorithms Most of the

More information

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 2: PROBABILITY DISTRIBUTIONS

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 2: PROBABILITY DISTRIBUTIONS PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 2: PROBABILITY DISTRIBUTIONS Parametric Distributions Basic building blocks: Need to determine given Representation: or? Recall Curve Fitting Binary Variables

More information

Why should you care about the solution strategies?

Why should you care about the solution strategies? Optimization Why should you care about the solution strategies? Understanding the optimization approaches behind the algorithms makes you more effectively choose which algorithm to run Understanding the

More information

CSE 546 Final Exam, Autumn 2013

CSE 546 Final Exam, Autumn 2013 CSE 546 Final Exam, Autumn 0. Personal info: Name: Student ID: E-mail address:. There should be 5 numbered pages in this exam (including this cover sheet).. You can use any material you brought: any book,

More information

FINAL: CS 6375 (Machine Learning) Fall 2014

FINAL: CS 6375 (Machine Learning) Fall 2014 FINAL: CS 6375 (Machine Learning) Fall 2014 The exam is closed book. You are allowed a one-page cheat sheet. Answer the questions in the spaces provided on the question sheets. If you run out of room for

More information

Linear Classification. CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington

Linear Classification. CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington Linear Classification CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington 1 Example of Linear Classification Red points: patterns belonging

More information

Machine Learning Lecture 7

Machine Learning Lecture 7 Course Outline Machine Learning Lecture 7 Fundamentals (2 weeks) Bayes Decision Theory Probability Density Estimation Statistical Learning Theory 23.05.2016 Discriminative Approaches (5 weeks) Linear Discriminant

More information

Shortening Picking Distance by using Rank-Order Clustering and Genetic Algorithm for Distribution Centers

Shortening Picking Distance by using Rank-Order Clustering and Genetic Algorithm for Distribution Centers Shortening Picking Distance by using Rank-Order Clustering and Genetic Algorithm for Distribution Centers Rong-Chang Chen, Yi-Ru Liao, Ting-Yao Lin, Chia-Hsin Chuang, Department of Distribution Management,

More information

Quantum computing with superconducting qubits Towards useful applications

Quantum computing with superconducting qubits Towards useful applications Quantum computing with superconducting qubits Towards useful applications Stefan Filipp IBM Research Zurich Switzerland Forum Teratec 2018 June 20, 2018 Palaiseau, France Why Quantum Computing? Why now?

More information

Parameter estimation Conditional risk

Parameter estimation Conditional risk Parameter estimation Conditional risk Formalizing the problem Specify random variables we care about e.g., Commute Time e.g., Heights of buildings in a city We might then pick a particular distribution

More information

SVAN 2016 Mini Course: Stochastic Convex Optimization Methods in Machine Learning

SVAN 2016 Mini Course: Stochastic Convex Optimization Methods in Machine Learning SVAN 2016 Mini Course: Stochastic Convex Optimization Methods in Machine Learning Mark Schmidt University of British Columbia, May 2016 www.cs.ubc.ca/~schmidtm/svan16 Some images from this lecture are

More information

Regression. Machine Learning and Pattern Recognition. Chris Williams. School of Informatics, University of Edinburgh.

Regression. Machine Learning and Pattern Recognition. Chris Williams. School of Informatics, University of Edinburgh. Regression Machine Learning and Pattern Recognition Chris Williams School of Informatics, University of Edinburgh September 24 (All of the slides in this course have been adapted from previous versions

More information

10701/15781 Machine Learning, Spring 2007: Homework 2

10701/15781 Machine Learning, Spring 2007: Homework 2 070/578 Machine Learning, Spring 2007: Homework 2 Due: Wednesday, February 2, beginning of the class Instructions There are 4 questions on this assignment The second question involves coding Do not attach

More information

CSC2515 Winter 2015 Introduction to Machine Learning. Lecture 2: Linear regression

CSC2515 Winter 2015 Introduction to Machine Learning. Lecture 2: Linear regression CSC2515 Winter 2015 Introduction to Machine Learning Lecture 2: Linear regression All lecture slides will be available as.pdf on the course website: http://www.cs.toronto.edu/~urtasun/courses/csc2515/csc2515_winter15.html

More information

Behavioral Data Mining. Lecture 7 Linear and Logistic Regression

Behavioral Data Mining. Lecture 7 Linear and Logistic Regression Behavioral Data Mining Lecture 7 Linear and Logistic Regression Outline Linear Regression Regularization Logistic Regression Stochastic Gradient Fast Stochastic Methods Performance tips Linear Regression

More information

Solving LP and MIP Models with Piecewise Linear Objective Functions

Solving LP and MIP Models with Piecewise Linear Objective Functions Solving LP and MIP Models with Piecewise Linear Obective Functions Zonghao Gu Gurobi Optimization Inc. Columbus, July 23, 2014 Overview } Introduction } Piecewise linear (PWL) function Convex and convex

More information

PATTERN RECOGNITION AND MACHINE LEARNING

PATTERN RECOGNITION AND MACHINE LEARNING PATTERN RECOGNITION AND MACHINE LEARNING Chapter 1. Introduction Shuai Huang April 21, 2014 Outline 1 What is Machine Learning? 2 Curve Fitting 3 Probability Theory 4 Model Selection 5 The curse of dimensionality

More information

Short Course Robust Optimization and Machine Learning. 3. Optimization in Supervised Learning

Short Course Robust Optimization and Machine Learning. 3. Optimization in Supervised Learning Short Course Robust Optimization and 3. Optimization in Supervised EECS and IEOR Departments UC Berkeley Spring seminar TRANSP-OR, Zinal, Jan. 16-19, 2012 Outline Overview of Supervised models and variants

More information

15.081J/6.251J Introduction to Mathematical Programming. Lecture 24: Discrete Optimization

15.081J/6.251J Introduction to Mathematical Programming. Lecture 24: Discrete Optimization 15.081J/6.251J Introduction to Mathematical Programming Lecture 24: Discrete Optimization 1 Outline Modeling with integer variables Slide 1 What is a good formulation? Theme: The Power of Formulations

More information

Logistic Regression. Jia-Bin Huang. Virginia Tech Spring 2019 ECE-5424G / CS-5824

Logistic Regression. Jia-Bin Huang. Virginia Tech Spring 2019 ECE-5424G / CS-5824 Logistic Regression Jia-Bin Huang ECE-5424G / CS-5824 Virginia Tech Spring 2019 Administrative Please start HW 1 early! Questions are welcome! Two principles for estimating parameters Maximum Likelihood

More information

SCUOLA DI SPECIALIZZAZIONE IN FISICA MEDICA. Sistemi di Elaborazione dell Informazione. Regressione. Ruggero Donida Labati

SCUOLA DI SPECIALIZZAZIONE IN FISICA MEDICA. Sistemi di Elaborazione dell Informazione. Regressione. Ruggero Donida Labati SCUOLA DI SPECIALIZZAZIONE IN FISICA MEDICA Sistemi di Elaborazione dell Informazione Regressione Ruggero Donida Labati Dipartimento di Informatica via Bramante 65, 26013 Crema (CR), Italy http://homes.di.unimi.it/donida

More information

Mathematical Optimization Models and Applications

Mathematical Optimization Models and Applications Mathematical Optimization Models and Applications Yinyu Ye Department of Management Science and Engineering Stanford University Stanford, CA 94305, U.S.A. http://www.stanford.edu/ yyye Chapters 1, 2.1-2,

More information

Machine Learning. Bayesian Regression & Classification. Marc Toussaint U Stuttgart

Machine Learning. Bayesian Regression & Classification. Marc Toussaint U Stuttgart Machine Learning Bayesian Regression & Classification learning as inference, Bayesian Kernel Ridge regression & Gaussian Processes, Bayesian Kernel Logistic Regression & GP classification, Bayesian Neural

More information

Curve Fitting Re-visited, Bishop1.2.5

Curve Fitting Re-visited, Bishop1.2.5 Curve Fitting Re-visited, Bishop1.2.5 Maximum Likelihood Bishop 1.2.5 Model Likelihood differentiation p(t x, w, β) = Maximum Likelihood N N ( t n y(x n, w), β 1). (1.61) n=1 As we did in the case of the

More information

STA414/2104. Lecture 11: Gaussian Processes. Department of Statistics

STA414/2104. Lecture 11: Gaussian Processes. Department of Statistics STA414/2104 Lecture 11: Gaussian Processes Department of Statistics www.utstat.utoronto.ca Delivered by Mark Ebden with thanks to Russ Salakhutdinov Outline Gaussian Processes Exam review Course evaluations

More information

Bayesian Methods for Machine Learning

Bayesian Methods for Machine Learning Bayesian Methods for Machine Learning CS 584: Big Data Analytics Material adapted from Radford Neal s tutorial (http://ftp.cs.utoronto.ca/pub/radford/bayes-tut.pdf), Zoubin Ghahramni (http://hunch.net/~coms-4771/zoubin_ghahramani_bayesian_learning.pdf),

More information

Computer Vision Group Prof. Daniel Cremers. 2. Regression (cont.)

Computer Vision Group Prof. Daniel Cremers. 2. Regression (cont.) Prof. Daniel Cremers 2. Regression (cont.) Regression with MLE (Rep.) Assume that y is affected by Gaussian noise : t = f(x, w)+ where Thus, we have p(t x, w, )=N (t; f(x, w), 2 ) 2 Maximum A-Posteriori

More information

Part 6: Multivariate Normal and Linear Models

Part 6: Multivariate Normal and Linear Models Part 6: Multivariate Normal and Linear Models 1 Multiple measurements Up until now all of our statistical models have been univariate models models for a single measurement on each member of a sample of

More information

Introduction to Logistic Regression

Introduction to Logistic Regression Introduction to Logistic Regression Guy Lebanon Binary Classification Binary classification is the most basic task in machine learning, and yet the most frequent. Binary classifiers often serve as the

More information

Improved methods for the Travelling Salesperson with Hotel Selection

Improved methods for the Travelling Salesperson with Hotel Selection Improved methods for the Travelling Salesperson with Hotel Selection Marco Castro marco.castro@ua.ac.be ANT/OR November 23rd, 2011 Contents Problem description Motivation Mathematical formulation Solution

More information

Scheduling and Optimization Course (MPRI)

Scheduling and Optimization Course (MPRI) MPRI Scheduling and optimization: lecture p. /6 Scheduling and Optimization Course (MPRI) Leo Liberti LIX, École Polytechnique, France MPRI Scheduling and optimization: lecture p. /6 Teachers Christoph

More information

Computer Vision Group Prof. Daniel Cremers. 4. Gaussian Processes - Regression

Computer Vision Group Prof. Daniel Cremers. 4. Gaussian Processes - Regression Group Prof. Daniel Cremers 4. Gaussian Processes - Regression Definition (Rep.) Definition: A Gaussian process is a collection of random variables, any finite number of which have a joint Gaussian distribution.

More information

Probabilistic Machine Learning. Industrial AI Lab.

Probabilistic Machine Learning. Industrial AI Lab. Probabilistic Machine Learning Industrial AI Lab. Probabilistic Linear Regression Outline Probabilistic Classification Probabilistic Clustering Probabilistic Dimension Reduction 2 Probabilistic Linear

More information

INTRODUCTION TO PATTERN

INTRODUCTION TO PATTERN INTRODUCTION TO PATTERN RECOGNITION INSTRUCTOR: WEI DING 1 Pattern Recognition Automatic discovery of regularities in data through the use of computer algorithms With the use of these regularities to take

More information

Indicator Constraints in Mixed-Integer Programming

Indicator Constraints in Mixed-Integer Programming Indicator Constraints in Mixed-Integer Programming Andrea Lodi University of Bologna, Italy - andrea.lodi@unibo.it Amaya Nogales-Gómez, Universidad de Sevilla, Spain Pietro Belotti, FICO, UK Matteo Fischetti,

More information

Asymmetric Traveling Salesman Problem (ATSP): Models

Asymmetric Traveling Salesman Problem (ATSP): Models Asymmetric Traveling Salesman Problem (ATSP): Models Given a DIRECTED GRAPH G = (V,A) with V = {,, n} verte set A = {(i, j) : i V, j V} arc set (complete digraph) c ij = cost associated with arc (i, j)

More information