Computational Oracle Inequalities for Large Scale Model Selection Problems
|
|
- Edmund French
- 6 years ago
- Views:
Transcription
1 for Large Scale Model Selection Problems University of California at Berkeley Queensland University of Technology ETH Zürich, September 2011 Joint work with Alekh Agarwal, John Duchi and Clément Levrard.
2 Large Scale Data Analysis Observation: For many prediction problems, the amount of data available is effectively unlimited.
3 Large Scale Data Analysis Observation: For many prediction problems, the amount of data available is effectively unlimited. Information retrieval: Web search 10 8 websites pages queries/day.
4 Large Scale Data Analysis Observation: For many prediction problems, the amount of data available is effectively unlimited. Natural language processing: Spelling correction Google Linguistics Data Consortium n-gram corpus: sentences.
5 Large Scale Data Analysis Observation: For many prediction problems, the amount of data available is effectively unlimited. Computer vision: Captions Facebook: photos.
6 Large Scale Data Analysis Observation: For many prediction problems, the amount of data available is effectively unlimited. Information retrieval: Web search Natural language processing: Spelling correction Computer vision: Captions
7 Large Scale Data Analysis Observation: For many prediction problems, performance is limited by computational resources, not sample size. Information retrieval: Web search Natural language processing: Spelling correction Computer vision: Captions
8 Large Scale Data Analysis Example: Peter Norvig, Internet-Scale Data Analysis : On a spelling correction problem, trivial prediction rules, estimated with a massive dataset perform much better than complex prediction rules (which allow only a dataset of modest size). Given a limited computational budget, what is the best trade-off? That is, should we spend our computation on gathering more data, or on estimating richer prediction rules?
9 1 Computation is precious, not sample size Model selection Oracle inequalities 2 Problem formulation 3 4
10 Prediction Problem Model selection Oracle inequalities i.i.d. Z 1, Z 2,..., Z n, Z from Z. Use data Z 1,..., Z n to choose f n : Z A from a class F. Aim to ensure f n has small risk: L(f ) = El(f, Z), where l : F Z is a loss function.
11 Model selection Oracle inequalities Approximation-Estimation Trade-Off Define the Bayes risk, L = inf f L(f ), where the infimum is over measurable f. We can decompose the excess risk as ( ) L(f n ) L = L(f n ) inf L(f ) f F }{{} estimation error ( + ) inf L(f ) L. f F }{{} approximation error Model selection: automatically choose F to optimize this trade-off.
12 The Model Selection Problem Model selection Oracle inequalities Given function classes F 1,..., F K, use the data Z 1,..., Z n to choose f n K i=1 F i that gives a good trade-off between the approximation error and the estimation error. Example: Complexity-penalized model selection. f i n = arg min f F i L n (f ), f n = minimizer of L n (f i n) + γ i (n), where γ i (n) is a complexity penalty and L n is the empirical risk: L n (f ) = 1 n n l(f, Z i ). i=1
13 A Simple Model selection Oracle inequalities Theorem Suppose that we have risk bounds for each F i : w.p. 1 δ, log 1/δ sup L(f ) L n (f ) γ i (n) + c. f F i n If f n is chosen via complexity regularization: f i n = arg min f F i L n (f ), f n = minimizer of L n (f i n) + γ i (n), then with probability 1 δ, ( L(f n ) min i inf L(f ) + 2γ i (n) + c f F i ) log 1/δ + log i. n
14 A Simple Model selection Oracle inequalities Notice that, for each F i satisfying log 1/δ sup L(f ) L n (f ) γ i (n) + c, f F i n we have log 1/δ L(fn) i inf L(f ) + 2γ i (n) + c. f F i n But complexity regularization gives f n satisfying ( ) log 1/δ + log i L(f n ) min inf L(f ) + 2γ i (n) + c. i f F i n Thus, f n gives a near-optimal trade-off between the approximation error and the (bound on) estimation error, with only a log i penalty.
15 Computation versus sample size Model selection Oracle inequalities Complexity regularization involves computation of the empirical risk minimizer for each F i : f i n = arg min f F i L n (f ), f n = minimizer of L n (f i n) + γ i (n), So computation typically grows linearly with K. The oracle inequality gives the best trade-off for a given sample size: ( ) log 1/δ + log i L(f n ) min inf L(f ) + 2γ i (n) + c. i f F i n
16 Model selection Oracle inequalities 1 Computation is precious, not sample size Model selection Oracle inequalities 2 Problem formulation 3 4
17 Problem formulation Linear sample size scaling with computation Assumption For class F i : It takes unit time to use a sample of size n i to choose f ni F i. It takes time t to use a sample of size tn i to choose f tni F i. Our goal: A computational oracle inequality: f n compares favorably with each model, estimated using the entire computational budget. L(f n ) min i inf L(f ) + γ i (Tn i ) f F } i {{} c.f. estimate f using the entire budget +O log 1/δ Tn i.
18 Problem formulation Linear sample size scaling with computation Assumption For class F i : It takes unit time to use a sample of size n i to choose f ni F i. It takes time t to use a sample of size tn i to choose f tni F i. Our goal: A computational oracle inequality: f n compares favorably with each model, estimated using the entire computational budget. L(f n ) min i inf L(f ) + γ i (Tn i ) f F } i {{} c.f. estimate f using the entire budget +O log i/δ Tn i.
19 Naïve solution: grid search Problem formulation Allocate budget T /K to each model. Use a sample of size Tn i /K for F i. Choose f i Tn i /K = arg min f F i L Tni /K (f ), f n = minimizer of L Tni /K (f i Tn i /K ) + γ i Satisfies oracle inequality L(f n ) min i ( inf f F i L(f ) + γ i ( ) ( Tni + O K ( ) Tni. K K Tn i )).
20 Problem formulation Model selection from nested classes Suppose that the models are ordered by inclusion: F 1 F 2 F K. Examples: F i = { θ R d : θ r i }, r1 r 2 r K. F i = { θ R d i : θ 1 }, d 1 d 2 d K. Suppose that we have risk bounds for each F i : w.p. 1 δ, log 1/δ sup L(f ) L n (f ) γ i (n) + c. f F i n
21 Problem formulation Exploiting structure of nested classes Want to exploit monotonicity of risks and penalties Excess risk, R i = inf f Fi L(f ) L : Penalty, γ i (n): R i γi(n) i i
22 Coarse grid sets Problem formulation Want to spend computation on only few classes. Use monotonicity to interpolate for the rest. Partition based on penalty values. (1 + λ) (j+1) (1 + λ) j γi(n) (1 + λ) λ 1 F1F2 Fj Fj+1 i Coarse Grid
23 Coarse grids for model selection Problem formulation Definition (Coarse grid) A set S {1,..., K} is a coarse grid of size s if S = s and for each i {1,..., K} there is an index j S such that ( ) ( ) ( ) Tni Tnj Tni γ i γ j 2γ i. s s s Include a new class only after penalty increases sufficiently S = O(log K) is typical.
24 Problem formulation Complexity regularization on a coarse grid Given a coarse grid set S with cardinality s: 1 Allocate budget T /s to each class in S. 2 Choose f i Tn i /s = arg min f F i L Tni /s(f ) ( Tnj f = arg min L Tnj /s(f ) + γ j f {f j Tn j /s :j S} s ).
25 Problem formulation Complexity regularization on a coarse grid Theorem With probability at least 1 δ { ( )} Tni L(ˆf ) min inf L(f ) + 4γ i i {1,...,K} f F i s + O s log K + log 1/δ Tn K
26 Problem formulation Complexity regularization on a coarse grid Corollary For a coarse grid of logarithmic size, that is, S c log K, With probability at least 1 δ { ( )} Tni L(ˆf ) min inf L(f ) + 4γ i i {1,...,K} f F i c log K log + O 2 K + log 1 δ Tn K Both the statistical and computational costs scale logarithmically with K.
27 Problem formulation 1 Computation is precious, not sample size Model selection Oracle inequalities 2 Problem formulation 3 4
28 Heterogeneous Models In general, the F i can be heterogeneous, not ordered by inclusion. Different kernels. Graphs in directed graphical models. Subsets of features. Key idea: Successively allocate computational quanta online.
29 Multi-Armed Bandits for Model Selection Want class i that minimizes inf L(f ) + γ i (Tn i ). f F i Use idea of optimism in the face of uncertainty: neatly trade off exploration and exploitation by choosing the class with the smallest lower bound on the criterion. i t
30 Multi-Armed Bandits for Model Selection Want class i that minimizes inf L(f ) + γ i (Tn i ). f F i We know it suffices to choose a class i to minimize L Tni (f i Tn i ) + γ i (Tn i ). Use the lower confidence bound: log K L n (fn) i γ i (n) + γ i (Tn i ), n where n is the size of the sample that we have allocated already to class i.
31 Multi-Armed Bandits for Model Selection picks class i t with smallest lower confidence bound. Allocate additional sample of size n it to class i t. Regret analysis of upper-confidence-bound algorithm (Auer et al., 2002) extends to give oracle inequalities. i t
32 Oracle inequality under separation assumption ( ) Define i = arg min inf L(f ) + γ i (Tn i ), i f F i ( Assume γ i (n) = c i n. i = inf f F i L(f ) + γ i (Tn i ) inf L(f ) + γ i (Tn i ) f F i ). Theorem Let T i (T ) be the number of times class i is queried. There are constants C, κ 1, κ 2 such that with probability at least 1 κ 1 TK 4, T i (T ) C n i ( ci + κ 2 log T i ) 2.
33 Oracle inequality under separation assumption ( ) Define i = arg min inf L(f ) + γ i (Tn i ), i f F i i = inf L(f ) + γ i (Tn i ) inf L(f ) + γ i (Tn i ). f F i f F i If we can incrementally update the choice f i n, then the fraction of budget that is assigned to a suboptimal class i is no more than log T /(n i T 2 i ). This is essentially optimal (Lai and Robbins, 1985).
34 Oracle inequality without separation Assume that functions in {F i } map to a vector space, and the loss l(, z) is convex. Define ˆf T = 1 T T t=1 f t, where algorithm produces f t F it at time t. Theorem There is a constant κ such that with probability at least 1 2κ TK 3 ( ) L(ˆf T ) = inf inf L(f ) + γ i (Tn i ) i {1,...,K} f F i ( ) K max{log T, log K} + O. T Linear dependence on K.
35 Summary For large-scale problems, data is cheap but computation is precious. Computational oracle inequalities for model selection: select a near-optimal model without wasting computation on other models. A nested complexity hierarchy ensures cost logarithmic in size of hierarchy. If not nested, cost of model selection is linear in size of hierarchy.
36 Open problems Fast rates: If excess risk and excess empirical risk are close, then penalized empirical risk gives oracle inequalities with fast rates. Can this be exploited to give computational oracle inequalities in these cases? For nested hierarchies, the analysis relied on a coarse multiplicative cover of the penalty values. If the penalties are data-dependent, when is this approach possible? What other structures on function classes lead to good computational oracle inequalities?
37 Fast Rates Theorem For F 1 F 2 and γ 1 (n) γ 2 (n), if sup k sup k sup (L(f ) L(fi ) 2 (L n (f ) L n (fi )) γ i (n)) 0, f F i sup (L n (f ) L n (fi ) 2 (L(f ) L(fi )) γ i (n)) 0, f F i then L(f n ) inf i (L(f i ) + 9γ i (n)), where f n minimizes L n (f i n) + 7γ i (n)/2 and f i = arg min f Fi L(f ). Examples: Convex losses, classification with low noise. Large increase in expected minimal empirical risk with sample size makes it challenging to avoid sometimes choosing a class that is too large.
Fast Rates for Estimation Error and Oracle Inequalities for Model Selection
Fast Rates for Estimation Error and Oracle Inequalities for Model Selection Peter L. Bartlett Computer Science Division and Department of Statistics University of California, Berkeley bartlett@cs.berkeley.edu
More informationThe Multi-Armed Bandit Problem
The Multi-Armed Bandit Problem Electrical and Computer Engineering December 7, 2013 Outline 1 2 Mathematical 3 Algorithm Upper Confidence Bound Algorithm A/B Testing Exploration vs. Exploitation Scientist
More informationStratégies bayésiennes et fréquentistes dans un modèle de bandit
Stratégies bayésiennes et fréquentistes dans un modèle de bandit thèse effectuée à Telecom ParisTech, co-dirigée par Olivier Cappé, Aurélien Garivier et Rémi Munos Journées MAS, Grenoble, 30 août 2016
More informationTwo optimization problems in a stochastic bandit model
Two optimization problems in a stochastic bandit model Emilie Kaufmann joint work with Olivier Cappé, Aurélien Garivier and Shivaram Kalyanakrishnan Journées MAS 204, Toulouse Outline From stochastic optimization
More informationIntroduction to Bandit Algorithms. Introduction to Bandit Algorithms
Stochastic K-Arm Bandit Problem Formulation Consider K arms (actions) each correspond to an unknown distribution {ν k } K k=1 with values bounded in [0, 1]. At each time t, the agent pulls an arm I t {1,...,
More informationOrdinal Optimization and Multi Armed Bandit Techniques
Ordinal Optimization and Multi Armed Bandit Techniques Sandeep Juneja. with Peter Glynn September 10, 2014 The ordinal optimization problem Determining the best of d alternative designs for a system, on
More informationLearning Methods for Online Prediction Problems. Peter Bartlett Statistics and EECS UC Berkeley
Learning Methods for Online Prediction Problems Peter Bartlett Statistics and EECS UC Berkeley Course Synopsis A finite comparison class: A = {1,..., m}. 1. Prediction with expert advice. 2. With perfect
More informationLearning with Rejection
Learning with Rejection Corinna Cortes 1, Giulia DeSalvo 2, and Mehryar Mohri 2,1 1 Google Research, 111 8th Avenue, New York, NY 2 Courant Institute of Mathematical Sciences, 251 Mercer Street, New York,
More informationMulti-Armed Bandits. Credit: David Silver. Google DeepMind. Presenter: Tianlu Wang
Multi-Armed Bandits Credit: David Silver Google DeepMind Presenter: Tianlu Wang Credit: David Silver (DeepMind) Multi-Armed Bandits Presenter: Tianlu Wang 1 / 27 Outline 1 Introduction Exploration vs.
More informationMulti-armed Bandits in the Presence of Side Observations in Social Networks
52nd IEEE Conference on Decision and Control December 0-3, 203. Florence, Italy Multi-armed Bandits in the Presence of Side Observations in Social Networks Swapna Buccapatnam, Atilla Eryilmaz, and Ness
More informationOrdinal optimization - Empirical large deviations rate estimators, and multi-armed bandit methods
Ordinal optimization - Empirical large deviations rate estimators, and multi-armed bandit methods Sandeep Juneja Tata Institute of Fundamental Research Mumbai, India joint work with Peter Glynn Applied
More informationHybrid Machine Learning Algorithms
Hybrid Machine Learning Algorithms Umar Syed Princeton University Includes joint work with: Rob Schapire (Princeton) Nina Mishra, Alex Slivkins (Microsoft) Common Approaches to Machine Learning!! Supervised
More informationAdvanced Machine Learning
Advanced Machine Learning Bandit Problems MEHRYAR MOHRI MOHRI@ COURANT INSTITUTE & GOOGLE RESEARCH. Multi-Armed Bandit Problem Problem: which arm of a K-slot machine should a gambler pull to maximize his
More informationLearning Theory. Ingo Steinwart University of Stuttgart. September 4, 2013
Learning Theory Ingo Steinwart University of Stuttgart September 4, 2013 Ingo Steinwart University of Stuttgart () Learning Theory September 4, 2013 1 / 62 Basics Informal Introduction Informal Description
More informationOn the Complexity of Best Arm Identification in Multi-Armed Bandit Models
On the Complexity of Best Arm Identification in Multi-Armed Bandit Models Aurélien Garivier Institut de Mathématiques de Toulouse Information Theory, Learning and Big Data Simons Institute, Berkeley, March
More informationOnline Learning and Sequential Decision Making
Online Learning and Sequential Decision Making Emilie Kaufmann CNRS & CRIStAL, Inria SequeL, emilie.kaufmann@univ-lille.fr Research School, ENS Lyon, Novembre 12-13th 2018 Emilie Kaufmann Sequential Decision
More informationAlireza Shafaei. Machine Learning Reading Group The University of British Columbia Summer 2017
s s Machine Learning Reading Group The University of British Columbia Summer 2017 (OCO) Convex 1/29 Outline (OCO) Convex Stochastic Bernoulli s (OCO) Convex 2/29 At each iteration t, the player chooses
More informationDistirbutional robustness, regularizing variance, and adversaries
Distirbutional robustness, regularizing variance, and adversaries John Duchi Based on joint work with Hongseok Namkoong and Aman Sinha Stanford University November 2017 Motivation We do not want machine-learned
More informationRevisiting the Exploration-Exploitation Tradeoff in Bandit Models
Revisiting the Exploration-Exploitation Tradeoff in Bandit Models joint work with Aurélien Garivier (IMT, Toulouse) and Tor Lattimore (University of Alberta) Workshop on Optimization and Decision-Making
More informationStatistical Ranking Problem
Statistical Ranking Problem Tong Zhang Statistics Department, Rutgers University Ranking Problems Rank a set of items and display to users in corresponding order. Two issues: performance on top and dealing
More informationReinforcement Learning
Reinforcement Learning Lecture 5: Bandit optimisation Alexandre Proutiere, Sadegh Talebi, Jungseul Ok KTH, The Royal Institute of Technology Objectives of this lecture Introduce bandit optimisation: the
More informationCS281B/Stat241B. Statistical Learning Theory. Lecture 1.
CS281B/Stat241B. Statistical Learning Theory. Lecture 1. Peter Bartlett 1. Organizational issues. 2. Overview. 3. Probabilistic formulation of prediction problems. 4. Game theoretic formulation of prediction
More informationRegret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems, Part I. Sébastien Bubeck Theory Group
Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems, Part I Sébastien Bubeck Theory Group i.i.d. multi-armed bandit, Robbins [1952] i.i.d. multi-armed bandit, Robbins [1952] Known
More informationBandits : optimality in exponential families
Bandits : optimality in exponential families Odalric-Ambrym Maillard IHES, January 2016 Odalric-Ambrym Maillard Bandits 1 / 40 Introduction 1 Stochastic multi-armed bandits 2 Boundary crossing probabilities
More informationThe information complexity of sequential resource allocation
The information complexity of sequential resource allocation Emilie Kaufmann, joint work with Olivier Cappé, Aurélien Garivier and Shivaram Kalyanakrishan SMILE Seminar, ENS, June 8th, 205 Sequential allocation
More informationOn Bayesian bandit algorithms
On Bayesian bandit algorithms Emilie Kaufmann joint work with Olivier Cappé, Aurélien Garivier, Nathaniel Korda and Rémi Munos July 1st, 2012 Emilie Kaufmann (Telecom ParisTech) On Bayesian bandit algorithms
More informationA talk on Oracle inequalities and regularization. by Sara van de Geer
A talk on Oracle inequalities and regularization by Sara van de Geer Workshop Regularization in Statistics Banff International Regularization Station September 6-11, 2003 Aim: to compare l 1 and other
More informationKernels to detect abrupt changes in time series
1 UMR 8524 CNRS - Université Lille 1 2 Modal INRIA team-project 3 SSB group Paris joint work with S. Arlot, Z. Harchaoui, G. Rigaill, and G. Marot Computational and statistical trade-offs in learning IHES
More informationMulti-armed bandit models: a tutorial
Multi-armed bandit models: a tutorial CERMICS seminar, March 30th, 2016 Multi-Armed Bandit model: general setting K arms: for a {1,..., K}, (X a,t ) t N is a stochastic process. (unknown distributions)
More informationAdaptive Online Gradient Descent
University of Pennsylvania ScholarlyCommons Statistics Papers Wharton Faculty Research 6-4-2007 Adaptive Online Gradient Descent Peter Bartlett Elad Hazan Alexander Rakhlin University of Pennsylvania Follow
More informationCsaba Szepesvári 1. University of Alberta. Machine Learning Summer School, Ile de Re, France, 2008
LEARNING THEORY OF OPTIMAL DECISION MAKING PART I: ON-LINE LEARNING IN STOCHASTIC ENVIRONMENTS Csaba Szepesvári 1 1 Department of Computing Science University of Alberta Machine Learning Summer School,
More informationStochastic Contextual Bandits with Known. Reward Functions
Stochastic Contextual Bandits with nown 1 Reward Functions Pranav Sakulkar and Bhaskar rishnamachari Ming Hsieh Department of Electrical Engineering Viterbi School of Engineering University of Southern
More informationAn Estimation Based Allocation Rule with Super-linear Regret and Finite Lock-on Time for Time-dependent Multi-armed Bandit Processes
An Estimation Based Allocation Rule with Super-linear Regret and Finite Lock-on Time for Time-dependent Multi-armed Bandit Processes Prokopis C. Prokopiou, Peter E. Caines, and Aditya Mahajan McGill University
More informationThe deterministic Lasso
The deterministic Lasso Sara van de Geer Seminar für Statistik, ETH Zürich Abstract We study high-dimensional generalized linear models and empirical risk minimization using the Lasso An oracle inequality
More informationStructured Prediction Theory and Algorithms
Structured Prediction Theory and Algorithms Joint work with Corinna Cortes (Google Research) Vitaly Kuznetsov (Google Research) Scott Yang (Courant Institute) MEHRYAR MOHRI MOHRI@ COURANT INSTITUTE & GOOGLE
More informationAd Placement Strategies
Case Study : Estimating Click Probabilities Intro Logistic Regression Gradient Descent + SGD AdaGrad Machine Learning for Big Data CSE547/STAT548, University of Washington Emily Fox January 7 th, 04 Ad
More informationProbably Approximately Correct (PAC) Learning
ECE91 Spring 24 Statistical Regularization and Learning Theory Lecture: 6 Probably Approximately Correct (PAC) Learning Lecturer: Rob Nowak Scribe: Badri Narayan 1 Introduction 1.1 Overview of the Learning
More informationMachine Learning Theory (CS 6783)
Machine Learning Theory (CS 6783) Tu-Th 1:25 to 2:40 PM Hollister, 306 Instructor : Karthik Sridharan ABOUT THE COURSE No exams! 5 assignments that count towards your grades (55%) One term project (40%)
More informationThe multi armed-bandit problem
The multi armed-bandit problem (with covariates if we have time) Vianney Perchet & Philippe Rigollet LPMA Université Paris Diderot ORFE Princeton University Algorithms and Dynamics for Games and Optimization
More informationSparse Linear Contextual Bandits via Relevance Vector Machines
Sparse Linear Contextual Bandits via Relevance Vector Machines Davis Gilton and Rebecca Willett Electrical and Computer Engineering University of Wisconsin-Madison Madison, WI 53706 Email: gilton@wisc.edu,
More informationBandit Algorithms. Zhifeng Wang ... Department of Statistics Florida State University
Bandit Algorithms Zhifeng Wang Department of Statistics Florida State University Outline Multi-Armed Bandits (MAB) Exploration-First Epsilon-Greedy Softmax UCB Thompson Sampling Adversarial Bandits Exp3
More informationBudget-Optimal Task Allocation for Reliable Crowdsourcing Systems
Budget-Optimal Task Allocation for Reliable Crowdsourcing Systems Sewoong Oh Massachusetts Institute of Technology joint work with David R. Karger and Devavrat Shah September 28, 2011 1 / 13 Crowdsourcing
More informationModel Selection and Geometry
Model Selection and Geometry Pascal Massart Université Paris-Sud, Orsay Leipzig, February Purpose of the talk! Concentration of measure plays a fundamental role in the theory of model selection! Model
More informationBandit models: a tutorial
Gdt COS, December 3rd, 2015 Multi-Armed Bandit model: general setting K arms: for a {1,..., K}, (X a,t ) t N is a stochastic process. (unknown distributions) Bandit game: a each round t, an agent chooses
More informationOn the Complexity of Best Arm Identification with Fixed Confidence
On the Complexity of Best Arm Identification with Fixed Confidence Discrete Optimization with Noise Aurélien Garivier, Emilie Kaufmann COLT, June 23 th 2016, New York Institut de Mathématiques de Toulouse
More informationSubsampling, Concentration and Multi-armed bandits
Subsampling, Concentration and Multi-armed bandits Odalric-Ambrym Maillard, R. Bardenet, S. Mannor, A. Baransi, N. Galichet, J. Pineau, A. Durand Toulouse, November 09, 2015 O-A. Maillard Subsampling and
More informationSample and Computationally Efficient Active Learning. Maria-Florina Balcan Carnegie Mellon University
Sample and Computationally Efficient Active Learning Maria-Florina Balcan Carnegie Mellon University Machine Learning is Shaping the World Highly successful discipline with lots of applications. Computational
More information12. Structural Risk Minimization. ECE 830 & CS 761, Spring 2016
12. Structural Risk Minimization ECE 830 & CS 761, Spring 2016 1 / 23 General setup for statistical learning theory We observe training examples {x i, y i } n i=1 x i = features X y i = labels / responses
More informationOn maxitive integration
On maxitive integration Marco E. G. V. Cattaneo Department of Mathematics, University of Hull m.cattaneo@hull.ac.uk Abstract A functional is said to be maxitive if it commutes with the (pointwise supremum
More informationNotes on AdaGrad. Joseph Perla 2014
Notes on AdaGrad Joseph Perla 2014 1 Introduction Stochastic Gradient Descent (SGD) is a common online learning algorithm for optimizing convex (and often non-convex) functions in machine learning today.
More informationLecture 5: GPs and Streaming regression
Lecture 5: GPs and Streaming regression Gaussian Processes Information gain Confidence intervals COMP-652 and ECSE-608, Lecture 5 - September 19, 2017 1 Recall: Non-parametric regression Input space X
More informationLecture 3: Introduction to Complexity Regularization
ECE90 Spring 2007 Statistical Learning Theory Instructor: R. Nowak Lecture 3: Introduction to Complexity Regularization We ended the previous lecture with a brief discussion of overfitting. Recall that,
More informationPerformance Measures for Ranking and Selection Procedures
Rolf Waeber Performance Measures for Ranking and Selection Procedures 1/23 Performance Measures for Ranking and Selection Procedures Rolf Waeber Peter I. Frazier Shane G. Henderson Operations Research
More informationAn Introduction to Statistical Machine Learning - Theoretical Aspects -
An Introduction to Statistical Machine Learning - Theoretical Aspects - Samy Bengio bengio@idiap.ch Dalle Molle Institute for Perceptual Artificial Intelligence (IDIAP) CP 592, rue du Simplon 4 1920 Martigny,
More informationStatistical Properties of Large Margin Classifiers
Statistical Properties of Large Margin Classifiers Peter Bartlett Division of Computer Science and Department of Statistics UC Berkeley Joint work with Mike Jordan, Jon McAuliffe, Ambuj Tewari. slides
More informationAnalysis of Thompson Sampling for the multi-armed bandit problem
Analysis of Thompson Sampling for the multi-armed bandit problem Shipra Agrawal Microsoft Research India shipra@microsoft.com avin Goyal Microsoft Research India navingo@microsoft.com Abstract We show
More informationFull-information Online Learning
Introduction Expert Advice OCO LM A DA NANJING UNIVERSITY Full-information Lijun Zhang Nanjing University, China June 2, 2017 Outline Introduction Expert Advice OCO 1 Introduction Definitions Regret 2
More informationTutorial: PART 2. Online Convex Optimization, A Game- Theoretic Approach to Learning
Tutorial: PART 2 Online Convex Optimization, A Game- Theoretic Approach to Learning Elad Hazan Princeton University Satyen Kale Yahoo Research Exploiting curvature: logarithmic regret Logarithmic regret
More informationIntroducing strategic measure actions in multi-armed bandits
213 IEEE 24th International Symposium on Personal, Indoor and Mobile Radio Communications: Workshop on Cognitive Radio Medium Access Control and Network Solutions Introducing strategic measure actions
More informationFoundations of Machine Learning
Introduction to ML Mehryar Mohri Courant Institute and Google Research mohri@cims.nyu.edu page 1 Logistics Prerequisites: basics in linear algebra, probability, and analysis of algorithms. Workload: about
More informationClass 2 & 3 Overfitting & Regularization
Class 2 & 3 Overfitting & Regularization Carlo Ciliberto Department of Computer Science, UCL October 18, 2017 Last Class The goal of Statistical Learning Theory is to find a good estimator f n : X Y, approximating
More informationBayesian Contextual Multi-armed Bandits
Bayesian Contextual Multi-armed Bandits Xiaoting Zhao Joint Work with Peter I. Frazier School of Operations Research and Information Engineering Cornell University October 22, 2012 1 / 33 Outline 1 Motivating
More informationWorst-Case Bounds for Gaussian Process Models
Worst-Case Bounds for Gaussian Process Models Sham M. Kakade University of Pennsylvania Matthias W. Seeger UC Berkeley Abstract Dean P. Foster University of Pennsylvania We present a competitive analysis
More informationNew Algorithms for Contextual Bandits
New Algorithms for Contextual Bandits Lev Reyzin Georgia Institute of Technology Work done at Yahoo! 1 S A. Beygelzimer, J. Langford, L. Li, L. Reyzin, R.E. Schapire Contextual Bandit Algorithms with Supervised
More informationAn Optimal Bidimensional Multi Armed Bandit Auction for Multi unit Procurement
An Optimal Bidimensional Multi Armed Bandit Auction for Multi unit Procurement Satyanath Bhat Joint work with: Shweta Jain, Sujit Gujar, Y. Narahari Department of Computer Science and Automation, Indian
More informationLecture 6: Non-stochastic best arm identification
CSE599i: Online and Adaptive Machine Learning Winter 08 Lecturer: Kevin Jamieson Lecture 6: Non-stochastic best arm identification Scribes: Anran Wang, eibin Li, rian Chan, Shiqing Yu, Zhijin Zhou Disclaimer:
More informationOnline Learning and Online Convex Optimization
Online Learning and Online Convex Optimization Nicolò Cesa-Bianchi Università degli Studi di Milano N. Cesa-Bianchi (UNIMI) Online Learning 1 / 49 Summary 1 My beautiful regret 2 A supposedly fun game
More informationMachine Learning Theory (CS 6783)
Machine Learning Theory (CS 6783) Tu-Th 1:25 to 2:40 PM Kimball, B-11 Instructor : Karthik Sridharan ABOUT THE COURSE No exams! 5 assignments that count towards your grades (55%) One term project (40%)
More informationTwo generic principles in modern bandits: the optimistic principle and Thompson sampling
Two generic principles in modern bandits: the optimistic principle and Thompson sampling Rémi Munos INRIA Lille, France CSML Lunch Seminars, September 12, 2014 Outline Two principles: The optimistic principle
More informationAdaBoost and other Large Margin Classifiers: Convexity in Classification
AdaBoost and other Large Margin Classifiers: Convexity in Classification Peter Bartlett Division of Computer Science and Department of Statistics UC Berkeley Joint work with Mikhail Traskin. slides at
More informationBayesian and Frequentist Methods in Bandit Models
Bayesian and Frequentist Methods in Bandit Models Emilie Kaufmann, Telecom ParisTech Bayes In Paris, ENSAE, October 24th, 2013 Emilie Kaufmann (Telecom ParisTech) Bayesian and Frequentist Bandits BIP,
More informationMachine Learning And Applications: Supervised Learning-SVM
Machine Learning And Applications: Supervised Learning-SVM Raphaël Bournhonesque École Normale Supérieure de Lyon, Lyon, France raphael.bournhonesque@ens-lyon.fr 1 Supervised vs unsupervised learning Machine
More informationSupport Vector Machine
Support Vector Machine Fabrice Rossi SAMM Université Paris 1 Panthéon Sorbonne 2018 Outline Linear Support Vector Machine Kernelized SVM Kernels 2 From ERM to RLM Empirical Risk Minimization in the binary
More informationPlug-in Approach to Active Learning
Plug-in Approach to Active Learning Stanislav Minsker Stanislav Minsker (Georgia Tech) Plug-in approach to active learning 1 / 18 Prediction framework Let (X, Y ) be a random couple in R d { 1, +1}. X
More informationConfidence Measure Estimation in Dynamical Systems Model Input Set Selection
Confidence Measure Estimation in Dynamical Systems Model Input Set Selection Paul B. Deignan, Jr. Galen B. King Peter H. Meckl School of Mechanical Engineering Purdue University West Lafayette, IN 4797-88
More informationOptimisation séquentielle et application au design
Optimisation séquentielle et application au design d expériences Nicolas Vayatis Séminaire Aristote, Ecole Polytechnique - 23 octobre 2014 Joint work with Emile Contal (computer scientist, PhD student)
More informationLearning Exploration/Exploitation Strategies for Single Trajectory Reinforcement Learning
JMLR: Workshop and Conference Proceedings vol:1 8, 2012 10th European Workshop on Reinforcement Learning Learning Exploration/Exploitation Strategies for Single Trajectory Reinforcement Learning Michael
More informationDistributed Statistical Estimation and Rates of Convergence in Normal Approximation
Distributed Statistical Estimation and Rates of Convergence in Normal Approximation Stas Minsker (joint with Nate Strawn) Department of Mathematics, USC July 3, 2017 Colloquium on Concentration inequalities,
More informationLearning Algorithms for Minimizing Queue Length Regret
Learning Algorithms for Minimizing Queue Length Regret Thomas Stahlbuhk Massachusetts Institute of Technology Cambridge, MA Brooke Shrader MIT Lincoln Laboratory Lexington, MA Eytan Modiano Massachusetts
More informationStochastic Composition Optimization
Stochastic Composition Optimization Algorithms and Sample Complexities Mengdi Wang Joint works with Ethan X. Fang, Han Liu, and Ji Liu ORFE@Princeton ICCOPT, Tokyo, August 8-11, 2016 1 / 24 Collaborators
More informationTHE LASSO, CORRELATED DESIGN, AND IMPROVED ORACLE INEQUALITIES. By Sara van de Geer and Johannes Lederer. ETH Zürich
Submitted to the Annals of Applied Statistics arxiv: math.pr/0000000 THE LASSO, CORRELATED DESIGN, AND IMPROVED ORACLE INEQUALITIES By Sara van de Geer and Johannes Lederer ETH Zürich We study high-dimensional
More informationYevgeny Seldin. University of Copenhagen
Yevgeny Seldin University of Copenhagen Classical (Batch) Machine Learning Collect Data Data Assumption The samples are independent identically distributed (i.i.d.) Machine Learning Prediction rule New
More informationarxiv: v2 [math.oc] 8 Oct 2011
Stochastic convex optimization with bandit feedback Alekh Agarwal Dean P. Foster Daniel Hsu Sham M. Kakade, Alexander Rakhlin Department of EECS Department of Statistics Microsoft Research University of
More informationA first model of learning
A first model of learning Let s restrict our attention to binary classification our labels belong to (or ) We observe the data where each Suppose we are given an ensemble of possible hypotheses / classifiers
More informationReward Maximization Under Uncertainty: Leveraging Side-Observations on Networks
Reward Maximization Under Uncertainty: Leveraging Side-Observations Reward Maximization Under Uncertainty: Leveraging Side-Observations on Networks Swapna Buccapatnam AT&T Labs Research, Middletown, NJ
More informationMachine Learning. Lecture 9: Learning Theory. Feng Li.
Machine Learning Lecture 9: Learning Theory Feng Li fli@sdu.edu.cn https://funglee.github.io School of Computer Science and Technology Shandong University Fall 2018 Why Learning Theory How can we tell
More informationRegularization. CSCE 970 Lecture 3: Regularization. Stephen Scott and Vinod Variyam. Introduction. Outline
Other Measures 1 / 52 sscott@cse.unl.edu learning can generally be distilled to an optimization problem Choose a classifier (function, hypothesis) from a set of functions that minimizes an objective function
More information11. Learning graphical models
Learning graphical models 11-1 11. Learning graphical models Maximum likelihood Parameter learning Structural learning Learning partially observed graphical models Learning graphical models 11-2 statistical
More informationSupplemental Material for Discrete Graph Hashing
Supplemental Material for Discrete Graph Hashing Wei Liu Cun Mu Sanjiv Kumar Shih-Fu Chang IM T. J. Watson Research Center Columbia University Google Research weiliu@us.ibm.com cm52@columbia.edu sfchang@ee.columbia.edu
More informationl 1 -Regularized Linear Regression: Persistence and Oracle Inequalities
l -Regularized Linear Regression: Persistence and Oracle Inequalities Peter Bartlett EECS and Statistics UC Berkeley slides at http://www.stat.berkeley.edu/ bartlett Joint work with Shahar Mendelson and
More informationOnline Learning and Sequential Decision Making
Online Learning and Sequential Decision Making Emilie Kaufmann CNRS & CRIStAL, Inria SequeL, emilie.kaufmann@univ-lille.fr Research School, ENS Lyon, Novembre 12-13th 2018 Emilie Kaufmann Online Learning
More informationReport on Preliminary Experiments with Data Grid Models in the Agnostic Learning vs. Prior Knowledge Challenge
Report on Preliminary Experiments with Data Grid Models in the Agnostic Learning vs. Prior Knowledge Challenge Marc Boullé IJCNN 2007 08/2007 research & development Outline From univariate to multivariate
More informationOn the Generalization Ability of Online Strongly Convex Programming Algorithms
On the Generalization Ability of Online Strongly Convex Programming Algorithms Sham M. Kakade I Chicago Chicago, IL 60637 sham@tti-c.org Ambuj ewari I Chicago Chicago, IL 60637 tewari@tti-c.org Abstract
More informationPLC Papers Created For:
PLC Papers Created For: Daniel Inequalities Inequalities on number lines 1 Grade 4 Objective: Represent the solution of a linear inequality on a number line. Question 1 Draw diagrams to represent these
More information1.1 Basis of Statistical Decision Theory
ECE598: Information-theoretic methods in high-dimensional statistics Spring 2016 Lecture 1: Introduction Lecturer: Yihong Wu Scribe: AmirEmad Ghassami, Jan 21, 2016 [Ed. Jan 31] Outline: Introduction of
More informationPenalized Squared Error and Likelihood: Risk Bounds and Fast Algorithms
university-logo Penalized Squared Error and Likelihood: Risk Bounds and Fast Algorithms Andrew Barron Cong Huang Xi Luo Department of Statistics Yale University 2008 Workshop on Sparsity in High Dimensional
More informationEmpirical Risk Minimization
Empirical Risk Minimization Fabrice Rossi SAMM Université Paris 1 Panthéon Sorbonne 2018 Outline Introduction PAC learning ERM in practice 2 General setting Data X the input space and Y the output space
More informationRisk Bounds for CART Classifiers under a Margin Condition
arxiv:0902.3130v5 stat.ml 1 Mar 2012 Risk Bounds for CART Classifiers under a Margin Condition Servane Gey March 2, 2012 Abstract Non asymptotic risk bounds for Classification And Regression Trees (CART)
More informationComputational and Statistical Learning Theory
Computational and Statistical Learning Theory TTIC 31120 Prof. Nati Srebro Lecture 17: Stochastic Optimization Part II: Realizable vs Agnostic Rates Part III: Nearest Neighbor Classification Stochastic
More informationMulti-armed Bandit with Additional Observations
Multi-armed Bandit with Additional Observations DOGGYU YU, aver Corporation ALEXADRE PROUTIERE, KTH SUMYEOG AH, JIWOO SHI, and YUG YI, KAIST We study multi-armed bandit (MAB) problems with additional observations,
More information