Online Learning and Sequential Decision Making

Size: px
Start display at page:

Download "Online Learning and Sequential Decision Making"

Transcription

1 Online Learning and Sequential Decision Making Emilie Kaufmann CNRS & CRIStAL, Inria SequeL, Research School, ENS Lyon, Novembre 12-13th 2018 Emilie Kaufmann Online Learning 1 / 42

2 Learning make predictions (in this class) make decisions based on examples / history use observed data to adapt one s behavior on new data Emilie Kaufmann Online Learning 2 / 42

3 Part I: Online Learning Learning in a sequential way Online Convex Optimization Prediction of Individual Sequences Emilie Kaufmann Online Learning 3 / 42

4 Part I: Online Learning Learning in a sequential way Online Convex Optimization Prediction of Individual Sequences Emilie Kaufmann Online Learning 4 / 42

5 Supervised Learning Goal: given a database of labeled examples, learn how to automatically label new data smart predictions by mean of generalizing from examples Emilie Kaufmann Online Learning 5 / 42

6 Example: Spam Detection Features: Text of the Label: Spam / No Spam {0, 1} [UCI Data] Emilie Kaufmann Online Learning 6 / 42

7 Example : Handwritten Digits Recognition Features: Image of the digit Label: Digit {0, 1,..., 9} [MNIST dataset] [Demo] Emilie Kaufmann Online Learning 7 / 42

8 Example : Prediction of Claim Severity Features: Client s history / personal data Label: How much money he will cost R [Kaggle datasets] Emilie Kaufmann Online Learning 8 / 42

9 Mathematical Formalization We observe a database containing features and labels D n = {(X i, Y i )} i=1,...n X Y ( labeled examples ) Typically X = R d (features are represented by vectors) and Y = {0, 1} : binary classification 3 Y < : multi-class classification Y = R : regression The goal is to build a predictor ĝ n : X Y, which is a function that depends on the data, such that for a new observation (X, Y ) ĝ n (X) Y. Emilie Kaufmann Online Learning 9 / 42

10 Modelling assumption Probabilistic framework The training dataset D n = {(X i, Y i )} i=1,...n contains i.i.d samples whose distribution is that of a random vector (X, Y ) P. where P is a probability distribution on X Y. We want ĝ n = ĝ n (D n ) a function X Y such that for new data (X, Y ) P independent from D n the random variables ĝ n (X) and Y are close How to measure this proximity? Emilie Kaufmann Online Learning 10 / 42

11 Example: Classification Definition The risk of a classifier g : X {a 1,..., a K } is R(g) = P (g(x) Y ). Practical evaluation: we use an independent testing database i.i.d. P and compute D test m ˆR m test (ĝ n ) = 1 m m i=1 1 (Y i ĝ n(x i )). (empirical error on new data : measures generalization power) Property: E[ˆR test m (ĝ n ) D n ] = R(ĝ n ). Emilie Kaufmann Online Learning 11 / 42

12 Supervised Learning Algorithms Many algorithms! linear regression (Gauss, 1795) logistic regression (1950s) k-nearest neighbors (1960s) Decision Trees (CART, 1984) Support Vector Machines (1995) Boosting algorithms (Adaboost, 1997) Random Forest (2001) Neural Networks (1960s-80s, Deep Learning 2012-)... and more :) Emilie Kaufmann Online Learning 12 / 42

13 Supervised Learning Algorithms Many algorithms! linear regression (Gauss, 1795) logistic regression (1950s) k-nearest neighbors (1960s) Decision Trees (CART, 1984) Support Vector Machines (1995) Boosting algorithms (Adaboost, 1997) Random Forest (2001) Neural Networks (1960s-80s, Deep Learning 2012-)... and more :) Emilie Kaufmann Online Learning 12 / 42

14 One example: Logistic Regression X = R d and Y = { 1, 1} (binary classification). Model-based approach: assume the generative model is P (Y = 1 X = x) = for some regression parameter θ R d e x,θ Emilie Kaufmann Online Learning 13 / 42

15 One example: Logistic Regression X = R d and Y = { 1, 1} (binary classification). Model-based approach: assume the generative model is P (Y = 1 X = x) = for some regression parameter θ R d e x,θ Optimal prediction: if θ is known, the optimal predictor is { g 1 if x, θ 0 (x) = sgn ( x, θ ) = 1 if x, θ < 0 Learning from data: compute ˆθ n the maximum likelihood estimator of the unknown parameter θ, and predict ( ) ĝ n (x) = sgn x, ˆθ n. Emilie Kaufmann Online Learning 13 / 42

16 One example: Logistic regression The likelihood of the observations in D n = {(X i, Y i ) 1 i n } is n l(d n ; θ) = P(Y = Y i X = X i )f X (X i ) i=1 = i:y i = e X i,θ i:y i = 1 ˆθ n argmax l(d n ; θ) θ R d ˆθ n argmin ln (1 ) + e X i,θ + θ R d i:y i =1 e X i,θ 1 + e X i,θ f X(X i ) i:y i = 1 ln (1 ) + e X i,θ Emilie Kaufmann Online Learning 14 / 42

17 One example: Logistic regression The likelihood of the observations in D n = {(X i, Y i ) 1 i n } is n l(d n ; θ) = P(Y = Y i X = X i )f X (X i ) i=1 = i:y i = e X i,θ i:y i = e X i,θ f X(X i ) ˆθ n argmax l(d n ; θ) θ R d ˆθ n argmin ln (1 ) + e X i,θ + θ R d i:y i =1 i:y i = 1 ln (1 ) + e X i,θ Emilie Kaufmann Online Learning 14 / 42

18 One example: Logistic regression The likelihood of the observations in D n = {(X i, Y i ) 1 i n } is n l(d n ; θ) = P(Y = Y i X = X i )f X (X i ) i=1 = i:y i = e X i,θ i:y i = e X i,θ f X(X i ) ˆθ n argmax l(d n ; θ) θ R d ˆθ n argmin ln (1 ) + e X i,θ + θ R d i:y i =1 ˆθ n argmin θ R d n i=1 i:y i = 1 ln ln (1 ) + e Y i X i,θ (1 ) + e X i,θ Emilie Kaufmann Online Learning 14 / 42

19 X = R 2, Y = { 1, +1} A linear separator ĝ n (x) = sgn ( ) x, ˆθ n. Emilie Kaufmann Online Learning 15 / 42

20 References available online [here] Emilie Kaufmann Online Learning 16 / 42

21 Outline Learning in a sequential way Online Convex Optimization Prediction of Individual Sequences Emilie Kaufmann Online Learning 17 / 42

22 Batch versus Online Supervised Learning: Based on a large database (batch), predict the label of new data Online Learning : Data is collected sequentially, and we have to predict their label one-by-one (online), after which the true label is revealed. Examples: predict the value of a stock predict electricity consumption for the next day predict the behavior of a customer... Emilie Kaufmann Online Learning 18 / 42

23 Online Learning Online Learning: general framework At every time step t = 1,..., T, 1. observe (features) x t X 2. predict (label) ŷ t Y 3. y t is revealed and we suffer a loss l(y t, ŷ t ). Goal: Minimize the cumulated loss T l(y t, ŷ t ) t=1 We can compare our performance to that of the best predictor in a family G that of ( black-box ) experts that propose predictions Emilie Kaufmann Online Learning 19 / 42

24 Outline Learning in a sequential way Online Convex Optimization Prediction of Individual Sequences Emilie Kaufmann Online Learning 20 / 42

25 Online Learning: example A particular Online Learning problem Let G be a class of predictors. A each time step t = 1,..., T, 1. choose a predictor g t G 2. observe x t X and predict ŷ t = g t (x t ) 3. y t is observed and we suffer a loss l(y t, ŷ t ). Goal: Minimize the regret, i.e. the difference between our cumulative loss and the cumulative loss of the best predictor in G: R T = T t=1 l(y t, g t (x t )) min g G T l(y t, g(x t )). t=1 Emilie Kaufmann Online Learning 21 / 42

26 Online Learning: example A particular Online Learning problem Let G be a class of predictors. A each time step t = 1,..., T, 1. choose a predictor g t G 2. observe x t X and predict ŷ t = g t (x t ) 3. y t is observed and we suffer a loss l(y t, ŷ t ). Example: X = R d, Y = { 1, 1}. G is the set of linear separators G = {g(x) = sgn( x, θ ), θ R d} thus there exists θ t R d such that g t (x) = θ t, x l is the logistic loss l (y t, ŷ t ) = ln (1 + e yt θt,xt ) Emilie Kaufmann Online Learning 22 / 42

27 Online Convex Optimization: example A particular Online Learning problem A each time step t = 1,..., T, 1. choose a vector θ t R d 2. the loss function l t (θ) = ln ( 1 + e yt θ,xt ) is revealed 3. we suffer a loss l t (θ t ). Example: X = R d, Y = { 1, 1}. G is the set of linear separators G = {g(x) = sgn( x, θ ), θ R d} thus there exists θ t R d such that g t (x) = θ t, x l is the logistic loss l t (θ t ) = ln (1 + e yt θt,xt ) Emilie Kaufmann Online Learning 23 / 42

28 Online Convex Optimization: example A particular Online Learning problem A each time step t = 1,..., T, 1. choose a vector θ t R d 2. the loss function l t (θ) = ln ( 1 + e yt θ,xt ) is revealed 3. we suffer a loss l t (θ t ). Goal: the regret that we should minimize rewrites R T = T t=1 ln (1 + e yt θt,xt ) } {{ } loss obtained by updating our predictor in an online fashion min T θ R d t=1 we want to perform online logistic regression ln (1 + e yt θ,xt ) } {{ } loss obtained by the logistic regression classifier trained with the whole dataset Emilie Kaufmann Online Learning 24 / 42

29 Online Convex Optimization: general framework Online Convex Optimization A each time step t = 1,..., T, 1. choose θ t K, a convex set 2. a convex loss function l t (θ) is revealed 3. we suffer a loss l t (θ t ). Goal: minimize the regret R T = T l t (θ t ) t=1 }{{} loss obtained by updating θ in an online fashion min θ R d t=1 T l t (θ) }{{} loss obtained by the best static choice of θ Emilie Kaufmann Online Learning 25 / 42

30 Online Gradient Descent Online (Projected) Gradient Descent { θ0 = 0 θ t+1 = Π K (θ t η l t (θ t )) where Π K (x) = argmin u K x u is the projection on K. Theorem [e.g., Theorem 3.2 in Bubeck 2015] If l t (θ) L and θ R for all θ K, then R T = max θ K T t=1 Corollary: for the choice η T = (l t (θ t ) l t (θ)) R2 2η + ηl2 T 2 R L T, we obtain R T RL T Emilie Kaufmann Online Learning 26 / 42

31 Proof in 3 steps 1. Exploit the convexity of l t l t (θ t ) l t (θ) θ t θ l t (θ t ) 2. Use the identity 2 a b = a 2 + b 2 a b 2 to obtain θ t θ l t (θ t ) θ t θ 2 θ t+1 θ 2 2η + η 2 l t(θ t ) 2 3. Conclude by summing T RT θ θ t θ l t (θ t ) t=1 1 2η T ( θt θ 2 θ t+1 θ 2) + η 2 LT t=1 R2 2η + η 2 LT. Emilie Kaufmann Online Learning 27 / 42

32 References [The OCO book] [Convex Optimization] Emilie Kaufmann Online Learning 28 / 42

33 Outline Learning in a sequential way Online Convex Optimization Prediction of Individual Sequences Emilie Kaufmann Online Learning 29 / 42

34 Prediction with expert advice we want to sequentially predict some phenomenon (market, weather, energy cunsumption...) no probabilistic hypothesis is made about this phenomenon we rely on experts (black boxes) ± good we want to be at least as good as the best expert Emilie Kaufmann Online Learning 30 / 42

35 A prediction game K experts. Prediction space Y. Loss function l : Y Y R +. Prediction with Expert Advice At each time step t = 1,..., T, 1. each expert k makes a prediction z k,t Y (that we observe) 2. we predict ŷ t Y 3. y t is revealed and we suffer a loss l(ŷ t, y t ). Expert k suffers a loss l(z k,t, y t ). Remark: experts may exploit the knowledge of some feature vector x t X to make their predictions. Emilie Kaufmann Online Learning 31 / 42

36 A prediction game K experts. Prediction space Y. Loss function l : Y Y R +. Prediction with Expert Advice At each time step t = 1,..., T, 1. each expert k makes a prediction z k,t Y (that we observe) 2. we predict ŷ t Y 3. y t is revealed and we suffer a loss l(ŷ t, y t ). Expert k suffers a loss l k,t := l(z k,t, y t ). ˆL T = T t=1 l(ŷ t, y t ): cumulative loss of our prediction strategy L k,t = T t=1 l k,t: cumulative loss of expert k Goal: a small regret regret : R T = ˆL T min L k,t k K }{{} cumulative loss of the best expert Emilie Kaufmann Online Learning 32 / 42

37 Weighted (Average) Prediction Principle: assign a weight w k,t for expert k at round t and predict a weighted average of the experts predictions. First idea: ŷ t = K k=1 w k,tz k,t K k=1 w k,t = ( ) K w k,t K i=1 w z k,t. i,t k=1 the prediction of experts with large weights matter more we should assign larger weights to good experts Emilie Kaufmann Online Learning 33 / 42

38 Weighted (Average) Prediction Principle: assign a weight w k,t for expert k at round t and predict a weighted average of the experts predictions. First idea: ŷ t = K k=1 w k,tz k,t K k=1 w k,t = ( ) K w k,t K i=1 w z k,t. i,t k=1 the prediction of experts with large weights matter more we should assign larger weights to good experts ŷ t might not be in Y if Y is not convex... Emilie Kaufmann Online Learning 33 / 42

39 Weighted (Average) Prediction Principle: assign a weight w k,t for expert k at round t and predict a weighted average of the experts predictions. Second idea: Compute the probability vector p t = (p 1,t,..., p K,t ) where p k,t := w k,t K i=1 w, i,t select an expert k t p t, i.e. P(k t = k) = p k,t, and predict ŷ t = z kt,t Y. experts with large weights are more likely to be selected we should assign larger weights to good experts Emilie Kaufmann Online Learning 34 / 42

40 How to choose the weights? The weights should depend on the quality of the expert in the past. For example, w k,t = F (L k,t 1 ) with F some decreasing function. Typical choice: F (x) = exp( ηx). Emilie Kaufmann Online Learning 35 / 42

41 Exponentially Weighted Forecaster EWF(η) algorithm (or Hedge) Parameter: η > 0. Initialization: for all k {1,..., K}, w k,1 = 1 K. For t = 1,..., T 1. Observe the experts predictions: (z k,t ) 1 k K 2. Compute the probability vector p t = (p 1,t,..., p K,t ) where p k,t = w k,t K i=1 w (normalize the weights) i,t 3. Select an expert k t p t, i.e., P (k t = k) = p k,t 4. Predict ŷ t = z kt,t and observe the losses 5. Update the weights: l k,t pour tout k {1,..., K} k {1,..., K}, w k,t+1 = w k,t exp ( ηl k,t ). Emilie Kaufmann Online Learning 36 / 42

42 Analysis of EWF As the algorithm is randomized, we consider the expected regret [ T ] T E[R T ] = E l kt,t min l k,t. t=1 k {1,...,K} t=1 Theorem (e.g., Cesa-Bianchi and Lugosi 06) Assume that for all k, t, 0 l k,t 1. Then for all η > 0 and T 0, EWF(η) satisfies E[R T ] ln(k) + ηt η 8 Proof: on the board Emilie Kaufmann Online Learning 37 / 42

43 Analysis of EWF Theorem Choosing η T = 8 ln(k) T, EWF(η T ) satisfies T ln(k) E[R T ] 2 Remarks: η can also be chosen without the knowledge of the horizon T with similar regret guarantees (up to a constant factor): 8 ln(k) η t = t if Y is convex, one can replace randomization by actual average, with the same regret guarantees Exponentially Weighted Average (EWA) Emilie Kaufmann Online Learning 38 / 42

44 EWA(η) algorithm Exponentially Weighted Average Parameter: η > 0. Initialization: for all k {1,..., K}, w k,1 = 1 K. For t = 1,..., T 1. Observe the experts predictions: (z k,t ) 1 k K 2. Compute the probability vector p t = (p 1,t,..., p K,t ) where p k,t = w k,t K i=1 w (normalize the weights) i,t 3. Predict ŷ t = K k=1 p k,tz kt,t and observe the losses 4. Update the weights: l k,t pour tout k {1,..., K} k {1,..., K}, w k,t+1 = w k,t exp ( ηl k,t ). Emilie Kaufmann Online Learning 39 / 42

45 Possible extensions Other notions of regret compare to the best convex combination of experts switching regret: compare to changing experts Partial information : so far we assumed that we observe the losses of all experts (full information) partial information: we only observe a subset of the (l k,t ) k bandit information: we only observe the loss of the chosen expert, l kt,t Bandit information: Choosing an expert has consequences on the loss received but also on the information gathered. Emilie Kaufmann Online Learning 40 / 42

46 EWF becomes EXP3 EXP3 (Explore, Exploit and Exponential Weights) Parameter: η > 0. Initialization: for all k {1,..., K}, w k,1 = 1 K. For t = 1,..., T 1. Observe the experts predictions: (z k,t ) 1 k K 2. Compute the probability vector p t = (p 1,t,..., p K,t ) where p k,t = w k,t K i=1 w (normalize the weights) i,t 3. Select an expert k t p t, i.e., P (k t = k) = p k,t 4. Predict ŷ t = z kt,t and observe l kt,t 5. Compute estimates of the unobserved losses: l k,t = l k,t p k,t 1 (kt=k) 6. Update the weights: k, w k,t+1 = w k,t exp ( η l k,t ). Emilie Kaufmann Online Learning 41 / 42

47 Reference [Prediction, Learning and Games] Emilie Kaufmann Online Learning 42 / 42

Full-information Online Learning

Full-information Online Learning Introduction Expert Advice OCO LM A DA NANJING UNIVERSITY Full-information Lijun Zhang Nanjing University, China June 2, 2017 Outline Introduction Expert Advice OCO 1 Introduction Definitions Regret 2

More information

CSCI-567: Machine Learning (Spring 2019)

CSCI-567: Machine Learning (Spring 2019) CSCI-567: Machine Learning (Spring 2019) Prof. Victor Adamchik U of Southern California Mar. 19, 2019 March 19, 2019 1 / 43 Administration March 19, 2019 2 / 43 Administration TA3 is due this week March

More information

Online Learning and Sequential Decision Making

Online Learning and Sequential Decision Making Online Learning and Sequential Decision Making Emilie Kaufmann CNRS & CRIStAL, Inria SequeL, emilie.kaufmann@univ-lille.fr Research School, ENS Lyon, Novembre 12-13th 2018 Emilie Kaufmann Sequential Decision

More information

The No-Regret Framework for Online Learning

The No-Regret Framework for Online Learning The No-Regret Framework for Online Learning A Tutorial Introduction Nahum Shimkin Technion Israel Institute of Technology Haifa, Israel Stochastic Processes in Engineering IIT Mumbai, March 2013 N. Shimkin,

More information

Online Learning and Online Convex Optimization

Online Learning and Online Convex Optimization Online Learning and Online Convex Optimization Nicolò Cesa-Bianchi Università degli Studi di Milano N. Cesa-Bianchi (UNIMI) Online Learning 1 / 49 Summary 1 My beautiful regret 2 A supposedly fun game

More information

Lecture 16: Perceptron and Exponential Weights Algorithm

Lecture 16: Perceptron and Exponential Weights Algorithm EECS 598-005: Theoretical Foundations of Machine Learning Fall 2015 Lecture 16: Perceptron and Exponential Weights Algorithm Lecturer: Jacob Abernethy Scribes: Yue Wang, Editors: Weiqing Yu and Andrew

More information

The Online Approach to Machine Learning

The Online Approach to Machine Learning The Online Approach to Machine Learning Nicolò Cesa-Bianchi Università degli Studi di Milano N. Cesa-Bianchi (UNIMI) Online Approach to ML 1 / 53 Summary 1 My beautiful regret 2 A supposedly fun game I

More information

Boosting: Foundations and Algorithms. Rob Schapire

Boosting: Foundations and Algorithms. Rob Schapire Boosting: Foundations and Algorithms Rob Schapire Example: Spam Filtering problem: filter out spam (junk email) gather large collection of examples of spam and non-spam: From: yoav@ucsd.edu Rob, can you

More information

Classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2012

Classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2012 Classification CE-717: Machine Learning Sharif University of Technology M. Soleymani Fall 2012 Topics Discriminant functions Logistic regression Perceptron Generative models Generative vs. discriminative

More information

COMS 4771 Lecture Boosting 1 / 16

COMS 4771 Lecture Boosting 1 / 16 COMS 4771 Lecture 12 1. Boosting 1 / 16 Boosting What is boosting? Boosting: Using a learning algorithm that provides rough rules-of-thumb to construct a very accurate predictor. 3 / 16 What is boosting?

More information

Optimal and Adaptive Online Learning

Optimal and Adaptive Online Learning Optimal and Adaptive Online Learning Haipeng Luo Advisor: Robert Schapire Computer Science Department Princeton University Examples of Online Learning (a) Spam detection 2 / 34 Examples of Online Learning

More information

Forecasting the electricity consumption by aggregating specialized experts

Forecasting the electricity consumption by aggregating specialized experts Forecasting the electricity consumption by aggregating specialized experts Pierre Gaillard (EDF R&D, ENS Paris) with Yannig Goude (EDF R&D) Gilles Stoltz (CNRS, ENS Paris, HEC Paris) June 2013 WIPFOR Goal

More information

Applications of on-line prediction. in telecommunication problems

Applications of on-line prediction. in telecommunication problems Applications of on-line prediction in telecommunication problems Gábor Lugosi Pompeu Fabra University, Barcelona based on joint work with András György and Tamás Linder 1 Outline On-line prediction; Some

More information

A Second-order Bound with Excess Losses

A Second-order Bound with Excess Losses A Second-order Bound with Excess Losses Pierre Gaillard 12 Gilles Stoltz 2 Tim van Erven 3 1 EDF R&D, Clamart, France 2 GREGHEC: HEC Paris CNRS, Jouy-en-Josas, France 3 Leiden University, the Netherlands

More information

From Bandits to Experts: A Tale of Domination and Independence

From Bandits to Experts: A Tale of Domination and Independence From Bandits to Experts: A Tale of Domination and Independence Nicolò Cesa-Bianchi Università degli Studi di Milano N. Cesa-Bianchi (UNIMI) Domination and Independence 1 / 1 From Bandits to Experts: A

More information

Lecture 2 Machine Learning Review

Lecture 2 Machine Learning Review Lecture 2 Machine Learning Review CMSC 35246: Deep Learning Shubhendu Trivedi & Risi Kondor University of Chicago March 29, 2017 Things we will look at today Formal Setup for Supervised Learning Things

More information

Machine Learning and Deep Learning! Vincent Lepetit!

Machine Learning and Deep Learning! Vincent Lepetit! Machine Learning and Deep Learning!! Vincent Lepetit! 1! What is Machine Learning?! 2! Hand-Written Digit Recognition! 2 9 3! Hand-Written Digit Recognition! Formalization! 0 1 x = @ A Images are 28x28

More information

Learning theory. Ensemble methods. Boosting. Boosting: history

Learning theory. Ensemble methods. Boosting. Boosting: history Learning theory Probability distribution P over X {0, 1}; let (X, Y ) P. We get S := {(x i, y i )} n i=1, an iid sample from P. Ensemble methods Goal: Fix ɛ, δ (0, 1). With probability at least 1 δ (over

More information

Statistical Machine Learning Theory. From Multi-class Classification to Structured Output Prediction. Hisashi Kashima.

Statistical Machine Learning Theory. From Multi-class Classification to Structured Output Prediction. Hisashi Kashima. http://goo.gl/jv7vj9 Course website KYOTO UNIVERSITY Statistical Machine Learning Theory From Multi-class Classification to Structured Output Prediction Hisashi Kashima kashima@i.kyoto-u.ac.jp DEPARTMENT

More information

ECE 5424: Introduction to Machine Learning

ECE 5424: Introduction to Machine Learning ECE 5424: Introduction to Machine Learning Topics: Ensemble Methods: Bagging, Boosting PAC Learning Readings: Murphy 16.4;; Hastie 16 Stefan Lee Virginia Tech Fighting the bias-variance tradeoff Simple

More information

Machine Learning Basics Lecture 2: Linear Classification. Princeton University COS 495 Instructor: Yingyu Liang

Machine Learning Basics Lecture 2: Linear Classification. Princeton University COS 495 Instructor: Yingyu Liang Machine Learning Basics Lecture 2: Linear Classification Princeton University COS 495 Instructor: Yingyu Liang Review: machine learning basics Math formulation Given training data x i, y i : 1 i n i.i.d.

More information

CS229 Supplemental Lecture notes

CS229 Supplemental Lecture notes CS229 Supplemental Lecture notes John Duchi Binary classification In binary classification problems, the target y can take on at only two values. In this set of notes, we show how to model this problem

More information

1 Overview. 2 Learning from Experts. 2.1 Defining a meaningful benchmark. AM 221: Advanced Optimization Spring 2016

1 Overview. 2 Learning from Experts. 2.1 Defining a meaningful benchmark. AM 221: Advanced Optimization Spring 2016 AM 1: Advanced Optimization Spring 016 Prof. Yaron Singer Lecture 11 March 3rd 1 Overview In this lecture we will introduce the notion of online convex optimization. This is an extremely useful framework

More information

Machine Learning Basics Lecture 7: Multiclass Classification. Princeton University COS 495 Instructor: Yingyu Liang

Machine Learning Basics Lecture 7: Multiclass Classification. Princeton University COS 495 Instructor: Yingyu Liang Machine Learning Basics Lecture 7: Multiclass Classification Princeton University COS 495 Instructor: Yingyu Liang Example: image classification indoor Indoor outdoor Example: image classification (multiclass)

More information

Machine Learning Theory (CS 6783)

Machine Learning Theory (CS 6783) Machine Learning Theory (CS 6783) Tu-Th 1:25 to 2:40 PM Hollister, 306 Instructor : Karthik Sridharan ABOUT THE COURSE No exams! 5 assignments that count towards your grades (55%) One term project (40%)

More information

Chapter 14 Combining Models

Chapter 14 Combining Models Chapter 14 Combining Models T-61.62 Special Course II: Pattern Recognition and Machine Learning Spring 27 Laboratory of Computer and Information Science TKK April 3th 27 Outline Independent Mixing Coefficients

More information

Lecture 8. Instructor: Haipeng Luo

Lecture 8. Instructor: Haipeng Luo Lecture 8 Instructor: Haipeng Luo Boosting and AdaBoost In this lecture we discuss the connection between boosting and online learning. Boosting is not only one of the most fundamental theories in machine

More information

ECE 5984: Introduction to Machine Learning

ECE 5984: Introduction to Machine Learning ECE 5984: Introduction to Machine Learning Topics: Ensemble Methods: Bagging, Boosting Readings: Murphy 16.4; Hastie 16 Dhruv Batra Virginia Tech Administrativia HW3 Due: April 14, 11:55pm You will implement

More information

Binary Classification / Perceptron

Binary Classification / Perceptron Binary Classification / Perceptron Nicholas Ruozzi University of Texas at Dallas Slides adapted from David Sontag and Vibhav Gogate Supervised Learning Input: x 1, y 1,, (x n, y n ) x i is the i th data

More information

Bandit models: a tutorial

Bandit models: a tutorial Gdt COS, December 3rd, 2015 Multi-Armed Bandit model: general setting K arms: for a {1,..., K}, (X a,t ) t N is a stochastic process. (unknown distributions) Bandit game: a each round t, an agent chooses

More information

Machine Learning for NLP

Machine Learning for NLP Machine Learning for NLP Linear Models Joakim Nivre Uppsala University Department of Linguistics and Philology Slides adapted from Ryan McDonald, Google Research Machine Learning for NLP 1(26) Outline

More information

Logistic Regression. Machine Learning Fall 2018

Logistic Regression. Machine Learning Fall 2018 Logistic Regression Machine Learning Fall 2018 1 Where are e? We have seen the folloing ideas Linear models Learning as loss minimization Bayesian learning criteria (MAP and MLE estimation) The Naïve Bayes

More information

Statistical Data Mining and Machine Learning Hilary Term 2016

Statistical Data Mining and Machine Learning Hilary Term 2016 Statistical Data Mining and Machine Learning Hilary Term 2016 Dino Sejdinovic Department of Statistics Oxford Slides and other materials available at: http://www.stats.ox.ac.uk/~sejdinov/sdmml Naïve Bayes

More information

Logistic Regression. COMP 527 Danushka Bollegala

Logistic Regression. COMP 527 Danushka Bollegala Logistic Regression COMP 527 Danushka Bollegala Binary Classification Given an instance x we must classify it to either positive (1) or negative (0) class We can use {1,-1} instead of {1,0} but we will

More information

Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function.

Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function. Bayesian learning: Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function. Let y be the true label and y be the predicted

More information

Statistical Machine Learning Theory. From Multi-class Classification to Structured Output Prediction. Hisashi Kashima.

Statistical Machine Learning Theory. From Multi-class Classification to Structured Output Prediction. Hisashi Kashima. http://goo.gl/xilnmn Course website KYOTO UNIVERSITY Statistical Machine Learning Theory From Multi-class Classification to Structured Output Prediction Hisashi Kashima kashima@i.kyoto-u.ac.jp DEPARTMENT

More information

Support Vector Machines

Support Vector Machines Support Vector Machines Le Song Machine Learning I CSE 6740, Fall 2013 Naïve Bayes classifier Still use Bayes decision rule for classification P y x = P x y P y P x But assume p x y = 1 is fully factorized

More information

Introduction to Machine Learning

Introduction to Machine Learning Introduction to Machine Learning Machine Learning: Jordan Boyd-Graber University of Maryland LOGISTIC REGRESSION FROM TEXT Slides adapted from Emily Fox Machine Learning: Jordan Boyd-Graber UMD Introduction

More information

Qualifying Exam in Machine Learning

Qualifying Exam in Machine Learning Qualifying Exam in Machine Learning October 20, 2009 Instructions: Answer two out of the three questions in Part 1. In addition, answer two out of three questions in two additional parts (choose two parts

More information

Multi-armed bandit models: a tutorial

Multi-armed bandit models: a tutorial Multi-armed bandit models: a tutorial CERMICS seminar, March 30th, 2016 Multi-Armed Bandit model: general setting K arms: for a {1,..., K}, (X a,t ) t N is a stochastic process. (unknown distributions)

More information

Classification: Logistic Regression from Data

Classification: Logistic Regression from Data Classification: Logistic Regression from Data Machine Learning: Jordan Boyd-Graber University of Colorado Boulder LECTURE 3 Slides adapted from Emily Fox Machine Learning: Jordan Boyd-Graber Boulder Classification:

More information

Yevgeny Seldin. University of Copenhagen

Yevgeny Seldin. University of Copenhagen Yevgeny Seldin University of Copenhagen Classical (Batch) Machine Learning Collect Data Data Assumption The samples are independent identically distributed (i.i.d.) Machine Learning Prediction rule New

More information

Naïve Bayes classification. p ij 11/15/16. Probability theory. Probability theory. Probability theory. X P (X = x i )=1 i. Marginal Probability

Naïve Bayes classification. p ij 11/15/16. Probability theory. Probability theory. Probability theory. X P (X = x i )=1 i. Marginal Probability Probability theory Naïve Bayes classification Random variable: a variable whose possible values are numerical outcomes of a random phenomenon. s: A person s height, the outcome of a coin toss Distinguish

More information

Warm up: risk prediction with logistic regression

Warm up: risk prediction with logistic regression Warm up: risk prediction with logistic regression Boss gives you a bunch of data on loans defaulting or not: {(x i,y i )} n i= x i 2 R d, y i 2 {, } You model the data as: P (Y = y x, w) = + exp( yw T

More information

U Logo Use Guidelines

U Logo Use Guidelines Information Theory Lecture 3: Applications to Machine Learning U Logo Use Guidelines Mark Reid logo is a contemporary n of our heritage. presents our name, d and our motto: arn the nature of things. authenticity

More information

Naïve Bayes classification

Naïve Bayes classification Naïve Bayes classification 1 Probability theory Random variable: a variable whose possible values are numerical outcomes of a random phenomenon. Examples: A person s height, the outcome of a coin toss

More information

Stochastic Gradient Descent

Stochastic Gradient Descent Stochastic Gradient Descent Machine Learning CSE546 Carlos Guestrin University of Washington October 9, 2013 1 Logistic Regression Logistic function (or Sigmoid): Learn P(Y X) directly Assume a particular

More information

Empirical Risk Minimization

Empirical Risk Minimization Empirical Risk Minimization Fabrice Rossi SAMM Université Paris 1 Panthéon Sorbonne 2018 Outline Introduction PAC learning ERM in practice 2 General setting Data X the input space and Y the output space

More information

Online Prediction: Bayes versus Experts

Online Prediction: Bayes versus Experts Marcus Hutter - 1 - Online Prediction Bayes versus Experts Online Prediction: Bayes versus Experts Marcus Hutter Istituto Dalle Molle di Studi sull Intelligenza Artificiale IDSIA, Galleria 2, CH-6928 Manno-Lugano,

More information

Learning Methods for Online Prediction Problems. Peter Bartlett Statistics and EECS UC Berkeley

Learning Methods for Online Prediction Problems. Peter Bartlett Statistics and EECS UC Berkeley Learning Methods for Online Prediction Problems Peter Bartlett Statistics and EECS UC Berkeley Course Synopsis A finite comparison class: A = {1,..., m}. Converting online to batch. Online convex optimization.

More information

Classification: Logistic Regression from Data

Classification: Logistic Regression from Data Classification: Logistic Regression from Data Machine Learning: Alvin Grissom II University of Colorado Boulder Slides adapted from Emily Fox Machine Learning: Alvin Grissom II Boulder Classification:

More information

MIDTERM SOLUTIONS: FALL 2012 CS 6375 INSTRUCTOR: VIBHAV GOGATE

MIDTERM SOLUTIONS: FALL 2012 CS 6375 INSTRUCTOR: VIBHAV GOGATE MIDTERM SOLUTIONS: FALL 2012 CS 6375 INSTRUCTOR: VIBHAV GOGATE March 28, 2012 The exam is closed book. You are allowed a double sided one page cheat sheet. Answer the questions in the spaces provided on

More information

Introduction to Logistic Regression and Support Vector Machine

Introduction to Logistic Regression and Support Vector Machine Introduction to Logistic Regression and Support Vector Machine guest lecturer: Ming-Wei Chang CS 446 Fall, 2009 () / 25 Fall, 2009 / 25 Before we start () 2 / 25 Fall, 2009 2 / 25 Before we start Feel

More information

Tutorial: PART 1. Online Convex Optimization, A Game- Theoretic Approach to Learning.

Tutorial: PART 1. Online Convex Optimization, A Game- Theoretic Approach to Learning. Tutorial: PART 1 Online Convex Optimization, A Game- Theoretic Approach to Learning http://www.cs.princeton.edu/~ehazan/tutorial/tutorial.htm Elad Hazan Princeton University Satyen Kale Yahoo Research

More information

Machine Learning 2017

Machine Learning 2017 Machine Learning 2017 Volker Roth Department of Mathematics & Computer Science University of Basel 21st March 2017 Volker Roth (University of Basel) Machine Learning 2017 21st March 2017 1 / 41 Section

More information

A Magiv CV Theory for Large-Margin Classifiers

A Magiv CV Theory for Large-Margin Classifiers A Magiv CV Theory for Large-Margin Classifiers Hui Zou School of Statistics, University of Minnesota June 30, 2018 Joint work with Boxiang Wang Outline 1 Background 2 Magic CV formula 3 Magic support vector

More information

ECS171: Machine Learning

ECS171: Machine Learning ECS171: Machine Learning Lecture 3: Linear Models I (LFD 3.2, 3.3) Cho-Jui Hsieh UC Davis Jan 17, 2018 Linear Regression (LFD 3.2) Regression Classification: Customer record Yes/No Regression: predicting

More information

Online Advertising is Big Business

Online Advertising is Big Business Online Advertising Online Advertising is Big Business Multiple billion dollar industry $43B in 2013 in USA, 17% increase over 2012 [PWC, Internet Advertising Bureau, April 2013] Higher revenue in USA

More information

Agnostic Online learnability

Agnostic Online learnability Technical Report TTIC-TR-2008-2 October 2008 Agnostic Online learnability Shai Shalev-Shwartz Toyota Technological Institute Chicago shai@tti-c.org ABSTRACT We study a fundamental question. What classes

More information

Introduction to Machine Learning

Introduction to Machine Learning Introduction to Machine Learning Thomas G. Dietterich tgd@eecs.oregonstate.edu 1 Outline What is Machine Learning? Introduction to Supervised Learning: Linear Methods Overfitting, Regularization, and the

More information

Neural Networks with Applications to Vision and Language. Feedforward Networks. Marco Kuhlmann

Neural Networks with Applications to Vision and Language. Feedforward Networks. Marco Kuhlmann Neural Networks with Applications to Vision and Language Feedforward Networks Marco Kuhlmann Feedforward networks Linear separability x 2 x 2 0 1 0 1 0 0 x 1 1 0 x 1 linearly separable not linearly separable

More information

Online Convex Optimization

Online Convex Optimization Advanced Course in Machine Learning Spring 2010 Online Convex Optimization Handouts are jointly prepared by Shie Mannor and Shai Shalev-Shwartz A convex repeated game is a two players game that is performed

More information

Perceptron Mistake Bounds

Perceptron Mistake Bounds Perceptron Mistake Bounds Mehryar Mohri, and Afshin Rostamizadeh Google Research Courant Institute of Mathematical Sciences Abstract. We present a brief survey of existing mistake bounds and introduce

More information

Consistency of Nearest Neighbor Methods

Consistency of Nearest Neighbor Methods E0 370 Statistical Learning Theory Lecture 16 Oct 25, 2011 Consistency of Nearest Neighbor Methods Lecturer: Shivani Agarwal Scribe: Arun Rajkumar 1 Introduction In this lecture we return to the study

More information

15-388/688 - Practical Data Science: Decision trees and interpretable models. J. Zico Kolter Carnegie Mellon University Spring 2018

15-388/688 - Practical Data Science: Decision trees and interpretable models. J. Zico Kolter Carnegie Mellon University Spring 2018 15-388/688 - Practical Data Science: Decision trees and interpretable models J. Zico Kolter Carnegie Mellon University Spring 2018 1 Outline Decision trees Training (classification) decision trees Interpreting

More information

Least Squares Regression

Least Squares Regression CIS 50: Machine Learning Spring 08: Lecture 4 Least Squares Regression Lecturer: Shivani Agarwal Disclaimer: These notes are designed to be a supplement to the lecture. They may or may not cover all the

More information

Learning Methods for Online Prediction Problems. Peter Bartlett Statistics and EECS UC Berkeley

Learning Methods for Online Prediction Problems. Peter Bartlett Statistics and EECS UC Berkeley Learning Methods for Online Prediction Problems Peter Bartlett Statistics and EECS UC Berkeley Course Synopsis A finite comparison class: A = {1,..., m}. 1. Prediction with expert advice. 2. With perfect

More information

Last updated: Oct 22, 2012 LINEAR CLASSIFIERS. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition

Last updated: Oct 22, 2012 LINEAR CLASSIFIERS. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition Last updated: Oct 22, 2012 LINEAR CLASSIFIERS Problems 2 Please do Problem 8.3 in the textbook. We will discuss this in class. Classification: Problem Statement 3 In regression, we are modeling the relationship

More information

CS229 Supplemental Lecture notes

CS229 Supplemental Lecture notes CS229 Supplemental Lecture notes John Duchi 1 Boosting We have seen so far how to solve classification (and other) problems when we have a data representation already chosen. We now talk about a procedure,

More information

Generalization, Overfitting, and Model Selection

Generalization, Overfitting, and Model Selection Generalization, Overfitting, and Model Selection Sample Complexity Results for Supervised Classification Maria-Florina (Nina) Balcan 10/03/2016 Two Core Aspects of Machine Learning Algorithm Design. How

More information

I D I A P. Online Policy Adaptation for Ensemble Classifiers R E S E A R C H R E P O R T. Samy Bengio b. Christos Dimitrakakis a IDIAP RR 03-69

I D I A P. Online Policy Adaptation for Ensemble Classifiers R E S E A R C H R E P O R T. Samy Bengio b. Christos Dimitrakakis a IDIAP RR 03-69 R E S E A R C H R E P O R T Online Policy Adaptation for Ensemble Classifiers Christos Dimitrakakis a IDIAP RR 03-69 Samy Bengio b I D I A P December 2003 D a l l e M o l l e I n s t i t u t e for Perceptual

More information

Optimal and Adaptive Algorithms for Online Boosting

Optimal and Adaptive Algorithms for Online Boosting Optimal and Adaptive Algorithms for Online Boosting Alina Beygelzimer 1 Satyen Kale 1 Haipeng Luo 2 1 Yahoo! Labs, NYC 2 Computer Science Department, Princeton University Jul 8, 2015 Boosting: An Example

More information

VBM683 Machine Learning

VBM683 Machine Learning VBM683 Machine Learning Pinar Duygulu Slides are adapted from Dhruv Batra Bias is the algorithm's tendency to consistently learn the wrong thing by not taking into account all the information in the data

More information

Learning, Games, and Networks

Learning, Games, and Networks Learning, Games, and Networks Abhishek Sinha Laboratory for Information and Decision Systems MIT ML Talk Series @CNRG December 12, 2016 1 / 44 Outline 1 Prediction With Experts Advice 2 Application to

More information

Linear Methods for Classification

Linear Methods for Classification Linear Methods for Classification Department of Statistics The Pennsylvania State University Email: jiali@stat.psu.edu Classification Supervised learning Training data: {(x 1, g 1 ), (x 2, g 2 ),..., (x

More information

Statistical Machine Learning from Data

Statistical Machine Learning from Data Samy Bengio Statistical Machine Learning from Data 1 Statistical Machine Learning from Data Ensembles Samy Bengio IDIAP Research Institute, Martigny, Switzerland, and Ecole Polytechnique Fédérale de Lausanne

More information

Apprentissage, réseaux de neurones et modèles graphiques (RCP209) Neural Networks and Deep Learning

Apprentissage, réseaux de neurones et modèles graphiques (RCP209) Neural Networks and Deep Learning Apprentissage, réseaux de neurones et modèles graphiques (RCP209) Neural Networks and Deep Learning Nicolas Thome Prenom.Nom@cnam.fr http://cedric.cnam.fr/vertigo/cours/ml2/ Département Informatique Conservatoire

More information

Loss Functions, Decision Theory, and Linear Models

Loss Functions, Decision Theory, and Linear Models Loss Functions, Decision Theory, and Linear Models CMSC 678 UMBC January 31 st, 2018 Some slides adapted from Hamed Pirsiavash Logistics Recap Piazza (ask & answer questions): https://piazza.com/umbc/spring2018/cmsc678

More information

Linear Models for Classification

Linear Models for Classification Linear Models for Classification Oliver Schulte - CMPT 726 Bishop PRML Ch. 4 Classification: Hand-written Digit Recognition CHINE INTELLIGENCE, VOL. 24, NO. 24, APRIL 2002 x i = t i = (0, 0, 0, 1, 0, 0,

More information

Adaptive Subgradient Methods for Online Learning and Stochastic Optimization John Duchi, Elad Hanzan, Yoram Singer

Adaptive Subgradient Methods for Online Learning and Stochastic Optimization John Duchi, Elad Hanzan, Yoram Singer Adaptive Subgradient Methods for Online Learning and Stochastic Optimization John Duchi, Elad Hanzan, Yoram Singer Vicente L. Malave February 23, 2011 Outline Notation minimize a number of functions φ

More information

1 Machine Learning Concepts (16 points)

1 Machine Learning Concepts (16 points) CSCI 567 Fall 2018 Midterm Exam DO NOT OPEN EXAM UNTIL INSTRUCTED TO DO SO PLEASE TURN OFF ALL CELL PHONES Problem 1 2 3 4 5 6 Total Max 16 10 16 42 24 12 120 Points Please read the following instructions

More information

THE WEIGHTED MAJORITY ALGORITHM

THE WEIGHTED MAJORITY ALGORITHM THE WEIGHTED MAJORITY ALGORITHM Csaba Szepesvári University of Alberta CMPUT 654 E-mail: szepesva@ualberta.ca UofA, October 3, 2006 OUTLINE 1 PREDICTION WITH EXPERT ADVICE 2 HALVING: FIND THE PERFECT EXPERT!

More information

Midterm exam CS 189/289, Fall 2015

Midterm exam CS 189/289, Fall 2015 Midterm exam CS 189/289, Fall 2015 You have 80 minutes for the exam. Total 100 points: 1. True/False: 36 points (18 questions, 2 points each). 2. Multiple-choice questions: 24 points (8 questions, 3 points

More information

CS281B/Stat241B. Statistical Learning Theory. Lecture 1.

CS281B/Stat241B. Statistical Learning Theory. Lecture 1. CS281B/Stat241B. Statistical Learning Theory. Lecture 1. Peter Bartlett 1. Organizational issues. 2. Overview. 3. Probabilistic formulation of prediction problems. 4. Game theoretic formulation of prediction

More information

1 Review of Winnow Algorithm

1 Review of Winnow Algorithm COS 511: Theoretical Machine Learning Lecturer: Rob Schapire Lecture # 17 Scribe: Xingyuan Fang, Ethan April 9th, 2013 1 Review of Winnow Algorithm We have studied Winnow algorithm in Algorithm 1. Algorithm

More information

Foundations of Machine Learning On-Line Learning. Mehryar Mohri Courant Institute and Google Research

Foundations of Machine Learning On-Line Learning. Mehryar Mohri Courant Institute and Google Research Foundations of Machine Learning On-Line Learning Mehryar Mohri Courant Institute and Google Research mohri@cims.nyu.edu Motivation PAC learning: distribution fixed over time (training and test). IID assumption.

More information

Stochastic gradient descent; Classification

Stochastic gradient descent; Classification Stochastic gradient descent; Classification Steve Renals Machine Learning Practical MLP Lecture 2 28 September 2016 MLP Lecture 2 Stochastic gradient descent; Classification 1 Single Layer Networks MLP

More information

Bayesian Learning (II)

Bayesian Learning (II) Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen Bayesian Learning (II) Niels Landwehr Overview Probabilities, expected values, variance Basic concepts of Bayesian learning MAP

More information

Machine Learning Theory (CS 6783)

Machine Learning Theory (CS 6783) Machine Learning Theory (CS 6783) Tu-Th 1:25 to 2:40 PM Kimball, B-11 Instructor : Karthik Sridharan ABOUT THE COURSE No exams! 5 assignments that count towards your grades (55%) One term project (40%)

More information

Ad Placement Strategies

Ad Placement Strategies Case Study : Estimating Click Probabilities Intro Logistic Regression Gradient Descent + SGD AdaGrad Machine Learning for Big Data CSE547/STAT548, University of Washington Emily Fox January 7 th, 04 Ad

More information

Conditional Random Fields for Sequential Supervised Learning

Conditional Random Fields for Sequential Supervised Learning Conditional Random Fields for Sequential Supervised Learning Thomas G. Dietterich Adam Ashenfelter Department of Computer Science Oregon State University Corvallis, Oregon 97331 http://www.eecs.oregonstate.edu/~tgd

More information

CSE 250a. Assignment Noisy-OR model. Out: Tue Oct 26 Due: Tue Nov 2

CSE 250a. Assignment Noisy-OR model. Out: Tue Oct 26 Due: Tue Nov 2 CSE 250a. Assignment 4 Out: Tue Oct 26 Due: Tue Nov 2 4.1 Noisy-OR model X 1 X 2 X 3... X d Y For the belief network of binary random variables shown above, consider the noisy-or conditional probability

More information

Reducing Multiclass to Binary: A Unifying Approach for Margin Classifiers

Reducing Multiclass to Binary: A Unifying Approach for Margin Classifiers Reducing Multiclass to Binary: A Unifying Approach for Margin Classifiers Erin Allwein, Robert Schapire and Yoram Singer Journal of Machine Learning Research, 1:113-141, 000 CSE 54: Seminar on Learning

More information

Machine Learning Lecture 5

Machine Learning Lecture 5 Machine Learning Lecture 5 Linear Discriminant Functions 26.10.2017 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de leibe@vision.rwth-aachen.de Course Outline Fundamentals Bayes Decision Theory

More information

ABC-LogitBoost for Multi-Class Classification

ABC-LogitBoost for Multi-Class Classification Ping Li, Cornell University ABC-Boost BTRY 6520 Fall 2012 1 ABC-LogitBoost for Multi-Class Classification Ping Li Department of Statistical Science Cornell University 2 4 6 8 10 12 14 16 2 4 6 8 10 12

More information

MIDTERM: CS 6375 INSTRUCTOR: VIBHAV GOGATE October,

MIDTERM: CS 6375 INSTRUCTOR: VIBHAV GOGATE October, MIDTERM: CS 6375 INSTRUCTOR: VIBHAV GOGATE October, 23 2013 The exam is closed book. You are allowed a one-page cheat sheet. Answer the questions in the spaces provided on the question sheets. If you run

More information

Foundations of Machine Learning

Foundations of Machine Learning Introduction to ML Mehryar Mohri Courant Institute and Google Research mohri@cims.nyu.edu page 1 Logistics Prerequisites: basics in linear algebra, probability, and analysis of algorithms. Workload: about

More information

EXAM IN STATISTICAL MACHINE LEARNING STATISTISK MASKININLÄRNING

EXAM IN STATISTICAL MACHINE LEARNING STATISTISK MASKININLÄRNING EXAM IN STATISTICAL MACHINE LEARNING STATISTISK MASKININLÄRNING DATE AND TIME: August 30, 2018, 14.00 19.00 RESPONSIBLE TEACHER: Niklas Wahlström NUMBER OF PROBLEMS: 5 AIDING MATERIAL: Calculator, mathematical

More information

Data Mining. 3.6 Regression Analysis. Fall Instructor: Dr. Masoud Yaghini. Numeric Prediction

Data Mining. 3.6 Regression Analysis. Fall Instructor: Dr. Masoud Yaghini. Numeric Prediction Data Mining 3.6 Regression Analysis Fall 2008 Instructor: Dr. Masoud Yaghini Outline Introduction Straight-Line Linear Regression Multiple Linear Regression Other Regression Models References Introduction

More information