In-Database Factorised Learning fdbresearch.github.io

Size: px
Start display at page:

Download "In-Database Factorised Learning fdbresearch.github.io"

Transcription

1 In-Database Factorised Learning fdbresearch.github.io Mahmoud Abo Khamis, Hung Ngo, XuanLong Nguyen, Dan Olteanu, and Maximilian Schleich December 2017 Logic for Data Science Seminar Alan Turing Institute

2 Talk Outline Current Landscape for ML over DB Factorised Learning over Normalised Data Learning under Functional Dependencies General Problem Formulation 1/37

3 Brief Outlook at Current Landscape for ML over DB (1/2) No integration ML & DB distinct tools on the technology stack DB exports data as one table, ML imports it in own format Spark/PostgreSQL + R supports virtually any ML task Most ML over DB solutions operate in this space 2/37

4 Brief Outlook at Current Landscape for ML over DB (1/2) No integration ML & DB distinct tools on the technology stack DB exports data as one table, ML imports it in own format Spark/PostgreSQL + R supports virtually any ML task Most ML over DB solutions operate in this space Loose integration Each ML task implemented by a distinct UDF inside DB Same running process for DB and ML DB computes one table, ML works directly on it MadLib supports comprehensive library of ML UDFs 2/37

5 Brief Outlook at Current Landscape for ML over DB (2/2) Unified programming architecture One framework for many ML tasks instead of one UDF per task, with possible code reuse across UDFs DB computes one table, ML works directly on it Bismark supports incremental gradient descent for convex programming; up to 100% overhead over specialized UDFs 3/37

6 Brief Outlook at Current Landscape for ML over DB (2/2) Unified programming architecture One framework for many ML tasks instead of one UDF per task, with possible code reuse across UDFs DB computes one table, ML works directly on it Bismark supports incremental gradient descent for convex programming; up to 100% overhead over specialized UDFs Tight integration In-Database Analytics One evaluation plan for both DB and ML workload; opportunity to push parts of ML tasks past DB joins Morpheus + Hamlet supports GLM and naïve Bayes Our approach supports PR/FM, decision trees,... 3/37

7 In-Database Analytics Move the analytics, not the data Avoid expensive data export/import Exploit database technologies Exploit the relational structure (schema, query, dependencies) Build better models using larger datasets and faster Cast analytics code as join-aggregate queries Many similar queries that massively share computation Fixpoint computation needed for model convergence 4/37

8 In-database vs. Out-of-database Analytics materialized output feature extraction query DB ML tool θ Query Engine model reformulation model Optimised join-aggregate queries Gradient-descent Trainer 5/37

9 Does It Pay Off in Practice? Retailer dataset (records) excerpt (17M) full (86M) Linear regression Features (cont+categ) ,653 Aggregates (cont+categ) 595+2, k MadLib Learn 1, sec > 24h R Join (PSQL) sec Export/Import sec Learn sec Our approach Join-Aggregate sec sec (1core, commodity machine) Converge (runs) 0.02 (343) sec 8.82 (366) sec Polynomial regression degree 2 Features (cont+categ) 562+2, k Aggregates (cont+categ) 158k+742k 158k+37M MadLib Learn > 24h Our approach Join-Aggregate sec 1, sec (1core, commodity machine) Converge (runs) 3.27 (321) sec (180) sec 6/37

10 Talk Outline Current Landscape for ML over DB Factorised Learning over Normalised Data Learning under Functional Dependencies General Problem Formulation 7/37

11 Unified In-Database Analytics for Optimisation Problems Our target: retail-planning and forecasting applications Typical databases: weekly sales, promotions, and products Training dataset: Result of a feature extraction query Task: Train model to predict additional demand generated for a product due to promotion Training algorithm: batch gradient descent ML tasks: ridge linear regression, polynomial regression, factorisation machines; logistic regression, SVM; PCA. 8/37

12 Typical Retail Example Database I = (R 1, R 2, R 3, R 4, R 5 ) Feature selection query Q: Q(sku, store, color, city, country, unitssold) R 1 (sku, store, day, unitssold), R 2 (sku, color), R 3 (day, quarter), R 4 (store, city), R 5 (city, country). Free variables Categorical (qualitative): F = {sku, store, color, city, country}. Continuous (quantitative): unitssold. Bound variables Categorical (qualitative): B = {day, quarter} 9/37

13 Typical Retail Example We learn the ridge linear regression model θ, x = f F θ f, x f Training dataset: D = Q(I ) Feature vector x and response y = unitssold The parameters θ obtained by minimising the objective function: least square loss l {}}{ 2 regulariser 1 {}}{ J(θ) = ( θ, x y) 2 + θ D (x,y) D 10/37

14 Side Note: One-hot Encoding of Categorical Variables Continuous variables are mapped to scalars y unitssold R. Categorical variables are mapped to indicator vectors country has categories vietnam and england country is then mapped to an indicator vector x country = [x vietnam, x england ] ({0, 1} 2 ). x country = [0, 1] for a tuple with country = england This encoding leads to wide training datasets and many 0s 11/37

15 From Optimisation to Sum-Product Queries We can solve θ := arg min θ J(θ) by repeatedly updating θ in the direction of the gradient until convergence: θ := θ α J(θ). 12/37

16 From Optimisation to Sum-Product Queries We can solve θ := arg min θ J(θ) by repeatedly updating θ in the direction of the gradient until convergence: θ := θ α J(θ). Define the matrix Σ = (σ ij ) i,j [ F ], the vector c = (c i ) i [ F ], and the scalar s Y : σ ij = 1 x i x j c i = 1 y x i s Y = 1 D D D (x,y) D (x,y) D (x,y) D y 2. 12/37

17 From Optimisation to Sum-Product Queries We can solve θ := arg min θ J(θ) by repeatedly updating θ in the direction of the gradient until convergence: θ := θ α J(θ). Define the matrix Σ = (σ ij ) i,j [ F ], the vector c = (c i ) i [ F ], and the scalar s Y : σ ij = 1 x i x j c i = 1 y x i s Y = 1 D D D (x,y) D (x,y) D (x,y) D y 2. Then, J(θ) = 1 2 D (x,y) D ( θ, x y) 2 + λ 2 θ 2 2 = 1 2 θ Σθ θ, c + s Y 2 + λ 2 θ /37

18 Σ, c, s Y can be Expressed as SQL Queries SQL queries for σ ij = 1 D (x,y) D x ix 1 j (w/o factor D ): 13/37

19 Σ, c, s Y can be Expressed as SQL Queries SQL queries for σ ij = 1 D (x,y) D x ix 1 j (w/o factor D ): x i, x j continuous no group-by variable SELECT SUM (x i * x j ) FROM D; where D is the natural join of tables R 1 to R 5 in our example. 13/37

20 Σ, c, s Y can be Expressed as SQL Queries SQL queries for σ ij = 1 D (x,y) D x ix 1 j (w/o factor D ): x i, x j continuous no group-by variable SELECT SUM (x i * x j ) FROM D; x i categorical, x j continuous one group-by variable SELECT x i, SUM(x j ) FROM D GROUP BY x i ; where D is the natural join of tables R 1 to R 5 in our example. 13/37

21 Σ, c, s Y can be Expressed as SQL Queries SQL queries for σ ij = 1 D (x,y) D x ix 1 j (w/o factor D ): x i, x j continuous no group-by variable SELECT SUM (x i * x j ) FROM D; x i categorical, x j continuous one group-by variable SELECT x i, SUM(x j ) FROM D GROUP BY x i ; x i, x j categorical two group-by variables SELECT x i, x j, SUM(1) FROM D GROUP BY x i, x j ; where D is the natural join of tables R 1 to R 5 in our example. 13/37

22 Σ, c, s Y can be Expressed as SQL Queries SQL queries for σ ij = 1 D (x,y) D x ix 1 j (w/o factor D ): x i, x j continuous no group-by variable SELECT SUM (x i * x j ) FROM D; x i categorical, x j continuous one group-by variable SELECT x i, SUM(x j ) FROM D GROUP BY x i ; x i, x j categorical two group-by variables SELECT x i, x j, SUM(1) FROM D GROUP BY x i, x j ; where D is the natural join of tables R 1 to R 5 in our example. This query encoding avoids drawbacks of one-hot encoding 13/37

23 How To Compute Efficiently These Join-Aggregate Queries? 13/37

24 Factorised Query Computation by Example Orders (O for short) customer day dish Dish (D for short) dish item Items (I for short) item price Elise Monday burger Elise Friday burger Steve Friday hotdog Joe Friday hotdog burger burger burger hotdog hotdog hotdog patty onion bun bun onion sausage patty 6 onion 2 bun 2 sausage 4 14/37

25 Factorised Query Computation by Example Orders (O for short) customer day dish Dish (D for short) dish item Items (I for short) item price Elise Monday burger Elise Friday burger Steve Friday hotdog Joe Friday hotdog burger burger burger hotdog hotdog hotdog patty onion bun bun onion sausage patty 6 onion 2 bun 2 sausage 4 Consider the natural join of the above relations: O(customer, day, dish), D(dish, item), I(item, price) customer day dish item price Elise Monday burger patty 6 Elise Monday burger onion 2 Elise Monday burger bun 2 Elise Friday burger patty 6 Elise Friday burger onion 2 Elise Friday burger bun /37

26 Factorised Query Computation by Example O(customer, day, dish), D(dish, item), I(item, price) customer day dish item price Elise Monday burger patty 6 Elise Monday burger onion 2 Elise Monday burger bun 2 Elise Friday burger patty 6 Elise Friday burger onion 2 Elise Friday burger bun An algebraic encoding uses product ( ), union (), and values: Elise Monday burger patty 6 Elise Monday burger onion 2 Elise Monday burger bun 2 Elise Friday burger patty 6 Elise Friday burger onion 2 Elise Friday burger bun /37

27 Factorised Join dish day item customer price Variable order burger hotdog Monday Friday patty bun onion Friday bun onion sausage Elise Elise Joe Steve Grounding of the variable order over the input database There are several algebraically equivalent factorised joins defined by distributivity of product over union and their commutativity. 16/37

28 .. Now with Further Compression dish day item customer price burger hotdog Monday Friday patty bun onion bun onion sausage Friday Elise Elise Joe Steve Observation: price is under item, which is under dish, but only depends on item,.. so the same price appears under an item regardless of the dish. Idea: Cache price for a specific item and avoid repetition! 17/37

29 Same Data, Different Factorisation day customer dish item price Monday Friday Elise burger Elise burger Joe hotdog Steve hotdog patty bun onion patty bun onion bun onion sausage bun onion sausage /37

30 .. and Further Compressed day costumer dish item price Monday Friday Elise burger Elise burger Joe hotdog patty bun onion bun onion sausage Steve hotdog 19/37

31 Grounding Variable Orders to Factorised Joins Our join: O(customer, day, dish), D(dish, item), I(item, price) can be grounded to a factorised join as follows: O(,,dish),D(dish, ) dish O(,day,dish) day D(dish,item) item O(customer,day,dish) customer I (item,price) price This grounding follows the previous variable order. 20/37

32 Grounding Variable Orders to Factorised Joins O(,,dish),D(dish, ) dish O(,day,dish) day D(dish,item) item O(customer,day,dish) customer I (item,price) price Relations sorted following topological order of the variable order Intersection of O and D on dish in time Õ(min( π disho, π dish D )) The remaining operations are lookups in the relations, where we first fix the dish value and then the day and item values 21/37

33 Factorising the Computation of Aggregates (1/2) dish day item customer price burger hotdog Monday Friday patty bun onion bun onion sausage Friday Elise Elise Joe Steve COUNT(*) computed in one pass over the factorisation: values 1, +,. 22/37

34 Factorising the Computation of Aggregates (1/2) dish day item customer price COUNT(*) computed in one pass over the factorisation: values 1, +,. 23/37

35 Factorising the Computation of Aggregates (2/2) dish day item customer price burger hotdog Monday Friday patty bun onion bun onion sausage Friday Elise Elise Joe Steve SUM(dish * price) computed in one pass over the factorisation: Assume there is a function f that turns dish into reals. All values except for dish & price 1, +,. 24/37

36 Factorising the Computation of Aggregates (2/2) dish day item customer price 20 f (burger)+16 f (hotdog) + f (burger) f (hotdog) SUM(dish * price) computed in one pass over the factorisation: Assume there is a function f that turns dish into reals. All values except for dish & price 1, +,. 25/37

37 Talk Outline Current Landscape for ML over DB Factorised Learning over Normalised Data Learning under Functional Dependencies General Problem Formulation 26/37

38 Model Reparameterisation using Functional Dependencies Consider the functional dependency city country and country categories: vietnam, england city categories: saigon, hanoi, oxford, leeds,bristol The one-hot encoding enforces the following identities: x vietnam = x saigon + x hanoi country is vietnam city is either saigon or hanoi x vietnam = 1 either x saigon = 1 or x hanoi = 1 x england = x oxford + x leeds + x bristol country is england city is either oxford, leeds, or bristol x england = 1 either x oxford = 1 or x leeds = 1 or x bristol = 1 27/37

39 Model Reparameterisation using Functional Dependencies Identities due to one-hot encoding x vietnam = x saigon + x hanoi x england = x oxford + x leeds + x bristol Encode x country as x country = Rx city, where R = saigon hanoi oxford leeds bristol vietnam england For instance, if city is saigon, i.e., x city = [1, 0, 0, 0, 0], then country is vietnam, i.e., x country = Rx city = [1, 0]. [ ] [ = 1 0 ] 28/37

40 Model Reparameterisation using Functional Dependencies Functional dependency: city country x country = Rx city Replace all occurrences of x country by Rx city : = = f F {city,country} f F {city,country} f F {city,country} θ f, x f + θ country, x country + θ city, x city θ f, x f + θ country, Rx city + θ city, x city θ f, x f + R θ country + θ city, x city }{{} γ city 29/37

41 Model Reparameterisation using Functional Dependencies Functional dependency: city country x country = Rx city Replace all occurrences of x country by Rx city : = = f F {city,country} f F {city,country} f F {city,country} θ f, x f + θ country, x country + θ city, x city θ f, x f + θ country, Rx city + θ city, x city θ f, x f + R θ country + θ city, x city }{{} γ city We avoid the computation of the aggregates over x country. We reparameterise and ignore parameters θ country. What about the penalty term in the loss function? 29/37

42 Model Reparameterisation using Functional Dependencies Functional dependency: city country x country = Rx city γ city = R θ country + θ city Rewrite the penalty term θ 2 2 = θ j γcity R 2 θ country j city 2 + θ country 2 2 Optimise out θ country by expressing it in terms of γ city : θ country = (I country + RR ) 1 Rγ city = R(I city + R R) 1 γ city The penalty term becomes θ 2 2 = j city θ j (I city + R R) 1 γ city, γ city 30/37

43 Talk Outline Current Landscape for ML over DB Factorised Learning over Normalised Data Learning under Functional Dependencies General Problem Formulation 31/37

44 General Problem Formulation We want to solve θ := arg min θ J(θ), where J(θ) := (x,y) D L ( g(θ), h(x), y) + Ω(θ). θ = (θ 1,..., θ p ) R p are parameters functions g : R p R m and h : R n R m g = (g j ) j [m] is a vector of multivariate polynomials h = (h j ) j [m] is a vector of multivariate monomials L is a loss function, Ω is the regulariser D is the training dataset with features x and response y. Problems: ridge linear regression, polynomial regression, Factorisation machines; logistic regression, SVM; PCA. 32/37

45 Special Case: Ridge Linear Regression Under square loss L, l 2 -regularisation, data points x = (x 0, x 1,..., x n, y), p = n + 1 parameters θ = (θ 0,..., θ n ), x 0 = 1 corresponds to the bias parameter θ 0, identity functions g and h, we obtain the following formulation for ridge linear regression: J(θ) := 1 2 D (x,y) D ( θ, x y) 2 + λ 2 θ /37

46 Special Case: Degree-d Polynomial Regression Under square loss L, l 2 -regularisation, data points x = (x 0, x 1,..., x n, y), p = m = 1 + n + n n d parameters θ = (θ a ), where a = (a 1,..., a n ) with non-negative integers s.t. a 1 d. the components of h are given by h a (x) = n i=1 x a i i, g(θ) = θ, we obtain the following formulation for polynomial regression: J(θ) := 1 2 D (x,y) D ( g(θ), h(x) y) 2 + λ 2 θ /37

47 Special Case: Factorisation Machines Under square loss L, l 2 -regularisation, data points x = (x 0, x 1,..., x n, y), p = 1 + n + r n parameters, m = 1 + n + ( n 2) features, we obtain the following formulation for degree-2 rank-r Factorisation machines: J(θ) := 1 2 D (x,y) D n θ i x i + i=0 {i,j} ( [n] 2 ) l [r] θ (l) i θ (l) j 2 x i x j y + λ 2 θ /37

48 Special Case: Classifiers Typically, the regulariser is λ 2 θ 2 2 The response is binary: y {±1} The loss function L(γ, y), where γ := g(θ), h(x) is L(γ, y) = max{1 yγ, 0} for support vector machines, L(γ, y) = log(1 + e yγ ) for logistic regression, L(γ, y) = e yγ for Adaboost. 36/37

49 Zoom-in: In-database vs. Out-of-database Learning x y feature extraction query R1... Rk DB D ML tool θ Queries: σ 11 h model reformulation model Query optimiser. σ ij. c 1. Σ, c θ g g(θ) No Gradient-descent Yes converged? J(θ) J(θ) Factorised query evaluation Cost N faqw D 37/37

50 Thank you! 37/37

In-Database Learning with Sparse Tensors

In-Database Learning with Sparse Tensors In-Database Learning with Sparse Tensors Mahmoud Abo Khamis, Hung Ngo, XuanLong Nguyen, Dan Olteanu, and Maximilian Schleich Toronto, October 2017 RelationalAI Talk Outline Current Landscape for DB+ML

More information

Joins Aggregates Optimization

Joins Aggregates Optimization Joins Aggregates Optimization https://fdbresearch.github.io Dan Olteanu PhD Open School University of Warsaw November 24, 2018 1/1 Acknowledgements Some work reported in this course has been done in the

More information

In-Database Learning with Sparse Tensors

In-Database Learning with Sparse Tensors In-Database Learning with Sparse Tensors Mahmoud Abo Khamis RelationalAI, Inc Hung Q. Ngo RelationalAI, Inc XuanLong Nguyen University of Michigan ABSTRACT Dan Olteanu University of Oxford In-database

More information

arxiv: v3 [cs.db] 30 May 2018 ABSTRACT

arxiv: v3 [cs.db] 30 May 2018 ABSTRACT In-Database Learning with Sparse Tensors Mahmoud Abo Khamis 1 Hung Q Ngo 1 XuanLong Nguyen Dan Olteanu 3 Maximilian Schleich 3 arxiv:170304780v3 [csdb] 30 May 018 ABSTRACT 1 RelationalAI, Inc University

More information

Factorized Relational Databases Olteanu and Závodný, University of Oxford

Factorized Relational Databases   Olteanu and Závodný, University of Oxford November 8, 2013 Database Seminar, U Washington Factorized Relational Databases http://www.cs.ox.ac.uk/projects/fd/ Olteanu and Závodný, University of Oxford Factorized Representations of Relations Cust

More information

Lecture 2 Machine Learning Review

Lecture 2 Machine Learning Review Lecture 2 Machine Learning Review CMSC 35246: Deep Learning Shubhendu Trivedi & Risi Kondor University of Chicago March 29, 2017 Things we will look at today Formal Setup for Supervised Learning Things

More information

MIDTERM: CS 6375 INSTRUCTOR: VIBHAV GOGATE October,

MIDTERM: CS 6375 INSTRUCTOR: VIBHAV GOGATE October, MIDTERM: CS 6375 INSTRUCTOR: VIBHAV GOGATE October, 23 2013 The exam is closed book. You are allowed a one-page cheat sheet. Answer the questions in the spaces provided on the question sheets. If you run

More information

EXAM IN STATISTICAL MACHINE LEARNING STATISTISK MASKININLÄRNING

EXAM IN STATISTICAL MACHINE LEARNING STATISTISK MASKININLÄRNING EXAM IN STATISTICAL MACHINE LEARNING STATISTISK MASKININLÄRNING DATE AND TIME: August 30, 2018, 14.00 19.00 RESPONSIBLE TEACHER: Niklas Wahlström NUMBER OF PROBLEMS: 5 AIDING MATERIAL: Calculator, mathematical

More information

Intelligent Systems Discriminative Learning, Neural Networks

Intelligent Systems Discriminative Learning, Neural Networks Intelligent Systems Discriminative Learning, Neural Networks Carsten Rother, Dmitrij Schlesinger WS2014/2015, Outline 1. Discriminative learning 2. Neurons and linear classifiers: 1) Perceptron-Algorithm

More information

QR Decomposition of Normalised Relational Data

QR Decomposition of Normalised Relational Data University of Oxford QR Decomposition of Normalised Relational Data Bas A.M. van Geffen Kellogg College Supervisor Prof. Dan Olteanu A dissertation submitted for the degree of: Master of Science in Computer

More information

Statistical Data Mining and Machine Learning Hilary Term 2016

Statistical Data Mining and Machine Learning Hilary Term 2016 Statistical Data Mining and Machine Learning Hilary Term 2016 Dino Sejdinovic Department of Statistics Oxford Slides and other materials available at: http://www.stats.ox.ac.uk/~sejdinov/sdmml Naïve Bayes

More information

Learning From Data: Modelling as an Optimisation Problem

Learning From Data: Modelling as an Optimisation Problem Learning From Data: Modelling as an Optimisation Problem Iman Shames April 2017 1 / 31 You should be able to... Identify and formulate a regression problem; Appreciate the utility of regularisation; Identify

More information

Machine Learning: Chenhao Tan University of Colorado Boulder LECTURE 5

Machine Learning: Chenhao Tan University of Colorado Boulder LECTURE 5 Machine Learning: Chenhao Tan University of Colorado Boulder LECTURE 5 Slides adapted from Jordan Boyd-Graber, Tom Mitchell, Ziv Bar-Joseph Machine Learning: Chenhao Tan Boulder 1 of 27 Quiz question For

More information

Databases 2011 The Relational Algebra

Databases 2011 The Relational Algebra Databases 2011 Christian S. Jensen Computer Science, Aarhus University What is an Algebra? An algebra consists of values operators rules Closure: operations yield values Examples integers with +,, sets

More information

StreamSVM Linear SVMs and Logistic Regression When Data Does Not Fit In Memory

StreamSVM Linear SVMs and Logistic Regression When Data Does Not Fit In Memory StreamSVM Linear SVMs and Logistic Regression When Data Does Not Fit In Memory S.V. N. (vishy) Vishwanathan Purdue University and Microsoft vishy@purdue.edu October 9, 2012 S.V. N. Vishwanathan (Purdue,

More information

Case Study 1: Estimating Click Probabilities. Kakade Announcements: Project Proposals: due this Friday!

Case Study 1: Estimating Click Probabilities. Kakade Announcements: Project Proposals: due this Friday! Case Study 1: Estimating Click Probabilities Intro Logistic Regression Gradient Descent + SGD Machine Learning for Big Data CSE547/STAT548, University of Washington Sham Kakade April 4, 017 1 Announcements:

More information

Scalable Asynchronous Gradient Descent Optimization for Out-of-Core Models

Scalable Asynchronous Gradient Descent Optimization for Out-of-Core Models Scalable Asynchronous Gradient Descent Optimization for Out-of-Core Models Chengjie Qin 1, Martin Torres 2, and Florin Rusu 2 1 GraphSQL, Inc. 2 University of California Merced August 31, 2017 Machine

More information

> DEPARTMENT OF MATHEMATICS AND COMPUTER SCIENCE GRAVIS 2016 BASEL. Logistic Regression. Pattern Recognition 2016 Sandro Schönborn University of Basel

> DEPARTMENT OF MATHEMATICS AND COMPUTER SCIENCE GRAVIS 2016 BASEL. Logistic Regression. Pattern Recognition 2016 Sandro Schönborn University of Basel Logistic Regression Pattern Recognition 2016 Sandro Schönborn University of Basel Two Worlds: Probabilistic & Algorithmic We have seen two conceptual approaches to classification: data class density estimation

More information

Midterm: CS 6375 Spring 2015 Solutions

Midterm: CS 6375 Spring 2015 Solutions Midterm: CS 6375 Spring 2015 Solutions The exam is closed book. You are allowed a one-page cheat sheet. Answer the questions in the spaces provided on the question sheets. If you run out of room for an

More information

Support Vector Machines

Support Vector Machines Support Vector Machines Hypothesis Space variable size deterministic continuous parameters Learning Algorithm linear and quadratic programming eager batch SVMs combine three important ideas Apply optimization

More information

Lecture 6. Notes on Linear Algebra. Perceptron

Lecture 6. Notes on Linear Algebra. Perceptron Lecture 6. Notes on Linear Algebra. Perceptron COMP90051 Statistical Machine Learning Semester 2, 2017 Lecturer: Andrey Kan Copyright: University of Melbourne This lecture Notes on linear algebra Vectors

More information

Decision Support Systems MEIC - Alameda 2010/2011. Homework #8. Due date: 5.Dec.2011

Decision Support Systems MEIC - Alameda 2010/2011. Homework #8. Due date: 5.Dec.2011 Decision Support Systems MEIC - Alameda 2010/2011 Homework #8 Due date: 5.Dec.2011 1 Rule Learning 1. Consider once again the decision-tree you computed in Question 1c of Homework #7, used to determine

More information

Stochastic gradient descent; Classification

Stochastic gradient descent; Classification Stochastic gradient descent; Classification Steve Renals Machine Learning Practical MLP Lecture 2 28 September 2016 MLP Lecture 2 Stochastic gradient descent; Classification 1 Single Layer Networks MLP

More information

ECS289: Scalable Machine Learning

ECS289: Scalable Machine Learning ECS289: Scalable Machine Learning Cho-Jui Hsieh UC Davis Oct 18, 2016 Outline One versus all/one versus one Ranking loss for multiclass/multilabel classification Scaling to millions of labels Multiclass

More information

Numerical Learning Algorithms

Numerical Learning Algorithms Numerical Learning Algorithms Example SVM for Separable Examples.......................... Example SVM for Nonseparable Examples....................... 4 Example Gaussian Kernel SVM...............................

More information

CS60021: Scalable Data Mining. Large Scale Machine Learning

CS60021: Scalable Data Mining. Large Scale Machine Learning J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org 1 CS60021: Scalable Data Mining Large Scale Machine Learning Sourangshu Bhattacharya Example: Spam filtering Instance

More information

Ad Placement Strategies

Ad Placement Strategies Case Study 1: Estimating Click Probabilities Tackling an Unknown Number of Features with Sketching Machine Learning for Big Data CSE547/STAT548, University of Washington Emily Fox 2014 Emily Fox January

More information

Final Overview. Introduction to ML. Marek Petrik 4/25/2017

Final Overview. Introduction to ML. Marek Petrik 4/25/2017 Final Overview Introduction to ML Marek Petrik 4/25/2017 This Course: Introduction to Machine Learning Build a foundation for practice and research in ML Basic machine learning concepts: max likelihood,

More information

Online Advertising is Big Business

Online Advertising is Big Business Online Advertising Online Advertising is Big Business Multiple billion dollar industry $43B in 2013 in USA, 17% increase over 2012 [PWC, Internet Advertising Bureau, April 2013] Higher revenue in USA

More information

Machine Learning for NLP

Machine Learning for NLP Machine Learning for NLP Linear Models Joakim Nivre Uppsala University Department of Linguistics and Philology Slides adapted from Ryan McDonald, Google Research Machine Learning for NLP 1(26) Outline

More information

Kaggle.

Kaggle. Administrivia Mini-project 2 due April 7, in class implement multi-class reductions, naive bayes, kernel perceptron, multi-class logistic regression and two layer neural networks training set: Project

More information

Lecture 11 Linear regression

Lecture 11 Linear regression Advanced Algorithms Floriano Zini Free University of Bozen-Bolzano Faculty of Computer Science Academic Year 2013-2014 Lecture 11 Linear regression These slides are taken from Andrew Ng, Machine Learning

More information

ECS289: Scalable Machine Learning

ECS289: Scalable Machine Learning ECS289: Scalable Machine Learning Cho-Jui Hsieh UC Davis Oct 27, 2015 Outline One versus all/one versus one Ranking loss for multiclass/multilabel classification Scaling to millions of labels Multiclass

More information

CSE 417T: Introduction to Machine Learning. Final Review. Henry Chai 12/4/18

CSE 417T: Introduction to Machine Learning. Final Review. Henry Chai 12/4/18 CSE 417T: Introduction to Machine Learning Final Review Henry Chai 12/4/18 Overfitting Overfitting is fitting the training data more than is warranted Fitting noise rather than signal 2 Estimating! "#$

More information

Click Prediction and Preference Ranking of RSS Feeds

Click Prediction and Preference Ranking of RSS Feeds Click Prediction and Preference Ranking of RSS Feeds 1 Introduction December 11, 2009 Steven Wu RSS (Really Simple Syndication) is a family of data formats used to publish frequently updated works. RSS

More information

EECS 349:Machine Learning Bryan Pardo

EECS 349:Machine Learning Bryan Pardo EECS 349:Machine Learning Bryan Pardo Topic 2: Decision Trees (Includes content provided by: Russel & Norvig, D. Downie, P. Domingos) 1 General Learning Task There is a set of possible examples Each example

More information

Linear Models for Regression CS534

Linear Models for Regression CS534 Linear Models for Regression CS534 Example Regression Problems Predict housing price based on House size, lot size, Location, # of rooms Predict stock price based on Price history of the past month Predict

More information

Machine Learning. Lecture 3: Logistic Regression. Feng Li.

Machine Learning. Lecture 3: Logistic Regression. Feng Li. Machine Learning Lecture 3: Logistic Regression Feng Li fli@sdu.edu.cn https://funglee.github.io School of Computer Science and Technology Shandong University Fall 2016 Logistic Regression Classification

More information

The Kernel Trick, Gram Matrices, and Feature Extraction. CS6787 Lecture 4 Fall 2017

The Kernel Trick, Gram Matrices, and Feature Extraction. CS6787 Lecture 4 Fall 2017 The Kernel Trick, Gram Matrices, and Feature Extraction CS6787 Lecture 4 Fall 2017 Momentum for Principle Component Analysis CS6787 Lecture 3.1 Fall 2017 Principle Component Analysis Setting: find the

More information

Machine Learning Practice Page 2 of 2 10/28/13

Machine Learning Practice Page 2 of 2 10/28/13 Machine Learning 10-701 Practice Page 2 of 2 10/28/13 1. True or False Please give an explanation for your answer, this is worth 1 pt/question. (a) (2 points) No classifier can do better than a naive Bayes

More information

Naïve Bayes Classifiers and Logistic Regression. Doug Downey Northwestern EECS 349 Winter 2014

Naïve Bayes Classifiers and Logistic Regression. Doug Downey Northwestern EECS 349 Winter 2014 Naïve Bayes Classifiers and Logistic Regression Doug Downey Northwestern EECS 349 Winter 2014 Naïve Bayes Classifiers Combines all ideas we ve covered Conditional Independence Bayes Rule Statistical Estimation

More information

Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function.

Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function. Bayesian learning: Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function. Let y be the true label and y be the predicted

More information

CS 188: Artificial Intelligence. Outline

CS 188: Artificial Intelligence. Outline CS 188: Artificial Intelligence Lecture 21: Perceptrons Pieter Abbeel UC Berkeley Many slides adapted from Dan Klein. Outline Generative vs. Discriminative Binary Linear Classifiers Perceptron Multi-class

More information

Logistic Regression. Jia-Bin Huang. Virginia Tech Spring 2019 ECE-5424G / CS-5824

Logistic Regression. Jia-Bin Huang. Virginia Tech Spring 2019 ECE-5424G / CS-5824 Logistic Regression Jia-Bin Huang ECE-5424G / CS-5824 Virginia Tech Spring 2019 Administrative Please start HW 1 early! Questions are welcome! Two principles for estimating parameters Maximum Likelihood

More information

Linear and Logistic Regression. Dr. Xiaowei Huang

Linear and Logistic Regression. Dr. Xiaowei Huang Linear and Logistic Regression Dr. Xiaowei Huang https://cgi.csc.liv.ac.uk/~xiaowei/ Up to now, Two Classical Machine Learning Algorithms Decision tree learning K-nearest neighbor Model Evaluation Metrics

More information

Logistic Regression. William Cohen

Logistic Regression. William Cohen Logistic Regression William Cohen 1 Outline Quick review classi5ication, naïve Bayes, perceptrons new result for naïve Bayes Learning as optimization Logistic regression via gradient ascent Over5itting

More information

6.036 midterm review. Wednesday, March 18, 15

6.036 midterm review. Wednesday, March 18, 15 6.036 midterm review 1 Topics covered supervised learning labels available unsupervised learning no labels available semi-supervised learning some labels available - what algorithms have you learned that

More information

Reducing Multiclass to Binary: A Unifying Approach for Margin Classifiers

Reducing Multiclass to Binary: A Unifying Approach for Margin Classifiers Reducing Multiclass to Binary: A Unifying Approach for Margin Classifiers Erin Allwein, Robert Schapire and Yoram Singer Journal of Machine Learning Research, 1:113-141, 000 CSE 54: Seminar on Learning

More information

Stochastic Gradient Descent

Stochastic Gradient Descent Stochastic Gradient Descent Machine Learning CSE546 Carlos Guestrin University of Washington October 9, 2013 1 Logistic Regression Logistic function (or Sigmoid): Learn P(Y X) directly Assume a particular

More information

Machine Learning 4771

Machine Learning 4771 Machine Learning 4771 Instructor: Tony Jebara Topic 3 Additive Models and Linear Regression Sinusoids and Radial Basis Functions Classification Logistic Regression Gradient Descent Polynomial Basis Functions

More information

CS229 Supplemental Lecture notes

CS229 Supplemental Lecture notes CS229 Supplemental Lecture notes John Duchi 1 Boosting We have seen so far how to solve classification (and other) problems when we have a data representation already chosen. We now talk about a procedure,

More information

Machine Learning: Assignment 1

Machine Learning: Assignment 1 10-701 Machine Learning: Assignment 1 Due on Februrary 0, 014 at 1 noon Barnabas Poczos, Aarti Singh Instructions: Failure to follow these directions may result in loss of points. Your solutions for this

More information

Discriminative Models

Discriminative Models No.5 Discriminative Models Hui Jiang Department of Electrical Engineering and Computer Science Lassonde School of Engineering York University, Toronto, Canada Outline Generative vs. Discriminative models

More information

CPSC 340: Machine Learning and Data Mining

CPSC 340: Machine Learning and Data Mining CPSC 340: Machine Learning and Data Mining Linear Classifiers: multi-class Original version of these slides by Mark Schmidt, with modifications by Mike Gelbart. 1 Admin Assignment 4: Due in a week Midterm:

More information

Overview. Probabilistic Interpretation of Linear Regression Maximum Likelihood Estimation Bayesian Estimation MAP Estimation

Overview. Probabilistic Interpretation of Linear Regression Maximum Likelihood Estimation Bayesian Estimation MAP Estimation Overview Probabilistic Interpretation of Linear Regression Maximum Likelihood Estimation Bayesian Estimation MAP Estimation Probabilistic Interpretation: Linear Regression Assume output y is generated

More information

Midterm, Fall 2003

Midterm, Fall 2003 5-78 Midterm, Fall 2003 YOUR ANDREW USERID IN CAPITAL LETTERS: YOUR NAME: There are 9 questions. The ninth may be more time-consuming and is worth only three points, so do not attempt 9 unless you are

More information

Statistical Machine Learning Hilary Term 2018

Statistical Machine Learning Hilary Term 2018 Statistical Machine Learning Hilary Term 2018 Pier Francesco Palamara Department of Statistics University of Oxford Slide credits and other course material can be found at: http://www.stats.ox.ac.uk/~palamara/sml18.html

More information

Introduction to Logistic Regression and Support Vector Machine

Introduction to Logistic Regression and Support Vector Machine Introduction to Logistic Regression and Support Vector Machine guest lecturer: Ming-Wei Chang CS 446 Fall, 2009 () / 25 Fall, 2009 / 25 Before we start () 2 / 25 Fall, 2009 2 / 25 Before we start Feel

More information

Learning from Data 1 Naive Bayes

Learning from Data 1 Naive Bayes Learning from Data 1 Naive Bayes Copyright David Barber 2001-2004. Course lecturer: Amos Storkey a.storkey@ed.ac.uk Course page : http://www.anc.ed.ac.uk/ amos/lfd/ 1 2 1 Why Naive Bayes? Naive Bayes is

More information

Machine Learning Basics

Machine Learning Basics Security and Fairness of Deep Learning Machine Learning Basics Anupam Datta CMU Spring 2019 Image Classification Image Classification Image classification pipeline Input: A training set of N images, each

More information

Predictive analysis on Multivariate, Time Series datasets using Shapelets

Predictive analysis on Multivariate, Time Series datasets using Shapelets 1 Predictive analysis on Multivariate, Time Series datasets using Shapelets Hemal Thakkar Department of Computer Science, Stanford University hemal@stanford.edu hemal.tt@gmail.com Abstract Multivariate,

More information

Support Vector Machine I

Support Vector Machine I Support Vector Machine I Jia-Bin Huang ECE-5424G / CS-5824 Virginia Tech Spring 2019 Administrative Please use piazza. No emails. HW 0 grades are back. Re-grade request for one week. HW 1 due soon. HW

More information

Recitation 9. Gradient Boosting. Brett Bernstein. March 30, CDS at NYU. Brett Bernstein (CDS at NYU) Recitation 9 March 30, / 14

Recitation 9. Gradient Boosting. Brett Bernstein. March 30, CDS at NYU. Brett Bernstein (CDS at NYU) Recitation 9 March 30, / 14 Brett Bernstein CDS at NYU March 30, 2017 Brett Bernstein (CDS at NYU) Recitation 9 March 30, 2017 1 / 14 Initial Question Intro Question Question Suppose 10 different meteorologists have produced functions

More information

An Introduction to Statistical and Probabilistic Linear Models

An Introduction to Statistical and Probabilistic Linear Models An Introduction to Statistical and Probabilistic Linear Models Maximilian Mozes Proseminar Data Mining Fakultät für Informatik Technische Universität München June 07, 2017 Introduction In statistical learning

More information

Mining Classification Knowledge

Mining Classification Knowledge Mining Classification Knowledge Remarks on NonSymbolic Methods JERZY STEFANOWSKI Institute of Computing Sciences, Poznań University of Technology COST Doctoral School, Troina 2008 Outline 1. Bayesian classification

More information

Engineering Part IIB: Module 4F10 Statistical Pattern Processing Lecture 5: Single Layer Perceptrons & Estimating Linear Classifiers

Engineering Part IIB: Module 4F10 Statistical Pattern Processing Lecture 5: Single Layer Perceptrons & Estimating Linear Classifiers Engineering Part IIB: Module 4F0 Statistical Pattern Processing Lecture 5: Single Layer Perceptrons & Estimating Linear Classifiers Phil Woodland: pcw@eng.cam.ac.uk Michaelmas 202 Engineering Part IIB:

More information

Algorithms for NLP. Language Modeling III. Taylor Berg-Kirkpatrick CMU Slides: Dan Klein UC Berkeley

Algorithms for NLP. Language Modeling III. Taylor Berg-Kirkpatrick CMU Slides: Dan Klein UC Berkeley Algorithms for NLP Language Modeling III Taylor Berg-Kirkpatrick CMU Slides: Dan Klein UC Berkeley Announcements Office hours on website but no OH for Taylor until next week. Efficient Hashing Closed address

More information

Logistic Regression. COMP 527 Danushka Bollegala

Logistic Regression. COMP 527 Danushka Bollegala Logistic Regression COMP 527 Danushka Bollegala Binary Classification Given an instance x we must classify it to either positive (1) or negative (0) class We can use {1,-1} instead of {1,0} but we will

More information

Apprentissage, réseaux de neurones et modèles graphiques (RCP209) Neural Networks and Deep Learning

Apprentissage, réseaux de neurones et modèles graphiques (RCP209) Neural Networks and Deep Learning Apprentissage, réseaux de neurones et modèles graphiques (RCP209) Neural Networks and Deep Learning Nicolas Thome Prenom.Nom@cnam.fr http://cedric.cnam.fr/vertigo/cours/ml2/ Département Informatique Conservatoire

More information

Gradient-Based Learning. Sargur N. Srihari

Gradient-Based Learning. Sargur N. Srihari Gradient-Based Learning Sargur N. srihari@cedar.buffalo.edu 1 Topics Overview 1. Example: Learning XOR 2. Gradient-Based Learning 3. Hidden Units 4. Architecture Design 5. Backpropagation and Other Differentiation

More information

Linear Models for Regression CS534

Linear Models for Regression CS534 Linear Models for Regression CS534 Example Regression Problems Predict housing price based on House size, lot size, Location, # of rooms Predict stock price based on Price history of the past month Predict

More information

Linear Regression (continued)

Linear Regression (continued) Linear Regression (continued) Professor Ameet Talwalkar Professor Ameet Talwalkar CS260 Machine Learning Algorithms February 6, 2017 1 / 39 Outline 1 Administration 2 Review of last lecture 3 Linear regression

More information

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Cheng Soon Ong & Christian Walder. Canberra February June 2018 Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 2018 (Many figures from C. M. Bishop, "Pattern Recognition and ") 1of 254 Part V

More information

Neural Networks and Ensemble Methods for Classification

Neural Networks and Ensemble Methods for Classification Neural Networks and Ensemble Methods for Classification NEURAL NETWORKS 2 Neural Networks A neural network is a set of connected input/output units (neurons) where each connection has a weight associated

More information

9 Classification. 9.1 Linear Classifiers

9 Classification. 9.1 Linear Classifiers 9 Classification This topic returns to prediction. Unlike linear regression where we were predicting a numeric value, in this case we are predicting a class: winner or loser, yes or no, rich or poor, positive

More information

CSE 417T: Introduction to Machine Learning. Lecture 11: Review. Henry Chai 10/02/18

CSE 417T: Introduction to Machine Learning. Lecture 11: Review. Henry Chai 10/02/18 CSE 417T: Introduction to Machine Learning Lecture 11: Review Henry Chai 10/02/18 Unknown Target Function!: # % Training data Formal Setup & = ( ), + ),, ( -, + - Learning Algorithm 2 Hypothesis Set H

More information

ECS171: Machine Learning

ECS171: Machine Learning ECS171: Machine Learning Lecture 3: Linear Models I (LFD 3.2, 3.3) Cho-Jui Hsieh UC Davis Jan 17, 2018 Linear Regression (LFD 3.2) Regression Classification: Customer record Yes/No Regression: predicting

More information

CSC 411 Lecture 6: Linear Regression

CSC 411 Lecture 6: Linear Regression CSC 411 Lecture 6: Linear Regression Roger Grosse, Amir-massoud Farahmand, and Juan Carrasquilla University of Toronto UofT CSC 411: 06-Linear Regression 1 / 37 A Timely XKCD UofT CSC 411: 06-Linear Regression

More information

Probabilistic classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2016

Probabilistic classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2016 Probabilistic classification CE-717: Machine Learning Sharif University of Technology M. Soleymani Fall 2016 Topics Probabilistic approach Bayes decision theory Generative models Gaussian Bayes classifier

More information

Introduction to Machine Learning

Introduction to Machine Learning 1, DATA11002 Introduction to Machine Learning Lecturer: Teemu Roos TAs: Ville Hyvönen and Janne Leppä-aho Department of Computer Science University of Helsinki (based in part on material by Patrik Hoyer

More information

MIDTERM SOLUTIONS: FALL 2012 CS 6375 INSTRUCTOR: VIBHAV GOGATE

MIDTERM SOLUTIONS: FALL 2012 CS 6375 INSTRUCTOR: VIBHAV GOGATE MIDTERM SOLUTIONS: FALL 2012 CS 6375 INSTRUCTOR: VIBHAV GOGATE March 28, 2012 The exam is closed book. You are allowed a double sided one page cheat sheet. Answer the questions in the spaces provided on

More information

Variations of Logistic Regression with Stochastic Gradient Descent

Variations of Logistic Regression with Stochastic Gradient Descent Variations of Logistic Regression with Stochastic Gradient Descent Panqu Wang(pawang@ucsd.edu) Phuc Xuan Nguyen(pxn002@ucsd.edu) January 26, 2012 Abstract In this paper, we extend the traditional logistic

More information

CS446: Machine Learning Fall Final Exam. December 6 th, 2016

CS446: Machine Learning Fall Final Exam. December 6 th, 2016 CS446: Machine Learning Fall 2016 Final Exam December 6 th, 2016 This is a closed book exam. Everything you need in order to solve the problems is supplied in the body of this exam. This exam booklet contains

More information

Overfitting, Bias / Variance Analysis

Overfitting, Bias / Variance Analysis Overfitting, Bias / Variance Analysis Professor Ameet Talwalkar Professor Ameet Talwalkar CS260 Machine Learning Algorithms February 8, 207 / 40 Outline Administration 2 Review of last lecture 3 Basic

More information

Lecture 18: Multiclass Support Vector Machines

Lecture 18: Multiclass Support Vector Machines Fall, 2017 Outlines Overview of Multiclass Learning Traditional Methods for Multiclass Problems One-vs-rest approaches Pairwise approaches Recent development for Multiclass Problems Simultaneous Classification

More information

CSC242: Intro to AI. Lecture 21

CSC242: Intro to AI. Lecture 21 CSC242: Intro to AI Lecture 21 Administrivia Project 4 (homeworks 18 & 19) due Mon Apr 16 11:59PM Posters Apr 24 and 26 You need an idea! You need to present it nicely on 2-wide by 4-high landscape pages

More information

From Binary to Multiclass Classification. CS 6961: Structured Prediction Spring 2018

From Binary to Multiclass Classification. CS 6961: Structured Prediction Spring 2018 From Binary to Multiclass Classification CS 6961: Structured Prediction Spring 2018 1 So far: Binary Classification We have seen linear models Learning algorithms Perceptron SVM Logistic Regression Prediction

More information

Dot-Product Join: Scalable In-Database Linear Algebra for Big Model Analytics

Dot-Product Join: Scalable In-Database Linear Algebra for Big Model Analytics Dot-Product Join: Scalable In-Database Linear Algebra for Big Model Analytics Chengjie Qin 1 and Florin Rusu 2 1 GraphSQL, Inc. 2 University of California Merced June 29, 2017 Machine Learning (ML) Is

More information

A Magiv CV Theory for Large-Margin Classifiers

A Magiv CV Theory for Large-Margin Classifiers A Magiv CV Theory for Large-Margin Classifiers Hui Zou School of Statistics, University of Minnesota June 30, 2018 Joint work with Boxiang Wang Outline 1 Background 2 Magic CV formula 3 Magic support vector

More information

Reinforcement Learning

Reinforcement Learning Reinforcement Learning Function approximation Mario Martin CS-UPC May 18, 2018 Mario Martin (CS-UPC) Reinforcement Learning May 18, 2018 / 65 Recap Algorithms: MonteCarlo methods for Policy Evaluation

More information

Introduction to Machine Learning

Introduction to Machine Learning 1, DATA11002 Introduction to Machine Learning Lecturer: Antti Ukkonen TAs: Saska Dönges and Janne Leppä-aho Department of Computer Science University of Helsinki (based in part on material by Patrik Hoyer,

More information

Discriminative Models

Discriminative Models No.5 Discriminative Models Hui Jiang Department of Electrical Engineering and Computer Science Lassonde School of Engineering York University, Toronto, Canada Outline Generative vs. Discriminative models

More information

Text Categorization CSE 454. (Based on slides by Dan Weld, Tom Mitchell, and others)

Text Categorization CSE 454. (Based on slides by Dan Weld, Tom Mitchell, and others) Text Categorization CSE 454 (Based on slides by Dan Weld, Tom Mitchell, and others) 1 Given: Categorization A description of an instance, x X, where X is the instance language or instance space. A fixed

More information

Final Exam, Machine Learning, Spring 2009

Final Exam, Machine Learning, Spring 2009 Name: Andrew ID: Final Exam, 10701 Machine Learning, Spring 2009 - The exam is open-book, open-notes, no electronics other than calculators. - The maximum possible score on this exam is 100. You have 3

More information

Online Advertising is Big Business

Online Advertising is Big Business Online Advertising Online Advertising is Big Business Multiple billion dollar industry $43B in 2013 in USA, 17% increase over 2012 [PWC, Internet Advertising Bureau, April 2013] Higher revenue in USA

More information

Midterm: CS 6375 Spring 2018

Midterm: CS 6375 Spring 2018 Midterm: CS 6375 Spring 2018 The exam is closed book (1 cheat sheet allowed). Answer the questions in the spaces provided on the question sheets. If you run out of room for an answer, use an additional

More information

Chapter 6 Classification and Prediction (2)

Chapter 6 Classification and Prediction (2) Chapter 6 Classification and Prediction (2) Outline Classification and Prediction Decision Tree Naïve Bayes Classifier Support Vector Machines (SVM) K-nearest Neighbors Accuracy and Error Measures Feature

More information

VBM683 Machine Learning

VBM683 Machine Learning VBM683 Machine Learning Pinar Duygulu Slides are adapted from Dhruv Batra Bias is the algorithm's tendency to consistently learn the wrong thing by not taking into account all the information in the data

More information

Machine Learning CSE546 Carlos Guestrin University of Washington. October 7, Efficiency: If size(w) = 100B, each prediction is expensive:

Machine Learning CSE546 Carlos Guestrin University of Washington. October 7, Efficiency: If size(w) = 100B, each prediction is expensive: Simple Variable Selection LASSO: Sparse Regression Machine Learning CSE546 Carlos Guestrin University of Washington October 7, 2013 1 Sparsity Vector w is sparse, if many entries are zero: Very useful

More information

Large-scale Collaborative Ranking in Near-Linear Time

Large-scale Collaborative Ranking in Near-Linear Time Large-scale Collaborative Ranking in Near-Linear Time Liwei Wu Depts of Statistics and Computer Science UC Davis KDD 17, Halifax, Canada August 13-17, 2017 Joint work with Cho-Jui Hsieh and James Sharpnack

More information