Linear and Logistic Regression. (the SQL way)

Similar documents
Lecture 4 Logistic Regression

Lecture 11 Linear regression

CPSC 340: Machine Learning and Data Mining. Stochastic Gradient Fall 2017

Linear Models in Machine Learning

COMS 4771 Regression. Nakul Verma

Quick Introduction to Nonnegative Matrix Factorization

ECE 5424: Introduction to Machine Learning

CSE 417T: Introduction to Machine Learning. Lecture 11: Review. Henry Chai 10/02/18

Training the linear classifier

CSC 411 Lecture 7: Linear Classification

Chapter 10 Logistic Regression

COMS 4771 Introduction to Machine Learning. James McInerney Adapted from slides by Nakul Verma

Gaussian and Linear Discriminant Analysis; Multiclass Classification

Gradient Descent. Stephan Robert

Train the model with a subset of the data. Test the model on the remaining data (the validation set) What data to choose for training vs. test?

9 Classification. 9.1 Linear Classifiers

Least Mean Squares Regression

Math for Machine Learning Open Doors to Data Science and Artificial Intelligence. Richard Han

Solving Regression. Jordan Boyd-Graber. University of Colorado Boulder LECTURE 12. Slides adapted from Matt Nedrich and Trevor Hastie

Statistical aspects of prediction models with high-dimensional data

Online Advertising is Big Business

SVAN 2016 Mini Course: Stochastic Convex Optimization Methods in Machine Learning

> DEPARTMENT OF MATHEMATICS AND COMPUTER SCIENCE GRAVIS 2016 BASEL. Logistic Regression. Pattern Recognition 2016 Sandro Schönborn University of Basel

CS229 Supplemental Lecture notes

Gradient Descent. Model

Linear Regression. CSL603 - Fall 2017 Narayanan C Krishnan

Linear Regression. CSL465/603 - Fall 2016 Narayanan C Krishnan

1 What a Neural Network Computes

Stat 406: Algorithms for classification and prediction. Lecture 1: Introduction. Kevin Murphy. Mon 7 January,

Binary Classification / Perceptron

Boosting: Foundations and Algorithms. Rob Schapire

Lecture #11: Classification & Logistic Regression

Big Data Analytics. Lucas Rego Drumond

CPSC 340: Machine Learning and Data Mining. Gradient Descent Fall 2016

Boosting. Ryan Tibshirani Data Mining: / April Optional reading: ISL 8.2, ESL , 10.7, 10.13

CSC 411: Lecture 04: Logistic Regression

DATA MINING AND MACHINE LEARNING. Lecture 4: Linear models for regression and classification Lecturer: Simone Scardapane

COMPUTATIONAL INTELLIGENCE (INTRODUCTION TO MACHINE LEARNING) SS16

CSC321 Lecture 4: Learning a Classifier

Vote. Vote on timing for night section: Option 1 (what we have now) Option 2. Lecture, 6:10-7:50 25 minute dinner break Tutorial, 8:15-9

Big Data Analytics Linear models for Classification (Often called Logistic Regression) April 16, 2018 Prof. R Bohn

VBM683 Machine Learning

CSE 417T: Introduction to Machine Learning. Final Review. Henry Chai 12/4/18

Statistical NLP Spring A Discriminative Approach

Case Study 1: Estimating Click Probabilities. Kakade Announcements: Project Proposals: due this Friday!

Machine Learning (CSE 446): Multi-Class Classification; Kernel Methods

Conjugate-Gradient. Learn about the Conjugate-Gradient Algorithm and its Uses. Descent Algorithms and the Conjugate-Gradient Method. Qx = b.

From Binary to Multiclass Classification. CS 6961: Structured Prediction Spring 2018

Optimization and Gradient Descent

Least Mean Squares Regression. Machine Learning Fall 2018

Computational Mathematics for Learning and Data Analysis: introduction

Learning with Momentum, Conjugate Gradient Learning

This ensures that we walk downhill. For fixed λ not even this may be the case.

Mustafa Jarrar. Machine Learning Linear Regression. Birzeit University. Mustafa Jarrar: Lecture Notes on Linear Regression Birzeit University, 2018

December 20, MAA704, Multivariate analysis. Christopher Engström. Multivariate. analysis. Principal component analysis

Linear Regression. Aarti Singh. Machine Learning / Sept 27, 2010

Introduction to Machine Learning. Introduction to ML - TAU 2016/7 1

CSC321 Lecture 4: Learning a Classifier

Online Advertising is Big Business

Linear Discrimination Functions

61A Extra Lecture 13

SVAN 2016 Mini Course: Stochastic Convex Optimization Methods in Machine Learning

Generative v. Discriminative classifiers Intuition

Linear Models for Classification

Optimization and Root Finding. Kurt Hornik

Linear Methods for Classification

Overfitting, Bias / Variance Analysis

The Application of Extreme Learning Machine based on Gaussian Kernel in Image Classification

CSC321 Lecture 5 Learning in a Single Neuron

DATA MINING AND MACHINE LEARNING

Classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2012

Linear smoother. ŷ = S y. where s ij = s ij (x) e.g. s ij = diag(l i (x))

Convex Optimization / Homework 1, due September 19

Learning by constraints and SVMs (2)

Lecture 10. Neural networks and optimization. Machine Learning and Data Mining November Nando de Freitas UBC. Nonlinear Supervised Learning

Introduction to Logistic Regression

Randomized Decision Trees

Machine Learning Lecture 7

Outline. Supervised Learning. Hong Chang. Institute of Computing Technology, Chinese Academy of Sciences. Machine Learning Methods (Fall 2012)

CENG 793. On Machine Learning and Optimization. Sinan Kalkan

Qualifying Exam in Machine Learning

Ad Placement Strategies

CPSC 340: Machine Learning and Data Mining. MLE and MAP Fall 2017

FA Homework 2 Recitation 2

Jeffrey D. Ullman Stanford University

Homework 5. Convex Optimization /36-725

Comparison of Modern Stochastic Optimization Algorithms

The perceptron learning algorithm is one of the first procedures proposed for learning in neural network models and is mostly credited to Rosenblatt.

CS6220: DATA MINING TECHNIQUES

Warm up. Regrade requests submitted directly in Gradescope, do not instructors.

The Kernel Trick, Gram Matrices, and Feature Extraction. CS6787 Lecture 4 Fall 2017

Stochastic Gradient Descent

Neural Networks Learning the network: Backprop , Fall 2018 Lecture 4

Numerical Computation for Deep Learning

Natural Language Processing. Classification. Features. Some Definitions. Classification. Feature Vectors. Classification I. Dan Klein UC Berkeley

2 Linear Classifiers and Perceptrons

Stats 170A: Project in Data Science Predictive Modeling: Regression

ECE521 week 3: 23/26 January 2017

Association studies and regression

Transcription:

Linear and Logistic Regression (the SQL way)

What is the purpose of this presentation? To show if linear or logistic regression is possible with SQL and to provide their implementation To provide enough arguments whether an SQL implementation of these regressions is worth or not When is it worth to perform any numerical calculations on the database side.

Presentation Structure 1. Linear Regression What is linear regression? Use cases Solving for coefcients in SQL (with demo) 2. Logistic Regression What is logistic regression? Use cases Logistic Regression vs Linear Regression comparison Gradient Descent Solving for coefcients in C++ demo 3. Discussion whether SQL implementation of the above is worth it or not?

What is Simple Linear Regression? Simple linear regression is a statistical method that allows us to summarize and study relationships between two continuous (quantitative) variables: One variable, denoted x, is regarded as the predictor, explanatory, or independent variable. The other variable, denoted y, is regarded as the response, outcome, or dependent variable.

Scatter Plots of Data with Various Correlation Coefcients Y Y X Y Y X Y Y X X X X

Why do we need Linear Regression? What dose is required for a particular treatment? How many days will take for a full recovery? How many cars will be parked here tomorrow? How many insurances will be sold next month?

Simple Linear Regression Given the training set we try to fnd the best coefcients which would give a where

Simple linear regression (SLR) There are multiple ways in which we could obtain the coefcients : Ordinary least squares (OLS) : conceptually the simplest and computationally straightforward (advantageous for SQL) Generalized least squares (GLS) Iteratively reweighted least squares (IRWLS) Percentage least squares (PLS)

SLR - OLS Minimize =0

SLR - OLS Minimize =0 =0 =0

SLR With only 2 parameters to estimate : computationally it is not a challenge for any DBMS But increasing the #parameters it will slow by noticeable constant factor if we solve all in our scalar fashion. However we could use a some matrix tricks to solve for any size of.

Multilinear Linear Regression with LS

Minimize

Solving with QR No need to compute, bring it to a simpler form : where Q is orthogonal() and R upper triangular

At this point it is trivial to solve for because R is an upper triangular matrix. QR Factorization simplifed the process but it s still tedious (this is how the lm routine is implemented in R with C and Fortran calls underneath)

Problems with Multiple linear regression? Operations for linear algebra must be implemented : Matrix/vector multiplication Matrix inverse/pseudo-inverse Matrix factorization like SVD, QR, Cholesky, Gauss-Jordan Too much number crunching for an engine which has diferent purpose, it s far away from FORTRA!嘻 Even C++ Boost s library for basic linear algebra (BLAS) does a poor job in comparison to MATLAB.

What is Logistic Regression? To predict an outcome variable that is categorical from predictor variables that are continuous and/or categorical Used because having a categorical outcome variable violates the assumption of linearity in normal regression The only real limitation for logistic regression is that the outcome variable must be discrete Logistic regression deals with this problem by using a logarithmic transformation on the outcome variable which allow us to model a nonlinear association in a linear way It expresses the linear regression equation in logarithmic terms (called the logit)

Logistic Regression Use Cases Google uses it to classify spam or not spam email Is a loan good for you or bad? Will my political candidate win the election? Will this user buy a monthly!etfix subscription? Will the prescribed drugs have the desired efect? Should this bank transaction be classifed as a fraud?

Why not Linear Regression instead of Logistic Regression? Maybe we could solve the problem mathematically only using Linear Regression for classifcation? and that will spare us a lot of complexity. We would like to classify our input into 2 categories : either a 0 or 1 (ex: 0!o, 1 Yes)

Why not Linear Regression instead of Logistic Regression? Assume if Real Data, -> Training set (1) Yes (0.5)Malignan t? (0)!o It doesn t look that unreasonable until Tumor Size

Why not Linear Regression instead of Logistic Regression? Assume if (1) Yes (0.5)Malignan t? (0)!o Tumor Size!ew data comes to our model and destroys it

Why not Linear Regression instead of Logistic Regression? Assume if (1) Yes (0.5)Malignan t? (0)!o Where should the separation line be Tumor Size

Why not Linear Regression instead of Logistic Regression? Currently : For classifcation we need Logistic Regression : Let s create a new function which will satisfy our conditions. wrap it to

Logistic Regression

Gradient Descent A technique to fnd minimum of a function With other words for which input parameters of the function will I get a minimum output!ot the best algorithm but computationally and conceptually simple

Gradient Descent Intuition This technique works well for convex functions. convex landing point

Gradient Descent Intuition This technique is not the best for non-convex functions. nonconvex potential landing points It can potentially fnd the global min but also local

Gradient Descent Intuition

Gradient Descent Given some function We want High level steps: Start with some will keep changing to reduce until hopefully we come to the, that is : we converge.

Gradient Descent Algorithm Repeat until convergence {

Gradient Descent Intuition Positive Tangent Case repeat until convergence { The learning rate will adjust accordingly. 0

Gradient Descent Intuition!egative Tangent Case repeat until convergence { The learning rate will adjust accordingly. 0

Some Ugly Cases for Gradient Descent It could converge slow, taking micro steps (almost fat surfaces) It may not converge at all (large, may overshoot) Starting point

Gradient Descent notation Can be generalized for any dimension

Rosenbrock function A special non-convex Also known as function used as a Rosenbrock s valley performance test problem function or for optimization algorithms. Rosenbrock s banana function. Global min at

Logistic Regression with Gradient Descent Repeat until convergence {

Is pure SQL Regression worth it? MAYBE DEFI!ETELY!OT Simple linear regression Multiple linear regression Multiple logistic regression!umerical Algorithms

Stackoverfow friendliness

References https://en.wikipedia.org/wiki/linear_regression https://en.wikipedia.org/wiki/logistic_regression https://en.wikipedia.org/wiki/gradient_descent https://stackoverfow.com/questions/6449072/doingcalculations-in-mysql-vs-php